Hi all,
Is there a way to skip non-sequential lines using the "skip" argument
in the scan function?
E.g., I have a matrix with 100 rows and 1e7 columns. I open a
connection and want to read only lines 5, 7, 9, etc [i.e.,
seq(5,99,2)]
It might seem that the syntax to do this would be something like this
(if only the "skip" allowed vectors in the same way colClasses does in
read.table):
con <- file("bigfile",open="r")
rows.I.want <- seq(5,99,2)
new <-
scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want)
The above doesn't work - it would read lines 5, 6, 7, ...
length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can
accomplish this by looping, but with the huge datasets I'll be working
with, I'd like to try to save time by doing it all at once. Any ideas?
Matt
--
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com
simplest thing is to read in all 100 rows and then just select the ones you want: x <- scan(........) x <- lapply(x, '[', seq(5,99,2)) On Nov 8, 2007 4:19 AM, Matthew Keller <mckellercran at gmail.com> wrote:> Hi all, > > Is there a way to skip non-sequential lines using the "skip" argument > in the scan function? > > E.g., I have a matrix with 100 rows and 1e7 columns. I open a > connection and want to read only lines 5, 7, 9, etc [i.e., > seq(5,99,2)] > > It might seem that the syntax to do this would be something like this > (if only the "skip" allowed vectors in the same way colClasses does in > read.table): > > con <- file("bigfile",open="r") > rows.I.want <- seq(5,99,2) > new <- scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want) > > The above doesn't work - it would read lines 5, 6, 7, ... > length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can > accomplish this by looping, but with the huge datasets I'll be working > with, I'd like to try to save time by doing it all at once. Any ideas? > > Matt > > > > -- > Matthew C Keller > Asst. Professor of Psychology > University of Colorado at Boulder > www.matthewckeller.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Also check out: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/92525.html On Nov 8, 2007 4:19 AM, Matthew Keller <mckellercran at gmail.com> wrote:> Hi all, > > Is there a way to skip non-sequential lines using the "skip" argument > in the scan function? > > E.g., I have a matrix with 100 rows and 1e7 columns. I open a > connection and want to read only lines 5, 7, 9, etc [i.e., > seq(5,99,2)] > > It might seem that the syntax to do this would be something like this > (if only the "skip" allowed vectors in the same way colClasses does in > read.table): > > con <- file("bigfile",open="r") > rows.I.want <- seq(5,99,2) > new <- scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want) > > The above doesn't work - it would read lines 5, 6, 7, ... > length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can > accomplish this by looping, but with the huge datasets I'll be working > with, I'd like to try to save time by doing it all at once. Any ideas? > > Matt > > > > -- > Matthew C Keller > Asst. Professor of Psychology > University of Colorado at Boulder > www.matthewckeller.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Don't know if SQLite can handle that many columns but if it can and if file
in an acceptable format then sqldf simplifies the interface to reading it
into an SQLite database that it automatically creates on the fly and then
gets a subset out of it into R. (If it will fit into memory you can omit the
dname= argument.)
library(sqldf)
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
myfile <- file("myfile.dat")
sqldf("select * from myfile where rowid % 2 = 0 and rowid >= 5",
dbname = tempfile())
See example 6 on the home page:
http://sqldf.googlecode.com
On Nov 8, 2007 4:19 AM, Matthew Keller <mckellercran at gmail.com>
wrote:> Hi all,
>
> Is there a way to skip non-sequential lines using the "skip"
argument
> in the scan function?
>
> E.g., I have a matrix with 100 rows and 1e7 columns. I open a
> connection and want to read only lines 5, 7, 9, etc [i.e.,
> seq(5,99,2)]
>
> It might seem that the syntax to do this would be something like this
> (if only the "skip" allowed vectors in the same way colClasses
does in
> read.table):
>
> con <- file("bigfile",open="r")
> rows.I.want <- seq(5,99,2)
> new <-
scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want)
>
> The above doesn't work - it would read lines 5, 6, 7, ...
> length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can
> accomplish this by looping, but with the huge datasets I'll be working
> with, I'd like to try to save time by doing it all at once. Any ideas?
>
> Matt
>
>
>
> --
> Matthew C Keller
> Asst. Professor of Psychology
> University of Colorado at Boulder
> www.matthewckeller.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Reasonably Related Threads
- how to make read in a vector of 0s and 1s with no space between them
- ideas about how to reduce RAM & improve speed in trying to use lapply(strsplit())
- Changing the name of the "R" process in top
- unable to install package ff
- things that are difficult/impossible to do in SAS or SPSS but simple in R