Hi all, Is there a way to skip non-sequential lines using the "skip" argument in the scan function? E.g., I have a matrix with 100 rows and 1e7 columns. I open a connection and want to read only lines 5, 7, 9, etc [i.e., seq(5,99,2)] It might seem that the syntax to do this would be something like this (if only the "skip" allowed vectors in the same way colClasses does in read.table): con <- file("bigfile",open="r") rows.I.want <- seq(5,99,2) new <- scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want) The above doesn't work - it would read lines 5, 6, 7, ... length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can accomplish this by looping, but with the huge datasets I'll be working with, I'd like to try to save time by doing it all at once. Any ideas? Matt -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com
simplest thing is to read in all 100 rows and then just select the ones you want: x <- scan(........) x <- lapply(x, '[', seq(5,99,2)) On Nov 8, 2007 4:19 AM, Matthew Keller <mckellercran at gmail.com> wrote:> Hi all, > > Is there a way to skip non-sequential lines using the "skip" argument > in the scan function? > > E.g., I have a matrix with 100 rows and 1e7 columns. I open a > connection and want to read only lines 5, 7, 9, etc [i.e., > seq(5,99,2)] > > It might seem that the syntax to do this would be something like this > (if only the "skip" allowed vectors in the same way colClasses does in > read.table): > > con <- file("bigfile",open="r") > rows.I.want <- seq(5,99,2) > new <- scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want) > > The above doesn't work - it would read lines 5, 6, 7, ... > length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can > accomplish this by looping, but with the huge datasets I'll be working > with, I'd like to try to save time by doing it all at once. Any ideas? > > Matt > > > > -- > Matthew C Keller > Asst. Professor of Psychology > University of Colorado at Boulder > www.matthewckeller.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Also check out: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/92525.html On Nov 8, 2007 4:19 AM, Matthew Keller <mckellercran at gmail.com> wrote:> Hi all, > > Is there a way to skip non-sequential lines using the "skip" argument > in the scan function? > > E.g., I have a matrix with 100 rows and 1e7 columns. I open a > connection and want to read only lines 5, 7, 9, etc [i.e., > seq(5,99,2)] > > It might seem that the syntax to do this would be something like this > (if only the "skip" allowed vectors in the same way colClasses does in > read.table): > > con <- file("bigfile",open="r") > rows.I.want <- seq(5,99,2) > new <- scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want) > > The above doesn't work - it would read lines 5, 6, 7, ... > length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can > accomplish this by looping, but with the huge datasets I'll be working > with, I'd like to try to save time by doing it all at once. Any ideas? > > Matt > > > > -- > Matthew C Keller > Asst. Professor of Psychology > University of Colorado at Boulder > www.matthewckeller.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Don't know if SQLite can handle that many columns but if it can and if file in an acceptable format then sqldf simplifies the interface to reading it into an SQLite database that it automatically creates on the fly and then gets a subset out of it into R. (If it will fit into memory you can omit the dname= argument.) library(sqldf) source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R") myfile <- file("myfile.dat") sqldf("select * from myfile where rowid % 2 = 0 and rowid >= 5", dbname = tempfile()) See example 6 on the home page: http://sqldf.googlecode.com On Nov 8, 2007 4:19 AM, Matthew Keller <mckellercran at gmail.com> wrote:> Hi all, > > Is there a way to skip non-sequential lines using the "skip" argument > in the scan function? > > E.g., I have a matrix with 100 rows and 1e7 columns. I open a > connection and want to read only lines 5, 7, 9, etc [i.e., > seq(5,99,2)] > > It might seem that the syntax to do this would be something like this > (if only the "skip" allowed vectors in the same way colClasses does in > read.table): > > con <- file("bigfile",open="r") > rows.I.want <- seq(5,99,2) > new <- scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want) > > The above doesn't work - it would read lines 5, 6, 7, ... > length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can > accomplish this by looping, but with the huge datasets I'll be working > with, I'd like to try to save time by doing it all at once. Any ideas? > > Matt > > > > -- > Matthew C Keller > Asst. Professor of Psychology > University of Colorado at Boulder > www.matthewckeller.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Seemingly Similar Threads
- how to make read in a vector of 0s and 1s with no space between them
- ideas about how to reduce RAM & improve speed in trying to use lapply(strsplit())
- Changing the name of the "R" process in top
- unable to install package ff
- things that are difficult/impossible to do in SAS or SPSS but simple in R