Collins, Stephen
2013-Jan-21 15:56 UTC
[R] how to bread while loop reading from connection with read.csv
Hello, I'm trying to read a file rows at a time, so as to not read the entire file into memory.? When reading the "connections" and "readLines" help, and "R help archive," it seems this should be possible with read.csv and a file connection, making use of the "nrows" argument, and checking where the "nrow()" of the new batch is zero rows.>From certain posts, it seemed that read.csv should return "character(0)" when the end of file is reached, and there are no more rows to read.? Instead, I get an error there are "no lines available for input." Have I made a mistake with the file, or calling read.csv??What is the proper way to check the end-of-file condition with read.csv, such that I could break a while loop reading the data in? #example, make a test file con <- file("test.csv","wt") cat("a,b,c\n", "1,2,3\n", "4,5,6\n", "7,6,5\n", "4,3,2\n", "3,2,1\n",file=con) unlink(con) #show the file is valid con <- file("test.csv","rt") read.csv(con,header=T) unlink(con) #show that readLines ends with "character(0)", like expected con <- file("test.csv","rt") readLines(con,n=10) readLines(con,n=10) unlink(con) #show that read.csv end with error con <- file("test.csv","rt") read.csv(con,header=T,nrows=10) read.csv(con,header=F,nrows=10) unlink(con) Sincerely, Stephen Collins Predictive Modeler Allstate Insurance Company> sessionInfo()R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252?? [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C????????????????????????? [5] LC_TIME=English_United States.1252??? attached base packages: [1] stats???? graphics? grDevices utils???? datasets? methods?? base???? loaded via a namespace (and not attached): [1] tools_2.15.0 Stephen Collins Predictive Modeler Quantitative Research & Analytics Allstate Insurance Company 2775 Sanders Road, Suite D2W Northbrook, IL 60062 t: 1+ 847 402 1465 e: stephen.collins at allstate.com
Berend Hasselman
2013-Jan-21 16:34 UTC
[R] how to bread while loop reading from connection with read.csv
On 21-01-2013, at 16:56, "Collins, Stephen" <Stephen.Collins at allstate.com> wrote:> Hello, > > I'm trying to read a file rows at a time, so as to not read the entire file into memory. When reading the "connections" and "readLines" help, and "R help archive," it seems this should be possible with read.csv and a file connection, making use of the "nrows" argument, and checking where the "nrow()" of the new batch is zero rows. > >> From certain posts, it seemed that read.csv should return "character(0)" when the end of file is reached, and there are no more rows to read. Instead, I get an error there are "no lines available for input." Have I made a mistake with the file, or calling read.csv? > > What is the proper way to check the end-of-file condition with read.csv, such that I could break a while loop reading the data in? > > #example, make a test file > con <- file("test.csv","wt") > cat("a,b,c\n", "1,2,3\n", "4,5,6\n", "7,6,5\n", "4,3,2\n", "3,2,1\n",file=con) > unlink(con) > > #show the file is valid > con <- file("test.csv","rt") > read.csv(con,header=T) > unlink(con) > > #show that readLines ends with "character(0)", like expected > con <- file("test.csv","rt") > readLines(con,n=10) > readLines(con,n=10) > unlink(con) > > #show that read.csv end with error > con <- file("test.csv","rt") > read.csv(con,header=T,nrows=10) > read.csv(con,header=F,nrows=10) > unlink(con) >How about: con <- file("test.csv","rt") while( length(tmp <- readLines(con,n=10)) > 0 ) { qq <- read.csv(text=tmp, header=TRUE) # do something with qq } unlink(con) qq Berend
Duncan Murdoch
2013-Jan-21 16:41 UTC
[R] how to bread while loop reading from connection with read.csv
On 13-01-21 10:56 AM, Collins, Stephen wrote:> Hello, > > I'm trying to read a file rows at a time, so as to not read the entire file into memory. When reading the "connections" and "readLines" help, and "R help archive," it seems this should be possible with read.csv and a file connection, making use of the "nrows" argument, and checking where the "nrow()" of the new batch is zero rows. > >>From certain posts, it seemed that read.csv should return "character(0)" when the end of file is reached, and there are no more rows to read. Instead, I get an error there are "no lines available for input." Have I made a mistake with the file, or calling read.csv? > > What is the proper way to check the end-of-file condition with read.csv, such that I could break a while loop reading the data in? > > #example, make a test file > con <- file("test.csv","wt") > cat("a,b,c\n", "1,2,3\n", "4,5,6\n", "7,6,5\n", "4,3,2\n", "3,2,1\n",file=con) > unlink(con)I don't think this is causing your problem, but unlink() seems like the wrong function to use here. Don't you mean close()?> > #show the file is valid > con <- file("test.csv","rt") > read.csv(con,header=T) > unlink(con) > > #show that readLines ends with "character(0)", like expected > con <- file("test.csv","rt") > readLines(con,n=10) > readLines(con,n=10) > unlink(con) > > #show that read.csv end with error > con <- file("test.csv","rt") > read.csv(con,header=T,nrows=10) > read.csv(con,header=F,nrows=10) > unlink(con)See the Value section of ?read.csv. In particular, "Empty input is an error unless col.names is specified, when a 0-row data frame is returned: similarly giving just a header line if header = TRUE results in a 0-row data frame. Note that in either case the columns will be logical unless colClasses was supplied." Duncan Murdoch