Hi all, is there a way to read a data file into R line-by-line, akin to what fscanf does in C, say? It seems that "scan" and "read.table" both read the entire data file in at once, whereas "readLines" allows one to read a file partially, but doesn't quite read line-by-line either. I guess what I was hoping to do is something like this: while( linebyline( "file" ) != end of file ) { process each line .... } Thanks.
On Wed, 30 Apr 2003 11:51:00 +0000, you wrote:>Hi all, is there a way to read a data file into R line-by-line, akin >to what fscanf does in C, say? > >It seems that "scan" and "read.table" both read the entire data file >in at once, whereas "readLines" allows one to read a file partially, >but doesn't quite read line-by-line either.That's what readLines(con, n=1) is supposed to do; in what way does it not quite work? Duncan
Maybe I'm missing something. When I do readLines( "file", n = 1) and repeat, it always reads the first line of "file". I've to be able to advance to the next line, no? I'll take a look at the command file(), as someone else suggested. Thanks.>From: Duncan Murdoch <dmurdoch at pair.com> >To: "R A F" <raf1729 at hotmail.com> >CC: R-help at stat.math.ethz.ch >Subject: Re: [R] Scanning data files line-by-line >Date: Wed, 30 Apr 2003 08:26:26 -0400 > >On Wed, 30 Apr 2003 11:51:00 +0000, you wrote: > > >Hi all, is there a way to read a data file into R line-by-line, akin > >to what fscanf does in C, say? > > > >It seems that "scan" and "read.table" both read the entire data file > >in at once, whereas "readLines" allows one to read a file partially, > >but doesn't quite read line-by-line either. > >That's what readLines(con, n=1) is supposed to do; in what way does it >not quite work? > >Duncan
Thanks very much. I guess the answer leads to more questions: (a) What if I don't know the number of lines? So I would like to use a while loop until readLines hits an EOF character. Would that be possible? (b) When readLines is used, a string is returned. I'd like to split the string into fields, and Andy Liaw suggested strsplit, but the number of spaces between fields is variable. So for example, one line could be 1 space 2 space space 3 and the next line could be 4 space space 5 space 6, so I could not do a strsplit using " ". Really what I know is the variable type of each field -- for example, each line is double, string, then double, etc. How would one use this information to split the string given by readLines? Thanks very much again!>From: Prof Brian Ripley <ripley at stats.ox.ac.uk> >To: R A F <raf1729 at hotmail.com> >CC: R-help at stat.math.ethz.ch >Subject: Re: [R] Scanning data files line-by-line >Date: Wed, 30 Apr 2003 14:13:26 +0100 (BST) > >It's open() you need, as in > >con <- file("file") >open(con) >for(i in 1:10) print(readLines(con, n=1)) >close(con) > >In C you would need to (f)open a file to read it line-by-line, just as >here. > >The first two lines can be collapsed to > >con <- file("file", "r")
Hi all, thanks to everyone again for helping out. I don't want to generate too many messages, but this problem seems common enough that maybe it's worth a summary. What I can do is this. Let's say "file" has lines of double, string, double with variable number of spaces between fields followed by EOF. aaa <- file( "file", "r" ) while( length( ( x <- scan( aaa, nlines = 1, list( 0, "", 0 ) ) )[[1]] ) > 0 ) { check to see if x is empty again (by length( x[[1]] ) > 0 ) since we would read in the EOF character into x still if not empty start processing } close( aaa ) Here x is a list and x[[1]] is the first field, etc. Professor Ripley also suggested textConnections, but I didn't experiment -- I'm usually happy to find something that works. :-) Thanks again.>From: Spencer Graves <spencer.graves at pdf.com> >To: Prof Brian Ripley <ripley at stats.ox.ac.uk> >CC: R-help at stat.math.ethz.ch, R A F <raf1729 at hotmail.com> >Subject: Re: [R] Scanning data files line-by-line >Date: Wed, 30 Apr 2003 07:28:03 -0700 > >With a "connection" instead of a "file", there is no counterpart to >"count.fields" to summarize what's available? > >Thanks, >Spencer Graves > >Prof Brian Ripley wrote: >>On Wed, 30 Apr 2003, R A F wrote: >> >> >>>Thanks very much. I guess the answer leads to more questions: >>> >>>(a) What if I don't know the number of lines? So I would like to use >>> a while loop until readLines hits an EOF character. Would that >>> be possible? >> >> >>Yes. After you reach the end of the file you will get character(0) since >> >>Value: >> >> A character vector of length the number of lines read. >> >>and zero lines would have been read. >> >> >>>(b) When readLines is used, a string is returned. >> >> >>Not quite: a character vector is returned. >> >> >>>I'd like to split >>> the string into fields, and Andy Liaw suggested strsplit, but the >>> number of spaces between fields is variable. So for example, one >>> line could be 1 space 2 space space 3 and the next line could be >>> 4 space space 5 space 6, so I could not do a strsplit using " ". >>> >>> Really what I know is the variable type of each field -- for >>> example, each line is double, string, then double, etc. How >>> would one use this information to split the string given by >>> readLines? >> >> >>You could use scan on the line: it works on textConnections. >> >> >>>Thanks very much again!
I'm sorry to have to ask another question related to this. What if the lines have variable number of fields? For example, all I know is that each line starts with double, string, double, say, but some lines may have some more fields afterwards. So the data file may look like 1 A 2 3 B 4 5 6 DD 7 etc. If I use scan with list( 0, "", 0 ), each line is treated as if it has a multiple of 3 elements, but really, what I want is to discard all fields after the third. I tried the nmax = 3 option but that did not seem to work. Maybe I'm doing this wrong. Thanks again!>From: "R A F" <raf1729 at hotmail.com> >To: spencer.graves at pdf.com, ripley at stats.ox.ac.uk >CC: R-help at stat.math.ethz.ch >Subject: Re: [R] Scanning data files line-by-line >Date: Wed, 30 Apr 2003 15:21:23 +0000 > >Hi all, thanks to everyone again for helping out. I don't want to >generate too many messages, but this problem seems common enough that >maybe it's worth a summary. > >What I can do is this. Let's say "file" has lines of double, string, >double with variable number of spaces between fields followed by EOF. > >aaa <- file( "file", "r" ) > >while( length( ( x <- scan( aaa, nlines = 1, list( 0, "", 0 ) ) )[[1]] ) > > 0 ) >{ > check to see if x is empty again (by length( x[[1]] ) > 0 ) since > we would read in the EOF character into x still > > if not empty > start processing >} > >close( aaa ) > >Here x is a list and x[[1]] is the first field, etc. > >Professor Ripley also suggested textConnections, but I didn't >experiment -- I'm usually happy to find something that works. :-) > >Thanks again.