readLines() does not work for me since it breaks up multiline fields that are enclosed in quotes. E.g., the text file line A "Two line\nentry" should be imported as 2 strings, the second being "Two line\nfield", not "\"Two line" with the next call to readLines bringing in "fentry\"". Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Oct 15, 2015 at 1:44 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:> I've always used system("wc -l myfile") to get the number of lines in > advance. But here are two other R-only options, both using readLines > instead of scan. There's probably something more efficient, too. > > Your setup: > t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' > tfile <- tempfile() > cat(t, file=tfile) > tcon <- file(tfile, "r") # or tcon <- textConnection(t) > > readLines() produces character(0) for nonexistent lines and "" for empty lines. > >> readLines(tcon, n=1) > [1] "A \"Two line" >> readLines(tcon, n=1) > [1] "entry\"" >> readLines(tcon, n=1) > [1] "" >> readLines(tcon, n=1) > [1] "\"Three" >> readLines(tcon, n=1) > [1] "line" >> readLines(tcon, n=1) > [1] "entry\" D E" >> readLines(tcon, n=1) > character(0) >> readLines(tcon, n=1) > character(0) > > Or if the file isn't too large for memory, you can read the whole > thing in then process it line by line: > > tcon <- file(tfile, "r") # or tcon <- textConnection(t) > allfile <- readLines(tcon, n=10000) > >> length(allfile) > [1] 6 > > On Thu, Oct 15, 2015 at 4:16 PM, William Dunlap <wdunlap at tibco.com> wrote: >> I would like to read a connection line by line with scan but >> don't know how to tell when to quit trying. Is there any >> way that you can ask the connection object if it is at the end? >> >> E.g., >> >> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' >> tfile <- tempfile() >> cat(t, file=tfile) >> tcon <- file(tfile, "r") # or tcon <- textConnection(t) >> scan(tcon, what="", nlines=1) >> #Read 2 items >> #[1] "A" "Two line\nentry" >>> scan(tcon, what="", nlines=1) # empty line >> #Read 0 items >> #character(0) >> scan(tcon, what="", nlines=1) >> #Read 3 items >> #[1] "Three\nline\nentry" "D" "E" >> scan(tcon, what="", nlines=1) # end of file >> #Read 0 items >> #character(0) >> scan(tcon, what="", nlines=1) # end of file >> #Read 0 items >> #character(0) >> >> I am reading virtual line by virtual line because the lines >> may have different numbers of fields. >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com > -- > Sarah Goslee > http://www.functionaldiversity.org
Thus the post-processing, which I assume you'd have to do with scan() as well.> tcon <- file(tfile, "r") # or tcon <- textConnection(t) > allfile <- readLines(tcon, n=10000)> strsplit(paste(allfile, collapse="\n"), "\"")[[1]] [1] "A " "Two line\nentry" "\n\n" "Three\nline\nentry" [5] " D E" Or similar, depending on exactly what you want the result to look like. On Thu, Oct 15, 2015 at 4:56 PM, William Dunlap <wdunlap at tibco.com> wrote:> readLines() does not work for me since it breaks up > multiline fields that are enclosed in quotes. E.g., the > text file line > A "Two line\nentry" > should be imported as 2 strings, the second being > "Two line\nfield", not "\"Two line" with the next call to > readLines bringing in "fentry\"". > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Thu, Oct 15, 2015 at 1:44 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote: >> I've always used system("wc -l myfile") to get the number of lines in >> advance. But here are two other R-only options, both using readLines >> instead of scan. There's probably something more efficient, too. >> >> Your setup: >> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' >> tfile <- tempfile() >> cat(t, file=tfile) >> tcon <- file(tfile, "r") # or tcon <- textConnection(t) >> >> readLines() produces character(0) for nonexistent lines and "" for empty lines. >> >>> readLines(tcon, n=1) >> [1] "A \"Two line" >>> readLines(tcon, n=1) >> [1] "entry\"" >>> readLines(tcon, n=1) >> [1] "" >>> readLines(tcon, n=1) >> [1] "\"Three" >>> readLines(tcon, n=1) >> [1] "line" >>> readLines(tcon, n=1) >> [1] "entry\" D E" >>> readLines(tcon, n=1) >> character(0) >>> readLines(tcon, n=1) >> character(0) >> >> Or if the file isn't too large for memory, you can read the whole >> thing in then process it line by line: >> >> tcon <- file(tfile, "r") # or tcon <- textConnection(t) >> allfile <- readLines(tcon, n=10000) >> >>> length(allfile) >> [1] 6 >> >> On Thu, Oct 15, 2015 at 4:16 PM, William Dunlap <wdunlap at tibco.com> wrote: >>> I would like to read a connection line by line with scan but >>> don't know how to tell when to quit trying. Is there any >>> way that you can ask the connection object if it is at the end? >>> >>> E.g., >>> >>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' >>> tfile <- tempfile() >>> cat(t, file=tfile) >>> tcon <- file(tfile, "r") # or tcon <- textConnection(t) >>> scan(tcon, what="", nlines=1) >>> #Read 2 items >>> #[1] "A" "Two line\nentry" >>>> scan(tcon, what="", nlines=1) # empty line >>> #Read 0 items >>> #character(0) >>> scan(tcon, what="", nlines=1) >>> #Read 3 items >>> #[1] "Three\nline\nentry" "D" "E" >>> scan(tcon, what="", nlines=1) # end of file >>> #Read 0 items >>> #character(0) >>> scan(tcon, what="", nlines=1) # end of file >>> #Read 0 items >>> #character(0) >>> >>> I am reading virtual line by virtual line because the lines >>> may have different numbers of fields. >>> >>> Bill Dunlap >>> TIBCO Software >>> wdunlap tibco.com >> -- >> Sarah Goslee >> http://www.functionaldiversity.org
scan(nlines=) does this post-processing, which is why I'm using it instead of readLines. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Oct 15, 2015 at 2:06 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:> Thus the post-processing, which I assume you'd have to do with scan() as well. > >> tcon <- file(tfile, "r") # or tcon <- textConnection(t) >> allfile <- readLines(tcon, n=10000) > >> strsplit(paste(allfile, collapse="\n"), "\"") > [[1]] > [1] "A " "Two line\nentry" "\n\n" > "Three\nline\nentry" > [5] " D E" > > Or similar, depending on exactly what you want the result to look like. > > On Thu, Oct 15, 2015 at 4:56 PM, William Dunlap <wdunlap at tibco.com> wrote: >> readLines() does not work for me since it breaks up >> multiline fields that are enclosed in quotes. E.g., the >> text file line >> A "Two line\nentry" >> should be imported as 2 strings, the second being >> "Two line\nfield", not "\"Two line" with the next call to >> readLines bringing in "fentry\"". >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> >> On Thu, Oct 15, 2015 at 1:44 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote: >>> I've always used system("wc -l myfile") to get the number of lines in >>> advance. But here are two other R-only options, both using readLines >>> instead of scan. There's probably something more efficient, too. >>> >>> Your setup: >>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' >>> tfile <- tempfile() >>> cat(t, file=tfile) >>> tcon <- file(tfile, "r") # or tcon <- textConnection(t) >>> >>> readLines() produces character(0) for nonexistent lines and "" for empty lines. >>> >>>> readLines(tcon, n=1) >>> [1] "A \"Two line" >>>> readLines(tcon, n=1) >>> [1] "entry\"" >>>> readLines(tcon, n=1) >>> [1] "" >>>> readLines(tcon, n=1) >>> [1] "\"Three" >>>> readLines(tcon, n=1) >>> [1] "line" >>>> readLines(tcon, n=1) >>> [1] "entry\" D E" >>>> readLines(tcon, n=1) >>> character(0) >>>> readLines(tcon, n=1) >>> character(0) >>> >>> Or if the file isn't too large for memory, you can read the whole >>> thing in then process it line by line: >>> >>> tcon <- file(tfile, "r") # or tcon <- textConnection(t) >>> allfile <- readLines(tcon, n=10000) >>> >>>> length(allfile) >>> [1] 6 >>> >>> On Thu, Oct 15, 2015 at 4:16 PM, William Dunlap <wdunlap at tibco.com> wrote: >>>> I would like to read a connection line by line with scan but >>>> don't know how to tell when to quit trying. Is there any >>>> way that you can ask the connection object if it is at the end? >>>> >>>> E.g., >>>> >>>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' >>>> tfile <- tempfile() >>>> cat(t, file=tfile) >>>> tcon <- file(tfile, "r") # or tcon <- textConnection(t) >>>> scan(tcon, what="", nlines=1) >>>> #Read 2 items >>>> #[1] "A" "Two line\nentry" >>>>> scan(tcon, what="", nlines=1) # empty line >>>> #Read 0 items >>>> #character(0) >>>> scan(tcon, what="", nlines=1) >>>> #Read 3 items >>>> #[1] "Three\nline\nentry" "D" "E" >>>> scan(tcon, what="", nlines=1) # end of file >>>> #Read 0 items >>>> #character(0) >>>> scan(tcon, what="", nlines=1) # end of file >>>> #Read 0 items >>>> #character(0) >>>> >>>> I am reading virtual line by virtual line because the lines >>>> may have different numbers of fields. >>>> >>>> Bill Dunlap >>>> TIBCO Software >>>> wdunlap tibco.com >>> -- >>> Sarah Goslee >>> http://www.functionaldiversity.org