I would like to read a connection line by line with scan but don't know how to tell when to quit trying. Is there any way that you can ask the connection object if it is at the end? E.g., t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' tfile <- tempfile() cat(t, file=tfile) tcon <- file(tfile, "r") # or tcon <- textConnection(t) scan(tcon, what="", nlines=1) #Read 2 items #[1] "A" "Two line\nentry"> scan(tcon, what="", nlines=1) # empty line#Read 0 items #character(0) scan(tcon, what="", nlines=1) #Read 3 items #[1] "Three\nline\nentry" "D" "E" scan(tcon, what="", nlines=1) # end of file #Read 0 items #character(0) scan(tcon, what="", nlines=1) # end of file #Read 0 items #character(0) I am reading virtual line by virtual line because the lines may have different numbers of fields. Bill Dunlap TIBCO Software wdunlap tibco.com
I've always used system("wc -l myfile") to get the number of lines in advance. But here are two other R-only options, both using readLines instead of scan. There's probably something more efficient, too. Your setup: t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' tfile <- tempfile() cat(t, file=tfile) tcon <- file(tfile, "r") # or tcon <- textConnection(t) readLines() produces character(0) for nonexistent lines and "" for empty lines.> readLines(tcon, n=1)[1] "A \"Two line"> readLines(tcon, n=1)[1] "entry\""> readLines(tcon, n=1)[1] ""> readLines(tcon, n=1)[1] "\"Three"> readLines(tcon, n=1)[1] "line"> readLines(tcon, n=1)[1] "entry\" D E"> readLines(tcon, n=1)character(0)> readLines(tcon, n=1)character(0) Or if the file isn't too large for memory, you can read the whole thing in then process it line by line: tcon <- file(tfile, "r") # or tcon <- textConnection(t) allfile <- readLines(tcon, n=10000)> length(allfile)[1] 6 On Thu, Oct 15, 2015 at 4:16 PM, William Dunlap <wdunlap at tibco.com> wrote:> I would like to read a connection line by line with scan but > don't know how to tell when to quit trying. Is there any > way that you can ask the connection object if it is at the end? > > E.g., > > t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' > tfile <- tempfile() > cat(t, file=tfile) > tcon <- file(tfile, "r") # or tcon <- textConnection(t) > scan(tcon, what="", nlines=1) > #Read 2 items > #[1] "A" "Two line\nentry" >> scan(tcon, what="", nlines=1) # empty line > #Read 0 items > #character(0) > scan(tcon, what="", nlines=1) > #Read 3 items > #[1] "Three\nline\nentry" "D" "E" > scan(tcon, what="", nlines=1) # end of file > #Read 0 items > #character(0) > scan(tcon, what="", nlines=1) # end of file > #Read 0 items > #character(0) > > I am reading virtual line by virtual line because the lines > may have different numbers of fields. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com-- Sarah Goslee http://www.functionaldiversity.org
readLines() does not work for me since it breaks up multiline fields that are enclosed in quotes. E.g., the text file line A "Two line\nentry" should be imported as 2 strings, the second being "Two line\nfield", not "\"Two line" with the next call to readLines bringing in "fentry\"". Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Oct 15, 2015 at 1:44 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:> I've always used system("wc -l myfile") to get the number of lines in > advance. But here are two other R-only options, both using readLines > instead of scan. There's probably something more efficient, too. > > Your setup: > t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' > tfile <- tempfile() > cat(t, file=tfile) > tcon <- file(tfile, "r") # or tcon <- textConnection(t) > > readLines() produces character(0) for nonexistent lines and "" for empty lines. > >> readLines(tcon, n=1) > [1] "A \"Two line" >> readLines(tcon, n=1) > [1] "entry\"" >> readLines(tcon, n=1) > [1] "" >> readLines(tcon, n=1) > [1] "\"Three" >> readLines(tcon, n=1) > [1] "line" >> readLines(tcon, n=1) > [1] "entry\" D E" >> readLines(tcon, n=1) > character(0) >> readLines(tcon, n=1) > character(0) > > Or if the file isn't too large for memory, you can read the whole > thing in then process it line by line: > > tcon <- file(tfile, "r") # or tcon <- textConnection(t) > allfile <- readLines(tcon, n=10000) > >> length(allfile) > [1] 6 > > On Thu, Oct 15, 2015 at 4:16 PM, William Dunlap <wdunlap at tibco.com> wrote: >> I would like to read a connection line by line with scan but >> don't know how to tell when to quit trying. Is there any >> way that you can ask the connection object if it is at the end? >> >> E.g., >> >> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' >> tfile <- tempfile() >> cat(t, file=tfile) >> tcon <- file(tfile, "r") # or tcon <- textConnection(t) >> scan(tcon, what="", nlines=1) >> #Read 2 items >> #[1] "A" "Two line\nentry" >>> scan(tcon, what="", nlines=1) # empty line >> #Read 0 items >> #character(0) >> scan(tcon, what="", nlines=1) >> #Read 3 items >> #[1] "Three\nline\nentry" "D" "E" >> scan(tcon, what="", nlines=1) # end of file >> #Read 0 items >> #character(0) >> scan(tcon, what="", nlines=1) # end of file >> #Read 0 items >> #character(0) >> >> I am reading virtual line by virtual line because the lines >> may have different numbers of fields. >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com > -- > Sarah Goslee > http://www.functionaldiversity.org
This is a problem in C as well... and the solution is to read the lines yourself and then give those lines to scan. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On October 15, 2015 1:16:58 PM PDT, William Dunlap <wdunlap at tibco.com> wrote:>I would like to read a connection line by line with scan but >don't know how to tell when to quit trying. Is there any >way that you can ask the connection object if it is at the end? > >E.g., > >t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' >tfile <- tempfile() >cat(t, file=tfile) >tcon <- file(tfile, "r") # or tcon <- textConnection(t) >scan(tcon, what="", nlines=1) >#Read 2 items >#[1] "A" "Two line\nentry" >> scan(tcon, what="", nlines=1) # empty line >#Read 0 items >#character(0) >scan(tcon, what="", nlines=1) >#Read 3 items >#[1] "Three\nline\nentry" "D" "E" >scan(tcon, what="", nlines=1) # end of file >#Read 0 items >#character(0) >scan(tcon, what="", nlines=1) # end of file >#Read 0 items >#character(0) > >I am reading virtual line by virtual line because the lines >may have different numbers of fields. > >Bill Dunlap >TIBCO Software >wdunlap tibco.com > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
C can tell when it hits the end of input. Reading the lines with readLines and passing them to scan() does not help - it is the same as having scan read the original file. My problem is that the file (or other connection) has a variable number of fields on each "line", and perhaps no fields on some lines. Fields enclosed in quotes may include newline character. I want to read this file into a list of character vectors, the n'th element of the list being the fields on the n'th "line" of the file. repeating scan(connection, nlines=1, what="") does everything right except for telling me when it has read everything the connection has to offer. scan(connection, what="") manages to figure out where the end of the file is, but does not tell me the line number associated each character string. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Oct 15, 2015 at 2:57 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> This is a problem in C as well... and the solution is to read the lines yourself and then give those lines to scan. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On October 15, 2015 1:16:58 PM PDT, William Dunlap <wdunlap at tibco.com> wrote: >>I would like to read a connection line by line with scan but >>don't know how to tell when to quit trying. Is there any >>way that you can ask the connection object if it is at the end? >> >>E.g., >> >>t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' >>tfile <- tempfile() >>cat(t, file=tfile) >>tcon <- file(tfile, "r") # or tcon <- textConnection(t) >>scan(tcon, what="", nlines=1) >>#Read 2 items >>#[1] "A" "Two line\nentry" >>> scan(tcon, what="", nlines=1) # empty line >>#Read 0 items >>#character(0) >>scan(tcon, what="", nlines=1) >>#Read 3 items >>#[1] "Three\nline\nentry" "D" "E" >>scan(tcon, what="", nlines=1) # end of file >>#Read 0 items >>#character(0) >>scan(tcon, what="", nlines=1) # end of file >>#Read 0 items >>#character(0) >> >>I am reading virtual line by virtual line because the lines >>may have different numbers of fields. >> >>Bill Dunlap >>TIBCO Software >>wdunlap tibco.com >> >>______________________________________________ >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. >
> I would like to read a connection line by line with scan but > don't know how to tell when to quit trying. Is there any > way that you can ask the connection object if it is at the end?I found that adding the argument blank.lines.skip=FALSE to the scan command will do the trick when nlines=1. If the line is empty it returns a vector of one empty character string; at end-of-file it returns a zero-long character vector. E.g.,> cat(t <- "OneA OneB\n\n\"Third\nLine A\" \"Line3B\" Line3C\n", file= tf <- tempfile()) > str({tcon <- file(tf, "r"); lapply(1:5, function(i)scan(tcon, nlines=1, what=list(""), blank.lines.skip=FALSE, quiet=TRUE))})List of 5 $ :List of 1 ..$ : chr [1:2] "OneA" "OneB" $ :List of 1 ..$ : chr "" $ :List of 1 ..$ : chr [1:3] "Third\nLine A" "Line3B" "Line3C" $ :List of 1 ..$ : chr(0) $ :List of 1 ..$ : chr(0)> str({tcon <- file(tf, "r"); lapply(1:5, function(i)scan(tcon, nlines=1, what=list(""), blank.lines.skip=TRUE, quiet=TRUE))})List of 5 $ :List of 1 ..$ : chr [1:2] "OneA" "OneB" $ :List of 1 ..$ : chr(0) $ :List of 1 ..$ : chr [1:3] "Third\nLine A" "Line3B" "Line3C" $ :List of 1 ..$ : chr(0) $ :List of 1 ..$ : chr(0)> str({tcon <- file(tf, "r"); lapply(1:5, function(i)scan(tcon, nlines=1, what="", blank.lines.skip=FALSE, quiet=TRUE))})List of 5 $ : chr [1:2] "OneA" "OneB" $ : chr "" $ : chr [1:3] "Third\nLine A" "Line3B" "Line3C" $ : chr(0) $ : chr(0)> str({tcon <- file(tf, "r"); lapply(1:5, function(i)scan(tcon, nlines=1, what="", blank.lines.skip=TRUE, quiet=TRUE))})List of 5 $ : chr [1:2] "OneA" "OneB" $ : chr(0) $ : chr [1:3] "Third\nLine A" "Line3B" "Line3C" $ : chr(0) $ : chr(0) Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Oct 15, 2015 at 1:16 PM, William Dunlap <wdunlap at tibco.com> wrote:> I would like to read a connection line by line with scan but > don't know how to tell when to quit trying. Is there any > way that you can ask the connection object if it is at the end? > > E.g., > > t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n' > tfile <- tempfile() > cat(t, file=tfile) > tcon <- file(tfile, "r") # or tcon <- textConnection(t) > scan(tcon, what="", nlines=1) > #Read 2 items > #[1] "A" "Two line\nentry" >> scan(tcon, what="", nlines=1) # empty line > #Read 0 items > #character(0) > scan(tcon, what="", nlines=1) > #Read 3 items > #[1] "Three\nline\nentry" "D" "E" > scan(tcon, what="", nlines=1) # end of file > #Read 0 items > #character(0) > scan(tcon, what="", nlines=1) # end of file > #Read 0 items > #character(0) > > I am reading virtual line by virtual line because the lines > may have different numbers of fields. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com