R gurus, My use of scan() seems to be dropping the first digit of sequential scans on a connection. It looks like it happens only within a line:> cat("TITLE extra line", "235 335 535 735", "115 135 175",file="ex.data", sep="\n")> cn.x <- file("ex.data", open="r") > a <- scan(cn.x, skip=1, n=2)Read 2 items> a[1] 235 335> b <- scan(cn.x, n=2)Read 2 items> b[1] 35 735> c <- scan(cn.x, n=2)Read 2 items> c[1] 115 135> d <- scan(cn.x, n=1)Read 1 items> d[1] 75>Note in b, I should get 535, not 35 as the first value. In d, I should get 175. Does anyone know how to get these digits? The reason I'm not scanning the entire file at once is that my real dataset is much larger than a Gig and I'll need to pull only portions of the file in at once. I got readLines to work, but then I have to figure out how to convert each entire line into a data.frame. Scan seems a lot cleaner, with the exception of the funny character dropping issue. Thanks so much! Tim Howard
Dear Tim You can use cat("TITLE extra line", "235 335 535 735", "115 135 175", file="ex.data", sep="\n") cn.x <- file("ex.data", open="r") a <- scan(cn.x, skip=1, n=2, sep = " ")> Read 2 itemsa> [1] 235 335b <- scan(cn.x, n=2, sep = " ")> Read 2 itemsb> [1] 535 735c <- scan(cn.x, n=2, sep = " ")> Read 2 itemsc> [1] 115 135d <- scan(cn.x, n=1, sep = " ")> Read 1 itemsd> [1] 175Regards, Christoph Buser -- Christoph Buser <buser at stat.math.ethz.ch> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ Tim Howard writes: > R gurus, > > My use of scan() seems to be dropping the first digit of sequential > scans on a connection. It looks like it happens only within a line: > > > cat("TITLE extra line", "235 335 535 735", "115 135 175", > file="ex.data", sep="\n") > > cn.x <- file("ex.data", open="r") > > a <- scan(cn.x, skip=1, n=2) > Read 2 items > > a > [1] 235 335 > > b <- scan(cn.x, n=2) > Read 2 items > > b > [1] 35 735 > > c <- scan(cn.x, n=2) > Read 2 items > > c > [1] 115 135 > > d <- scan(cn.x, n=1) > Read 1 items > > d > [1] 75 > > > > Note in b, I should get 535, not 35 as the first value. In d, I should > get 175. Does anyone know how to get these digits? > > The reason I'm not scanning the entire file at once is that my real > dataset is much larger than a Gig and I'll need to pull only portions of > the file in at once. I got readLines to work, but then I have to figure > out how to convert each entire line into a data.frame. Scan seems a lot > cleaner, with the exception of the funny character dropping issue. > > Thanks so much! > Tim Howard > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
This is because scan() has a private pushback. Either: 1) Read the file a whole line at a time: I cannot see why you need to do so here nor in your sketched application. or 2) Use an explicit separator, e.g. " " in your example. scan() is not designed to read parts of lines of a file, On Tue, 18 Jan 2005, Tim Howard wrote:> R gurus, > > My use of scan() seems to be dropping the first digit of sequential > scans on a connection. It looks like it happens only within a line: > >> cat("TITLE extra line", "235 335 535 735", "115 135 175", > file="ex.data", sep="\n") >> cn.x <- file("ex.data", open="r") >> a <- scan(cn.x, skip=1, n=2) > Read 2 items >> a > [1] 235 335 >> b <- scan(cn.x, n=2) > Read 2 items >> b > [1] 35 735 >> c <- scan(cn.x, n=2) > Read 2 items >> c > [1] 115 135 >> d <- scan(cn.x, n=1) > Read 1 items >> d > [1] 75 >> > > Note in b, I should get 535, not 35 as the first value. In d, I should > get 175. Does anyone know how to get these digits? > > The reason I'm not scanning the entire file at once is that my real > dataset is much larger than a Gig and I'll need to pull only portions of > the file in at once. I got readLines to work, but then I have to figure > out how to convert each entire line into a data.frame. Scan seems a lot > cleaner, with the exception of the funny character dropping issue. > > Thanks so much! > Tim Howard > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thank you Dr. Ripley and Christoph Buser for your explanations and help. Using sep = " " within scan worked within lines of my file, but then I gained an NA record when wrapping from one line to the next (because the linebreak character is no longer recognized as a sep?). So, I'll continue by ensuring each group I read ends at the end of a line (as scan was designed), and by using scan without the sep option. FYI, Here's how the NA showed up, each line is 800 numbers long:>test4 <- scan(cn.test, n=1600, sep = " ") >test5 <- scan(cn.test, n=1600) >test4[797:803][1] 81.00000 81.08746 81.89484 82.00000 NA 580.09030 576.90300> test5[797:803][1] 81.01944 81.62060 81.96495 82.00000 82.00000 567.91840 563.10470 Thanks again. Tim>>> Prof Brian Ripley <ripley at stats.ox.ac.uk> 01/19/05 03:42AM >>>This is because scan() has a private pushback. Either: 1) Read the file a whole line at a time: I cannot see why you need to do so here nor in your sketched application. or 2) Use an explicit separator, e.g. " " in your example. scan() is not designed to read parts of lines of a file, On Tue, 18 Jan 2005, Tim Howard wrote:> R gurus, > > My use of scan() seems to be dropping the first digit of sequential > scans on a connection. It looks like it happens only within a line: > >> cat("TITLE extra line", "235 335 535 735", "115 135 175", > file="ex.data", sep="\n") >> cn.x <- file("ex.data", open="r") >> a <- scan(cn.x, skip=1, n=2) > Read 2 items >> a > [1] 235 335 >> b <- scan(cn.x, n=2) > Read 2 items >> b > [1] 35 735 >> c <- scan(cn.x, n=2) > Read 2 items >> c > [1] 115 135 >> d <- scan(cn.x, n=1) > Read 1 items >> d > [1] 75 >> > > Note in b, I should get 535, not 35 as the first value. In d, Ishould> get 175. Does anyone know how to get these digits? > > The reason I'm not scanning the entire file at once is that my real > dataset is much larger than a Gig and I'll need to pull only portionsof> the file in at once. I got readLines to work, but then I have tofigure> out how to convert each entire line into a data.frame. Scan seems alot> cleaner, with the exception of the funny character dropping issue. > > Thanks so much! > Tim Howard > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html>-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595