thr3ads.net - R help - [R] Dropping a digit with scan() on a connection [Jan 2005]

If this information is useful, please help other people find it:
Share via:

Tim Howard

2005-Jan-18 20:36 UTC

[R] Dropping a digit with scan() on a connection

R gurus,

My use of scan() seems to be dropping the first digit of sequential
scans on a connection. It looks like it happens only within a line:
> cat("TITLE extra line", "235 335 535 735", "115
135 175",
file="ex.data", sep="\n")> cn.x <- file("ex.data", open="r")
> a <- scan(cn.x, skip=1, n=2)
Read 2 items> a
[1] 235 335> b <- scan(cn.x, n=2)
Read 2 items> b
[1]  35 735> c <- scan(cn.x, n=2)
Read 2 items> c
[1] 115 135> d <- scan(cn.x, n=1)
Read 1 items> d
[1] 75> 
Note in b, I should get 535, not 35 as the first value. In d, I should
get 175.  Does anyone know how to get these digits?

The reason I'm not scanning the entire file at once is that my real
dataset is much larger than a Gig and I'll need to pull only portions of
the file in at once. I got readLines to work, but then I have to figure
out how to convert each entire line into a data.frame. Scan seems a lot
cleaner, with the exception of the funny character dropping issue.

Thanks so much!
Tim Howard

Christoph Buser

2005-Jan-19 08:20 UTC

head link

[R] Dropping a digit with scan() on a connection

Dear Tim

You can use

cat("TITLE extra line", "235 335 535 735", "115 135
175", file="ex.data", sep="\n")
cn.x <- file("ex.data", open="r")

a <- scan(cn.x, skip=1, n=2, sep = " ")> Read 2 items
a> [1] 235 335
b <- scan(cn.x, n=2, sep = " ")> Read 2 items
b> [1] 535 735
c <- scan(cn.x, n=2, sep = " ")> Read 2 items
c> [1] 115 135
d <- scan(cn.x, n=1, sep = " ")> Read 1 items
d> [1] 175
Regards,

Christoph Buser

-- 
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
phone: x-41-1-632-5414		fax: 632-1228
http://stat.ethz.ch/~buser/


Tim Howard writes:
 > R gurus,
 > 
 > My use of scan() seems to be dropping the first digit of sequential
 > scans on a connection. It looks like it happens only within a line:
 > 
 > > cat("TITLE extra line", "235 335 535 735",
"115 135 175",
 > file="ex.data", sep="\n")
 > > cn.x <- file("ex.data", open="r")
 > > a <- scan(cn.x, skip=1, n=2)
 > Read 2 items
 > > a
 > [1] 235 335
 > > b <- scan(cn.x, n=2)
 > Read 2 items
 > > b
 > [1]  35 735
 > > c <- scan(cn.x, n=2)
 > Read 2 items
 > > c
 > [1] 115 135
 > > d <- scan(cn.x, n=1)
 > Read 1 items
 > > d
 > [1] 75
 > > 
 > 
 > Note in b, I should get 535, not 35 as the first value. In d, I should
 > get 175.  Does anyone know how to get these digits?
 > 
 > The reason I'm not scanning the entire file at once is that my real
 > dataset is much larger than a Gig and I'll need to pull only portions
of
 > the file in at once. I got readLines to work, but then I have to figure
 > out how to convert each entire line into a data.frame. Scan seems a lot
 > cleaner, with the exception of the funny character dropping issue.
 > 
 > Thanks so much!
 > Tim Howard
 > 
 > ______________________________________________
 > R-help at stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Prof Brian Ripley

2005-Jan-19 08:42 UTC

head link

[R] Dropping a digit with scan() on a connection

This is because scan() has a private pushback.
Either:

1) Read the file a whole line at a time: I cannot see why you need to do 
so here nor in your sketched application.

or

2) Use an explicit separator, e.g. " " in your example.

scan() is not designed to read parts of lines of a file,


On Tue, 18 Jan 2005, Tim Howard wrote:
> R gurus,
>
> My use of scan() seems to be dropping the first digit of sequential
> scans on a connection. It looks like it happens only within a line:
>
>> cat("TITLE extra line", "235 335 535 735",
"115 135 175",
> file="ex.data", sep="\n")
>> cn.x <- file("ex.data", open="r")
>> a <- scan(cn.x, skip=1, n=2)
> Read 2 items
>> a
> [1] 235 335
>> b <- scan(cn.x, n=2)
> Read 2 items
>> b
> [1]  35 735
>> c <- scan(cn.x, n=2)
> Read 2 items
>> c
> [1] 115 135
>> d <- scan(cn.x, n=1)
> Read 1 items
>> d
> [1] 75
>>
>
> Note in b, I should get 535, not 35 as the first value. In d, I should
> get 175.  Does anyone know how to get these digits?
>
> The reason I'm not scanning the entire file at once is that my real
> dataset is much larger than a Gig and I'll need to pull only portions
of
> the file in at once. I got readLines to work, but then I have to figure
> out how to convert each entire line into a data.frame. Scan seems a lot
> cleaner, with the exception of the funny character dropping issue.
>
> Thanks so much!
> Tim Howard
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Tim Howard

2005-Jan-19 12:42 UTC

head link

[R] Dropping a digit with scan() on a connection

Thank you Dr. Ripley and Christoph Buser for your explanations and
help.

Using sep = " "  within scan worked within lines of my file, but then
I
gained an NA record when wrapping from one line to the next (because the
linebreak character is no longer recognized as a sep?).  So, I'll
continue by ensuring each group I read ends at the end of a line (as
scan was designed), and by using scan without the sep option.

FYI, Here's how the NA showed up, each line is 800 numbers long:
>test4 <- scan(cn.test, n=1600, sep = " ")
>test5 <- scan(cn.test, n=1600)
>test4[797:803][1]  81.00000  81.08746  81.89484  82.00000        NA 580.09030
576.90300> test5[797:803][1]  81.01944  81.62060  81.96495  82.00000  82.00000 567.91840
563.10470

Thanks again.
Tim

>>> Prof Brian Ripley <ripley at stats.ox.ac.uk> 01/19/05 03:42AM
>>>This is because scan() has a private pushback.
Either:

1) Read the file a whole line at a time: I cannot see why you need to
do 
so here nor in your sketched application.

or

2) Use an explicit separator, e.g. " " in your example.

scan() is not designed to read parts of lines of a file,


On Tue, 18 Jan 2005, Tim Howard wrote:
> R gurus,
>
> My use of scan() seems to be dropping the first digit of sequential
> scans on a connection. It looks like it happens only within a line:
>
>> cat("TITLE extra line", "235 335 535 735",
"115 135 175",
> file="ex.data", sep="\n")
>> cn.x <- file("ex.data", open="r")
>> a <- scan(cn.x, skip=1, n=2)
> Read 2 items
>> a
> [1] 235 335
>> b <- scan(cn.x, n=2)
> Read 2 items
>> b
> [1]  35 735
>> c <- scan(cn.x, n=2)
> Read 2 items
>> c
> [1] 115 135
>> d <- scan(cn.x, n=1)
> Read 1 items
>> d
> [1] 75
>>
>
> Note in b, I should get 535, not 35 as the first value. In d, I
should> get 175.  Does anyone know how to get these digits?
>
> The reason I'm not scanning the entire file at once is that my real
> dataset is much larger than a Gig and I'll need to pull only portions
of> the file in at once. I got readLines to work, but then I have to
figure> out how to convert each entire line into a data.frame. Scan seems a
lot> cleaner, with the exception of the funny character dropping issue.
>
> Thanks so much!
> Tim Howard
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html >
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk 
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/ 
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Reasonably Related Threads

Search for more seemingly similar threads

R help - Jan 2005 - Dropping a digit with scan() on a connection

[R] Dropping a digit with scan() on a connection

[R] Dropping a digit with scan() on a connection

[R] Dropping a digit with scan() on a connection

[R] Dropping a digit with scan() on a connection

Reasonably Related Threads