thr3ads.net - R help - [R] read.table() versus scan() [Jan 2011]

If this information is useful, please help other people find it:
Share via:

H Roark

2011-Jan-28 04:23 UTC

[R] read.table() versus scan()

I need to import a large number of simple, space-delimited text files with a few
columns of data each. The one quirk is that some rows are missing data and some
contain junk text at the end of each line. A typical file might look like:

a b c d
1 2 3 x
4 5 6
7 8 9 x
1 2 3 x c c
4 5 6 x
7 8 9 x

I'm trying to avoid having to pre-process the text files, as they all sit on
an ftp site that I don't manage.  My initial approach was just to read the
files using a read.table() statement with the arguments flush and fill set to
TRUE. For example, to import the above text file I tried:

read.table(file="ftp://ftp.example.dta", header=T, row.names=NULL,
fill=T, flush=T)

However, R throws the error "more columns than column names" and
won't import the file.

Interestingly, if I move the extra text "c c" from line 5 to line 6 in
the data file, read.table() reads the file just fine, and ignores the "c
c".  So, my first question is, why does simply moving these data down a row
solve this problem?

Next, I decided to try reading the file with the scan() function and it worked
perfectly:

data.frame(scan(file="ftp://ftp.example.dta", what=list(a=0, b=0, c=0,
d=""), sep=" ", skip=1, flush=T, fill=T))

I'm new to R, but as I understand it read.table() is based on the scan()
function. This makes me wonder if there is an additional argument I can add to
read.table() to make it import the file successfully, as scan() was able to do. 
Any help in this regard would be very much appreciated.  I'd also really
like to hear folks' perspectives on the merits of scan() versus read.table()
(e.g. when is scan() the best option?).

Cheers
 		 	   		  
	[[alternative HTML version deleted]]

Tal Galili

2011-Jan-28 08:23 UTC

head link

[R] read.table() versus scan()

Hi Roark,
>From my experience, this error is because of problem with reading theheaders, or problem with the "sep" parameter in read.table
Try something like
read.table(... ,sep ="\t")  (This is for tab delimited files)

Others might give more ideas.

Cheers,
Tal



----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Fri, Jan 28, 2011 at 6:23 AM, H Roark <hrbuilder@hotmail.com> wrote:
>
> I need to import a large number of simple, space-delimited text files with
> a few columns of data each. The one quirk is that some rows are missing
data
> and some contain junk text at the end of each line. A typical file might
> look like:
>
> a b c d
> 1 2 3 x
> 4 5 6
> 7 8 9 x
> 1 2 3 x c c
> 4 5 6 x
> 7 8 9 x
>
> I'm trying to avoid having to pre-process the text files, as they all
sit
> on an ftp site that I don't manage.  My initial approach was just to
read
> the files using a read.table() statement with the arguments flush and fill
> set to TRUE. For example, to import the above text file I tried:
>
> read.table(file="ftp://ftp.example.dta", header=T,
row.names=NULL, fill=T,
> flush=T)
>
> However, R throws the error "more columns than column names" and
won't
> import the file.
>
> Interestingly, if I move the extra text "c c" from line 5 to line
6 in the
> data file, read.table() reads the file just fine, and ignores the "c
c".
>  So, my first question is, why does simply moving these data down a row
> solve this problem?
>
> Next, I decided to try reading the file with the scan() function and it
> worked perfectly:
>
> data.frame(scan(file="ftp://ftp.example.dta", what=list(a=0, b=0,
c=0,
> d=""), sep=" ", skip=1, flush=T, fill=T))
>
> I'm new to R, but as I understand it read.table() is based on the
scan()
> function. This makes me wonder if there is an additional argument I can add
> to read.table() to make it import the file successfully, as scan() was able
> to do.  Any help in this regard would be very much appreciated.  I'd
also
> really like to hear folks' perspectives on the merits of scan() versus
> read.table() (e.g. when is scan() the best option?).
>
> Cheers
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Gabor Grothendieck

2011-Jan-28 13:09 UTC

head link

[R] read.table() versus scan()

On Thu, Jan 27, 2011 at 11:23 PM, H Roark <hrbuilder at hotmail.com>
wrote:>
> I need to import a large number of simple, space-delimited text files with
a few columns of data each. The one quirk is that some rows are missing data and
some contain junk text at the end of each line. A typical file might look like:
>
> a b c d
> 1 2 3 x
> 4 5 6
> 7 8 9 x
> 1 2 3 x c c
> 4 5 6 x
> 7 8 9 x
>
> I'm trying to avoid having to pre-process the text files, as they all
sit on an ftp site that I don't manage. ?My initial approach was just to
read the files using a read.table() statement with the arguments flush and fill
set to TRUE. For example, to import the above text file I tried:
>
> read.table(file="ftp://ftp.example.dta", header=T,
row.names=NULL, fill=T, flush=T)
>
> However, R throws the error "more columns than column names" and
won't import the file.
>
> Interestingly, if I move the extra text "c c" from line 5 to line
6 in the data file, read.table() reads the file just fine, and ignores the
"c c". ?So, my first question is, why does simply moving these data
down a row solve this problem?
>
> Next, I decided to try reading the file with the scan() function and it
worked perfectly:
>
> data.frame(scan(file="ftp://ftp.example.dta", what=list(a=0, b=0,
c=0, d=""), sep=" ", skip=1, flush=T, fill=T))
>
> I'm new to R, but as I understand it read.table() is based on the
scan() function. This makes me wonder if there is an additional argument I can
add to read.table() to make it import the file successfully, as scan() was able
to do. ?Any help in this regard would be very much appreciated. ?I'd also
really like to hear folks' perspectives on the merits of scan() versus
read.table() (e.g. when is scan() the best option?).
>
Read the header into nms and then the data into DF and then put them together:

con <- file("myfile.dat")
nms <- scan(con, what = "", nlines = 1)
DF <- read.table(con, fill = TRUE)
DF <- setNames(DF[seq_along(nms)], nms)

or just read it twice: first the one line of the header and then the data:

nms <- unlist(read.table("myfile.dat", nrows = 1))
DF <- read.table("myfile.dat", fill = TRUE, skip = 1)
DF <- setNames(DF[seq_along(nms)], nms)


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Peter Ehlers

2011-Jan-28 13:44 UTC

head link

[R] read.table() versus scan()

On 2011-01-27 20:23, H Roark wrote:>
> I need to import a large number of simple, space-delimited text files with
a few columns of data each. The one quirk is that some rows are missing data and
some contain junk text at the end of each line. A typical file might look like:
>
> a b c d
> 1 2 3 x
> 4 5 6
> 7 8 9 x
> 1 2 3 x c c
> 4 5 6 x
> 7 8 9 x
>
> I'm trying to avoid having to pre-process the text files, as they all
sit on an ftp site that I don't manage.  My initial approach was just to
read the files using a read.table() statement with the arguments flush and fill
set to TRUE. For example, to import the above text file I tried:
>
> read.table(file="ftp://ftp.example.dta", header=T,
row.names=NULL, fill=T, flush=T)
>
> However, R throws the error "more columns than column names" and
won't import the file.
>
> Interestingly, if I move the extra text "c c" from line 5 to line
6 in the data file, read.table() reads the file just fine, and ignores the
"c c".  So, my first question is, why does simply moving these data
down a row solve this problem?
>
Note this comment in the Details section of ?read.table:

    "The number of data columns is determined by looking
     at the first five lines of input ..."

Peter Ehlers
> Next, I decided to try reading the file with the scan() function and it
worked perfectly:
>
> data.frame(scan(file="ftp://ftp.example.dta", what=list(a=0, b=0,
c=0, d=""), sep=" ", skip=1, flush=T, fill=T))
>
> I'm new to R, but as I understand it read.table() is based on the
scan() function. This makes me wonder if there is an additional argument I can
add to read.table() to make it import the file successfully, as scan() was able
to do.  Any help in this regard would be very much appreciated.  I'd also
really like to hear folks' perspectives on the merits of scan() versus
read.table() (e.g. when is scan() the best option?).
>
> Cheers
>   		 	   		
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more maybe matching threads

R help - Jan 2011 - read.table() versus scan()

[R] read.table() versus scan()

[R] read.table() versus scan()

[R] read.table() versus scan()

[R] read.table() versus scan()

Maybe Matching Threads