thr3ads.net - R help - [R] read.table() with "\t" as seperator, all other programs report equal fields each row, read.table() returns unequal row length error [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Yong Wang

2011-Mar-16 16:37 UTC

[R] read.table() with "\t" as seperator, all other programs report equal fields each row, read.table() returns unequal row length error

hi, list

R is undoudtedly my favorite statistic tool, however, the data
inputnpart has long been a pain. most data I have to deal with are
irregular and contains special character.

Recently I get a tab delimited data, read.table(filename,sep="\t")
constantly return erors for certain rows does not has xyz elements
while all other programs such as perl,python, awk all report equal row
length if use "\t" as seperator.

I scout through the problematic row, sometimes it is because a row
contains a "#", so I go back to specify comment.char=""
next it will be some other problems, for some rows I simply can't
figure out what the problem is.

can I have any guru suggestion to save this pain now and in the
future, is CSV a safer format? or can anyone let me know what are the
fundamental principles I must bear in mind when do preliminary data
processing using other programs such as perl to ensure the output can
be readily feed into R.

best

yong

peter dalgaard

2011-Mar-16 19:20 UTC

head link

[R] read.table() with "\t" as seperator, all other programs report equal fields each row, read.table() returns unequal row length error

On Mar 16, 2011, at 17:37 , Yong Wang wrote:
> hi, list
> 
> R is undoudtedly my favorite statistic tool, however, the data
> inputnpart has long been a pain. most data I have to deal with are
> irregular and contains special character.
> 
> Recently I get a tab delimited data,
read.table(filename,sep="\t")
> constantly return erors for certain rows does not has xyz elements
> while all other programs such as perl,python, awk all report equal row
> length if use "\t" as seperator.
> 
> I scout through the problematic row, sometimes it is because a row
> contains a "#", so I go back to specify comment.char=""
> next it will be some other problems, for some rows I simply can't
> figure out what the problem is.
> 
> can I have any guru suggestion to save this pain now and in the
> future, is CSV a safer format? or can anyone let me know what are the
> fundamental principles I must bear in mind when do preliminary data
> processing using other programs such as perl to ensure the output can
> be readily feed into R.
A couple of other things can get messed up, e.g. quote symbols. Does
read.delim()/read.delim2() perhaps work better?

With CSV, you generally get the same sort of issues, just with ","
instead of "\t".

> 
> best
> 
> yong
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Maybe Matching Threads

Search for more maybe matching threads

R help - Mar 2011 - read.table() with "\t" as seperator, all other programs report equal fields each row, read.table() returns unequal row length error

[R] read.table() with "\t" as seperator, all other programs report equal fields each row, read.table() returns unequal row length error

[R] read.table() with "\t" as seperator, all other programs report equal fields each row, read.table() returns unequal row length error

Maybe Matching Threads