Yong Wang
2011-Mar-16 16:37 UTC
[R] read.table() with "\t" as seperator, all other programs report equal fields each row, read.table() returns unequal row length error
hi, list R is undoudtedly my favorite statistic tool, however, the data inputnpart has long been a pain. most data I have to deal with are irregular and contains special character. Recently I get a tab delimited data, read.table(filename,sep="\t") constantly return erors for certain rows does not has xyz elements while all other programs such as perl,python, awk all report equal row length if use "\t" as seperator. I scout through the problematic row, sometimes it is because a row contains a "#", so I go back to specify comment.char="" next it will be some other problems, for some rows I simply can't figure out what the problem is. can I have any guru suggestion to save this pain now and in the future, is CSV a safer format? or can anyone let me know what are the fundamental principles I must bear in mind when do preliminary data processing using other programs such as perl to ensure the output can be readily feed into R. best yong
peter dalgaard
2011-Mar-16 19:20 UTC
[R] read.table() with "\t" as seperator, all other programs report equal fields each row, read.table() returns unequal row length error
On Mar 16, 2011, at 17:37 , Yong Wang wrote:> hi, list > > R is undoudtedly my favorite statistic tool, however, the data > inputnpart has long been a pain. most data I have to deal with are > irregular and contains special character. > > Recently I get a tab delimited data, read.table(filename,sep="\t") > constantly return erors for certain rows does not has xyz elements > while all other programs such as perl,python, awk all report equal row > length if use "\t" as seperator. > > I scout through the problematic row, sometimes it is because a row > contains a "#", so I go back to specify comment.char="" > next it will be some other problems, for some rows I simply can't > figure out what the problem is. > > can I have any guru suggestion to save this pain now and in the > future, is CSV a safer format? or can anyone let me know what are the > fundamental principles I must bear in mind when do preliminary data > processing using other programs such as perl to ensure the output can > be readily feed into R.A couple of other things can get messed up, e.g. quote symbols. Does read.delim()/read.delim2() perhaps work better? With CSV, you generally get the same sort of issues, just with "," instead of "\t".> > best > > yong > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Reasonably Related Threads
- merge two data frame based on equal and unequal comparisons
- Read in a all-character file and specify field separator and records separator
- How to do varimax rotation for principal component based factor analysis, any packages?
- Why a multi column, tab delimited file has only one column after reading in with read.table specification sep="\t"
- How to speed up the for loop by releasing memeory