Yong Wang
2007-May-22 12:04 UTC
[R] rewrite a data file use write.table(), count.fields() show different pattern, any suggestion appreciated.
Dear all: I read in a tab delimited dataset, and then write it out as another file as following: I did this simply to make sure I understand the behavior of this command. data<-read.table(file,header=F,sep="\t",fill=T,colClasses="character"); write.table(data,file="newdata.txt",eol="\n",sep="\t",quote=F,row.names=F); cf1 <- count.fields(newdata.txt, sep="\t") table(cf1) 13 17 23 10 126 5445 # is different to cf2 <- count.fields(file,sep="\t") 13 17 23 33 10 106 5433 32 the worst problem is the maximal value of cf1 (33) is larger than the maximal value of cf2 (23) which is the right number of fields for most rows in the original file. I need to use write.table for some important data manipulation work, your suggestion is highly appreciated. Best Regards
Prof Brian Ripley
2007-May-22 12:12 UTC
[R] rewrite a data file use write.table(), count.fields() show different pattern, any suggestion appreciated.
If you write out unquoted fields, how do you know they do not contain tabs? The default is quote=TRUE for a good reason. On Tue, 22 May 2007, Yong Wang wrote:> Dear all: > > I read in a tab delimited dataset, and then write it out as another > file as following: I did this simply to make sure I understand the > behavior of this command. > > data<-read.table(file,header=F,sep="\t",fill=T,colClasses="character"); > write.table(data,file="newdata.txt",eol="\n",sep="\t",quote=F,row.names=F); > > > cf1 <- count.fields(newdata.txt, sep="\t") > table(cf1) > 13 17 23 > 10 126 5445 > > # is different to > > cf2 <- count.fields(file,sep="\t") > 13 17 23 33 > 10 106 5433 32 > > the worst problem is the maximal value of cf1 (33) is larger than the > maximal value of cf2 (23) which is the right number of fields for most > rows in the original file. > > I need to use write.table for some important data manipulation work, > your suggestion is > highly appreciated. > > Best Regards > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Yong Wang
2007-May-22 14:30 UTC
[R] rewrite a data file use write.table(), count.fields() show different pattern, any suggestion appreciated.
Thank you for the suggestion, Dr. Ripley However, I am a little bit confused. My understanding is that you suspect the should-be-quoted fields (factor or character fields) contains tabs. if this is the case, count.fields() should detect the tab, read.table(sep="t\") should read with the same awareness, and if write.table(sep"\t") write and seperate with tab those fields as acknowldged by read.table(sep="t\"), the two field counts should be the same. anyway, I will try to redo it per your suggestion. Regards yong On 5/22/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:> If you write out unquoted fields, how do you know they do not contain > tabs? > > The default is quote=TRUE for a good reason. > > On Tue, 22 May 2007, Yong Wang wrote: > > > Dear all: > > > > I read in a tab delimited dataset, and then write it out as another > > file as following: I did this simply to make sure I understand the > > behavior of this command. > > > > data<-read.table(file,header=F,sep="\t",fill=T,colClasses="character"); > > write.table(data,file="newdata.txt",eol="\n",sep="\t",quote=F,row.names=F); > > > > > > cf1 <- count.fields(newdata.txt, sep="\t") > > table(cf1) > > 13 17 23 > > 10 126 5445 > > > > # is different to > > > > cf2 <- count.fields(file,sep="\t") > > 13 17 23 33 > > 10 106 5433 32 > > > > the worst problem is the maximal value of cf1 (33) is larger than the > > maximal value of cf2 (23) which is the right number of fields for most > > rows in the original file. > > > > I need to use write.table for some important data manipulation work, > > your suggestion is > > highly appreciated. > > > > Best Regards > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 >