Hello, I have a data frame consisting of four columns and would like to sort based on the first column and then write the sorted data frame to a file. > df <- read.table("file.txt", sep="\t") where file.txt is simply a tab-delimited file containing 4 columns of data (first 2 numeric, second 2 character). I then do, > df_ordered <- df[order(df$V1), ] OR, I assume equivalently, > df_ordered <- df[ do.call(order, df), ] and then, > write.table(df_ordered, file="newfile.txt", ...) The input data file looks like this: 0.083044 375.276 680220 majority 5.50816e-09 2.48914e-05 26377 conformation 0.000169618 0.766505 1546938 interaction 3.90425e-05 0.176433 1655338 vitamin 0.0378182 170.9 1510941 array 3.00359e-07 0.00135732 69421 oligo(dT)-cellulose 1.01517e-13 4.58754e-10 699918 elastase ... I'd like the output file to look the same except sorted by the first column. The output of the commands above give me something that is sorted in some places but not sorted in others: [sorted section] ... 1.87276e-07 0.000846299 1142090 vitamin K 1.89026e-07 0.000854207 917889 leader peptide 1.90884e-07 0.000862605 31206 s 0.00536062 24.2246 1706420 prevent 5.42648e-05 0.245223 1513041 measured 5.42648e-05 0.245223 1513040 measured 0.019939 90.1044 12578 fly 0.00135512 6.12377 61688 GPI 0.00124421 5.62257 681915 content 0.0128271 57.9655 681916 estimated ... [sorted section] ... [unsorted section] ... [etc] I'm not sure if this is a problem with the input data or with order() or what. I am only doing this in R because many of my numeric values are expressed in exponential notation and UNIX sort does not handle this to my knowledge, but this behavior baffles me. I am pretty new to R so it's possible I'm missing something. Any insight would be greatly appreciated! Thanks, -Shirley graduate student Stanford University
works fine by me with the data you supplied:> xV1 V2 V3 V4 1 8.30440e-02 3.75276e+02 680220 majority 2 5.50816e-09 2.48914e-05 26377 conformation 3 1.69618e-04 7.66505e-01 1546938 interaction 4 3.90425e-05 1.76433e-01 1655338 vitamin 5 3.78182e-02 1.70900e+02 1510941 array 6 3.00359e-07 1.35732e-03 69421 oligo(dT)-cellulose 7 1.01517e-13 4.58754e-10 699918 elastase> x[order(x$V1),]V1 V2 V3 V4 7 1.01517e-13 4.58754e-10 699918 elastase 2 5.50816e-09 2.48914e-05 26377 conformation 6 3.00359e-07 1.35732e-03 69421 oligo(dT)-cellulose 4 3.90425e-05 1.76433e-01 1655338 vitamin 3 1.69618e-04 7.66505e-01 1546938 interaction 5 3.78182e-02 1.70900e+02 1510941 array 1 8.30440e-02 3.75276e+02 680220 majority>BTW, these two are not equivalent: > df_ordered <- df[order(df$V1), ] OR, I assume equivalently, > df_ordered <- df[ do.call(order, df), ] since you did not specify the column in the second case; you did not indicate exactly which one was giving you problems. On Mon, Mar 24, 2008 at 9:13 PM, Shirley Wu <shwu19 at stanford.edu> wrote:> Hello, > > I have a data frame consisting of four columns and would like to sort > based on the first column and then write the sorted data frame to a > file. > > > df <- read.table("file.txt", sep="\t") > where file.txt is simply a tab-delimited file containing 4 columns of > data (first 2 numeric, second 2 character). I then do, > > > df_ordered <- df[order(df$V1), ] > > OR, I assume equivalently, > > > df_ordered <- df[ do.call(order, df), ] > > and then, > > > write.table(df_ordered, file="newfile.txt", ...) > > The input data file looks like this: > > 0.083044 375.276 680220 majority > 5.50816e-09 2.48914e-05 26377 conformation > 0.000169618 0.766505 1546938 interaction > 3.90425e-05 0.176433 1655338 vitamin > 0.0378182 170.9 1510941 array > 3.00359e-07 0.00135732 69421 oligo(dT)-cellulose > 1.01517e-13 4.58754e-10 699918 elastase > ... > > I'd like the output file to look the same except sorted by the first > column. The output of the commands above give me something that is > sorted in some places but not sorted in others: > > [sorted section] > ... > 1.87276e-07 0.000846299 1142090 vitamin K > 1.89026e-07 0.000854207 917889 leader peptide > 1.90884e-07 0.000862605 31206 s > 0.00536062 24.2246 1706420 prevent > 5.42648e-05 0.245223 1513041 measured > 5.42648e-05 0.245223 1513040 measured > 0.019939 90.1044 12578 fly > 0.00135512 6.12377 61688 GPI > 0.00124421 5.62257 681915 content > 0.0128271 57.9655 681916 estimated > ... > [sorted section] > ... > [unsorted section] > ... > [etc] > > I'm not sure if this is a problem with the input data or with order() > or what. I am only doing this in R because many of my numeric values > are expressed in exponential notation and UNIX sort does not handle > this to my knowledge, but this behavior baffles me. I am pretty new > to R so it's possible I'm missing something. > > Any insight would be greatly appreciated! > > Thanks, > -Shirley > graduate student > Stanford University > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Hi Shirley, You can use the function sort_df() from the reshape package to sort an entire data.frame based on one column. cheers, Paul Shirley Wu wrote:> Hello, > > I have a data frame consisting of four columns and would like to sort > based on the first column and then write the sorted data frame to a > file. > > > df <- read.table("file.txt", sep="\t") > where file.txt is simply a tab-delimited file containing 4 columns of > data (first 2 numeric, second 2 character). I then do, > > > df_ordered <- df[order(df$V1), ] > > OR, I assume equivalently, > > > df_ordered <- df[ do.call(order, df), ] > > and then, > > > write.table(df_ordered, file="newfile.txt", ...) > > The input data file looks like this: > > 0.083044 375.276 680220 majority > 5.50816e-09 2.48914e-05 26377 conformation > 0.000169618 0.766505 1546938 interaction > 3.90425e-05 0.176433 1655338 vitamin > 0.0378182 170.9 1510941 array > 3.00359e-07 0.00135732 69421 oligo(dT)-cellulose > 1.01517e-13 4.58754e-10 699918 elastase > ... > > I'd like the output file to look the same except sorted by the first > column. The output of the commands above give me something that is > sorted in some places but not sorted in others: > > [sorted section] > ... > 1.87276e-07 0.000846299 1142090 vitamin K > 1.89026e-07 0.000854207 917889 leader peptide > 1.90884e-07 0.000862605 31206 s > 0.00536062 24.2246 1706420 prevent > 5.42648e-05 0.245223 1513041 measured > 5.42648e-05 0.245223 1513040 measured > 0.019939 90.1044 12578 fly > 0.00135512 6.12377 61688 GPI > 0.00124421 5.62257 681915 content > 0.0128271 57.9655 681916 estimated > ... > [sorted section] > ... > [unsorted section] > ... > [etc] > > I'm not sure if this is a problem with the input data or with order() > or what. I am only doing this in R because many of my numeric values > are expressed in exponential notation and UNIX sort does not handle > this to my knowledge, but this behavior baffles me. I am pretty new > to R so it's possible I'm missing something. > > Any insight would be greatly appreciated! > > Thanks, > -Shirley > graduate student > Stanford University > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +31302535773 Fax: +31302531145 http://intamap.geo.uu.nl/~paul
On Mon, 24 Mar 2008, Shirley Wu wrote:> Hello, > > I have a data frame consisting of four columns and would like to sort > based on the first column and then write the sorted data frame to a > file. > > > df <- read.table("file.txt", sep="\t") > where file.txt is simply a tab-delimited file containing 4 columns of > data (first 2 numeric, second 2 character). I then do, > > > df_ordered <- df[order(df$V1), ] > > OR, I assume equivalently, > > > df_ordered <- df[ do.call(order, df), ] > > and then, > > > write.table(df_ordered, file="newfile.txt", ...) > > The input data file looks like this: > > 0.083044 375.276 680220 majority > 5.50816e-09 2.48914e-05 26377 conformation > 0.000169618 0.766505 1546938 interaction > 3.90425e-05 0.176433 1655338 vitamin > 0.0378182 170.9 1510941 array > 3.00359e-07 0.00135732 69421 oligo(dT)-cellulose > 1.01517e-13 4.58754e-10 699918 elastase > ... > > I'd like the output file to look the same except sorted by the first > column. The output of the commands above give me something that is > sorted in some places but not sorted in others: > > [sorted section] > ... > 1.87276e-07 0.000846299 1142090 vitamin K > 1.89026e-07 0.000854207 917889 leader peptide > 1.90884e-07 0.000862605 31206 s > 0.00536062 24.2246 1706420 prevent > 5.42648e-05 0.245223 1513041 measured > 5.42648e-05 0.245223 1513040 measured > 0.019939 90.1044 12578 fly > 0.00135512 6.12377 61688 GPI > 0.00124421 5.62257 681915 content > 0.0128271 57.9655 681916 estimated > ... > [sorted section] > ... > [unsorted section] > ... > [etc] > > I'm not sure if this is a problem with the input data or with order() > or what. I am only doing this in R because many of my numeric values > are expressed in exponential notation and UNIX sort does not handle > this to my knowledge, but this behavior baffles me. I am pretty new > to R so it's possible I'm missing something. > > Any insight would be greatly appreciated!I suspect that the first column contains something that cannot be rendered as a numeric value. Probably you have leading blanks before some of the numbers. The result is that the first column is a factor, which will be ordered according to the character collating sequence in your locale after coercing it to character. (I am guessing here, but it appears to do this on my PC.) Try df <- read.table("file.txt", sep="\t", strip.white=TRUE ) and see if the ordering agrees with your intuition. HTH, Chuck> > Thanks, > -Shirley > graduate student > Stanford University > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901