thr3ads.net - R help - [R] Output of order() incorrectly ordered? [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Shirley Wu

2008-Mar-25 02:13 UTC

[R] Output of order() incorrectly ordered?

Hello,

I have a data frame consisting of four columns and would like to sort  
based on the first column and then write the sorted data frame to a  
file.

 > df <- read.table("file.txt", sep="\t")
where file.txt is simply a tab-delimited file containing 4 columns of  
data (first 2 numeric, second 2 character). I then do,

 > df_ordered <- df[order(df$V1), ]

OR, I assume equivalently,

 > df_ordered <- df[ do.call(order, df), ]

and then,

 > write.table(df_ordered, file="newfile.txt", ...)

The input data file looks like this:

0.083044        375.276 680220  majority
5.50816e-09     2.48914e-05     26377   conformation
0.000169618     0.766505        1546938 interaction
3.90425e-05     0.176433        1655338 vitamin
0.0378182       170.9   1510941 array
3.00359e-07     0.00135732      69421   oligo(dT)-cellulose
1.01517e-13     4.58754e-10     699918  elastase
...

I'd like the output file to look the same except sorted by the first  
column. The output of the commands above give me something that is  
sorted in some places but not sorted in others:

[sorted section]
...
1.87276e-07     0.000846299     1142090 vitamin K
1.89026e-07     0.000854207     917889  leader peptide
1.90884e-07     0.000862605     31206   s
0.00536062      24.2246 1706420 prevent
5.42648e-05     0.245223        1513041 measured
5.42648e-05     0.245223        1513040 measured
0.019939        90.1044 12578   fly
0.00135512      6.12377 61688   GPI
0.00124421      5.62257 681915  content
0.0128271       57.9655 681916  estimated
...
[sorted section]
...
[unsorted section]
...
[etc]

I'm not sure if this is a problem with the input data or with order()  
or what. I am only doing this in R because many of my numeric values  
are expressed in exponential notation and UNIX sort does not handle  
this to my knowledge, but this behavior baffles me. I am pretty new  
to R so it's possible I'm missing something.

Any insight would be greatly appreciated!

Thanks,
-Shirley
graduate student
Stanford University

jim holtman

2008-Mar-25 09:20 UTC

head link

[R] Output of order() incorrectly ordered?

works fine by me with the data you supplied:
> x           V1          V2      V3                  V4
1 8.30440e-02 3.75276e+02  680220            majority
2 5.50816e-09 2.48914e-05   26377        conformation
3 1.69618e-04 7.66505e-01 1546938         interaction
4 3.90425e-05 1.76433e-01 1655338             vitamin
5 3.78182e-02 1.70900e+02 1510941               array
6 3.00359e-07 1.35732e-03   69421 oligo(dT)-cellulose
7 1.01517e-13 4.58754e-10  699918            elastase> x[order(x$V1),]           V1          V2      V3                  V4
7 1.01517e-13 4.58754e-10  699918            elastase
2 5.50816e-09 2.48914e-05   26377        conformation
6 3.00359e-07 1.35732e-03   69421 oligo(dT)-cellulose
4 3.90425e-05 1.76433e-01 1655338             vitamin
3 1.69618e-04 7.66505e-01 1546938         interaction
5 3.78182e-02 1.70900e+02 1510941               array
1 8.30440e-02 3.75276e+02  680220            majority>
BTW, these two are not equivalent:

 > df_ordered <- df[order(df$V1), ]

OR, I assume equivalently,

 > df_ordered <- df[ do.call(order, df), ]

since you did not specify the column in the second case; you did not
indicate exactly which one was giving you problems.


On Mon, Mar 24, 2008 at 9:13 PM, Shirley Wu <shwu19 at stanford.edu>
wrote:> Hello,
>
> I have a data frame consisting of four columns and would like to sort
> based on the first column and then write the sorted data frame to a
> file.
>
>  > df <- read.table("file.txt", sep="\t")
> where file.txt is simply a tab-delimited file containing 4 columns of
> data (first 2 numeric, second 2 character). I then do,
>
>  > df_ordered <- df[order(df$V1), ]
>
> OR, I assume equivalently,
>
>  > df_ordered <- df[ do.call(order, df), ]
>
> and then,
>
>  > write.table(df_ordered, file="newfile.txt", ...)
>
> The input data file looks like this:
>
> 0.083044        375.276 680220  majority
> 5.50816e-09     2.48914e-05     26377   conformation
> 0.000169618     0.766505        1546938 interaction
> 3.90425e-05     0.176433        1655338 vitamin
> 0.0378182       170.9   1510941 array
> 3.00359e-07     0.00135732      69421   oligo(dT)-cellulose
> 1.01517e-13     4.58754e-10     699918  elastase
> ...
>
> I'd like the output file to look the same except sorted by the first
> column. The output of the commands above give me something that is
> sorted in some places but not sorted in others:
>
> [sorted section]
> ...
> 1.87276e-07     0.000846299     1142090 vitamin K
> 1.89026e-07     0.000854207     917889  leader peptide
> 1.90884e-07     0.000862605     31206   s
> 0.00536062      24.2246 1706420 prevent
> 5.42648e-05     0.245223        1513041 measured
> 5.42648e-05     0.245223        1513040 measured
> 0.019939        90.1044 12578   fly
> 0.00135512      6.12377 61688   GPI
> 0.00124421      5.62257 681915  content
> 0.0128271       57.9655 681916  estimated
> ...
> [sorted section]
> ...
> [unsorted section]
> ...
> [etc]
>
> I'm not sure if this is a problem with the input data or with order()
> or what. I am only doing this in R because many of my numeric values
> are expressed in exponential notation and UNIX sort does not handle
> this to my knowledge, but this behavior baffles me. I am pretty new
> to R so it's possible I'm missing something.
>
> Any insight would be greatly appreciated!
>
> Thanks,
> -Shirley
> graduate student
> Stanford University
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Paul Hiemstra

2008-Mar-25 10:06 UTC

head link

[R] Output of order() incorrectly ordered?

Hi Shirley,

You can use the function sort_df() from the reshape package to sort an 
entire data.frame based on one column.

cheers,
Paul

Shirley Wu wrote:> Hello,
>
> I have a data frame consisting of four columns and would like to sort  
> based on the first column and then write the sorted data frame to a  
> file.
>
>  > df <- read.table("file.txt", sep="\t")
> where file.txt is simply a tab-delimited file containing 4 columns of  
> data (first 2 numeric, second 2 character). I then do,
>
>  > df_ordered <- df[order(df$V1), ]
>
> OR, I assume equivalently,
>
>  > df_ordered <- df[ do.call(order, df), ]
>
> and then,
>
>  > write.table(df_ordered, file="newfile.txt", ...)
>
> The input data file looks like this:
>
> 0.083044        375.276 680220  majority
> 5.50816e-09     2.48914e-05     26377   conformation
> 0.000169618     0.766505        1546938 interaction
> 3.90425e-05     0.176433        1655338 vitamin
> 0.0378182       170.9   1510941 array
> 3.00359e-07     0.00135732      69421   oligo(dT)-cellulose
> 1.01517e-13     4.58754e-10     699918  elastase
> ...
>
> I'd like the output file to look the same except sorted by the first  
> column. The output of the commands above give me something that is  
> sorted in some places but not sorted in others:
>
> [sorted section]
> ...
> 1.87276e-07     0.000846299     1142090 vitamin K
> 1.89026e-07     0.000854207     917889  leader peptide
> 1.90884e-07     0.000862605     31206   s
> 0.00536062      24.2246 1706420 prevent
> 5.42648e-05     0.245223        1513041 measured
> 5.42648e-05     0.245223        1513040 measured
> 0.019939        90.1044 12578   fly
> 0.00135512      6.12377 61688   GPI
> 0.00124421      5.62257 681915  content
> 0.0128271       57.9655 681916  estimated
> ...
> [sorted section]
> ...
> [unsorted section]
> ...
> [etc]
>
> I'm not sure if this is a problem with the input data or with order()  
> or what. I am only doing this in R because many of my numeric values  
> are expressed in exponential notation and UNIX sort does not handle  
> this to my knowledge, but this behavior baffles me. I am pretty new  
> to R so it's possible I'm missing something.
>
> Any insight would be greatly appreciated!
>
> Thanks,
> -Shirley
> graduate student
> Stanford University
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   

-- 
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone: 	+31302535773
Fax:	+31302531145
http://intamap.geo.uu.nl/~paul

Charles C. Berry

2008-Mar-25 17:03 UTC

head link

[R] Output of order() incorrectly ordered?

On Mon, 24 Mar 2008, Shirley Wu wrote:
> Hello,
>
> I have a data frame consisting of four columns and would like to sort
> based on the first column and then write the sorted data frame to a
> file.
>
> > df <- read.table("file.txt", sep="\t")
> where file.txt is simply a tab-delimited file containing 4 columns of
> data (first 2 numeric, second 2 character). I then do,
>
> > df_ordered <- df[order(df$V1), ]
>
> OR, I assume equivalently,
>
> > df_ordered <- df[ do.call(order, df), ]
>
> and then,
>
> > write.table(df_ordered, file="newfile.txt", ...)
>
> The input data file looks like this:
>
> 0.083044        375.276 680220  majority
> 5.50816e-09     2.48914e-05     26377   conformation
> 0.000169618     0.766505        1546938 interaction
> 3.90425e-05     0.176433        1655338 vitamin
> 0.0378182       170.9   1510941 array
> 3.00359e-07     0.00135732      69421   oligo(dT)-cellulose
> 1.01517e-13     4.58754e-10     699918  elastase
> ...
>
> I'd like the output file to look the same except sorted by the first
> column. The output of the commands above give me something that is
> sorted in some places but not sorted in others:
>
> [sorted section]
> ...
> 1.87276e-07     0.000846299     1142090 vitamin K
> 1.89026e-07     0.000854207     917889  leader peptide
> 1.90884e-07     0.000862605     31206   s
> 0.00536062      24.2246 1706420 prevent
> 5.42648e-05     0.245223        1513041 measured
> 5.42648e-05     0.245223        1513040 measured
> 0.019939        90.1044 12578   fly
> 0.00135512      6.12377 61688   GPI
> 0.00124421      5.62257 681915  content
> 0.0128271       57.9655 681916  estimated
> ...
> [sorted section]
> ...
> [unsorted section]
> ...
> [etc]
>
> I'm not sure if this is a problem with the input data or with order()
> or what. I am only doing this in R because many of my numeric values
> are expressed in exponential notation and UNIX sort does not handle
> this to my knowledge, but this behavior baffles me. I am pretty new
> to R so it's possible I'm missing something.
>
> Any insight would be greatly appreciated!
I suspect that the first column contains something that cannot be rendered 
as a numeric value. Probably you have leading blanks before some of the 
numbers.

The result is that the first column is a factor, which will be ordered 
according to the character collating sequence in your locale after 
coercing it to character. (I am guessing here, but it appears to do this 
on my PC.)

Try

df <- read.table("file.txt", sep="\t", strip.white=TRUE )

and see if the ordering agrees with your intuition.

HTH,

Chuck

>
> Thanks,
> -Shirley
> graduate student
> Stanford University
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Maybe Matching Threads

Slow read performance

R help - Mar 2008 - Output of order() incorrectly ordered?

[R] Output of order() incorrectly ordered?

[R] Output of order() incorrectly ordered?

[R] Output of order() incorrectly ordered?

[R] Output of order() incorrectly ordered?

Maybe Matching Threads