thr3ads.net - R help - [R] Largest N Values Efficiently? [Nov 2007]

If this information is useful, please help other people find it:
Share via:

David Katz

2007-Nov-11 23:43 UTC

[R] Largest N Values Efficiently?

What is the most efficient alternative to x[order(x)][1:n] where
length(x)>>n?
I also need the positions of the mins/maxs perhaps by preserving names.

 Thanks for any suggestions.
-- 
View this message in context:
http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13697535
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

2007-Nov-11 23:53 UTC

head link

[R] Largest N Values Efficiently?

You can apply 1:n to order(x) so you don't wind up
subscripting x by every element in order(x).

o <- head(order(x), n) # positions
x[o]

A completely different approach, if X is a data frame with d as a data
column is this where the row names give the positions (don't know
about speed):
> library(sqldf)
> n <- 3
> X <- data.frame(d = c(5, 4, 6, 3, 10, 1, 12, 2))
> sqldf(paste("select * from X order by d limit", n), row.names =
TRUE)  d
6 1
8 2
4 3

On Nov 11, 2007 6:43 PM, David Katz <david at davidkatzconsulting.com>
wrote:>
> What is the most efficient alternative to x[order(x)][1:n] where
> length(x)>>n?
> I also need the positions of the mins/maxs perhaps by preserving names.
>
>  Thanks for any suggestions.
> --
> View this message in context:
http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13697535
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

jim holtman

2007-Nov-12 03:46 UTC

head link

[R] Largest N Values Efficiently?

Try:

sort(x, partial=n)[1:n]

On Nov 11, 2007 6:43 PM, David Katz <david at davidkatzconsulting.com>
wrote:>
> What is the most efficient alternative to x[order(x)][1:n] where
> length(x)>>n?
> I also need the positions of the mins/maxs perhaps by preserving names.
>
>  Thanks for any suggestions.
> --
> View this message in context:
http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13697535
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Prof Brian Ripley

2007-Nov-12 06:56 UTC

head link

[R] Largest N Values Efficiently?

What is 'x' here?  What type?  Does it contain NAs?  Are there ties? 
R's
ordering functions are rather general, and you can gain efficiency by 
ruling some of these out.

See ?sort, look at the 'partial' argument, including the comments in the
Details.  And also look at ?sort.list.

sort.int(x) is more efficient than x[order(x)], and x[order(x)[1:n]] is 
more efficient than x[order(x)][1:n] for most types.

Finally, does efficiency matter?  As the examples in ?sort show, R can 
sort a vector of length 2000 is well under 1ms, and 1e7 random normals in 
less time than they take to generate.  There are not many tasks where 
gaining efficiency over x[order(x)][1:n] will be important.  E.g.
> system.time(x <- rnorm(1e6))    user  system elapsed
    0.44    0.00    0.44> system.time(x[order(x)][1:4])    user  system elapsed
    1.72    0.00    1.72> system.time(x2 <- sort.int(x, method = "quick")[1:4])    user  system elapsed
    0.31    0.00    0.32> system.time(min(x))    user  system elapsed
    0.02    0.00    0.02> system.time(x2 <- sort.int(x, partial=1)[1])    user  system elapsed
    0.07    0.00    0.07

and do savings of tenths of a second matter?  (There is also 
quantreg::kselect, if you work out how to use it, which apparently is 
a bit faster at partial sorting on MacOS X but not elsewhere.)

On Sun, 11 Nov 2007, David Katz wrote:
>
> What is the most efficient alternative to x[order(x)][1:n] where
> length(x)>>n?
That is the smallest n values, pace your subject line.
> I also need the positions of the mins/maxs perhaps by preserving names.
>
> Thanks for any suggestions.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Frede Aakmann Tøgersen

2007-Nov-12 07:35 UTC

head link

[R] Largest N Values Efficiently?

On my system
>system.time(x1 <- sort(x,decreasing=TRUE)[1:1000])   user  system elapsed 
   0.03    0.00    0.03 

whereas
> system.time(x1 <- x[order(x)][1:1000])   user  system elapsed 
   0.11    0.00    0.11 


I.e. using sort is about 30 times faster.


Best regards

Frede Aakmann T?gersen
Scientist


UNIVERSITY OF AARHUS
Faculty of Agricultural Sciences
Dept. of Genetics and Biotechnology
Blichers All? 20, P.O. BOX 50
DK-8830 Tjele

Phone:   +45 8999 1900
Direct:  +45 8999 1878

E-mail:  FredeA.Togersen at agrsci.dk
Web:	   http://www.agrsci.org				

This email may contain information that is confidential.
Any use or publication of this email without written permission from Faculty of
Agricultural Sciences is not allowed.
If you are not the intended recipient, please notify Faculty of Agricultural
Sciences immediately and delete this email.


 
> -----Oprindelig meddelelse-----
> Fra: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] P? vegne af David Katz
> Sendt: 12. november 2007 00:44
> Til: r-help at r-project.org
> Emne: [R] Largest N Values Efficiently?
> 
> 
> What is the most efficient alternative to x[order(x)][1:n] 
> where length(x)>>n?
> I also need the positions of the mins/maxs perhaps by 
> preserving names.
> 
>  Thanks for any suggestions.
> --
> View this message in context: 
> http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.
> html#a13697535
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Nov 2007 - Largest N Values Efficiently?

[R] Largest N Values Efficiently?

[R] Largest N Values Efficiently?

[R] Largest N Values Efficiently?

[R] Largest N Values Efficiently?

[R] Largest N Values Efficiently?

Possibly Parallel Threads