What is the most efficient alternative to x[order(x)][1:n] where length(x)>>n? I also need the positions of the mins/maxs perhaps by preserving names. Thanks for any suggestions. -- View this message in context: http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13697535 Sent from the R help mailing list archive at Nabble.com.
You can apply 1:n to order(x) so you don't wind up subscripting x by every element in order(x). o <- head(order(x), n) # positions x[o] A completely different approach, if X is a data frame with d as a data column is this where the row names give the positions (don't know about speed):> library(sqldf) > n <- 3 > X <- data.frame(d = c(5, 4, 6, 3, 10, 1, 12, 2)) > sqldf(paste("select * from X order by d limit", n), row.names = TRUE)d 6 1 8 2 4 3 On Nov 11, 2007 6:43 PM, David Katz <david at davidkatzconsulting.com> wrote:> > What is the most efficient alternative to x[order(x)][1:n] where > length(x)>>n? > I also need the positions of the mins/maxs perhaps by preserving names. > > Thanks for any suggestions. > -- > View this message in context: http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13697535 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Try: sort(x, partial=n)[1:n] On Nov 11, 2007 6:43 PM, David Katz <david at davidkatzconsulting.com> wrote:> > What is the most efficient alternative to x[order(x)][1:n] where > length(x)>>n? > I also need the positions of the mins/maxs perhaps by preserving names. > > Thanks for any suggestions. > -- > View this message in context: http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13697535 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
What is 'x' here? What type? Does it contain NAs? Are there ties? R's ordering functions are rather general, and you can gain efficiency by ruling some of these out. See ?sort, look at the 'partial' argument, including the comments in the Details. And also look at ?sort.list. sort.int(x) is more efficient than x[order(x)], and x[order(x)[1:n]] is more efficient than x[order(x)][1:n] for most types. Finally, does efficiency matter? As the examples in ?sort show, R can sort a vector of length 2000 is well under 1ms, and 1e7 random normals in less time than they take to generate. There are not many tasks where gaining efficiency over x[order(x)][1:n] will be important. E.g.> system.time(x <- rnorm(1e6))user system elapsed 0.44 0.00 0.44> system.time(x[order(x)][1:4])user system elapsed 1.72 0.00 1.72> system.time(x2 <- sort.int(x, method = "quick")[1:4])user system elapsed 0.31 0.00 0.32> system.time(min(x))user system elapsed 0.02 0.00 0.02> system.time(x2 <- sort.int(x, partial=1)[1])user system elapsed 0.07 0.00 0.07 and do savings of tenths of a second matter? (There is also quantreg::kselect, if you work out how to use it, which apparently is a bit faster at partial sorting on MacOS X but not elsewhere.) On Sun, 11 Nov 2007, David Katz wrote:> > What is the most efficient alternative to x[order(x)][1:n] where > length(x)>>n?That is the smallest n values, pace your subject line.> I also need the positions of the mins/maxs perhaps by preserving names. > > Thanks for any suggestions. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On my system>system.time(x1 <- sort(x,decreasing=TRUE)[1:1000])user system elapsed 0.03 0.00 0.03 whereas> system.time(x1 <- x[order(x)][1:1000])user system elapsed 0.11 0.00 0.11 I.e. using sort is about 30 times faster. Best regards Frede Aakmann T?gersen Scientist UNIVERSITY OF AARHUS Faculty of Agricultural Sciences Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50 DK-8830 Tjele Phone: +45 8999 1900 Direct: +45 8999 1878 E-mail: FredeA.Togersen at agrsci.dk Web: http://www.agrsci.org This email may contain information that is confidential. Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed. If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email.> -----Oprindelig meddelelse----- > Fra: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] P? vegne af David Katz > Sendt: 12. november 2007 00:44 > Til: r-help at r-project.org > Emne: [R] Largest N Values Efficiently? > > > What is the most efficient alternative to x[order(x)][1:n] > where length(x)>>n? > I also need the positions of the mins/maxs perhaps by > preserving names. > > Thanks for any suggestions. > -- > View this message in context: > http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033. > html#a13697535 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >