Kevin Ummel
2011-Apr-14 19:34 UTC
[R] Find number of elements less than some number: Elegant/fast solution needed
Take vector x and a subset y: x=1:10 y=c(4,5,7,9) For each value in 'x', I want to know how many elements in 'y' are less than 'x'. An example would be: sapply(x,FUN=function(i) {length(which(y<i))}) [1] 0 0 0 0 1 2 2 3 3 4 But this solution is far too slow when x and y have lengths in the millions. I'm certain an elegant (and computationally efficient) solution exists, but I'm in the weeds at this point. Any help is much appreciated. Kevin University of Manchester Take two vectors x and y, where y is a subset of x: x=1:10 y=c(2,5,6,9) If y is removed from x, the original x values now have a new placement (index) in the resulting vector (new): new=x[-y] index=1:length(new) The challenge is: How can I *quickly* and *efficiently* deduce the new 'index' value directly from the original 'x' value -- using only 'y' as an input? In practice, I have very large matrices containing the 'x' values, and I need to convert them to the corresponding 'index' if the 'y' values are removed. [[alternative HTML version deleted]]
Richard M. Heiberger
2011-Apr-14 20:35 UTC
[R] Find number of elements less than some number: Elegant/fast solution needed
This is faster for small vectors> system.time(for (i in 1:1000)+ colSums(outer(y, x, `<`)) + ) user system elapsed 0.11 0.00 0.11> system.time(for (i in 1:1000)+ sapply(x,FUN=function(i) {length(which(y<i))}) + ) user system elapsed 0.14 0.00 0.16> > colSums(outer(y, x, `<`))[1] 0 0 0 0 1 2 2 3 3 4> sapply(x,FUN=function(i) {length(which(y<i))})[1] 0 0 0 0 1 2 2 3 3 4>Rich On Thu, Apr 14, 2011 at 3:34 PM, Kevin Ummel <kevinummel@gmail.com> wrote:> Take vector x and a subset y: > > x=1:10 > > y=c(4,5,7,9) > > For each value in 'x', I want to know how many elements in 'y' are less > than 'x'. > > An example would be: > > sapply(x,FUN=function(i) {length(which(y<i))}) > [1] 0 0 0 0 1 2 2 3 3 4 > > But this solution is far too slow when x and y have lengths in the > millions. > > I'm certain an elegant (and computationally efficient) solution exists, but > I'm in the weeds at this point. > > Any help is much appreciated. > > Kevin > > University of Manchester > > > > > > > > > Take two vectors x and y, where y is a subset of x: > > x=1:10 > > y=c(2,5,6,9) > > If y is removed from x, the original x values now have a new placement > (index) in the resulting vector (new): > > new=x[-y] > > index=1:length(new) > > The challenge is: How can I *quickly* and *efficiently* deduce the new > 'index' value directly from the original 'x' value -- using only 'y' as an > input? > > In practice, I have very large matrices containing the 'x' values, and I > need to convert them to the corresponding 'index' if the 'y' values are > removed. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Marc Schwartz
2011-Apr-14 20:37 UTC
[R] Find number of elements less than some number: Elegant/fast solution needed
On Apr 14, 2011, at 2:34 PM, Kevin Ummel wrote:> Take vector x and a subset y: > > x=1:10 > > y=c(4,5,7,9) > > For each value in 'x', I want to know how many elements in 'y' are less than 'x'. > > An example would be: > > sapply(x,FUN=function(i) {length(which(y<i))}) > [1] 0 0 0 0 1 2 2 3 3 4 > > But this solution is far too slow when x and y have lengths in the millions. > > I'm certain an elegant (and computationally efficient) solution exists, but I'm in the weeds at this point. > > Any help is much appreciated. > > Kevin > > University of Manchester >I started working on a solution to your problem above and then noted the one below. Here is one approach to the above:> colSums(outer(y, x, "<"))[1] 0 0 0 0 1 2 2 3 3 4> > Take two vectors x and y, where y is a subset of x: > > x=1:10 > > y=c(2,5,6,9) > > If y is removed from x, the original x values now have a new placement (index) in the resulting vector (new): > > new=x[-y] > > index=1:length(new) > > The challenge is: How can I *quickly* and *efficiently* deduce the new 'index' value directly from the original 'x' value -- using only 'y' as an input? > > In practice, I have very large matrices containing the 'x' values, and I need to convert them to the corresponding 'index' if the 'y' values are removed.Something like the following might work, if I correctly understand the problem:> match(x, x[-y])[1] 1 NA 2 3 NA NA 4 5 NA 6 HTH, Marc Schwartz
William Dunlap
2011-Apr-14 21:45 UTC
[R] Find number of elements less than some number: Elegant/fastsolution needed
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Kevin Ummel > Sent: Thursday, April 14, 2011 12:35 PM > To: r-help at r-project.org > Subject: [R] Find number of elements less than some number: > Elegant/fastsolution needed > > Take vector x and a subset y: > > x=1:10 > > y=c(4,5,7,9) > > For each value in 'x', I want to know how many elements in > 'y' are less than 'x'. > > An example would be: > > sapply(x,FUN=function(i) {length(which(y<i))}) > [1] 0 0 0 0 1 2 2 3 3 4 > > But this solution is far too slow when x and y have lengths > in the millions. > > I'm certain an elegant (and computationally efficient) > solution exists, but I'm in the weeds at this point.If x is sorted findInterval(x, y) would do it for <= but you want strict <. Try f2 <- function(x, y) length(y) - findInterval(-x, rev(-sort(y))) where your version is f0 <- function(x, y) sapply(x,FUN=function(i) {length(which(y<i))}) E.g., > x <- sort(sample(1e6, size=1e5)) > y <- sample(1e6, size=1e4, replace=TRUE) > system.time(r0 <- f0(x,y)) user system elapsed 7.900 0.310 8.211 > system.time(r2 <- f2(x,y)) user system elapsed 0.000 0.000 0.004 > identical(r0, r2) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> > Any help is much appreciated. > > Kevin > > University of Manchester > > > > > > > > > Take two vectors x and y, where y is a subset of x: > > x=1:10 > > y=c(2,5,6,9) > > If y is removed from x, the original x values now have a new > placement (index) in the resulting vector (new): > > new=x[-y] > > index=1:length(new) > > The challenge is: How can I *quickly* and *efficiently* > deduce the new 'index' value directly from the original 'x' > value -- using only 'y' as an input? > > In practice, I have very large matrices containing the 'x' > values, and I need to convert them to the corresponding > 'index' if the 'y' values are removed. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Seemingly Similar Threads
- [Rcpp-devel] Find number of elements less than some number: Elegant/fastsolution needed
- Match numeric vector against rows in a matrix?
- Need very fast application of 'diff' - ideas?
- Modal Window Stealing Elements from Form. Need Elegant Solution.
- Fast method to compute average values of duplicated IDs