thr3ads.net - R help - [R] Find number of elements less than some number: Elegant/fast solution needed [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Kevin Ummel

2011-Apr-14 19:34 UTC

[R] Find number of elements less than some number: Elegant/fast solution needed

Take vector x and a subset y:

x=1:10

y=c(4,5,7,9)

For each value in 'x', I want to know how many elements in 'y'
are less than 'x'.

An example would be:

sapply(x,FUN=function(i) {length(which(y<i))})
 [1] 0 0 0 0 1 2 2 3 3 4

But this solution is far too slow when x and y have lengths in the millions.

I'm certain an elegant (and computationally efficient) solution exists, but
I'm in the weeds at this point.

Any help is much appreciated.

Kevin

University of Manchester








Take two vectors x and y, where y is a subset of x:

x=1:10

y=c(2,5,6,9)

If y is removed from x, the original x values now have a new placement (index)
in the resulting vector (new):

new=x[-y]

index=1:length(new)

The challenge is: How can I *quickly* and *efficiently* deduce the new
'index' value directly from the original 'x' value -- using only
'y' as an input?

In practice, I have very large matrices containing the 'x' values, and I
need to convert them to the corresponding 'index' if the 'y'
values are removed.




	[[alternative HTML version deleted]]

Richard M. Heiberger

2011-Apr-14 20:35 UTC

head link

[R] Find number of elements less than some number: Elegant/fast solution needed

This is faster for small vectors

> system.time(for (i in 1:1000)+ colSums(outer(y, x, `<`))
+ )
   user  system elapsed
   0.11    0.00    0.11> system.time(for (i in 1:1000)+ sapply(x,FUN=function(i) {length(which(y<i))})
+ )
   user  system elapsed
   0.14    0.00    0.16>
> colSums(outer(y, x, `<`))
 [1] 0 0 0 0 1 2 2 3 3 4> sapply(x,FUN=function(i) {length(which(y<i))})
 [1] 0 0 0 0 1 2 2 3 3 4>
Rich



On Thu, Apr 14, 2011 at 3:34 PM, Kevin Ummel <kevinummel@gmail.com> wrote:
> Take vector x and a subset y:
>
> x=1:10
>
> y=c(4,5,7,9)
>
> For each value in 'x', I want to know how many elements in
'y' are less
> than 'x'.
>
> An example would be:
>
> sapply(x,FUN=function(i) {length(which(y<i))})
>  [1] 0 0 0 0 1 2 2 3 3 4
>
> But this solution is far too slow when x and y have lengths in the
> millions.
>
> I'm certain an elegant (and computationally efficient) solution exists,
but
> I'm in the weeds at this point.
>
> Any help is much appreciated.
>
> Kevin
>
> University of Manchester
>
>
>
>
>
>
>
>
> Take two vectors x and y, where y is a subset of x:
>
> x=1:10
>
> y=c(2,5,6,9)
>
> If y is removed from x, the original x values now have a new placement
> (index) in the resulting vector (new):
>
> new=x[-y]
>
> index=1:length(new)
>
> The challenge is: How can I *quickly* and *efficiently* deduce the new
> 'index' value directly from the original 'x' value -- using
only 'y' as an
> input?
>
> In practice, I have very large matrices containing the 'x' values,
and I
> need to convert them to the corresponding 'index' if the
'y' values are
> removed.
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Marc Schwartz

2011-Apr-14 20:37 UTC

head link

[R] Find number of elements less than some number: Elegant/fast solution needed

On Apr 14, 2011, at 2:34 PM, Kevin Ummel wrote:
> Take vector x and a subset y:
> 
> x=1:10
> 
> y=c(4,5,7,9)
> 
> For each value in 'x', I want to know how many elements in
'y' are less than 'x'.
> 
> An example would be:
> 
> sapply(x,FUN=function(i) {length(which(y<i))})
> [1] 0 0 0 0 1 2 2 3 3 4
> 
> But this solution is far too slow when x and y have lengths in the
millions.
> 
> I'm certain an elegant (and computationally efficient) solution exists,
but I'm in the weeds at this point.
> 
> Any help is much appreciated.
> 
> Kevin
> 
> University of Manchester
> 

I started working on a solution to your problem above and then noted the one
below.

Here is one approach to the above:
> colSums(outer(y, x, "<")) [1] 0 0 0 0 1 2 2 3 3 4


> 
> Take two vectors x and y, where y is a subset of x:
> 
> x=1:10
> 
> y=c(2,5,6,9)
> 
> If y is removed from x, the original x values now have a new placement
(index) in the resulting vector (new):
> 
> new=x[-y]
> 
> index=1:length(new)
> 
> The challenge is: How can I *quickly* and *efficiently* deduce the new
'index' value directly from the original 'x' value -- using only
'y' as an input?
> 
> In practice, I have very large matrices containing the 'x' values,
and I need to convert them to the corresponding 'index' if the
'y' values are removed.

Something like the following might work, if I correctly understand the problem:
> match(x, x[-y]) [1]  1 NA  2  3 NA NA  4  5 NA  6


HTH,

Marc Schwartz

William Dunlap

2011-Apr-14 21:45 UTC

head link

[R] Find number of elements less than some number: Elegant/fastsolution needed

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Kevin Ummel
> Sent: Thursday, April 14, 2011 12:35 PM
> To: r-help at r-project.org
> Subject: [R] Find number of elements less than some number: 
> Elegant/fastsolution needed
> 
> Take vector x and a subset y:
> 
> x=1:10
> 
> y=c(4,5,7,9)
> 
> For each value in 'x', I want to know how many elements in 
> 'y' are less than 'x'.
> 
> An example would be:
> 
> sapply(x,FUN=function(i) {length(which(y<i))})
>  [1] 0 0 0 0 1 2 2 3 3 4
> 
> But this solution is far too slow when x and y have lengths 
> in the millions.
> 
> I'm certain an elegant (and computationally efficient) 
> solution exists, but I'm in the weeds at this point.
If x is sorted findInterval(x, y) would do it for <= but you
want strict <.  Try
  f2 <- function(x, y) length(y) - findInterval(-x, rev(-sort(y)))
where your version is
  f0 <- function(x, y) sapply(x,FUN=function(i) {length(which(y<i))})
E.g.,
  > x <- sort(sample(1e6, size=1e5))
  > y <- sample(1e6, size=1e4, replace=TRUE)
  > system.time(r0 <- f0(x,y))
     user  system elapsed
    7.900   0.310   8.211
  > system.time(r2 <- f2(x,y))
     user  system elapsed
    0.000   0.000   0.004
  > identical(r0, r2)
  [1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> 
> Any help is much appreciated.
> 
> Kevin
> 
> University of Manchester
> 
> 
> 
> 
> 
> 
> 
> 
> Take two vectors x and y, where y is a subset of x:
> 
> x=1:10
> 
> y=c(2,5,6,9)
> 
> If y is removed from x, the original x values now have a new 
> placement (index) in the resulting vector (new): 
> 
> new=x[-y]
> 
> index=1:length(new)
> 
> The challenge is: How can I *quickly* and *efficiently* 
> deduce the new 'index' value directly from the original 'x'
> value -- using only 'y' as an input?
> 
> In practice, I have very large matrices containing the 'x' 
> values, and I need to convert them to the corresponding 
> 'index' if the 'y' values are removed.
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Apr 2011 - Find number of elements less than some number: Elegant/fast solution needed

[R] Find number of elements less than some number: Elegant/fast solution needed

[R] Find number of elements less than some number: Elegant/fast solution needed

[R] Find number of elements less than some number: Elegant/fast solution needed

[R] Find number of elements less than some number: Elegant/fastsolution needed

Seemingly Similar Threads