thr3ads.net - R help - [R] Fast ave for sorted data? [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Zhou Fang

2009-Feb-15 17:46 UTC

[R] Fast ave for sorted data?

Hi,

This is probably really obvious, by I can't seem to find anything on it.

Is there a fast version of ave for when the data is already sorted in 
terms of the factor, or if the breaks are already known?

Basically, I have:
X = 0.1, 0.2, 0.32, 0.32, 0.4, 0.56, 0.56, 0.7...
Y = 223, 434, 343, 544, 231.... etc
of the same, admittedly large length.

Now note that some of the values of X are repeated. What I want to do 
is, for those X that are repeated, take the corresponding values of Y 
and change them to the average for that particular X.

So, ave(Y,X) will work. But it's very slow, and certainly not suited to 
my problem, where Y changes and X stays the same and I need to 
repeatedly recalculate the averaging of Y. Ave also does not take take 
advantage of the sorting of the data.

So, is there an alternative? (Presumeably avoiding loops.)

Thanks,

Zhou Fang

Charles C. Berry

2009-Feb-15 19:08 UTC

head link

[R] Fast ave for sorted data?

On Sun, 15 Feb 2009, Zhou Fang wrote:
> Hi,
>
> This is probably really obvious, by I can't seem to find anything on
it.
>
> Is there a fast version of ave for when the data is already sorted in terms
> of the factor, or if the breaks are already known?
>
If all you want are means, you can use rle() and colMeans() to good 
effect:

foo2 <- 
function (x,y)
{

 	reps <- rle(x)$lengths
 	lens <- rep(reps,reps)
 	uniqLens <- unique(lens)
 	for (i in uniqLens[ uniqLens != 1]){
 		y[ lens == i] <-
 			rep( colMeans(matrix(y[ lens == i], nr=i)), each=i)
 		}
 	y

}
> x <- sort( round( runif(100000, 0 , 1 ), 5) )
> y <- sample(1000000,100000)
> all.equal(ave(y,x),foo2(x,y))
[1] TRUE> system.time(foo2(x,y))    user  system elapsed
   0.087   0.029   0.117> system.time(ave(y,x))    user  system elapsed
   1.933   0.030   1.980>

If, as in your example, a substantial fraction of the X's are unique, and 
if you want to generalize to more than means, then you can still gain a 
lot by treating the unique and non-unique values separately like this:

foo <- 
function (x,y)
{

 	reps <- rle(x)$lengths
 	len.not.1 <- rep(reps,reps) != 1
 	y[ len.not.1] <- ave( y[ len.not.1], x[ len.not.1 ])
 	y

}
> y <- sample(1000000,100000)
> x <- sort( round( runif(100000, 0 , 2 ), 5) )
> system.time(foo(x,y))    user  system elapsed
   0.577   0.027   0.628> system.time(ave(y,x))    user  system elapsed
   2.513   0.038   2.545> table(table(x))
     1     2     3     4     5     6
60526 15161  2578   318    28     1

And if neither of these is quite good enough, a line or two of C code 
should do the trick. See package 'inline'.


HTH,

Chuck
> Basically, I have:
> X = 0.1, 0.2, 0.32, 0.32, 0.4, 0.56, 0.56, 0.7...
> Y = 223, 434, 343, 544, 231.... etc
> of the same, admittedly large length.
>
> Now note that some of the values of X are repeated. What I want to do is,
for
> those X that are repeated, take the corresponding values of Y and change
them
> to the average for that particular X.
>
> So, ave(Y,X) will work. But it's very slow, and certainly not suited to
my
> problem, where Y changes and X stays the same and I need to repeatedly 
> recalculate the averaging of Y. Ave also does not take take advantage of
the
> sorting of the data.
>
> So, is there an alternative? (Presumeably avoiding loops.)
>
> Thanks,
>
> Zhou Fang
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Zhou Fang

2009-Feb-15 19:26 UTC

head link

[R] Fast ave for sorted data?

Thanks! That does exactly what I want. (Heck, maybe this should be
included as a default sorted alternative to ave.)

I was thinking of doing it another way using cumsums, but maybe this
method is faster.

Zhou

Seemingly Similar Threads

Search for more reasonably related threads

R help - Feb 2009 - Fast ave for sorted data?

[R] Fast ave for sorted data?

[R] Fast ave for sorted data?

[R] Fast ave for sorted data?

Seemingly Similar Threads