thr3ads.net - R help - [R] how to get the group mean deviation data ? [Jul 2005]

If this information is useful, please help other people find it:
Share via:

ronggui

2005-Jul-25 06:07 UTC

[R] how to get the group mean deviation data ?

> n=10;t=3
> d<-cbind(id=rep(1:n,each=t),y=rnorm(n*t),x=rnorm(n*t),z=rnorm(n*t))
> head(d)     id          y           x          z
[1,]  1 -2.1725379  0.07629954 -0.3985258
[2,]  1 -1.2383038 -2.49667038  0.6966127
[3,]  1 -1.2642401 -0.50613307  0.4895856
[4,]  2  0.2171246  0.86711864 -0.6660036
[5,]  2  2.2765760 -0.48547142 -1.4496664
[6,]  2  0.5985345 -1.06427035  2.1761071

first,i want to get the group mean of each variable,which i can
use> d<-data.frame(d)
> aggregate(d,list(d$id),mean)[,-1]   id           y          x           z
1   1 -1.55836060 -0.9755013  0.26255754
2   2  1.03074502 -0.2275410  0.02014565
3   3  0.20700121 -0.7159450  1.35890176
4   4  0.17839650  1.2575891  0.04135165
5   5 -0.20012508  0.4310221  0.55458899
6   6 -0.13084185 -0.2953392  0.28229068
7   7  0.20737288 -0.8863761 -0.50793880
8   8  0.07512612 -0.6591304 -0.21656533
9   9  0.94727796 -0.6108891  0.13529884
10 10 -0.04434875  0.1332086 -0.88229808

then i want the  group mean deviation data,like> head(sapply(d[,2:4],function(x) x-ave(x,d$id)))              y          x          z
[1,] -0.6141773  1.0518008 -0.6610833
[2,]  0.3200568 -1.5211691  0.4340552
[3,]  0.2941205  0.4693682  0.2270281
[4,] -0.8136205  1.0946597 -0.6861493
[5,]  1.2458310 -0.2579304 -1.4698121
[6,] -0.4322105 -0.8367293  2.1559614

both above are what i want.though i can do it use the function  to do it.but if
n id quite large,say n=1000 and t=3, it require too much time.so i want to know
any more efficient way to do it?

myfun<-function(x,id)
 {
 x<-as.matrix(x)
 id<-as.factor(id)
 xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id)
 xdm<- x[] <- x-xm[id,]  
 re<-list(xm=xm, xdm=xdm)
 re
 }

Prof Brian Ripley

2005-Jul-25 06:57 UTC

head link

[R] how to get the group mean deviation data ?

> if n id quite large,say n=1000 and t=3, it require too much time.so i 
> want to know any more efficient way to do it?
Why is about 0.4 second (which is what it takes on my system) too long?

Given that you want to operate on 3000 cells, a second does not look 
unreasonable.

This is a toy problem, and it is unclear what the real problem is (if 
any).  Since you have the same number of replications for each cell 
(group-variable combination), I would use this as a n x 3 x t array (a 
simple call to dim and aperem).  Then rowMeans will find the group means, 
and you can just subtract those to get the deviations from the means, 
making use of recycling.

E.g.

D <- d[,-1]
dim(D) <- c(t,n,3)
D <- aperm(D, c(2,3,1))
gmeans <- rowMeans(D, dims=2)
d[,-1] - rep(gmeans, each=3)

That takes under 10ms for n=1000


On Mon, 25 Jul 2005, ronggui wrote:
>> n=10;t=3
>> d<-cbind(id=rep(1:n,each=t),y=rnorm(n*t),x=rnorm(n*t),z=rnorm(n*t))
>> head(d)
>     id          y           x          z
> [1,]  1 -2.1725379  0.07629954 -0.3985258
> [2,]  1 -1.2383038 -2.49667038  0.6966127
> [3,]  1 -1.2642401 -0.50613307  0.4895856
> [4,]  2  0.2171246  0.86711864 -0.6660036
> [5,]  2  2.2765760 -0.48547142 -1.4496664
> [6,]  2  0.5985345 -1.06427035  2.1761071
>
> first,i want to get the group mean of each variable,which i can use
>> d<-data.frame(d)
>> aggregate(d,list(d$id),mean)[,-1]
>   id           y          x           z
> 1   1 -1.55836060 -0.9755013  0.26255754
> 2   2  1.03074502 -0.2275410  0.02014565
> 3   3  0.20700121 -0.7159450  1.35890176
> 4   4  0.17839650  1.2575891  0.04135165
> 5   5 -0.20012508  0.4310221  0.55458899
> 6   6 -0.13084185 -0.2953392  0.28229068
> 7   7  0.20737288 -0.8863761 -0.50793880
> 8   8  0.07512612 -0.6591304 -0.21656533
> 9   9  0.94727796 -0.6108891  0.13529884
> 10 10 -0.04434875  0.1332086 -0.88229808
>
> then i want the  group mean deviation data,like
>> head(sapply(d[,2:4],function(x) x-ave(x,d$id)))
>              y          x          z
> [1,] -0.6141773  1.0518008 -0.6610833
> [2,]  0.3200568 -1.5211691  0.4340552
> [3,]  0.2941205  0.4693682  0.2270281
> [4,] -0.8136205  1.0946597 -0.6861493
> [5,]  1.2458310 -0.2579304 -1.4698121
> [6,] -0.4322105 -0.8367293  2.1559614
>
> both above are what i want.though i can do it use the function  to do
it.but if n id quite large,say n=1000 and t=3, it require too much time.so i
want to know any more efficient way to do it?
>
> myfun<-function(x,id)
> {
> x<-as.matrix(x)
> id<-as.factor(id)
> xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id)
> xdm<- x[] <- x-xm[id,]
> re<-list(xm=xm, xdm=xdm)
> re
> }
>
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

ronggui

2005-Jul-25 08:32 UTC

head link

[R] how to get the group mean deviation data ?

Yeah,I meant n=10000,but i just missed a zero.

If n=10000,t=3,it take about 3 seconds.
If n=2000,t=7,it takes about 10 seconds.
I want to write a function to fit a model,and the data maybe quite large (maybe
n<=20000,t<=10).When n and t become larger and larger,the time will be
much longer. It is of course reasonable.But I think there should be much pretty
code to do this job,so I post here.

What I really want to konw is how to optimize the code for this purpose.Of
course, I can still fit my model even I use this code.and I still like R much as
it's free ,flexibile and powerfull.
>> if n id quite large,say n=1000 and t=3, it require too much time.so i 
>> want to know any more efficient way to do it?
>
>Why is about 0.4 second (which is what it takes on my system) too long?
>
>Given that you want to operate on 3000 cells, a second does not look 
>unreasonable.
>
>This is a toy problem, and it is unclear what the real problem is (if 
>any).  Since you have the same number of replications for each cell 
>(group-variable combination)
I want to deal with the case with different number of replications for each cell
too.
> I would use this as a n x 3 x t array (a 
>simple call to dim and aperem).  Then rowMeans will find the group means, 
>and you can just subtract those to get the deviations from the means, 
>making use of recycling.
>
>E.g.
>
>D <- d[,-1]
>dim(D) <- c(t,n,3)
>D <- aperm(D, c(2,3,1))
>gmeans <- rowMeans(D, dims=2)
>d[,-1] - rep(gmeans, each=3)
>
>That takes under 10ms for n=1000
>
>
>On Mon, 25 Jul 2005, ronggui wrote:
>
>>> n=10;t=3
>>>
d<-cbind(id=rep(1:n,each=t),y=rnorm(n*t),x=rnorm(n*t),z=rnorm(n*t))
>>> head(d)
>>     id          y           x          z
>> [1,]  1 -2.1725379  0.07629954 -0.3985258
>> [2,]  1 -1.2383038 -2.49667038  0.6966127
>> [3,]  1 -1.2642401 -0.50613307  0.4895856
>> [4,]  2  0.2171246  0.86711864 -0.6660036
>> [5,]  2  2.2765760 -0.48547142 -1.4496664
>> [6,]  2  0.5985345 -1.06427035  2.1761071
>>
>> first,i want to get the group mean of each variable,which i can use
>>> d<-data.frame(d)
>>> aggregate(d,list(d$id),mean)[,-1]
>>   id           y          x           z
>> 1   1 -1.55836060 -0.9755013  0.26255754
>> 2   2  1.03074502 -0.2275410  0.02014565
>> 3   3  0.20700121 -0.7159450  1.35890176
>> 4   4  0.17839650  1.2575891  0.04135165
>> 5   5 -0.20012508  0.4310221  0.55458899
>> 6   6 -0.13084185 -0.2953392  0.28229068
>> 7   7  0.20737288 -0.8863761 -0.50793880
>> 8   8  0.07512612 -0.6591304 -0.21656533
>> 9   9  0.94727796 -0.6108891  0.13529884
>> 10 10 -0.04434875  0.1332086 -0.88229808
>>
>> then i want the  group mean deviation data,like
>>> head(sapply(d[,2:4],function(x) x-ave(x,d$id)))
>>              y          x          z
>> [1,] -0.6141773  1.0518008 -0.6610833
>> [2,]  0.3200568 -1.5211691  0.4340552
>> [3,]  0.2941205  0.4693682  0.2270281
>> [4,] -0.8136205  1.0946597 -0.6861493
>> [5,]  1.2458310 -0.2579304 -1.4698121
>> [6,] -0.4322105 -0.8367293  2.1559614
>>
>> both above are what i want.though i can do it use the function  to do
it.but if n id quite large,say n=1000 and t=3, it require too much time.so i
want to know any more efficient way to do it?
>>
>> myfun<-function(x,id)
>> {
>> x<-as.matrix(x)
>> id<-as.factor(id)
>> xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id)
>> xdm<- x[] <- x-xm[id,]
>> re<-list(xm=xm, xdm=xdm)
>> re
>> }
>>
>>
>
>-- 
>Brian D. Ripley,                  ripley at stats.ox.ac.uk
>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>University of Oxford,             Tel:  +44 1865 272861 (self)
>1 South Parks Road,                     +44 1865 272866 (PA)
>Oxford OX1 3TG, UK                Fax:  +44 1865 272595

R help - Jul 2005 - how to get the group mean deviation data ?

[R] how to get the group mean deviation data ?

[R] how to get the group mean deviation data ?

[R] how to get the group mean deviation data ?