> n=10;t=3 > d<-cbind(id=rep(1:n,each=t),y=rnorm(n*t),x=rnorm(n*t),z=rnorm(n*t)) > head(d)id y x z [1,] 1 -2.1725379 0.07629954 -0.3985258 [2,] 1 -1.2383038 -2.49667038 0.6966127 [3,] 1 -1.2642401 -0.50613307 0.4895856 [4,] 2 0.2171246 0.86711864 -0.6660036 [5,] 2 2.2765760 -0.48547142 -1.4496664 [6,] 2 0.5985345 -1.06427035 2.1761071 first,i want to get the group mean of each variable,which i can use> d<-data.frame(d) > aggregate(d,list(d$id),mean)[,-1]id y x z 1 1 -1.55836060 -0.9755013 0.26255754 2 2 1.03074502 -0.2275410 0.02014565 3 3 0.20700121 -0.7159450 1.35890176 4 4 0.17839650 1.2575891 0.04135165 5 5 -0.20012508 0.4310221 0.55458899 6 6 -0.13084185 -0.2953392 0.28229068 7 7 0.20737288 -0.8863761 -0.50793880 8 8 0.07512612 -0.6591304 -0.21656533 9 9 0.94727796 -0.6108891 0.13529884 10 10 -0.04434875 0.1332086 -0.88229808 then i want the group mean deviation data,like> head(sapply(d[,2:4],function(x) x-ave(x,d$id)))y x z [1,] -0.6141773 1.0518008 -0.6610833 [2,] 0.3200568 -1.5211691 0.4340552 [3,] 0.2941205 0.4693682 0.2270281 [4,] -0.8136205 1.0946597 -0.6861493 [5,] 1.2458310 -0.2579304 -1.4698121 [6,] -0.4322105 -0.8367293 2.1559614 both above are what i want.though i can do it use the function to do it.but if n id quite large,say n=1000 and t=3, it require too much time.so i want to know any more efficient way to do it? myfun<-function(x,id) { x<-as.matrix(x) id<-as.factor(id) xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id) xdm<- x[] <- x-xm[id,] re<-list(xm=xm, xdm=xdm) re }
> if n id quite large,say n=1000 and t=3, it require too much time.so i > want to know any more efficient way to do it?Why is about 0.4 second (which is what it takes on my system) too long? Given that you want to operate on 3000 cells, a second does not look unreasonable. This is a toy problem, and it is unclear what the real problem is (if any). Since you have the same number of replications for each cell (group-variable combination), I would use this as a n x 3 x t array (a simple call to dim and aperem). Then rowMeans will find the group means, and you can just subtract those to get the deviations from the means, making use of recycling. E.g. D <- d[,-1] dim(D) <- c(t,n,3) D <- aperm(D, c(2,3,1)) gmeans <- rowMeans(D, dims=2) d[,-1] - rep(gmeans, each=3) That takes under 10ms for n=1000 On Mon, 25 Jul 2005, ronggui wrote:>> n=10;t=3 >> d<-cbind(id=rep(1:n,each=t),y=rnorm(n*t),x=rnorm(n*t),z=rnorm(n*t)) >> head(d) > id y x z > [1,] 1 -2.1725379 0.07629954 -0.3985258 > [2,] 1 -1.2383038 -2.49667038 0.6966127 > [3,] 1 -1.2642401 -0.50613307 0.4895856 > [4,] 2 0.2171246 0.86711864 -0.6660036 > [5,] 2 2.2765760 -0.48547142 -1.4496664 > [6,] 2 0.5985345 -1.06427035 2.1761071 > > first,i want to get the group mean of each variable,which i can use >> d<-data.frame(d) >> aggregate(d,list(d$id),mean)[,-1] > id y x z > 1 1 -1.55836060 -0.9755013 0.26255754 > 2 2 1.03074502 -0.2275410 0.02014565 > 3 3 0.20700121 -0.7159450 1.35890176 > 4 4 0.17839650 1.2575891 0.04135165 > 5 5 -0.20012508 0.4310221 0.55458899 > 6 6 -0.13084185 -0.2953392 0.28229068 > 7 7 0.20737288 -0.8863761 -0.50793880 > 8 8 0.07512612 -0.6591304 -0.21656533 > 9 9 0.94727796 -0.6108891 0.13529884 > 10 10 -0.04434875 0.1332086 -0.88229808 > > then i want the group mean deviation data,like >> head(sapply(d[,2:4],function(x) x-ave(x,d$id))) > y x z > [1,] -0.6141773 1.0518008 -0.6610833 > [2,] 0.3200568 -1.5211691 0.4340552 > [3,] 0.2941205 0.4693682 0.2270281 > [4,] -0.8136205 1.0946597 -0.6861493 > [5,] 1.2458310 -0.2579304 -1.4698121 > [6,] -0.4322105 -0.8367293 2.1559614 > > both above are what i want.though i can do it use the function to do it.but if n id quite large,say n=1000 and t=3, it require too much time.so i want to know any more efficient way to do it? > > myfun<-function(x,id) > { > x<-as.matrix(x) > id<-as.factor(id) > xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id) > xdm<- x[] <- x-xm[id,] > re<-list(xm=xm, xdm=xdm) > re > } > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Yeah,I meant n=10000,but i just missed a zero. If n=10000,t=3,it take about 3 seconds. If n=2000,t=7,it takes about 10 seconds. I want to write a function to fit a model,and the data maybe quite large (maybe n<=20000,t<=10).When n and t become larger and larger,the time will be much longer. It is of course reasonable.But I think there should be much pretty code to do this job,so I post here. What I really want to konw is how to optimize the code for this purpose.Of course, I can still fit my model even I use this code.and I still like R much as it's free ,flexibile and powerfull.>> if n id quite large,say n=1000 and t=3, it require too much time.so i >> want to know any more efficient way to do it? > >Why is about 0.4 second (which is what it takes on my system) too long? > >Given that you want to operate on 3000 cells, a second does not look >unreasonable. > >This is a toy problem, and it is unclear what the real problem is (if >any). Since you have the same number of replications for each cell >(group-variable combination)I want to deal with the case with different number of replications for each cell too.> I would use this as a n x 3 x t array (a >simple call to dim and aperem). Then rowMeans will find the group means, >and you can just subtract those to get the deviations from the means, >making use of recycling. > >E.g. > >D <- d[,-1] >dim(D) <- c(t,n,3) >D <- aperm(D, c(2,3,1)) >gmeans <- rowMeans(D, dims=2) >d[,-1] - rep(gmeans, each=3) > >That takes under 10ms for n=1000 > > >On Mon, 25 Jul 2005, ronggui wrote: > >>> n=10;t=3 >>> d<-cbind(id=rep(1:n,each=t),y=rnorm(n*t),x=rnorm(n*t),z=rnorm(n*t)) >>> head(d) >> id y x z >> [1,] 1 -2.1725379 0.07629954 -0.3985258 >> [2,] 1 -1.2383038 -2.49667038 0.6966127 >> [3,] 1 -1.2642401 -0.50613307 0.4895856 >> [4,] 2 0.2171246 0.86711864 -0.6660036 >> [5,] 2 2.2765760 -0.48547142 -1.4496664 >> [6,] 2 0.5985345 -1.06427035 2.1761071 >> >> first,i want to get the group mean of each variable,which i can use >>> d<-data.frame(d) >>> aggregate(d,list(d$id),mean)[,-1] >> id y x z >> 1 1 -1.55836060 -0.9755013 0.26255754 >> 2 2 1.03074502 -0.2275410 0.02014565 >> 3 3 0.20700121 -0.7159450 1.35890176 >> 4 4 0.17839650 1.2575891 0.04135165 >> 5 5 -0.20012508 0.4310221 0.55458899 >> 6 6 -0.13084185 -0.2953392 0.28229068 >> 7 7 0.20737288 -0.8863761 -0.50793880 >> 8 8 0.07512612 -0.6591304 -0.21656533 >> 9 9 0.94727796 -0.6108891 0.13529884 >> 10 10 -0.04434875 0.1332086 -0.88229808 >> >> then i want the group mean deviation data,like >>> head(sapply(d[,2:4],function(x) x-ave(x,d$id))) >> y x z >> [1,] -0.6141773 1.0518008 -0.6610833 >> [2,] 0.3200568 -1.5211691 0.4340552 >> [3,] 0.2941205 0.4693682 0.2270281 >> [4,] -0.8136205 1.0946597 -0.6861493 >> [5,] 1.2458310 -0.2579304 -1.4698121 >> [6,] -0.4322105 -0.8367293 2.1559614 >> >> both above are what i want.though i can do it use the function to do it.but if n id quite large,say n=1000 and t=3, it require too much time.so i want to know any more efficient way to do it? >> >> myfun<-function(x,id) >> { >> x<-as.matrix(x) >> id<-as.factor(id) >> xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id) >> xdm<- x[] <- x-xm[id,] >> re<-list(xm=xm, xdm=xdm) >> re >> } >> >> > >-- >Brian D. Ripley, ripley at stats.ox.ac.uk >Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >University of Oxford, Tel: +44 1865 272861 (self) >1 South Parks Road, +44 1865 272866 (PA) >Oxford OX1 3TG, UK Fax: +44 1865 272595