Hello list ! I have a huge data.frame with several variables observed on about 3000 persons. For every person (row) there is variable called GROUP which indices the group the person belongs to. There is also another variable AV for each person. Now i want to create a new variable which holds the group mean of AV as a value for each person. With tapply(AV,GROUP,mean) i get the means for each level of GROUP, but i cannot find out, how to give every person the groupmean as a value (every person should have the same value as every other person in the same group). Has anybody any ideas how to do that ? Yours sincerly Felix Eschenburg
predict(lm(AV~as.factor(GROUP))) Felix Eschenburg <Atropin75 <at> t-online.de> writes: : : Hello list ! : : I have a huge data.frame with several variables observed on about 3000 : persons. For every person (row) there is variable called GROUP which indices : the group the person belongs to. There is also another variable AV for each : person. Now i want to create a new variable which holds the group mean of AV : as a value for each person. : With tapply(AV,GROUP,mean) i get the means for each level of GROUP, but i : cannot find out, how to give every person the groupmean as a value (every : person should have the same value as every other person in the same group). : : Has anybody any ideas how to do that ? : : Yours sincerly : Felix Eschenburg
On Sat, 8 May 2004, Gabor Grothendieck wrote:> > predict(lm(AV~as.factor(GROUP)))If Felix actually has a "huge" data frame this will be slow. Instead try groupmeans<-rowsum(AV,GROUP,reorder=FALSE) individual.means<- groupmeans[match(GROUP, unique(GROUP)] It uses hashing and takes roughly O(MGlogG) time for M measurements on G groups, whereas the lm solution takes O(MG^3) [and the space requirements are O(MG) and O(MG^2)] Admittedly, with only 3000 observations either one will be fast enough. -thomas> > > > Felix Eschenburg <Atropin75 <at> t-online.de> writes: > > : > : Hello list ! > : > : I have a huge data.frame with several variables observed on about 3000 > : persons. For every person (row) there is variable called GROUP which indices > : the group the person belongs to. There is also another variable AV for each > : person. Now i want to create a new variable which holds the group mean of AV > : as a value for each person. > : With tapply(AV,GROUP,mean) i get the means for each level of GROUP, but i > : cannot find out, how to give every person the groupmean as a value (every > : person should have the same value as every other person in the same group). > : > : Has anybody any ideas how to do that ? > : > : Yours sincerly > : Felix Eschenburg > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
Both of you might have missed my question from Friday: For very long `x' (e.g., length=50000), indexing by names can take a long time. See that thread for detail. (For small data, you can hardly tell the difference.) Also, I'm trying to write the function in a way that one can pass in more than one grouping variables in a list, much like tapply. The version I shown is a simplified version to demonstrate the `problem' I had. I obviously missed the fact that tapply returns 1D array... Best, Andy> From: kjetil at acelerate.com > > On 10 May 2004 at 10:09, Christophe Pallier wrote: > > > > > > > Liaw, Andy wrote: > > > > >Suppose I > > >define the function: > > > > > >fun <- function(x, f) { > > > m <- tapply(x, f, mean) > > > ans <- x - m[match(f, unique(f))] > > > names(ans) <- names(x) > > > ans > > >} > > > > > > > > > > > > > May I ask what is the purpose of match(f,unique(f)) ? > > > > To remove the group means, I have be using: > > > > x-tapply(x,f,mean)[f] > > > > for a while, (and I am now changing to > > x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of > > wouldn't > sweep(as.array(x), 1, tapply(x,f,mean)[as.character(f)] , "-") > > be more natural? > > Kjetil Halvorsen > > > indexing named vectors with factors ) > > > > The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular > > order in the result of tapply, no? It seems a bit dangerous to me. > > > > > > Christophe Pallier > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > >