Dimitri Liakhovitski
2010-Jan-20 22:37 UTC
[R] standardizing one variable by dividing each value by the mean - but within levels of a factor
Hello! I have a data frame with a factor and a numeric variable: x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,10,20,100,200)) For each level of "factor" - I would like to divide each value of "values" by the mean of "values" that corresponds to the level of "factor" In other words, I would like to get a new variable that is equal to: 1/1.5 2/1.5 10/15 20/15 100/150 200/150 I realize I could do it through tapply starting with: factor.level.means<-tapply(x$values,x$factor,mean) ... etc. But it seems clunky to me. Is there a more elegant way of doing it? Thanks a lot! -- Dimitri Liakhovitski
Chuck Cleland
2010-Jan-20 22:56 UTC
[R] standardizing one variable by dividing each value by the mean - but within levels of a factor
On 1/20/2010 5:37 PM, Dimitri Liakhovitski wrote:> Hello! > > I have a data frame with a factor and a numeric variable: > > x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,10,20,100,200)) > > For each level of "factor" - I would like to divide each value of > "values" by the mean of "values" that corresponds to the level of > "factor" > In other words, I would like to get a new variable that is equal to: > 1/1.5 > 2/1.5 > 10/15 > 20/15 > 100/150 > 200/150 > > I realize I could do it through tapply starting with: > factor.level.means<-tapply(x$values,x$factor,mean) ... etc. > > > But it seems clunky to me. > Is there a more elegant way of doing it?> with(x, ave(x=values, factor, FUN=function(x){x/mean(x)}))[1] 0.6666667 1.3333333 0.6666667 1.3333333 0.6666667 1.3333333 ?ave> Thanks a lot!-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Dimitri Liakhovitski
2010-Jan-20 23:34 UTC
[R] standardizing one variable by dividing each value by the mean - but within levels of a factor
Thanks a lot for your helpful suggestion! Dimitri On Wed, Jan 20, 2010 at 5:56 PM, Chuck Cleland <ccleland at optonline.net> wrote:> On 1/20/2010 5:37 PM, Dimitri Liakhovitski wrote: >> Hello! >> >> I have a data frame with a factor and a numeric variable: >> >> x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,10,20,100,200)) >> >> For each level of "factor" - I would like to divide each value of >> "values" by the mean of "values" that corresponds to the level of >> "factor" >> In other words, I would like to get a new variable that is equal to: >> 1/1.5 >> 2/1.5 >> 10/15 >> 20/15 >> 100/150 >> 200/150 >> >> I realize I could do it through tapply starting with: >> factor.level.means<-tapply(x$values,x$factor,mean) ... etc. >> >> >> But it seems clunky to me. >> Is there a more elegant way of doing it? > >> with(x, ave(x=values, factor, FUN=function(x){x/mean(x)})) > [1] 0.6666667 1.3333333 0.6666667 1.3333333 0.6666667 1.3333333 > > ?ave > >> Thanks a lot! > > -- > Chuck Cleland, Ph.D. > NDRI, Inc. (www.ndri.org) > 71 West 23rd Street, 8th floor > New York, NY 10010 > tel: (212) 845-4495 (Tu, Th) > tel: (732) 512-0171 (M, W, F) > fax: (917) 438-0894 >-- Dimitri Liakhovitski Ninah.com Dimitri.Liakhovitski at ninah.com
Dimitri Liakhovitski
2010-Jan-21 00:13 UTC
[R] standardizing one variable by dividing each value by the mean -but within levels of a factor
One follow up question - the proposed solution was (notice - this time I am introducing one NA in data frame "x") x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,NA,10,20,100,200)) x$std.via.ave<-ave(x$values, x$factor, FUN=function(x)x/mean(x)) I compared the result to my own clumsy solution: factor.level.means<-as.data.frame(tapply(x$values,x$factor,mean, na.rm=T)) factor.level.means$factor<-row.names(factor.level.means) names(factor.level.means)[1]<-"means" factor.level.means x$std<-NA for(i in 1:nrow(x)){ #i<-1 x[i,"std"]<-factor.level.means[factor.level.means$factor==x[i,"factor"],"means"] } x$std<-x$values/x$std If one compares x$std to x$std.via.ave - one notices that ave results in an NA for the very first observation - because it seems to be using na.rm=F when it calculates the means. Is there a way to fix that in the ave solution? Thank you! Dimitri On Wed, Jan 20, 2010 at 5:55 PM, William Dunlap <wdunlap at tibco.com> wrote:> > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> -----Original Message----- >> From: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] On Behalf Of Dimitri >> Liakhovitski >> Sent: Wednesday, January 20, 2010 2:38 PM >> To: r-help >> Subject: [R] standardizing one variable by dividing each >> value by the mean -but within levels of a factor >> >> Hello! >> >> I have a data frame with a factor and a numeric variable: >> >> x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,1 >> 0,20,100,200)) >> >> For each level of "factor" - I would like to divide each value of >> "values" by the mean of "values" that corresponds to the level of >> "factor" > > ave() can do it: > ? > ave(x$values, x$factor, FUN=function(x)x/mean(x)) > ? [1] 0.6666667 1.3333333 0.6666667 1.3333333 0.6666667 1.3333333 > The plyr package has functions which extend this > sort of thing. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > >> In other words, I would like to get a new variable that is equal to: >> 1/1.5 >> 2/1.5 >> 10/15 >> 20/15 >> 100/150 >> 200/150 >> >> I realize I could do it through tapply starting with: >> factor.level.means<-tapply(x$values,x$factor,mean) ... etc. >> >> >> But it seems clunky to me. >> Is there a more elegant way of doing it? >> >> Thanks a lot! >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >-- Dimitri Liakhovitski Ninah.com Dimitri.Liakhovitski at ninah.com
Jason Smith
2010-Jan-21 14:28 UTC
[R] standardizing one variable by dividing each value by the mean -but within levels of a factor
Dimitri Liakhovitski <ld7631 <at> gmail.com> writes:> > One follow up question - the proposed solution was (notice - this time > I am introducing one NA in data frame "x") > > x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,NA,10,20,100,200)) > x$std.via.ave<-ave(x$values, x$factor, FUN=function(x)x/mean(x)) > > I compared the result to my own clumsy solution: > > factor.level.means<-as.data.frame(tapply(x$values,x$factor,mean, na.rm=T)) > factor.level.means$factor<-row.names(factor.level.means) > names(factor.level.means)[1]<-"means" > factor.level.means > > x$std<-NA > for(i in 1:nrow(x)){ #i<-1 > x[i,"std"]<-factor.level.means[factor.level.means$factor==x[i,"factor"],"means"]> } > x$std<-x$values/x$std > > If one compares x$std to x$std.via.ave - one notices that ave results > in an NA for the very first observation - because it seems to be using > na.rm=F when it calculates the means. > Is there a way to fix that in the ave solution?I think you are asking how to have the first observation in the ave solution be calculated as 1 instead of NA. As you noted, the ave solution is currently using the default na.rm=F (see ?mean). Simply pass na.rm=T in to your custom function in the ave solution for it to remove the NA and you will get 1 as the average using the ave approach: x$std.via.ave<-ave(x$values, x$factor, FUN=function(x)x/mean(x,na.rm=T)) --jason
hadley wickham
2010-Jan-21 14:44 UTC
[R] standardizing one variable by dividing each value by the mean - but within levels of a factor
On Wed, Jan 20, 2010 at 4:37 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:> Hello! > > I have a data frame with a factor and a numeric variable: > > x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,10,20,100,200)) > > For each level of "factor" - I would like to divide each value of > "values" by the mean of "values" that corresponds to the level of > "factor" > In other words, I would like to get a new variable that is equal to: > 1/1.5 > 2/1.5 > 10/15 > 20/15 > 100/150 > 200/150 > > I realize I could do it through tapply starting with: > factor.level.means<-tapply(x$values,x$factor,mean) ... etc. > > > But it seems clunky to me. > Is there a more elegant way of doing it?Here's one way with the plyr package: library(plyr) ddply(x, "factor", transform, scaled = values / mean(values, na.rm = T)) Hadley -- http://had.co.nz/