Dimitri Liakhovitski
2010-Jan-20 22:37 UTC
[R] standardizing one variable by dividing each value by the mean - but within levels of a factor
Hello!
I have a data frame with a factor and a numeric variable:
x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,10,20,100,200))
For each level of "factor" - I would like to divide each value of
"values" by the mean of "values" that corresponds to the
level of
"factor"
In other words, I would like to get a new variable that is equal to:
1/1.5
2/1.5
10/15
20/15
100/150
200/150
I realize I could do it through tapply starting with:
factor.level.means<-tapply(x$values,x$factor,mean) ... etc.
But it seems clunky to me.
Is there a more elegant way of doing it?
Thanks a lot!
--
Dimitri Liakhovitski
Chuck Cleland
2010-Jan-20 22:56 UTC
[R] standardizing one variable by dividing each value by the mean - but within levels of a factor
On 1/20/2010 5:37 PM, Dimitri Liakhovitski wrote:> Hello! > > I have a data frame with a factor and a numeric variable: > > x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,10,20,100,200)) > > For each level of "factor" - I would like to divide each value of > "values" by the mean of "values" that corresponds to the level of > "factor" > In other words, I would like to get a new variable that is equal to: > 1/1.5 > 2/1.5 > 10/15 > 20/15 > 100/150 > 200/150 > > I realize I could do it through tapply starting with: > factor.level.means<-tapply(x$values,x$factor,mean) ... etc. > > > But it seems clunky to me. > Is there a more elegant way of doing it?> with(x, ave(x=values, factor, FUN=function(x){x/mean(x)}))[1] 0.6666667 1.3333333 0.6666667 1.3333333 0.6666667 1.3333333 ?ave> Thanks a lot!-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Dimitri Liakhovitski
2010-Jan-20 23:34 UTC
[R] standardizing one variable by dividing each value by the mean - but within levels of a factor
Thanks a lot for your helpful suggestion! Dimitri On Wed, Jan 20, 2010 at 5:56 PM, Chuck Cleland <ccleland at optonline.net> wrote:> On 1/20/2010 5:37 PM, Dimitri Liakhovitski wrote: >> Hello! >> >> I have a data frame with a factor and a numeric variable: >> >> x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,10,20,100,200)) >> >> For each level of "factor" - I would like to divide each value of >> "values" by the mean of "values" that corresponds to the level of >> "factor" >> In other words, I would like to get a new variable that is equal to: >> 1/1.5 >> 2/1.5 >> 10/15 >> 20/15 >> 100/150 >> 200/150 >> >> I realize I could do it through tapply starting with: >> factor.level.means<-tapply(x$values,x$factor,mean) ... etc. >> >> >> But it seems clunky to me. >> Is there a more elegant way of doing it? > >> with(x, ave(x=values, factor, FUN=function(x){x/mean(x)})) > [1] 0.6666667 1.3333333 0.6666667 1.3333333 0.6666667 1.3333333 > > ?ave > >> Thanks a lot! > > -- > Chuck Cleland, Ph.D. > NDRI, Inc. (www.ndri.org) > 71 West 23rd Street, 8th floor > New York, NY 10010 > tel: (212) 845-4495 (Tu, Th) > tel: (732) 512-0171 (M, W, F) > fax: (917) 438-0894 >-- Dimitri Liakhovitski Ninah.com Dimitri.Liakhovitski at ninah.com
Dimitri Liakhovitski
2010-Jan-21 00:13 UTC
[R] standardizing one variable by dividing each value by the mean -but within levels of a factor
One follow up question - the proposed solution was (notice - this time
I am introducing one NA in data frame "x")
x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,NA,10,20,100,200))
x$std.via.ave<-ave(x$values, x$factor, FUN=function(x)x/mean(x))
I compared the result to my own clumsy solution:
factor.level.means<-as.data.frame(tapply(x$values,x$factor,mean, na.rm=T))
factor.level.means$factor<-row.names(factor.level.means)
names(factor.level.means)[1]<-"means"
factor.level.means
x$std<-NA
for(i in 1:nrow(x)){ #i<-1
x[i,"std"]<-factor.level.means[factor.level.means$factor==x[i,"factor"],"means"]
}
x$std<-x$values/x$std
If one compares x$std to x$std.via.ave - one notices that ave results
in an NA for the very first observation - because it seems to be using
na.rm=F when it calculates the means.
Is there a way to fix that in the ave solution?
Thank you!
Dimitri
On Wed, Jan 20, 2010 at 5:55 PM, William Dunlap <wdunlap at tibco.com>
wrote:>
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Dimitri
>> Liakhovitski
>> Sent: Wednesday, January 20, 2010 2:38 PM
>> To: r-help
>> Subject: [R] standardizing one variable by dividing each
>> value by the mean -but within levels of a factor
>>
>> Hello!
>>
>> I have a data frame with a factor and a numeric variable:
>>
>>
x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,1
>> 0,20,100,200))
>>
>> For each level of "factor" - I would like to divide each
value of
>> "values" by the mean of "values" that corresponds
to the level of
>> "factor"
>
> ave() can do it:
> ? > ave(x$values, x$factor, FUN=function(x)x/mean(x))
> ? [1] 0.6666667 1.3333333 0.6666667 1.3333333 0.6666667 1.3333333
> The plyr package has functions which extend this
> sort of thing.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> In other words, I would like to get a new variable that is equal to:
>> 1/1.5
>> 2/1.5
>> 10/15
>> 20/15
>> 100/150
>> 200/150
>>
>> I realize I could do it through tapply starting with:
>> factor.level.means<-tapply(x$values,x$factor,mean) ... etc.
>>
>>
>> But it seems clunky to me.
>> Is there a more elegant way of doing it?
>>
>> Thanks a lot!
>>
>>
>> --
>> Dimitri Liakhovitski
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
--
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com
Jason Smith
2010-Jan-21 14:28 UTC
[R] standardizing one variable by dividing each value by the mean -but within levels of a factor
Dimitri Liakhovitski <ld7631 <at> gmail.com> writes:> > One follow up question - the proposed solution was (notice - this time > I am introducing one NA in data frame "x") > > x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,NA,10,20,100,200)) > x$std.via.ave<-ave(x$values, x$factor, FUN=function(x)x/mean(x)) > > I compared the result to my own clumsy solution: > > factor.level.means<-as.data.frame(tapply(x$values,x$factor,mean, na.rm=T)) > factor.level.means$factor<-row.names(factor.level.means) > names(factor.level.means)[1]<-"means" > factor.level.means > > x$std<-NA > for(i in 1:nrow(x)){ #i<-1 > x[i,"std"]<-factor.level.means[factor.level.means$factor==x[i,"factor"],"means"]> } > x$std<-x$values/x$std > > If one compares x$std to x$std.via.ave - one notices that ave results > in an NA for the very first observation - because it seems to be using > na.rm=F when it calculates the means. > Is there a way to fix that in the ave solution?I think you are asking how to have the first observation in the ave solution be calculated as 1 instead of NA. As you noted, the ave solution is currently using the default na.rm=F (see ?mean). Simply pass na.rm=T in to your custom function in the ave solution for it to remove the NA and you will get 1 as the average using the ave approach: x$std.via.ave<-ave(x$values, x$factor, FUN=function(x)x/mean(x,na.rm=T)) --jason
hadley wickham
2010-Jan-21 14:44 UTC
[R] standardizing one variable by dividing each value by the mean - but within levels of a factor
On Wed, Jan 20, 2010 at 4:37 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:> Hello! > > I have a data frame with a factor and a numeric variable: > > x<-data.frame(factor=c("b","b","d","d","e","e"),values=c(1,2,10,20,100,200)) > > For each level of "factor" - I would like to divide each value of > "values" by the mean of "values" that corresponds to the level of > "factor" > In other words, I would like to get a new variable that is equal to: > 1/1.5 > 2/1.5 > 10/15 > 20/15 > 100/150 > 200/150 > > I realize I could do it through tapply starting with: > factor.level.means<-tapply(x$values,x$factor,mean) ... etc. > > > But it seems clunky to me. > Is there a more elegant way of doing it?Here's one way with the plyr package: library(plyr) ddply(x, "factor", transform, scaled = values / mean(values, na.rm = T)) Hadley -- http://had.co.nz/