Coghlan, Avril
2011-Aug-25 15:02 UTC
[R] within-groups variance and between-groups variance
Hello, I have been looking for functions for calculating the within-groups variance and between-groups variance, for the case where you have several numerical variables describing samples from a number of groups. I didn't find such functions in R, so wrote my own versions myself (see below). I can calculate the within- and between-groups variance for the Sepal.length variable (iris[1]) in the "iris" data set, by typing:> calcWithinGroupsVariance(iris[1],iris[5])[1] 0.2650082> calcBetweenGroupsVariance(iris[1],iris[5])[1] 0.4300145 I am wondering however if there are functions for doing this already in R? I would prefer to use a standard R function if one exists. Kind Regards, Avril Within-Groups Variance: ====================== calcWithinGroupsVariance <- function(variable,groupvariable) { # find out how many values the group variable can take groupvariable2 <- as.factor(groupvariable[[1]]) levels <- levels(groupvariable2) numlevels <- length(levels) # get the mean and standard deviation for each group: numtotal <- 0 denomtotal <- 0 for (i in 1:numlevels) { leveli <- levels[i] levelidata <- variable[groupvariable==leveli,] levelilength <- length(levelidata) # get the mean and standard deviation for group i: meani <- mean(levelidata) sdi <- sd(levelidata) numi <- (levelilength - 1)*(sdi * sdi) denomi <- levelilength numtotal <- numtotal + numi denomtotal <- denomtotal + denomi } # calculate the within-groups variance Vw <- numtotal / (denomtotal - numlevels) return(Vw) } Between-Groups-Variance: ======================= calcBetweenGroupsVariance <- function(variable,groupvariable) { # find out how many values the group variable can take groupvariable2 <- as.factor(groupvariable[[1]]) levels <- levels(groupvariable2) numlevels <- length(levels) # calculate the overall grand mean: grandmean <- mean(variable) # get the mean and standard deviation for each group: numtotal <- 0 denomtotal <- 0 for (i in 1:numlevels) { leveli <- levels[i] levelidata <- variable[groupvariable==leveli,] levelilength <- length(levelidata) # get the mean and standard deviation for group i: meani <- mean(levelidata) sdi <- sd(levelidata) numi <- levelilength * ((meani - grandmean)^2) denomi <- levelilength numtotal <- numtotal + numi denomtotal <- denomtotal + denomi } # calculate the between-groups variance Vb <- numtotal / (denomtotal - numlevels) Vb <- Vb[[1]] return(Vb) }
Daniel Malter
2011-Aug-25 20:02 UTC
[R] within-groups variance and between-groups variance
#Here is I think an easier way of coding all the components you need. #The within-group variances, you get with this function: apply(iris[,1:4],2,function(x) tapply(x,iris$Species,var)) #You can get what you computed by taking the column means. apply(apply(iris[,1:4],2,function(x) tapply(x,iris$Species,var)),2,mean) #HOWEVER, note that this takes unweighted column means. If your groups contain an UNequal number of observations, you do not want to take the UNweighted means. #Here is how you get the group and grand means: apply(iris[,1:4],2,function(x) tapply(x,iris$Species,mean)) #group means apply(iris[,1:4],2,mean) #grand means #And you also need the number of observations in each bin, I guess: apply(iris[,1:4],2,function(x) tapply(x,iris$Species,length)) #I don't want to ruin all the fun for you. HTH, Daniel Coghlan, Avril wrote:> > Hello, > > I have been looking for functions for calculating the within-groups > variance and between-groups variance, for the case where you have > several numerical variables describing samples from a number of groups. > > I didn't find such functions in R, so wrote my own versions myself (see > below). I can calculate the within- and between-groups variance for the > Sepal.length variable (iris[1]) in the "iris" data set, by typing: >> calcWithinGroupsVariance(iris[1],iris[5]) > [1] 0.2650082 >> calcBetweenGroupsVariance(iris[1],iris[5]) > [1] 0.4300145 > > I am wondering however if there are functions for doing this already in > R? > I would prefer to use a standard R function if one exists. > > Kind Regards, > Avril > > > Within-Groups Variance: > ======================> > calcWithinGroupsVariance <- function(variable,groupvariable) > { > # find out how many values the group variable can take > groupvariable2 <- as.factor(groupvariable[[1]]) > levels <- levels(groupvariable2) > numlevels <- length(levels) > # get the mean and standard deviation for each group: > numtotal <- 0 > denomtotal <- 0 > for (i in 1:numlevels) > { > leveli <- levels[i] > levelidata <- variable[groupvariable==leveli,] > levelilength <- length(levelidata) > # get the mean and standard deviation for group i: > meani <- mean(levelidata) > sdi <- sd(levelidata) > numi <- (levelilength - 1)*(sdi * sdi) > denomi <- levelilength > numtotal <- numtotal + numi > denomtotal <- denomtotal + denomi > } > # calculate the within-groups variance > Vw <- numtotal / (denomtotal - numlevels) > return(Vw) > } > > Between-Groups-Variance: > =======================> > calcBetweenGroupsVariance <- function(variable,groupvariable) > { > # find out how many values the group variable can take > groupvariable2 <- as.factor(groupvariable[[1]]) > levels <- levels(groupvariable2) > numlevels <- length(levels) > # calculate the overall grand mean: > grandmean <- mean(variable) > # get the mean and standard deviation for each group: > numtotal <- 0 > denomtotal <- 0 > for (i in 1:numlevels) > { > leveli <- levels[i] > levelidata <- variable[groupvariable==leveli,] > levelilength <- length(levelidata) > # get the mean and standard deviation for group i: > meani <- mean(levelidata) > sdi <- sd(levelidata) > numi <- levelilength * ((meani - grandmean)^2) > denomi <- levelilength > numtotal <- numtotal + numi > denomtotal <- denomtotal + denomi > } > # calculate the between-groups variance > Vb <- numtotal / (denomtotal - numlevels) > Vb <- Vb[[1]] > return(Vb) > } > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/within-groups-variance-and-between-groups-variance-tp3769027p3769248.html Sent from the R help mailing list archive at Nabble.com.
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Coghlan, Avril > Sent: 25 August 2011 16:02 > To: r-help at r-project.org > Cc: Coghlan, Avril > Subject: [R] within-groups variance and between-groups variance > > Hello, > > I have been looking for functions for calculating the > within-groups variance and between-groups variance, for the > case where you have several numerical variables describing > samples from a number of groups.There doesn't seem to be a function in the core distribution for estimating variance components from linear model anova. If I want to do this from a balanced nested design I usually do something like a <- anova(lm(Sepal.Length~Species, data=iris)) a s2.within<-a[2,3] s2.between <- (a[1,3]-a[2,3])/( (1+a[1,1]+a[2,1])/(a[1,1]+1)) This is just the usual balanced one-way ANOVA estimate of variance components from the expected mean squares, with the within-group replicate number calculated from the degrees of freedom. However, you can very easily, and more generally, use lme to obtain the REML estimates,: library(nlme) iris.lme <- lme(Sepal.Length~1, random=~1|Species, data=iris) VarCorr(iris.lme) In this instance, because of the model specification the (Intercept) term gives you the between-species component of variance, and the residual term gives the within-species term. For this particular example, too, the REML estimates are identical to the linear model version above to six or so places. S Ellison LGC ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}