Aline Santos
2011-Mar-11 09:32 UTC
[R] How to calculate means for multiple variables in samples with different sizes
Hello R-helpers: I have data like this: sample replicate height weight age A 1.00 12.0 0.64 6.00 A 2.00 12.2 0.38 6.00 A 3.00 12.4 0.49 6.00 B 1.00 12.7 0.65 4.00 B 2.00 12.8 0.78 5.00 C 1.00 11.9 0.45 6.00 C 2.00 11.84 0.44 2.00 C 3.00 11.43 0.32 3.00 C 4.00 10.24 0.84 4.00 D 1.00 14.2 0.54 2.00 D 2.00 15.67 0.67 7.00 D 3.00 15.11 0.81 7.00 Now, how can I calculate the mean for each condition (heigth, weigth, age) in each sample, considering the samples have different number of replicates? The final matrix should look like: sample height weight age A 12.20 0.50 6.00 B 12.75 0.72 4.50 C 11.35 0.51 3.75 D 14.99 0.67 5.33 This is a simplified version of my dataset, which consist of 100 samples (unequally distributed in 530 replicates) for 600 different conditions. I appreciate all the help. A.S. [[alternative HTML version deleted]]
jim holtman
2011-Mar-11 10:51 UTC
[R] How to calculate means for multiple variables in samples with different sizes
use the package 'data.table'> x <- read.table(textConnection("sample replicate height weight age+ A 1.00 12.0 0.64 6.00 + A 2.00 12.2 0.38 6.00 + A 3.00 12.4 0.49 6.00 + B 1.00 12.7 0.65 4.00 + B 2.00 12.8 0.78 5.00 + C 1.00 11.9 0.45 6.00 + C 2.00 11.84 0.44 2.00 + C 3.00 11.43 0.32 3.00 + C 4.00 10.24 0.84 4.00 + D 1.00 14.2 0.54 2.00 + D 2.00 15.67 0.67 7.00 + D 3.00 15.11 0.81 7.00"), header = TRUE)> closeAllConnections() > require(data.table) > x.dt <- data.table(x) # convert > x.dt[, list(height = mean(height)+ , weight = mean(weight) + , age = mean(age) + ), by = sample] sample height weight age [1,] A 12.20000 0.5033333 6.000000 [2,] B 12.75000 0.7150000 4.500000 [3,] C 11.35250 0.5125000 3.750000 [4,] D 14.99333 0.6733333 5.333333>On Fri, Mar 11, 2011 at 4:32 AM, Aline Santos <alinexss at gmail.com> wrote:> Hello R-helpers: > > I have data like this: > > sample ? ?replicate ? ?height ? ?weight ? ?age > A ? ?1.00 ? ?12.0 ? ?0.64 ? ?6.00 > A ? ?2.00 ? ?12.2 ? ?0.38 ? ?6.00 > A ? ?3.00 ? ?12.4 ? ?0.49 ? ?6.00 > B ? ?1.00 ? ?12.7 ? ?0.65 ? ?4.00 > B ? ?2.00 ? ?12.8 ? ?0.78 ? ?5.00 > C ? ?1.00 ? ?11.9 ? ?0.45 ? ?6.00 > C ? ?2.00 ? ?11.84 ? ?0.44 ? ?2.00 > C ? ?3.00 ? ?11.43 ? ?0.32 ? ?3.00 > C ? ?4.00 ? ?10.24 ? ?0.84 ? ?4.00 > D ? ?1.00 ? ?14.2 ? ?0.54 ? ?2.00 > D ? ?2.00 ? ?15.67 ? ?0.67 ? ?7.00 > D ? ?3.00 ? ?15.11 ? ?0.81 ? ?7.00 > > Now, how can I calculate the mean for each condition (heigth, weigth, age) > in each sample, considering the samples have different number of replicates? > > > The final matrix should look like: > > sample ? ?height ? ?weight ? ?age > A ? ?12.20 ? ?0.50 ? ?6.00 > B ? ? 12.75 ? ? ?0.72 ? ? ?4.50 > C ? ? 11.35 ? ? ?0.51 ? ? ?3.75 > D ? ? 14.99 ? ? ?0.67 ? ? ?5.33 > > This is a simplified version of my dataset, which consist of 100 samples > (unequally distributed in 530 replicates) for 600 different conditions. > > I appreciate all the help. > > A.S. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Berend Hasselman
2011-Mar-11 11:11 UTC
[R] How to calculate means for multiple variables in samples with different sizes
Aline Santos wrote:> > Hello R-helpers: > > I have data like this: > > sample replicate height weight age > A 1.00 12.0 0.64 6.00 > A 2.00 12.2 0.38 6.00 > A 3.00 12.4 0.49 6.00 > B 1.00 12.7 0.65 4.00 > B 2.00 12.8 0.78 5.00 > C 1.00 11.9 0.45 6.00 > C 2.00 11.84 0.44 2.00 > C 3.00 11.43 0.32 3.00 > C 4.00 10.24 0.84 4.00 > D 1.00 14.2 0.54 2.00 > D 2.00 15.67 0.67 7.00 > D 3.00 15.11 0.81 7.00 > > Now, how can I calculate the mean for each condition (heigth, weigth, age) > in each sample, considering the samples have different number of > replicates? > > > The final matrix should look like: > > sample height weight age > A 12.20 0.50 6.00 > B 12.75 0.72 4.50 > C 11.35 0.51 3.75 > D 14.99 0.67 5.33 > > This is a simplified version of my dataset, which consist of 100 samples > (unequally distributed in 530 replicates) for 600 different conditions. >con.data <- textConnection("sample replicate height weight age A 1.00 12.0 0.64 6.00 A 2.00 12.2 0.38 6.00 A 3.00 12.4 0.49 6.00 B 1.00 12.7 0.65 4.00 B 2.00 12.8 0.78 5.00 C 1.00 11.9 0.45 6.00 C 2.00 11.84 0.44 2.00 C 3.00 11.43 0.32 3.00 C 4.00 10.24 0.84 4.00 D 1.00 14.2 0.54 2.00 D 2.00 15.67 0.67 7.00 D 3.00 15.11 0.81 7.00 ") df <- read.table(con.data,header=TRUE) close(con.data) aggregate(df[,!names(df) %in% c("sample","replicate") ],by=list(sample=df$sample), FUN=mean) best regards Berend -- View this message in context: http://r.789695.n4.nabble.com/How-to-calculate-means-for-multiple-variables-in-samples-with-different-sizes-tp3347819p3347895.html Sent from the R help mailing list archive at Nabble.com.
Dennis Murphy
2011-Mar-11 11:13 UTC
[R] How to calculate means for multiple variables in samples with different sizes
Hi: Here are a few one-liners. Calling your data frame dd, aggregate(cbind(height, weight, age) ~ sample, data = dd, FUN = mean) sample height weight age 1 A 12.20000 0.5033333 6.000000 2 B 12.75000 0.7150000 4.500000 3 C 11.35250 0.5125000 3.750000 4 D 14.99333 0.6733333 5.333333 With package doBy: library(doBy) summaryBy(height + weight + age ~ sample, data = dd, FUN = mean) sample height.mean weight.mean age.mean 1 A 12.20000 0.5033333 6.000000 2 B 12.75000 0.7150000 4.500000 3 C 11.35250 0.5125000 3.750000 4 D 14.99333 0.6733333 5.333333 With package plyr: library(plyr) ddply(dd, .(sample), colwise(mean, .(height, weight, age))) sample height weight age 1 A 12.20000 0.5033333 6.000000 2 B 12.75000 0.7150000 4.500000 3 C 11.35250 0.5125000 3.750000 4 D 14.99333 0.6733333 5.333333 Dennis On Fri, Mar 11, 2011 at 1:32 AM, Aline Santos <alinexss@gmail.com> wrote:> Hello R-helpers: > > I have data like this: > > sample replicate height weight age > A 1.00 12.0 0.64 6.00 > A 2.00 12.2 0.38 6.00 > A 3.00 12.4 0.49 6.00 > B 1.00 12.7 0.65 4.00 > B 2.00 12.8 0.78 5.00 > C 1.00 11.9 0.45 6.00 > C 2.00 11.84 0.44 2.00 > C 3.00 11.43 0.32 3.00 > C 4.00 10.24 0.84 4.00 > D 1.00 14.2 0.54 2.00 > D 2.00 15.67 0.67 7.00 > D 3.00 15.11 0.81 7.00 > > Now, how can I calculate the mean for each condition (heigth, weigth, age) > in each sample, considering the samples have different number of > replicates? > > > The final matrix should look like: > > sample height weight age > A 12.20 0.50 6.00 > B 12.75 0.72 4.50 > C 11.35 0.51 3.75 > D 14.99 0.67 5.33 > > This is a simplified version of my dataset, which consist of 100 samples > (unequally distributed in 530 replicates) for 600 different conditions. > > I appreciate all the help. > > A.S. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Matthew Dowle
2011-Mar-11 12:53 UTC
[R] How to calculate means for multiple variables in samples with different sizes
Hi, One liners in data.table are :> x.dt[,lapply(.SD,mean),by=sample]sample replicate height weight age [1,] A 2.0 12.20000 0.5033333 6.000000 [2,] B 1.5 12.75000 0.7150000 4.500000 [3,] C 2.5 11.35250 0.5125000 3.750000 [4,] D 2.0 14.99333 0.6733333 5.333333 without the replicate column :> x.dt[,lapply(list(height,weight,age),mean),by=sample]sample V1 V2 V3 [1,] A 12.20000 0.5033333 6.000000 [2,] B 12.75000 0.7150000 4.500000 [3,] C 11.35250 0.5125000 3.750000 [4,] D 14.99333 0.6733333 5.333333 one (long) way to retain the column names :> x.dt[,lapply(list(height=height,weight=weight,age=age),mean),by=sample]sample height weight age [1,] A 12.20000 0.5033333 6.000000 [2,] B 12.75000 0.7150000 4.500000 [3,] C 11.35250 0.5125000 3.750000 [4,] D 14.99333 0.6733333 5.333333>or this is shorter :> ans = x.dt[,lapply(.SD,mean),by=sample] > ans$replicate = NULL > anssample height weight age [1,] A 12.20000 0.5033333 6.000000 [2,] B 12.75000 0.7150000 4.500000 [3,] C 11.35250 0.5125000 3.750000 [4,] D 14.99333 0.6733333 5.333333>or another way :> mycols = c("height","weight","age") > x.dt[,lapply(.SD[,mycols,with=FALSE],mean),by=sample]sample height weight age [1,] A 12.20000 0.5033333 6.000000 [2,] B 12.75000 0.7150000 4.500000 [3,] C 11.35250 0.5125000 3.750000 [4,] D 14.99333 0.6733333 5.333333>or another way :> x.dt[,lapply(.SD[,list(height,weight,age)],mean),by=sample]sample height weight age [1,] A 12.20000 0.5033333 6.000000 [2,] B 12.75000 0.7150000 4.500000 [3,] C 11.35250 0.5125000 3.750000 [4,] D 14.99333 0.6733333 5.333333>The way Jim showed :> x.dt[, list(height = mean(height)+ , weight = mean(weight) + , age = mean(age) + ), by = sample] is the more flexible syntax for when you want different functions on different columns, easily, and as a bonus is fast. Matthew "Dennis Murphy" <djmuser at gmail.com> wrote in message news:AANLkTimxXL8BqTaYKUb=sAEE2CrA9fOSfuAp4QZkX8fe at mail.gmail.com...> Hi: > > Here are a few one-liners. Calling your data frame dd, > > aggregate(cbind(height, weight, age) ~ sample, data = dd, FUN = mean) > sample height weight age > 1 A 12.20000 0.5033333 6.000000 > 2 B 12.75000 0.7150000 4.500000 > 3 C 11.35250 0.5125000 3.750000 > 4 D 14.99333 0.6733333 5.333333 > > With package doBy: > > library(doBy) > summaryBy(height + weight + age ~ sample, data = dd, FUN = mean) > sample height.mean weight.mean age.mean > 1 A 12.20000 0.5033333 6.000000 > 2 B 12.75000 0.7150000 4.500000 > 3 C 11.35250 0.5125000 3.750000 > 4 D 14.99333 0.6733333 5.333333 > > With package plyr: > > library(plyr) > ddply(dd, .(sample), colwise(mean, .(height, weight, age))) > sample height weight age > 1 A 12.20000 0.5033333 6.000000 > 2 B 12.75000 0.7150000 4.500000 > 3 C 11.35250 0.5125000 3.750000 > 4 D 14.99333 0.6733333 5.333333 > > Dennis > > On Fri, Mar 11, 2011 at 1:32 AM, Aline Santos <alinexss at gmail.com> wrote: > >> Hello R-helpers: >> >> I have data like this: >> >> sample replicate height weight age >> A 1.00 12.0 0.64 6.00 >> A 2.00 12.2 0.38 6.00 >> A 3.00 12.4 0.49 6.00 >> B 1.00 12.7 0.65 4.00 >> B 2.00 12.8 0.78 5.00 >> C 1.00 11.9 0.45 6.00 >> C 2.00 11.84 0.44 2.00 >> C 3.00 11.43 0.32 3.00 >> C 4.00 10.24 0.84 4.00 >> D 1.00 14.2 0.54 2.00 >> D 2.00 15.67 0.67 7.00 >> D 3.00 15.11 0.81 7.00 >> >> Now, how can I calculate the mean for each condition (heigth, weigth, >> age) >> in each sample, considering the samples have different number of >> replicates? >> >> >> The final matrix should look like: >> >> sample height weight age >> A 12.20 0.50 6.00 >> B 12.75 0.72 4.50 >> C 11.35 0.51 3.75 >> D 14.99 0.67 5.33 >> >> This is a simplified version of my dataset, which consist of 100 samples >> (unequally distributed in 530 replicates) for 600 different conditions. >> >> I appreciate all the help. >> >> A.S. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] >
Henrique Dallazuanna
2011-Mar-11 12:58 UTC
[R] How to calculate means for multiple variables in samples with different sizes
Try this: aggregate(. ~ sample, x[-2], FUN = mean) On Fri, Mar 11, 2011 at 6:32 AM, Aline Santos <alinexss@gmail.com> wrote:> Hello R-helpers: > > I have data like this: > > sample replicate height weight age > A 1.00 12.0 0.64 6.00 > A 2.00 12.2 0.38 6.00 > A 3.00 12.4 0.49 6.00 > B 1.00 12.7 0.65 4.00 > B 2.00 12.8 0.78 5.00 > C 1.00 11.9 0.45 6.00 > C 2.00 11.84 0.44 2.00 > C 3.00 11.43 0.32 3.00 > C 4.00 10.24 0.84 4.00 > D 1.00 14.2 0.54 2.00 > D 2.00 15.67 0.67 7.00 > D 3.00 15.11 0.81 7.00 > > Now, how can I calculate the mean for each condition (heigth, weigth, age) > in each sample, considering the samples have different number of > replicates? > > > The final matrix should look like: > > sample height weight age > A 12.20 0.50 6.00 > B 12.75 0.72 4.50 > C 11.35 0.51 3.75 > D 14.99 0.67 5.33 > > This is a simplified version of my dataset, which consist of 100 samples > (unequally distributed in 530 replicates) for 600 different conditions. > > I appreciate all the help. > > A.S. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]