thr3ads.net - R help - [R] How to calculate means for multiple variables in samples with different sizes [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Aline Santos

2011-Mar-11 09:32 UTC

[R] How to calculate means for multiple variables in samples with different sizes

Hello R-helpers:

I have data like this:

sample    replicate    height    weight    age
A    1.00    12.0    0.64    6.00
A    2.00    12.2    0.38    6.00
A    3.00    12.4    0.49    6.00
B    1.00    12.7    0.65    4.00
B    2.00    12.8    0.78    5.00
C    1.00    11.9    0.45    6.00
C    2.00    11.84    0.44    2.00
C    3.00    11.43    0.32    3.00
C    4.00    10.24    0.84    4.00
D    1.00    14.2    0.54    2.00
D    2.00    15.67    0.67    7.00
D    3.00    15.11    0.81    7.00

Now, how can I calculate the mean for each condition (heigth, weigth, age)
in each sample, considering the samples have different number of replicates?


The final matrix should look like:

sample    height    weight    age
A    12.20    0.50    6.00
B     12.75      0.72      4.50
C     11.35      0.51      3.75
D     14.99      0.67      5.33

This is a simplified version of my dataset, which consist of 100 samples
(unequally distributed in 530 replicates) for 600 different conditions.

I appreciate all the help.

A.S.

	[[alternative HTML version deleted]]

jim holtman

2011-Mar-11 10:51 UTC

head link

[R] How to calculate means for multiple variables in samples with different sizes

use the package 'data.table'
> x <- read.table(textConnection("sample    replicate    height   
weight    age+ A    1.00    12.0    0.64    6.00
+ A    2.00    12.2    0.38    6.00
+ A    3.00    12.4    0.49    6.00
+ B    1.00    12.7    0.65    4.00
+ B    2.00    12.8    0.78    5.00
+ C    1.00    11.9    0.45    6.00
+ C    2.00    11.84    0.44    2.00
+ C    3.00    11.43    0.32    3.00
+ C    4.00    10.24    0.84    4.00
+ D    1.00    14.2    0.54    2.00
+ D    2.00    15.67    0.67    7.00
+ D    3.00    15.11    0.81    7.00"), header =
TRUE)> closeAllConnections()
> require(data.table)
> x.dt <- data.table(x)  # convert
> x.dt[, list(height = mean(height)+            , weight = mean(weight)
+            , age = mean(age)
+            ), by = sample]
     sample   height    weight      age
[1,]      A 12.20000 0.5033333 6.000000
[2,]      B 12.75000 0.7150000 4.500000
[3,]      C 11.35250 0.5125000 3.750000
[4,]      D 14.99333 0.6733333 5.333333>

On Fri, Mar 11, 2011 at 4:32 AM, Aline Santos <alinexss at gmail.com>
wrote:> Hello R-helpers:
>
> I have data like this:
>
> sample ? ?replicate ? ?height ? ?weight ? ?age
> A ? ?1.00 ? ?12.0 ? ?0.64 ? ?6.00
> A ? ?2.00 ? ?12.2 ? ?0.38 ? ?6.00
> A ? ?3.00 ? ?12.4 ? ?0.49 ? ?6.00
> B ? ?1.00 ? ?12.7 ? ?0.65 ? ?4.00
> B ? ?2.00 ? ?12.8 ? ?0.78 ? ?5.00
> C ? ?1.00 ? ?11.9 ? ?0.45 ? ?6.00
> C ? ?2.00 ? ?11.84 ? ?0.44 ? ?2.00
> C ? ?3.00 ? ?11.43 ? ?0.32 ? ?3.00
> C ? ?4.00 ? ?10.24 ? ?0.84 ? ?4.00
> D ? ?1.00 ? ?14.2 ? ?0.54 ? ?2.00
> D ? ?2.00 ? ?15.67 ? ?0.67 ? ?7.00
> D ? ?3.00 ? ?15.11 ? ?0.81 ? ?7.00
>
> Now, how can I calculate the mean for each condition (heigth, weigth, age)
> in each sample, considering the samples have different number of
replicates?
>
>
> The final matrix should look like:
>
> sample ? ?height ? ?weight ? ?age
> A ? ?12.20 ? ?0.50 ? ?6.00
> B ? ? 12.75 ? ? ?0.72 ? ? ?4.50
> C ? ? 11.35 ? ? ?0.51 ? ? ?3.75
> D ? ? 14.99 ? ? ?0.67 ? ? ?5.33
>
> This is a simplified version of my dataset, which consist of 100 samples
> (unequally distributed in 530 replicates) for 600 different conditions.
>
> I appreciate all the help.
>
> A.S.
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

Berend Hasselman

2011-Mar-11 11:11 UTC

head link

[R] How to calculate means for multiple variables in samples with different sizes

Aline Santos wrote:> 
> Hello R-helpers:
> 
> I have data like this:
> 
> sample    replicate    height    weight    age
> A    1.00    12.0    0.64    6.00
> A    2.00    12.2    0.38    6.00
> A    3.00    12.4    0.49    6.00
> B    1.00    12.7    0.65    4.00
> B    2.00    12.8    0.78    5.00
> C    1.00    11.9    0.45    6.00
> C    2.00    11.84    0.44    2.00
> C    3.00    11.43    0.32    3.00
> C    4.00    10.24    0.84    4.00
> D    1.00    14.2    0.54    2.00
> D    2.00    15.67    0.67    7.00
> D    3.00    15.11    0.81    7.00
> 
> Now, how can I calculate the mean for each condition (heigth, weigth, age)
> in each sample, considering the samples have different number of
> replicates?
> 
> 
> The final matrix should look like:
> 
> sample    height    weight    age
> A    12.20    0.50    6.00
> B     12.75      0.72      4.50
> C     11.35      0.51      3.75
> D     14.99      0.67      5.33
> 
> This is a simplified version of my dataset, which consist of 100 samples
> (unequally distributed in 530 replicates) for 600 different conditions.
> 
con.data <- textConnection("sample    replicate    height    weight   
age
A    1.00    12.0    0.64    6.00 
A    2.00    12.2    0.38    6.00 
A    3.00    12.4    0.49    6.00 
B    1.00    12.7    0.65    4.00 
B    2.00    12.8    0.78    5.00 
C    1.00    11.9    0.45    6.00 
C    2.00    11.84    0.44    2.00 
C    3.00    11.43    0.32    3.00 
C    4.00    10.24    0.84    4.00 
D    1.00    14.2    0.54    2.00 
D    2.00    15.67    0.67    7.00 
D    3.00    15.11    0.81    7.00 ")

df <- read.table(con.data,header=TRUE)
close(con.data)
aggregate(df[,!names(df) %in% c("sample","replicate")
],by=list(sample=df$sample), FUN=mean)

best regards

Berend

--
View this message in context:
http://r.789695.n4.nabble.com/How-to-calculate-means-for-multiple-variables-in-samples-with-different-sizes-tp3347819p3347895.html
Sent from the R help mailing list archive at Nabble.com.

Dennis Murphy

2011-Mar-11 11:13 UTC

head link

[R] How to calculate means for multiple variables in samples with different sizes

Hi:

Here are a few one-liners. Calling your data frame dd,

aggregate(cbind(height, weight, age) ~ sample, data = dd, FUN = mean)
  sample   height    weight      age
1      A 12.20000 0.5033333 6.000000
2      B 12.75000 0.7150000 4.500000
3      C 11.35250 0.5125000 3.750000
4      D 14.99333 0.6733333 5.333333

With package doBy:

library(doBy)
summaryBy(height + weight + age ~ sample, data = dd, FUN = mean)
  sample height.mean weight.mean age.mean
1      A    12.20000   0.5033333 6.000000
2      B    12.75000   0.7150000 4.500000
3      C    11.35250   0.5125000 3.750000
4      D    14.99333   0.6733333 5.333333

With package plyr:

library(plyr)
ddply(dd, .(sample), colwise(mean, .(height, weight, age)))
  sample   height    weight      age
1      A 12.20000 0.5033333 6.000000
2      B 12.75000 0.7150000 4.500000
3      C 11.35250 0.5125000 3.750000
4      D 14.99333 0.6733333 5.333333

Dennis

On Fri, Mar 11, 2011 at 1:32 AM, Aline Santos <alinexss@gmail.com> wrote:
> Hello R-helpers:
>
> I have data like this:
>
> sample    replicate    height    weight    age
> A    1.00    12.0    0.64    6.00
> A    2.00    12.2    0.38    6.00
> A    3.00    12.4    0.49    6.00
> B    1.00    12.7    0.65    4.00
> B    2.00    12.8    0.78    5.00
> C    1.00    11.9    0.45    6.00
> C    2.00    11.84    0.44    2.00
> C    3.00    11.43    0.32    3.00
> C    4.00    10.24    0.84    4.00
> D    1.00    14.2    0.54    2.00
> D    2.00    15.67    0.67    7.00
> D    3.00    15.11    0.81    7.00
>
> Now, how can I calculate the mean for each condition (heigth, weigth, age)
> in each sample, considering the samples have different number of
> replicates?
>
>
> The final matrix should look like:
>
> sample    height    weight    age
> A    12.20    0.50    6.00
> B     12.75      0.72      4.50
> C     11.35      0.51      3.75
> D     14.99      0.67      5.33
>
> This is a simplified version of my dataset, which consist of 100 samples
> (unequally distributed in 530 replicates) for 600 different conditions.
>
> I appreciate all the help.
>
> A.S.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Matthew Dowle

2011-Mar-11 12:53 UTC

head link

[R] How to calculate means for multiple variables in samples with different sizes

Hi,

One liners in data.table are :
> x.dt[,lapply(.SD,mean),by=sample]     sample replicate   height    weight      age
[1,]      A       2.0 12.20000 0.5033333 6.000000
[2,]      B       1.5 12.75000 0.7150000 4.500000
[3,]      C       2.5 11.35250 0.5125000 3.750000
[4,]      D       2.0 14.99333 0.6733333 5.333333

without the replicate column :
> x.dt[,lapply(list(height,weight,age),mean),by=sample]     sample       V1        V2       V3
[1,]      A 12.20000 0.5033333 6.000000
[2,]      B 12.75000 0.7150000 4.500000
[3,]      C 11.35250 0.5125000 3.750000
[4,]      D 14.99333 0.6733333 5.333333

one (long) way to retain the column names :
> x.dt[,lapply(list(height=height,weight=weight,age=age),mean),by=sample]     sample   height    weight      age
[1,]      A 12.20000 0.5033333 6.000000
[2,]      B 12.75000 0.7150000 4.500000
[3,]      C 11.35250 0.5125000 3.750000
[4,]      D 14.99333 0.6733333 5.333333>
or this is shorter :
> ans = x.dt[,lapply(.SD,mean),by=sample]
> ans$replicate = NULL
> ans     sample   height    weight      age
[1,]      A 12.20000 0.5033333 6.000000
[2,]      B 12.75000 0.7150000 4.500000
[3,]      C 11.35250 0.5125000 3.750000
[4,]      D 14.99333 0.6733333 5.333333>
or another way :
> mycols = c("height","weight","age")
> x.dt[,lapply(.SD[,mycols,with=FALSE],mean),by=sample]     sample   height    weight      age
[1,]      A 12.20000 0.5033333 6.000000
[2,]      B 12.75000 0.7150000 4.500000
[3,]      C 11.35250 0.5125000 3.750000
[4,]      D 14.99333 0.6733333 5.333333>
or another way :
> x.dt[,lapply(.SD[,list(height,weight,age)],mean),by=sample]     sample   height    weight      age
[1,]      A 12.20000 0.5033333 6.000000
[2,]      B 12.75000 0.7150000 4.500000
[3,]      C 11.35250 0.5125000 3.750000
[4,]      D 14.99333 0.6733333 5.333333>
The way Jim showed :
> x.dt[, list(height = mean(height)+            , weight = mean(weight)
+            , age = mean(age)
+            ), by = sample]

is the more flexible syntax for when you want different functions on 
different columns, easily, and as a bonus is fast.

Matthew


"Dennis Murphy" <djmuser at gmail.com> wrote in message 
news:AANLkTimxXL8BqTaYKUb=sAEE2CrA9fOSfuAp4QZkX8fe at
mail.gmail.com...> Hi:
>
> Here are a few one-liners. Calling your data frame dd,
>
> aggregate(cbind(height, weight, age) ~ sample, data = dd, FUN = mean)
>  sample   height    weight      age
> 1      A 12.20000 0.5033333 6.000000
> 2      B 12.75000 0.7150000 4.500000
> 3      C 11.35250 0.5125000 3.750000
> 4      D 14.99333 0.6733333 5.333333
>
> With package doBy:
>
> library(doBy)
> summaryBy(height + weight + age ~ sample, data = dd, FUN = mean)
>  sample height.mean weight.mean age.mean
> 1      A    12.20000   0.5033333 6.000000
> 2      B    12.75000   0.7150000 4.500000
> 3      C    11.35250   0.5125000 3.750000
> 4      D    14.99333   0.6733333 5.333333
>
> With package plyr:
>
> library(plyr)
> ddply(dd, .(sample), colwise(mean, .(height, weight, age)))
>  sample   height    weight      age
> 1      A 12.20000 0.5033333 6.000000
> 2      B 12.75000 0.7150000 4.500000
> 3      C 11.35250 0.5125000 3.750000
> 4      D 14.99333 0.6733333 5.333333
>
> Dennis
>
> On Fri, Mar 11, 2011 at 1:32 AM, Aline Santos <alinexss at gmail.com>
wrote:
>
>> Hello R-helpers:
>>
>> I have data like this:
>>
>> sample    replicate    height    weight    age
>> A    1.00    12.0    0.64    6.00
>> A    2.00    12.2    0.38    6.00
>> A    3.00    12.4    0.49    6.00
>> B    1.00    12.7    0.65    4.00
>> B    2.00    12.8    0.78    5.00
>> C    1.00    11.9    0.45    6.00
>> C    2.00    11.84    0.44    2.00
>> C    3.00    11.43    0.32    3.00
>> C    4.00    10.24    0.84    4.00
>> D    1.00    14.2    0.54    2.00
>> D    2.00    15.67    0.67    7.00
>> D    3.00    15.11    0.81    7.00
>>
>> Now, how can I calculate the mean for each condition (heigth, weigth, 
>> age)
>> in each sample, considering the samples have different number of
>> replicates?
>>
>>
>> The final matrix should look like:
>>
>> sample    height    weight    age
>> A    12.20    0.50    6.00
>> B     12.75      0.72      4.50
>> C     11.35      0.51      3.75
>> D     14.99      0.67      5.33
>>
>> This is a simplified version of my dataset, which consist of 100
samples
>> (unequally distributed in 530 replicates) for 600 different conditions.
>>
>> I appreciate all the help.
>>
>> A.S.
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>

Henrique Dallazuanna

2011-Mar-11 12:58 UTC

head link

[R] How to calculate means for multiple variables in samples with different sizes

Try this:

aggregate(. ~ sample, x[-2], FUN = mean)

On Fri, Mar 11, 2011 at 6:32 AM, Aline Santos <alinexss@gmail.com> wrote:
> Hello R-helpers:
>
> I have data like this:
>
> sample    replicate    height    weight    age
> A    1.00    12.0    0.64    6.00
> A    2.00    12.2    0.38    6.00
> A    3.00    12.4    0.49    6.00
> B    1.00    12.7    0.65    4.00
> B    2.00    12.8    0.78    5.00
> C    1.00    11.9    0.45    6.00
> C    2.00    11.84    0.44    2.00
> C    3.00    11.43    0.32    3.00
> C    4.00    10.24    0.84    4.00
> D    1.00    14.2    0.54    2.00
> D    2.00    15.67    0.67    7.00
> D    3.00    15.11    0.81    7.00
>
> Now, how can I calculate the mean for each condition (heigth, weigth, age)
> in each sample, considering the samples have different number of
> replicates?
>
>
> The final matrix should look like:
>
> sample    height    weight    age
> A    12.20    0.50    6.00
> B     12.75      0.72      4.50
> C     11.35      0.51      3.75
> D     14.99      0.67      5.33
>
> This is a simplified version of my dataset, which consist of 100 samples
> (unequally distributed in 530 replicates) for 600 different conditions.
>
> I appreciate all the help.
>
> A.S.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Mar 2011 - How to calculate means for multiple variables in samples with different sizes

[R] How to calculate means for multiple variables in samples with different sizes

[R] How to calculate means for multiple variables in samples with different sizes

[R] How to calculate means for multiple variables in samples with different sizes

[R] How to calculate means for multiple variables in samples with different sizes

[R] How to calculate means for multiple variables in samples with different sizes

[R] How to calculate means for multiple variables in samples with different sizes

Apparently Analagous Threads