thr3ads.net - R help - [R] means, SD's and tapply [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Christopher R. Dolanc

2011-Feb-25 20:09 UTC

[R] means, SD's and tapply

I'm trying to use tapply to output means and SD or SE for my data but 
seem to be limited by how many times I can subset it.  Here's a snippet 
of my data

 > stems353[1:10,]
      Time DataSource   Plot Elevation Aspect Slope     Type Species 
SizeClass Stems
1  Modern    Cameron 70F221      1730    ESE    20  Conifer    ABCO    
Class1     3
2  Modern    Cameron 70F221      1730    ESE    20  Conifer    ABMA    
Class1     0
3  Modern    Cameron 70F221      1730    ESE    20 Hardwood    ACMA    
Class1     0
4  Modern    Cameron 70F221      1730    ESE    20 Hardwood    AECA    
Class1     0
5  Modern    Cameron 70F221      1730    ESE    20 Hardwood    ARME    
Class1     0
6  Modern    Cameron 70F221      1730    ESE    20  Conifer    CADE    
Class1    15
7  Modern    Cameron 70F221      1730    ESE    20 Hardwood    CELE    
Class1     0
8  Modern    Cameron 70F221      1730    ESE    20 Hardwood    CONU    
Class1     0
9  Modern    Cameron 70F221      1730    ESE    20  Conifer    JUCA    
Class1     0
10 Modern    Cameron 70F221      1730    ESE    20  Conifer    JUOC    
Class1     0

I'd like to see means/SD of "Stems" stratified by
"Species", "Time" and
"SizeClass".  I can get R to give me this for means by species:

 > tapply(stems353$Stems, stems353$Species, mean)
         ABCO         ABMA         ACMA         AECA         
ARME         CADE         CELE
0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 
0.4684844193 0.0063739377
         CONU         JUCA         JUOC         LIDE         
PIAL         PICO         PIJE
0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 
1.5651558074 0.2315864023
         PILA         PIMO        PIMO2         PIPO         
PISA         POTR         PSME
0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 
0.0506373938 0.2000708215
         QUCH         QUDO         QUDU         QUKE         
QULO         QUWI        Salix
0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076 
0.0548866856 0.0003541076
         SEGI         TSME
0.0021246459 0.5017705382
 >

but I really need to see each species by SizeClass and Time so that each 
value would be labeled something like "ABCOSizeClass1TimeModern".  
Adding 2 variables to the function doesn't seem to work

 > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, 
stems353$Time, mean)
Error in match.fun(FUN) :
   'stems353$SizeClass' is not a function, character or symbol

I've already created proper subsets for each of these groups, e.g. one 
subset is called "stems353ABCO1" and I can run analyses on this.  But,
trying to extract means straight from those subsets doesn't seem to work

 > mean(stems353ABCO1)
[1] NA
Warning message:
In mean.default(stems353ABCO1) :
   argument is not numeric or logical: returning NA
 >

Thanks,
Chris Dolanc

-- 
Christopher R. Dolanc
PhD Candidate
Ecology Graduate Group
University of California, Davis
Lab Phone: (530) 752-2644 (Barbour lab)


	[[alternative HTML version deleted]]

Scott Chamberlain

2011-Feb-25 21:01 UTC

head link

[R] means, SD's and tapply

chris, it seems like you need the plyr package, esp ddply. for example:

stems353 <- data.frame(Time = rep(c("Modern", "Old"), 4),
SizeClass = rep(c("class1","class2"), each = 4),
Species = rep(c("a","b"), each = 4), 
Stems = seq(1,8,1))

ddply(stems353, .(Species, SizeClass, Time), summarise, 
mean = mean(Stems)
)

On Friday, February 25, 2011 at 2:09 PM, Christopher R. Dolanc wrote:
> I'm trying to use tapply to output means and SD or SE for my data but 
> seem to be limited by how many times I can subset it. Here's a snippet 
> of my data
> 
> > stems353[1:10,]
>  Time DataSource Plot Elevation Aspect Slope Type Species 
> SizeClass Stems
> 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO 
> Class1 3
> 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA 
> Class1 0
> 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA 
> Class1 0
> 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA 
> Class1 0
> 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME 
> Class1 0
> 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE 
> Class1 15
> 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE 
> Class1 0
> 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU 
> Class1 0
> 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA 
> Class1 0
> 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC 
> Class1 0
> 
> I'd like to see means/SD of "Stems" stratified by
"Species", "Time" and
> "SizeClass". I can get R to give me this for means by species:
> 
> > tapply(stems353$Stems, stems353$Species, mean)
>  ABCO ABMA ACMA AECA 
> ARME CADE CELE
> 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 
> 0.4684844193 0.0063739377
>  CONU JUCA JUOC LIDE 
> PIAL PICO PIJE
> 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 
> 1.5651558074 0.2315864023
>  PILA PIMO PIMO2 PIPO 
> PISA POTR PSME
> 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 
> 0.0506373938 0.2000708215
>  QUCH QUDO QUDU QUKE 
> QULO QUWI Salix
> 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076 
> 0.0548866856 0.0003541076
>  SEGI TSME
> 0.0021246459 0.5017705382
> >
> 
> but I really need to see each species by SizeClass and Time so that each 
> value would be labeled something like "ABCOSizeClass1TimeModern".
> Adding 2 variables to the function doesn't seem to work
> 
> > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, 
> stems353$Time, mean)
> Error in match.fun(FUN) :
>  'stems353$SizeClass' is not a function, character or symbol
> 
> I've already created proper subsets for each of these groups, e.g. one 
> subset is called "stems353ABCO1" and I can run analyses on this.
But,
> trying to extract means straight from those subsets doesn't seem to
work
> 
> > mean(stems353ABCO1)
> [1] NA
> Warning message:
> In mean.default(stems353ABCO1) :
>  argument is not numeric or logical: returning NA
> >
> 
> Thanks,
> Chris Dolanc
> 
> -- 
> Christopher R. Dolanc
> PhD Candidate
> Ecology Graduate Group
> University of California, Davis
> Lab Phone: (530) 752-2644 (Barbour lab)
> 
> 
>  [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
	[[alternative HTML version deleted]]

David Winsemius

2011-Feb-25 21:04 UTC

head link

[R] means, SD's and tapply

On Feb 25, 2011, at 3:09 PM, Christopher R. Dolanc wrote:
> I'm trying to use tapply to output means and SD or SE for my data but
> seem to be limited by how many times I can subset it.  Here's a  
> snippet
> of my data
>
>> stems353[1:10,]
>      Time DataSource   Plot Elevation Aspect Slope     Type Species
> SizeClass Stems
> 1  Modern    Cameron 70F221      1730    ESE    20  Conifer    ABCO
> Class1     3
> 2  Modern    Cameron 70F221      1730    ESE    20  Conifer    ABMA
> Class1     0
> 3  Modern    Cameron 70F221      1730    ESE    20 Hardwood    ACMA
> Class1     0
> 4  Modern    Cameron 70F221      1730    ESE    20 Hardwood    AECA
> Class1     0
> 5  Modern    Cameron 70F221      1730    ESE    20 Hardwood    ARME
> Class1     0
> 6  Modern    Cameron 70F221      1730    ESE    20  Conifer    CADE
> Class1    15
> 7  Modern    Cameron 70F221      1730    ESE    20 Hardwood    CELE
> Class1     0
> 8  Modern    Cameron 70F221      1730    ESE    20 Hardwood    CONU
> Class1     0
> 9  Modern    Cameron 70F221      1730    ESE    20  Conifer    JUCA
> Class1     0
> 10 Modern    Cameron 70F221      1730    ESE    20  Conifer    JUOC
> Class1     0
>
> I'd like to see means/SD of "Stems" stratified by
"Species", "Time"
> and
> "SizeClass".  I can get R to give me this for means by species:
>
>> tapply(stems353$Stems, stems353$Species, mean)
>         ABCO         ABMA         ACMA         AECA
> ARME         CADE         CELE
> 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382
> 0.4684844193 0.0063739377
>         CONU         JUCA         JUOC         LIDE
> PIAL         PICO         PIJE
> 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365
> 1.5651558074 0.2315864023
>         PILA         PIMO        PIMO2         PIPO
> PISA         POTR         PSME
> 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125
> 0.0506373938 0.2000708215
>         QUCH         QUDO         QUDU         QUKE
> QULO         QUWI        Salix
> 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076
> 0.0548866856 0.0003541076
>         SEGI         TSME
> 0.0021246459 0.5017705382
>>
>
> but I really need to see each species by SizeClass and Time so that  
> each
> value would be labeled something like "ABCOSizeClass1TimeModern".
> Adding 2 variables to the function doesn't seem to work
>
>> tapply(stems353$Stems, stems353$Species, stems353$SizeClass,
> stems353$Time, mean)
Some functions let you put an arbitrary number of items after the  
first (aggregate() always confuses me because it _does_ this)  but  
tapply expects them to be in a list or vector, so try:

with( stems353, tapply(Stems, list(Species, SizeClass, Time) , mean) )

with() improves readability
> Error in match.fun(FUN) :
>   'stems353$SizeClass' is not a function, character or symbol
The third item in your arguments got matched to what tapply was  
expecting to be a function name.
>
> I've already created proper subsets for each of these groups, e.g. one
> subset is called "stems353ABCO1" and I can run analyses on this. 
But,
> trying to extract means straight from those subsets doesn't seem to  
> work
>
>> mean(stems353ABCO1)
> [1] NA
> Warning message:
> In mean.default(stems353ABCO1) :
>   argument is not numeric or logical: returning NA
>>
>

David Winsemius, MD
West Hartford, CT

Dennis Murphy

2011-Feb-25 21:04 UTC

head link

[R] means, SD's and tapply

Hi:

On Fri, Feb 25, 2011 at 12:09 PM, Christopher R. Dolanc <
crdolanc@ucdavis.edu> wrote:
> I'm trying to use tapply to output means and SD or SE for my data but
> seem to be limited by how many times I can subset it.  Here's a snippet
> of my data
>
>  > stems353[1:10,]
>      Time DataSource   Plot Elevation Aspect Slope     Type Species
> SizeClass Stems
> 1  Modern    Cameron 70F221      1730    ESE    20  Conifer    ABCO
> Class1     3
> 2  Modern    Cameron 70F221      1730    ESE    20  Conifer    ABMA
> Class1     0
> 3  Modern    Cameron 70F221      1730    ESE    20 Hardwood    ACMA
> Class1     0
> 4  Modern    Cameron 70F221      1730    ESE    20 Hardwood    AECA
> Class1     0
> 5  Modern    Cameron 70F221      1730    ESE    20 Hardwood    ARME
> Class1     0
> 6  Modern    Cameron 70F221      1730    ESE    20  Conifer    CADE
> Class1    15
> 7  Modern    Cameron 70F221      1730    ESE    20 Hardwood    CELE
> Class1     0
> 8  Modern    Cameron 70F221      1730    ESE    20 Hardwood    CONU
> Class1     0
> 9  Modern    Cameron 70F221      1730    ESE    20  Conifer    JUCA
> Class1     0
> 10 Modern    Cameron 70F221      1730    ESE    20  Conifer    JUOC
> Class1     0
>
> I'd like to see means/SD of "Stems" stratified by
"Species", "Time" and
> "SizeClass".  I can get R to give me this for means by species:
>
>  > tapply(stems353$Stems, stems353$Species, mean)
>         ABCO         ABMA         ACMA         AECA
> ARME         CADE         CELE
> 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382
> 0.4684844193 0.0063739377
>         CONU         JUCA         JUOC         LIDE
> PIAL         PICO         PIJE
> 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365
> 1.5651558074 0.2315864023
>         PILA         PIMO        PIMO2         PIPO
> PISA         POTR         PSME
> 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125
> 0.0506373938 0.2000708215
>         QUCH         QUDO         QUDU         QUKE
> QULO         QUWI        Salix
> 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076
> 0.0548866856 0.0003541076
>         SEGI         TSME
> 0.0021246459 0.5017705382
>  >
>
There are several approaches here, including the aggregate() function in
base R, the doBy package or the plyr package, among others:

# Requires R 2.11.0 or above:
aggregate(Stems ~ Species + Time + SizeClass, data = stems353, FUN = mean)

# To get more than one output per group, one can use either of the above
packages:

library(plyr)
ddply(stems353, .(Species, Time, SizeClass), summarise, avgStems mean(Stems),
sdStems = sd(Stems))

library(doBy)
f <- function(x) c(mean = mean(x), sd = sd(x))
summaryBy(Stems ~ Species + Time + SizeClass, data = stems353, FUN = f)

# Another possibility is package data.table:
dt <- data.table(stems353,key = 'Species, Time, SizeClass')
dt[, list(avgStems = mean(Stems), sdStems = sd(Stems)), by = 'Species, Time,
SizeClass']

All of this is untested, so caveat emptor. Other possibilities include
package sqldf, if you are comfortable with SQL syntax, package remix or
package Hmisc. In other words, R has a number of efficient ways to summarize
data.

HTH,
Dennis
>
> but I really need to see each species by SizeClass and Time so that each
> value would be labeled something like "ABCOSizeClass1TimeModern".
> Adding 2 variables to the function doesn't seem to work
>
>  > tapply(stems353$Stems, stems353$Species, stems353$SizeClass,
> stems353$Time, mean)
> Error in match.fun(FUN) :
>   'stems353$SizeClass' is not a function, character or symbol
>
> I've already created proper subsets for each of these groups, e.g. one
> subset is called "stems353ABCO1" and I can run analyses on this. 
But,
> trying to extract means straight from those subsets doesn't seem to
work
>
>  > mean(stems353ABCO1)
> [1] NA
> Warning message:
> In mean.default(stems353ABCO1) :
>   argument is not numeric or logical: returning NA
>  >
>
> Thanks,
> Chris Dolanc
>
> --
> Christopher R. Dolanc
> PhD Candidate
> Ecology Graduate Group
> University of California, Davis
> Lab Phone: (530) 752-2644 (Barbour lab)
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

zem

2011-Feb-25 21:06 UTC

head link

[R] means, SD's and tapply

Hi Christopher,

i think you have the same problem like me today :) 
see this  
http://r.789695.n4.nabble.com/group-by-in-data-frame-tc3324240.html post 
i think you can find there the solution

zem
-- 
View this message in context:
http://r.789695.n4.nabble.com/means-SD-s-and-tapply-tp3325158p3325191.html
Sent from the R help mailing list archive at Nabble.com.

Reasonably Related Threads

Search for more apparently analagous threads

R help - Feb 2011 - means, SD's and tapply

[R] means, SD's and tapply

[R] means, SD's and tapply

[R] means, SD's and tapply

[R] means, SD's and tapply

[R] means, SD's and tapply

Reasonably Related Threads