I'm trying to use tapply to output means and SD or SE for my data but
seem to be limited by how many times I can subset it. Here's a snippet
of my data
> stems353[1:10,]
Time DataSource Plot Elevation Aspect Slope Type Species
SizeClass Stems
1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO
Class1 3
2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA
Class1 0
3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA
Class1 0
4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA
Class1 0
5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME
Class1 0
6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE
Class1 15
7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE
Class1 0
8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU
Class1 0
9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA
Class1 0
10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC
Class1 0
I'd like to see means/SD of "Stems" stratified by
"Species", "Time" and
"SizeClass". I can get R to give me this for means by species:
> tapply(stems353$Stems, stems353$Species, mean)
ABCO ABMA ACMA AECA
ARME CADE CELE
0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382
0.4684844193 0.0063739377
CONU JUCA JUOC LIDE
PIAL PICO PIJE
0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365
1.5651558074 0.2315864023
PILA PIMO PIMO2 PIPO
PISA POTR PSME
0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125
0.0506373938 0.2000708215
QUCH QUDO QUDU QUKE
QULO QUWI Salix
0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076
0.0548866856 0.0003541076
SEGI TSME
0.0021246459 0.5017705382
>
but I really need to see each species by SizeClass and Time so that each
value would be labeled something like "ABCOSizeClass1TimeModern".
Adding 2 variables to the function doesn't seem to work
> tapply(stems353$Stems, stems353$Species, stems353$SizeClass,
stems353$Time, mean)
Error in match.fun(FUN) :
'stems353$SizeClass' is not a function, character or symbol
I've already created proper subsets for each of these groups, e.g. one
subset is called "stems353ABCO1" and I can run analyses on this. But,
trying to extract means straight from those subsets doesn't seem to work
> mean(stems353ABCO1)
[1] NA
Warning message:
In mean.default(stems353ABCO1) :
argument is not numeric or logical: returning NA
>
Thanks,
Chris Dolanc
--
Christopher R. Dolanc
PhD Candidate
Ecology Graduate Group
University of California, Davis
Lab Phone: (530) 752-2644 (Barbour lab)
[[alternative HTML version deleted]]
chris, it seems like you need the plyr package, esp ddply. for example:
stems353 <- data.frame(Time = rep(c("Modern", "Old"), 4),
SizeClass = rep(c("class1","class2"), each = 4),
Species = rep(c("a","b"), each = 4),
Stems = seq(1,8,1))
ddply(stems353, .(Species, SizeClass, Time), summarise,
mean = mean(Stems)
)
On Friday, February 25, 2011 at 2:09 PM, Christopher R. Dolanc wrote:
> I'm trying to use tapply to output means and SD or SE for my data but
> seem to be limited by how many times I can subset it. Here's a snippet
> of my data
>
> > stems353[1:10,]
> Time DataSource Plot Elevation Aspect Slope Type Species
> SizeClass Stems
> 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO
> Class1 3
> 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA
> Class1 0
> 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA
> Class1 0
> 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA
> Class1 0
> 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME
> Class1 0
> 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE
> Class1 15
> 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE
> Class1 0
> 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU
> Class1 0
> 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA
> Class1 0
> 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC
> Class1 0
>
> I'd like to see means/SD of "Stems" stratified by
"Species", "Time" and
> "SizeClass". I can get R to give me this for means by species:
>
> > tapply(stems353$Stems, stems353$Species, mean)
> ABCO ABMA ACMA AECA
> ARME CADE CELE
> 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382
> 0.4684844193 0.0063739377
> CONU JUCA JUOC LIDE
> PIAL PICO PIJE
> 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365
> 1.5651558074 0.2315864023
> PILA PIMO PIMO2 PIPO
> PISA POTR PSME
> 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125
> 0.0506373938 0.2000708215
> QUCH QUDO QUDU QUKE
> QULO QUWI Salix
> 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076
> 0.0548866856 0.0003541076
> SEGI TSME
> 0.0021246459 0.5017705382
> >
>
> but I really need to see each species by SizeClass and Time so that each
> value would be labeled something like "ABCOSizeClass1TimeModern".
> Adding 2 variables to the function doesn't seem to work
>
> > tapply(stems353$Stems, stems353$Species, stems353$SizeClass,
> stems353$Time, mean)
> Error in match.fun(FUN) :
> 'stems353$SizeClass' is not a function, character or symbol
>
> I've already created proper subsets for each of these groups, e.g. one
> subset is called "stems353ABCO1" and I can run analyses on this.
But,
> trying to extract means straight from those subsets doesn't seem to
work
>
> > mean(stems353ABCO1)
> [1] NA
> Warning message:
> In mean.default(stems353ABCO1) :
> argument is not numeric or logical: returning NA
> >
>
> Thanks,
> Chris Dolanc
>
> --
> Christopher R. Dolanc
> PhD Candidate
> Ecology Graduate Group
> University of California, Davis
> Lab Phone: (530) 752-2644 (Barbour lab)
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
On Feb 25, 2011, at 3:09 PM, Christopher R. Dolanc wrote:> I'm trying to use tapply to output means and SD or SE for my data but > seem to be limited by how many times I can subset it. Here's a > snippet > of my data > >> stems353[1:10,] > Time DataSource Plot Elevation Aspect Slope Type Species > SizeClass Stems > 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO > Class1 3 > 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA > Class1 0 > 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA > Class1 0 > 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA > Class1 0 > 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME > Class1 0 > 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE > Class1 15 > 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE > Class1 0 > 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU > Class1 0 > 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA > Class1 0 > 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC > Class1 0 > > I'd like to see means/SD of "Stems" stratified by "Species", "Time" > and > "SizeClass". I can get R to give me this for means by species: > >> tapply(stems353$Stems, stems353$Species, mean) > ABCO ABMA ACMA AECA > ARME CADE CELE > 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 > 0.4684844193 0.0063739377 > CONU JUCA JUOC LIDE > PIAL PICO PIJE > 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 > 1.5651558074 0.2315864023 > PILA PIMO PIMO2 PIPO > PISA POTR PSME > 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 > 0.0506373938 0.2000708215 > QUCH QUDO QUDU QUKE > QULO QUWI Salix > 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076 > 0.0548866856 0.0003541076 > SEGI TSME > 0.0021246459 0.5017705382 >> > > but I really need to see each species by SizeClass and Time so that > each > value would be labeled something like "ABCOSizeClass1TimeModern". > Adding 2 variables to the function doesn't seem to work > >> tapply(stems353$Stems, stems353$Species, stems353$SizeClass, > stems353$Time, mean)Some functions let you put an arbitrary number of items after the first (aggregate() always confuses me because it _does_ this) but tapply expects them to be in a list or vector, so try: with( stems353, tapply(Stems, list(Species, SizeClass, Time) , mean) ) with() improves readability> Error in match.fun(FUN) : > 'stems353$SizeClass' is not a function, character or symbolThe third item in your arguments got matched to what tapply was expecting to be a function name.> > I've already created proper subsets for each of these groups, e.g. one > subset is called "stems353ABCO1" and I can run analyses on this. But, > trying to extract means straight from those subsets doesn't seem to > work > >> mean(stems353ABCO1) > [1] NA > Warning message: > In mean.default(stems353ABCO1) : > argument is not numeric or logical: returning NA >> >David Winsemius, MD West Hartford, CT
Hi: On Fri, Feb 25, 2011 at 12:09 PM, Christopher R. Dolanc < crdolanc@ucdavis.edu> wrote:> I'm trying to use tapply to output means and SD or SE for my data but > seem to be limited by how many times I can subset it. Here's a snippet > of my data > > > stems353[1:10,] > Time DataSource Plot Elevation Aspect Slope Type Species > SizeClass Stems > 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO > Class1 3 > 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA > Class1 0 > 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA > Class1 0 > 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA > Class1 0 > 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME > Class1 0 > 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE > Class1 15 > 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE > Class1 0 > 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU > Class1 0 > 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA > Class1 0 > 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC > Class1 0 > > I'd like to see means/SD of "Stems" stratified by "Species", "Time" and > "SizeClass". I can get R to give me this for means by species: > > > tapply(stems353$Stems, stems353$Species, mean) > ABCO ABMA ACMA AECA > ARME CADE CELE > 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 > 0.4684844193 0.0063739377 > CONU JUCA JUOC LIDE > PIAL PICO PIJE > 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 > 1.5651558074 0.2315864023 > PILA PIMO PIMO2 PIPO > PISA POTR PSME > 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 > 0.0506373938 0.2000708215 > QUCH QUDO QUDU QUKE > QULO QUWI Salix > 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076 > 0.0548866856 0.0003541076 > SEGI TSME > 0.0021246459 0.5017705382 > > >There are several approaches here, including the aggregate() function in base R, the doBy package or the plyr package, among others: # Requires R 2.11.0 or above: aggregate(Stems ~ Species + Time + SizeClass, data = stems353, FUN = mean) # To get more than one output per group, one can use either of the above packages: library(plyr) ddply(stems353, .(Species, Time, SizeClass), summarise, avgStems mean(Stems), sdStems = sd(Stems)) library(doBy) f <- function(x) c(mean = mean(x), sd = sd(x)) summaryBy(Stems ~ Species + Time + SizeClass, data = stems353, FUN = f) # Another possibility is package data.table: dt <- data.table(stems353,key = 'Species, Time, SizeClass') dt[, list(avgStems = mean(Stems), sdStems = sd(Stems)), by = 'Species, Time, SizeClass'] All of this is untested, so caveat emptor. Other possibilities include package sqldf, if you are comfortable with SQL syntax, package remix or package Hmisc. In other words, R has a number of efficient ways to summarize data. HTH, Dennis> > but I really need to see each species by SizeClass and Time so that each > value would be labeled something like "ABCOSizeClass1TimeModern". > Adding 2 variables to the function doesn't seem to work > > > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, > stems353$Time, mean) > Error in match.fun(FUN) : > 'stems353$SizeClass' is not a function, character or symbol > > I've already created proper subsets for each of these groups, e.g. one > subset is called "stems353ABCO1" and I can run analyses on this. But, > trying to extract means straight from those subsets doesn't seem to work > > > mean(stems353ABCO1) > [1] NA > Warning message: > In mean.default(stems353ABCO1) : > argument is not numeric or logical: returning NA > > > > Thanks, > Chris Dolanc > > -- > Christopher R. Dolanc > PhD Candidate > Ecology Graduate Group > University of California, Davis > Lab Phone: (530) 752-2644 (Barbour lab) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Christopher, i think you have the same problem like me today :) see this http://r.789695.n4.nabble.com/group-by-in-data-frame-tc3324240.html post i think you can find there the solution zem -- View this message in context: http://r.789695.n4.nabble.com/means-SD-s-and-tapply-tp3325158p3325191.html Sent from the R help mailing list archive at Nabble.com.