I'm trying to use tapply to output means and SD or SE for my data but seem to be limited by how many times I can subset it. Here's a snippet of my data > stems353[1:10,] Time DataSource Plot Elevation Aspect Slope Type Species SizeClass Stems 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO Class1 3 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA Class1 0 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA Class1 0 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA Class1 0 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME Class1 0 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE Class1 15 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE Class1 0 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU Class1 0 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA Class1 0 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC Class1 0 I'd like to see means/SD of "Stems" stratified by "Species", "Time" and "SizeClass". I can get R to give me this for means by species: > tapply(stems353$Stems, stems353$Species, mean) ABCO ABMA ACMA AECA ARME CADE CELE 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 0.4684844193 0.0063739377 CONU JUCA JUOC LIDE PIAL PICO PIJE 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 1.5651558074 0.2315864023 PILA PIMO PIMO2 PIPO PISA POTR PSME 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 0.0506373938 0.2000708215 QUCH QUDO QUDU QUKE QULO QUWI Salix 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076 0.0548866856 0.0003541076 SEGI TSME 0.0021246459 0.5017705382 > but I really need to see each species by SizeClass and Time so that each value would be labeled something like "ABCOSizeClass1TimeModern". Adding 2 variables to the function doesn't seem to work > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, stems353$Time, mean) Error in match.fun(FUN) : 'stems353$SizeClass' is not a function, character or symbol I've already created proper subsets for each of these groups, e.g. one subset is called "stems353ABCO1" and I can run analyses on this. But, trying to extract means straight from those subsets doesn't seem to work > mean(stems353ABCO1) [1] NA Warning message: In mean.default(stems353ABCO1) : argument is not numeric or logical: returning NA > Thanks, Chris Dolanc -- Christopher R. Dolanc PhD Candidate Ecology Graduate Group University of California, Davis Lab Phone: (530) 752-2644 (Barbour lab) [[alternative HTML version deleted]]
chris, it seems like you need the plyr package, esp ddply. for example: stems353 <- data.frame(Time = rep(c("Modern", "Old"), 4), SizeClass = rep(c("class1","class2"), each = 4), Species = rep(c("a","b"), each = 4), Stems = seq(1,8,1)) ddply(stems353, .(Species, SizeClass, Time), summarise, mean = mean(Stems) ) On Friday, February 25, 2011 at 2:09 PM, Christopher R. Dolanc wrote:> I'm trying to use tapply to output means and SD or SE for my data but > seem to be limited by how many times I can subset it. Here's a snippet > of my data > > > stems353[1:10,] > Time DataSource Plot Elevation Aspect Slope Type Species > SizeClass Stems > 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO > Class1 3 > 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA > Class1 0 > 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA > Class1 0 > 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA > Class1 0 > 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME > Class1 0 > 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE > Class1 15 > 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE > Class1 0 > 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU > Class1 0 > 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA > Class1 0 > 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC > Class1 0 > > I'd like to see means/SD of "Stems" stratified by "Species", "Time" and > "SizeClass". I can get R to give me this for means by species: > > > tapply(stems353$Stems, stems353$Species, mean) > ABCO ABMA ACMA AECA > ARME CADE CELE > 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 > 0.4684844193 0.0063739377 > CONU JUCA JUOC LIDE > PIAL PICO PIJE > 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 > 1.5651558074 0.2315864023 > PILA PIMO PIMO2 PIPO > PISA POTR PSME > 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 > 0.0506373938 0.2000708215 > QUCH QUDO QUDU QUKE > QULO QUWI Salix > 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076 > 0.0548866856 0.0003541076 > SEGI TSME > 0.0021246459 0.5017705382 > > > > but I really need to see each species by SizeClass and Time so that each > value would be labeled something like "ABCOSizeClass1TimeModern". > Adding 2 variables to the function doesn't seem to work > > > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, > stems353$Time, mean) > Error in match.fun(FUN) : > 'stems353$SizeClass' is not a function, character or symbol > > I've already created proper subsets for each of these groups, e.g. one > subset is called "stems353ABCO1" and I can run analyses on this. But, > trying to extract means straight from those subsets doesn't seem to work > > > mean(stems353ABCO1) > [1] NA > Warning message: > In mean.default(stems353ABCO1) : > argument is not numeric or logical: returning NA > > > > Thanks, > Chris Dolanc > > -- > Christopher R. Dolanc > PhD Candidate > Ecology Graduate Group > University of California, Davis > Lab Phone: (530) 752-2644 (Barbour lab) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Feb 25, 2011, at 3:09 PM, Christopher R. Dolanc wrote:> I'm trying to use tapply to output means and SD or SE for my data but > seem to be limited by how many times I can subset it. Here's a > snippet > of my data > >> stems353[1:10,] > Time DataSource Plot Elevation Aspect Slope Type Species > SizeClass Stems > 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO > Class1 3 > 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA > Class1 0 > 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA > Class1 0 > 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA > Class1 0 > 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME > Class1 0 > 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE > Class1 15 > 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE > Class1 0 > 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU > Class1 0 > 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA > Class1 0 > 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC > Class1 0 > > I'd like to see means/SD of "Stems" stratified by "Species", "Time" > and > "SizeClass". I can get R to give me this for means by species: > >> tapply(stems353$Stems, stems353$Species, mean) > ABCO ABMA ACMA AECA > ARME CADE CELE > 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 > 0.4684844193 0.0063739377 > CONU JUCA JUOC LIDE > PIAL PICO PIJE > 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 > 1.5651558074 0.2315864023 > PILA PIMO PIMO2 PIPO > PISA POTR PSME > 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 > 0.0506373938 0.2000708215 > QUCH QUDO QUDU QUKE > QULO QUWI Salix > 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076 > 0.0548866856 0.0003541076 > SEGI TSME > 0.0021246459 0.5017705382 >> > > but I really need to see each species by SizeClass and Time so that > each > value would be labeled something like "ABCOSizeClass1TimeModern". > Adding 2 variables to the function doesn't seem to work > >> tapply(stems353$Stems, stems353$Species, stems353$SizeClass, > stems353$Time, mean)Some functions let you put an arbitrary number of items after the first (aggregate() always confuses me because it _does_ this) but tapply expects them to be in a list or vector, so try: with( stems353, tapply(Stems, list(Species, SizeClass, Time) , mean) ) with() improves readability> Error in match.fun(FUN) : > 'stems353$SizeClass' is not a function, character or symbolThe third item in your arguments got matched to what tapply was expecting to be a function name.> > I've already created proper subsets for each of these groups, e.g. one > subset is called "stems353ABCO1" and I can run analyses on this. But, > trying to extract means straight from those subsets doesn't seem to > work > >> mean(stems353ABCO1) > [1] NA > Warning message: > In mean.default(stems353ABCO1) : > argument is not numeric or logical: returning NA >> >David Winsemius, MD West Hartford, CT
Hi: On Fri, Feb 25, 2011 at 12:09 PM, Christopher R. Dolanc < crdolanc@ucdavis.edu> wrote:> I'm trying to use tapply to output means and SD or SE for my data but > seem to be limited by how many times I can subset it. Here's a snippet > of my data > > > stems353[1:10,] > Time DataSource Plot Elevation Aspect Slope Type Species > SizeClass Stems > 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO > Class1 3 > 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA > Class1 0 > 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA > Class1 0 > 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA > Class1 0 > 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME > Class1 0 > 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE > Class1 15 > 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE > Class1 0 > 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU > Class1 0 > 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA > Class1 0 > 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC > Class1 0 > > I'd like to see means/SD of "Stems" stratified by "Species", "Time" and > "SizeClass". I can get R to give me this for means by species: > > > tapply(stems353$Stems, stems353$Species, mean) > ABCO ABMA ACMA AECA > ARME CADE CELE > 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 > 0.4684844193 0.0063739377 > CONU JUCA JUOC LIDE > PIAL PICO PIJE > 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 > 1.5651558074 0.2315864023 > PILA PIMO PIMO2 PIPO > PISA POTR PSME > 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 > 0.0506373938 0.2000708215 > QUCH QUDO QUDU QUKE > QULO QUWI Salix > 0.0474504249 0.1203966006 0.0000000000 0.2071529745 0.0003541076 > 0.0548866856 0.0003541076 > SEGI TSME > 0.0021246459 0.5017705382 > > >There are several approaches here, including the aggregate() function in base R, the doBy package or the plyr package, among others: # Requires R 2.11.0 or above: aggregate(Stems ~ Species + Time + SizeClass, data = stems353, FUN = mean) # To get more than one output per group, one can use either of the above packages: library(plyr) ddply(stems353, .(Species, Time, SizeClass), summarise, avgStems mean(Stems), sdStems = sd(Stems)) library(doBy) f <- function(x) c(mean = mean(x), sd = sd(x)) summaryBy(Stems ~ Species + Time + SizeClass, data = stems353, FUN = f) # Another possibility is package data.table: dt <- data.table(stems353,key = 'Species, Time, SizeClass') dt[, list(avgStems = mean(Stems), sdStems = sd(Stems)), by = 'Species, Time, SizeClass'] All of this is untested, so caveat emptor. Other possibilities include package sqldf, if you are comfortable with SQL syntax, package remix or package Hmisc. In other words, R has a number of efficient ways to summarize data. HTH, Dennis> > but I really need to see each species by SizeClass and Time so that each > value would be labeled something like "ABCOSizeClass1TimeModern". > Adding 2 variables to the function doesn't seem to work > > > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, > stems353$Time, mean) > Error in match.fun(FUN) : > 'stems353$SizeClass' is not a function, character or symbol > > I've already created proper subsets for each of these groups, e.g. one > subset is called "stems353ABCO1" and I can run analyses on this. But, > trying to extract means straight from those subsets doesn't seem to work > > > mean(stems353ABCO1) > [1] NA > Warning message: > In mean.default(stems353ABCO1) : > argument is not numeric or logical: returning NA > > > > Thanks, > Chris Dolanc > > -- > Christopher R. Dolanc > PhD Candidate > Ecology Graduate Group > University of California, Davis > Lab Phone: (530) 752-2644 (Barbour lab) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Christopher, i think you have the same problem like me today :) see this http://r.789695.n4.nabble.com/group-by-in-data-frame-tc3324240.html post i think you can find there the solution zem -- View this message in context: http://r.789695.n4.nabble.com/means-SD-s-and-tapply-tp3325158p3325191.html Sent from the R help mailing list archive at Nabble.com.