Ian Chidister
2009-Jul-29 16:57 UTC
[R] - counting factor occurrences within a group: tapply()
Dear List, I'm an [R] novice starting analysis of an ecological dataset containing the basal areas of different tree species in a number of research plots. Example data follow:> Trees<-data.frame(SppID=as.factor(c(rep('QUEELL',2), rep('QUEALB',3),'CORAME', 'ACENEG', 'TILAME')), BA=c(907.9, 1104.4, 113.0, 143.1, 452.3, 638.7, 791.7, 804.3), PlotID=as.factor(c('BU3F10', rep('BU3F11',2), rep('BU3F12',5))))> TreesSppID BA PlotID 1 QUEELL 907.9 BU3F10 2 QUEELL 1104.4 BU3F11 3 QUEALB 113.0 BU3F11 4 QUEALB 143.1 BU3F12 5 QUEALB 452.3 BU3F12 6 CORAME 638.7 BU3F12 7 ACENEG 791.7 BU3F12 8 TILAME 804.3 BU3F12 Fields are (in order): Tree Species Code, Basal Area, and Plot Code. I've been successful in computing summary statistics by species or plot groups using tapply():> tapply(BA, PlotID, sum)BU3F10 BU3F11 BU3F12 907.9 1217.4 2830.1 *My Question* I'd like to perform a similar function that tells me how many species are in each plot, I thought this would be possible using something like:> tapply(SppID, PlotID, nlevels)BU3F10 BU3F11 BU3F12 5 5 5 however, this outputs the total number of levels for the factor SppID rather than the number of species in each plot category which would look like: BU3F10 BU3F11 BU3F12 1 2 4 I understand, from reading the archive, that this occurs because R does not subset factor levels, but I'm wondering if there's a simple way around this. Thanks for your help, Ian Chidister Environment and Resources The Nelson Institute for Environmental Studies University of Wisconsin-Madison, USA [[alternative HTML version deleted]]
jim holtman
2009-Jul-29 17:17 UTC
[R] - counting factor occurrences within a group: tapply()
This is probably what you want; you need to count the number of unique instances:> tapply(Trees$SppID, Trees$PlotID, function(x) length(unique(x)))BU3F10 BU3F11 BU3F12 1 2 4>On Wed, Jul 29, 2009 at 12:57 PM, Ian Chidister<ian.chidister at gmail.com> wrote:> Dear List, > > I'm an [R] novice starting analysis of an ecological dataset containing the > basal areas of different tree species in a number of research plots. > Example data follow: > >> Trees<-data.frame(SppID=as.factor(c(rep('QUEELL',2), rep('QUEALB',3), > 'CORAME', 'ACENEG', 'TILAME')), BA=c(907.9, 1104.4, 113.0, 143.1, 452.3, > 638.7, 791.7, 804.3), PlotID=as.factor(c('BU3F10', rep('BU3F11',2), > rep('BU3F12',5)))) >> Trees > ? SppID ? ? ? BA ? ? ?PlotID > 1 QUEELL ?907.9 ?BU3F10 > 2 QUEELL ?1104.4 BU3F11 > 3 QUEALB ?113.0 ?BU3F11 > 4 QUEALB ?143.1 ?BU3F12 > 5 QUEALB ?452.3 ?BU3F12 > 6 CORAME ?638.7 BU3F12 > 7 ACENEG ?791.7 BU3F12 > 8 TILAME ?804.3 ? BU3F12 > > Fields are (in order): Tree Species Code, Basal Area, and Plot Code. > > I've been successful in computing summary statistics by species or plot > groups using tapply(): > >> tapply(BA, PlotID, sum) > BU3F10 BU3F11 BU3F12 > ?907.9 ? ?1217.4 ? ?2830.1 > > *My Question* I'd like to perform a similar function that tells me how many > species are in each plot, I thought this would be possible using something > like: > >> tapply(SppID, PlotID, nlevels) > BU3F10 BU3F11 BU3F12 > ? ? ? ? ? 5 ? ? ? ? ? ? 5 ? ? ? ? ? ?5 > > however, this outputs the total number of levels for the factor SppID rather > than the number of species in each plot category which would look like: > > BU3F10 BU3F11 BU3F12 > ? ? ? ? ? 1 ? ? ? ? ? ?2 ? ? ? ? ? ?4 > > I understand, from reading the archive, that this occurs because R does not > subset factor levels, but I'm wondering if there's a simple way around this. > > > Thanks for your help, > > Ian Chidister > > Environment and Resources > The Nelson Institute for Environmental Studies > University of Wisconsin-Madison, USA > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Daniel Malter
2009-Jul-29 17:20 UTC
[R] - counting factor occurrences within a group: tapply()
does "length" instead of "nlevels" do what you want to do? with(Trees,tapply(SppID,PlotID,unique)) daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Ian Chidister Gesendet: Wednesday, July 29, 2009 12:57 PM An: r-help at r-project.org Betreff: [R] - counting factor occurrences within a group: tapply() Dear List, I'm an [R] novice starting analysis of an ecological dataset containing the basal areas of different tree species in a number of research plots. Example data follow:> Trees<-data.frame(SppID=as.factor(c(rep('QUEELL',2), rep('QUEALB',3),'CORAME', 'ACENEG', 'TILAME')), BA=c(907.9, 1104.4, 113.0, 143.1, 452.3, 638.7, 791.7, 804.3), PlotID=as.factor(c('BU3F10', rep('BU3F11',2), rep('BU3F12',5))))> TreesSppID BA PlotID 1 QUEELL 907.9 BU3F10 2 QUEELL 1104.4 BU3F11 3 QUEALB 113.0 BU3F11 4 QUEALB 143.1 BU3F12 5 QUEALB 452.3 BU3F12 6 CORAME 638.7 BU3F12 7 ACENEG 791.7 BU3F12 8 TILAME 804.3 BU3F12 Fields are (in order): Tree Species Code, Basal Area, and Plot Code. I've been successful in computing summary statistics by species or plot groups using tapply():> tapply(BA, PlotID, sum)BU3F10 BU3F11 BU3F12 907.9 1217.4 2830.1 *My Question* I'd like to perform a similar function that tells me how many species are in each plot, I thought this would be possible using something like:> tapply(SppID, PlotID, nlevels)BU3F10 BU3F11 BU3F12 5 5 5 however, this outputs the total number of levels for the factor SppID rather than the number of species in each plot category which would look like: BU3F10 BU3F11 BU3F12 1 2 4 I understand, from reading the archive, that this occurs because R does not subset factor levels, but I'm wondering if there's a simple way around this. Thanks for your help, Ian Chidister Environment and Resources The Nelson Institute for Environmental Studies University of Wisconsin-Madison, USA [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ian Chidister
2009-Jul-29 18:13 UTC
[R] - counting factor occurrences within a group: tapply()
Hi All- Thanks for your quick responses. I was looking for unique instances, so Jim's and Daniel's suggestions got the job done. Using "length" alone didn't discriminate between multiple occurrences of the same species and multiple species. I do have one followup question- my full data set (not the example data) has a number of NAs in the SppID column, and [r] is currently counting the NAs as species occurrences. Using Jim's code, I tried:>tapply(SppID, PlotID, function(Trees, na.rm=T) length(unique(Trees,na.rm=T))) and alternately:>tapply(SppID, PlotID, function(Trees) length(unique(Trees)), na.rm=T)which doesn't seem to convince [r] to ignore the NAs. What am I doing wrong? Thanks, Ian [[alternative HTML version deleted]]
Ian Chidister
2009-Jul-29 18:29 UTC
[R] - counting factor occurrences within a group: tapply()
Jim- That did the trick- thanks so much for taking the time to help me out. Sincerely, Ian Chidister On Wed, Jul 29, 2009 at 11:57 AM, Ian Chidister <ian.chidister@gmail.com>wrote:> Dear List, > > I'm an [R] novice starting analysis of an ecological dataset containing the > basal areas of different tree species in a number of research plots. > Example data follow: > > > Trees<-data.frame(SppID=as.factor(c(rep('QUEELL',2), rep('QUEALB',3), > 'CORAME', 'ACENEG', 'TILAME')), BA=c(907.9, 1104.4, 113.0, 143.1, 452.3, > 638.7, 791.7, 804.3), PlotID=as.factor(c('BU3F10', rep('BU3F11',2), > rep('BU3F12',5)))) > > Trees > SppID BA PlotID > 1 QUEELL 907.9 BU3F10 > 2 QUEELL 1104.4 BU3F11 > 3 QUEALB 113.0 BU3F11 > 4 QUEALB 143.1 BU3F12 > 5 QUEALB 452.3 BU3F12 > 6 CORAME 638.7 BU3F12 > 7 ACENEG 791.7 BU3F12 > 8 TILAME 804.3 BU3F12 > > Fields are (in order): Tree Species Code, Basal Area, and Plot Code. > > I've been successful in computing summary statistics by species or plot > groups using tapply(): > > > tapply(BA, PlotID, sum) > BU3F10 BU3F11 BU3F12 > 907.9 1217.4 2830.1 > > *My Question* I'd like to perform a similar function that tells me how many > species are in each plot, I thought this would be possible using something > like: > > > tapply(SppID, PlotID, nlevels) > BU3F10 BU3F11 BU3F12 > 5 5 5 > > however, this outputs the total number of levels for the factor SppID > rather than the number of species in each plot category which would look > like: > > BU3F10 BU3F11 BU3F12 > 1 2 4 > > I understand, from reading the archive, that this occurs because R does not > subset factor levels, but I'm wondering if there's a simple way around this. > > > Thanks for your help, > > Ian Chidister > > Environment and Resources > The Nelson Institute for Environmental Studies > University of Wisconsin-Madison, USA >[[alternative HTML version deleted]]