Hi R-users: [R 2.2 on OSX 10.4.3] I have a (sparse) vegetation data frame with 500 rows (sampling units) and 177 columns (plant species) where the data represent % cover. I need to summarize the cover data by returning the names of the most dominant and the second most dominant species per plot. I reduced the data frame to omit cover below 5%; this is what it looks like stacked. I have experimented with tapply(), by(), and some functions mentioned in archived postings, but I haven't seen anything that answers to this directly. Does anybody have any ideas? OBJECTID PolygonID SpeciesCod AbundanceP 1 15006 ANT-CBG-rr1 Leymol 5.00000 3 15008 ANT-CBG-rr1 Ambcha 5.00000 5 15010 ANT-ESH-27 Atrpat 20.00000 6 15011 ANT-ESH-27 Ambcha 10.00000 11 15016 ANT-ESH-28 Salvir 20.00000 14 15019 ANT-ESH-28 Atrpat 5.00000 18 15023 ANT-POR-Rubarm5 Rubarm 60.00000 19 15024 ANT-POR-Rubarm5 Hedhel 40.00000 25 15030 ECO-CBG-A2 Griint 5.00000 27 15032 ECO-CBG-A2 Anngra 5.00000 38 15043 ECO-CBG-A4 Sperub 50.00000 Regards, Graham Watt-Gremm
Adaikalavan Ramasamy
2005-Nov-09 00:52 UTC
[R] retrieve most abundant species by sample unit
Your example does not appear to match your description of the problem. If you want have a 500x177 matrix and want to find the largest and second largest, you can try something like m <- matrix( sample( 101:115 ), nc=3 ) [,1] [,2] [,3] [1,] 102 112 110 [2,] 111 106 104 [3,] 108 101 103 [4,] 114 115 105 [5,] 113 107 109 t( apply( m, 1, function(x){ r <- rank(-x); c( which(r==1), which(r==2) ) } ) ) [,1] [,2] [1,] 2 3 [2,] 1 2 [3,] 1 3 [4,] 2 1 [5,] 1 3 This uses the fact that all entries in a column is always refers to the same species. If you have stacked data (especially where the species appear in a non-regular manner), then it becomes slightly more tricky to find an elegant solution. Regards, Adai On Tue, 2005-11-08 at 15:46 -0800, Graham Watt-Gremm wrote:> Hi R-users: > [R 2.2 on OSX 10.4.3] > I have a (sparse) vegetation data frame with 500 rows (sampling > units) and 177 columns (plant species) where the data represent % > cover. I need to summarize the cover data by returning the names of > the most dominant and the second most dominant species per plot. I > reduced the data frame to omit cover below 5%; this is what it looks > like stacked. I have experimented with tapply(), by(), and some > functions mentioned in archived postings, but I haven't seen anything > that answers to this directly. Does anybody have any ideas? > > OBJECTID PolygonID SpeciesCod AbundanceP > 1 15006 ANT-CBG-rr1 Leymol 5.00000 > 3 15008 ANT-CBG-rr1 Ambcha 5.00000 > 5 15010 ANT-ESH-27 Atrpat 20.00000 > 6 15011 ANT-ESH-27 Ambcha 10.00000 > 11 15016 ANT-ESH-28 Salvir 20.00000 > 14 15019 ANT-ESH-28 Atrpat 5.00000 > 18 15023 ANT-POR-Rubarm5 Rubarm 60.00000 > 19 15024 ANT-POR-Rubarm5 Hedhel 40.00000 > 25 15030 ECO-CBG-A2 Griint 5.00000 > 27 15032 ECO-CBG-A2 Anngra 5.00000 > 38 15043 ECO-CBG-A4 Sperub 50.00000 > > Regards, > Graham Watt-Gremm > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Graham, It's relatively easily done, especially the first one. Let's suppose your veg data frame is called veg > dom1 <- apply(veg,1,which.max) returns a vector with the column number of the species with the highest abundance for each sample (if there are ties, it returns the first one). If you're concerned about ties, you can check to see how many there are with for (i in 1:nrow(veg)) print(sum(veg[i,]==dom1[i])) There may be ways to eliminate the for loop, but this works If you want the names of the species, rather than column number, you can do > names(veg)[dom1] which will return the species names (assuming they are the column names of the data.frame). Now to get the next most abundant species, zero out the dominant species and repeat > tmp <- veg > for (i in 1:nrow(veg)) tmp[i,dom1[i]] <- 0 > dom2 <- apply(veg,1,which.max) HTH Dave R Graham Watt-Gremm wrote:> Hi R-users: > [R 2.2 on OSX 10.4.3] > I have a (sparse) vegetation data frame with 500 rows (sampling > units) and 177 columns (plant species) where the data represent % > cover. I need to summarize the cover data by returning the names of > the most dominant and the second most dominant species per plot. I > reduced the data frame to omit cover below 5%; this is what it looks > like stacked. I have experimented with tapply(), by(), and some > functions mentioned in archived postings, but I haven't seen anything > that answers to this directly. Does anybody have any ideas? > > OBJECTID PolygonID SpeciesCod AbundanceP > 1 15006 ANT-CBG-rr1 Leymol 5.00000 > 3 15008 ANT-CBG-rr1 Ambcha 5.00000 > 5 15010 ANT-ESH-27 Atrpat 20.00000 > 6 15011 ANT-ESH-27 Ambcha 10.00000 > 11 15016 ANT-ESH-28 Salvir 20.00000 > 14 15019 ANT-ESH-28 Atrpat 5.00000 > 18 15023 ANT-POR-Rubarm5 Rubarm 60.00000 > 19 15024 ANT-POR-Rubarm5 Hedhel 40.00000 > 25 15030 ECO-CBG-A2 Griint 5.00000 > 27 15032 ECO-CBG-A2 Anngra 5.00000 > 38 15043 ECO-CBG-A4 Sperub 50.00000 > > Regards, > Graham Watt-Gremm > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email droberts at montana.edu Montana State University Bozeman, MT 59717-3460