I have a data frame with two factors (sampling 'unit', 'species'). I want to calculate the number of unique 'species' per 'unit.' I can calculate the number of unique values for each variable separately, but can't get a count for each ?unit?.> data=read.csv("C:/Desktop/sr_sort_practice.csv") > attach(data)> data[1:10,]unit species 1 123 ACMA 2 123 LIDE 3 123 LIDE 4 123 SESE 5 123 SESE 6 123 SESE 7 345 HEAR 8 345 LOHI 9 345 QUAG 10 345 TODI?..> sr.unique<- lapply (data, unique)$unit [1] 123 345 216 $species [1] ACMA LIDE SESE HEAR LOHI QUAG TODI UMCA ARSP LIDE> sapply (sr.unique,length)unit species 3 10 Then, I get stuck here because this unique species count is not given for each ?unit?. What I'd like to get is: unit species 123 3 345 4 216 -- Thanks-- -- View this message in context: http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.html Sent from the R help mailing list archive at Nabble.com.
I think ?tapply will help here. But *please* read the posting guide and provide minimal, reproducible examples! Birdnerd wrote:> I have a data frame with two factors (sampling 'unit', 'species'). I want to > calculate the number of unique 'species' per 'unit.' I can calculate the > number of unique values for each variable separately, but can't get a count > for each ?unit?. > >> data=read.csv("C:/Desktop/sr_sort_practice.csv") >> attach(data) > >> data[1:10,] > unit species > 1 123 ACMA > 2 123 LIDE > 3 123 LIDE > 4 123 SESE > 5 123 SESE > 6 123 SESE > 7 345 HEAR > 8 345 LOHI > 9 345 QUAG > 10 345 TODI?.. > >> sr.unique<- lapply (data, unique) > $unit > [1] 123 345 216 > $species > [1] ACMA LIDE SESE HEAR LOHI QUAG TODI UMCA ARSP LIDE > >> sapply (sr.unique,length) > unit species > 3 10 > > Then, I get stuck here because this unique species count is not given for > each ?unit?. > What I'd like to get is: > > unit species > 123 3 > 345 4 > 216 -- > > Thanks-- >
Hi there, Try with(data, tapply(species, unit, function(x) length(unique(x)))) HTH, Jorge On Sun, Jun 13, 2010 at 12:07 PM, Birdnerd <> wrote:> > I have a data frame with two factors (sampling 'unit', 'species'). I want > to > calculate the number of unique 'species' per 'unit.' I can calculate the > number of unique values for each variable separately, but can't get a count > for each ‘unit’. > > > data=read.csv("C:/Desktop/sr_sort_practice.csv") > > attach(data) > > > data[1:10,] > unit species > 1 123 ACMA > 2 123 LIDE > 3 123 LIDE > 4 123 SESE > 5 123 SESE > 6 123 SESE > 7 345 HEAR > 8 345 LOHI > 9 345 QUAG > 10 345 TODI….. > > > sr.unique<- lapply (data, unique) > $unit > [1] 123 345 216 > $species > [1] ACMA LIDE SESE HEAR LOHI QUAG TODI UMCA ARSP LIDE > > > sapply (sr.unique,length) > unit species > 3 10 > > Then, I get stuck here because this unique species count is not given for > each ‘unit’. > What I'd like to get is: > > unit species > 123 3 > 345 4 > 216 -- > > Thanks-- > > -- > View this message in context: > http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
You can also use the sqldf package:> xunit species 1 123 ACMA 2 123 LIDE 3 123 LIDE 4 123 SESE 5 123 SESE 6 123 SESE 7 345 HEAR 8 345 LOHI 9 345 QUAG 10 345 TODI> require(sqldf) > sqldf('select unit, count(distinct species) as count from x group by unit')unit count 1 123 3 2 345 4>On Sun, Jun 13, 2010 at 12:07 PM, Birdnerd <haaszoology at gmail.com> wrote:> > I have a data frame with two factors (sampling 'unit', 'species'). I want to > calculate the number of unique 'species' per 'unit.' I can calculate the > number of unique values for each variable separately, but can't get a count > for each ?unit?. > >> data=read.csv("C:/Desktop/sr_sort_practice.csv") >> attach(data) > >> data[1:10,] > ? unit species > 1 ? 123 ? ?ACMA > 2 ? 123 ? ?LIDE > 3 ? 123 ? ?LIDE > 4 ? 123 ? ?SESE > 5 ? 123 ? ?SESE > 6 ? 123 ? ?SESE > 7 ? 345 ? ?HEAR > 8 ? 345 ? ?LOHI > 9 ? 345 ? ?QUAG > 10 ?345 ? ?TODI?.. > >> sr.unique<- lapply (data, unique) > $unit > [1] 123 345 216 > $species > ?[1] ACMA ?LIDE ?SESE ?HEAR ?LOHI ?QUAG ?TODI ?UMCA ?ARSP ?LIDE > >> sapply (sr.unique,length) > ? ?unit species > ? ? ?3 ? ? ?10 > > Then, I get stuck here because this unique species count is not given for > each ?unit?. > What I'd like to get is: > > unit species > 123 ? ?3 > 345 ? ?4 > 216 ? ?-- > > Thanks-- > > -- > View this message in context: http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On Sun, 13 Jun 2010, Birdnerd wrote:> > I have a data frame with two factors (sampling 'unit', 'species'). I want to > calculate the number of unique 'species' per 'unit.' I can calculate the > number of unique values for each variable separately, but can't get a count > for each ?unit?. >If I understand you colSums( xtabs( ~ specie + unit , data ) !=0 ) HTH, Chuck>> data=read.csv("C:/Desktop/sr_sort_practice.csv") >> attach(data) > >> data[1:10,] > unit species > 1 123 ACMA > 2 123 LIDE > 3 123 LIDE > 4 123 SESE > 5 123 SESE > 6 123 SESE > 7 345 HEAR > 8 345 LOHI > 9 345 QUAG > 10 345 TODI?.. > >> sr.unique<- lapply (data, unique) > $unit > [1] 123 345 216 > $species > [1] ACMA LIDE SESE HEAR LOHI QUAG TODI UMCA ARSP LIDE > >> sapply (sr.unique,length) > unit species > 3 10 > > Then, I get stuck here because this unique species count is not given for > each ?unit?. > What I'd like to get is: > > unit species > 123 3 > 345 4 > 216 -- > > Thanks-- > > -- > View this message in context: http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Hi: Another possibility: as.data.frame(with(data[!duplicated(data), ], table(unit)) unit Freq 1 123 3 2 345 4 HTH, Dennis On Sun, Jun 13, 2010 at 9:07 AM, Birdnerd <haaszoology@gmail.com> wrote:> > I have a data frame with two factors (sampling 'unit', 'species'). I want > to > calculate the number of unique 'species' per 'unit.' I can calculate the > number of unique values for each variable separately, but can't get a count > for each ‘unit’. > > > data=read.csv("C:/Desktop/sr_sort_practice.csv") > > attach(data) > > > data[1:10,] > unit species > 1 123 ACMA > 2 123 LIDE > 3 123 LIDE > 4 123 SESE > 5 123 SESE > 6 123 SESE > 7 345 HEAR > 8 345 LOHI > 9 345 QUAG > 10 345 TODI….. > > > sr.unique<- lapply (data, unique) > $unit > [1] 123 345 216 > $species > [1] ACMA LIDE SESE HEAR LOHI QUAG TODI UMCA ARSP LIDE > > > sapply (sr.unique,length) > unit species > 3 10 > > Then, I get stuck here because this unique species count is not given for > each ‘unit’. > What I'd like to get is: > > unit species > 123 3 > 345 4 > 216 -- > > Thanks-- > > -- > View this message in context: > http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Seemingly Similar Threads
- means, SD's and tapply
- [LLVMdev] "make check" failures: leaq in fold-mul-lohi.ll, stride-nine-with-base-reg.ll, stride-reuse.ll
- CentOS 6 - Support for CanoScan LiDE 210
- How to remove the quote "" in the data frame?
- [LLVMdev] "make check" failures: leaq in fold-mul-lohi.ll, stride-nine-with-base-reg.ll, stride-reuse.ll