Boya Sun
2010-Jul-21 20:55 UTC
[R] Get distribution of positive/negative examples for each cluster
Dear R experts, I have a labeled data set. Each data is assigned a binary label 0 or 1. Assume that I use some clustering algorithm to group the data by clusters (using some features of the data). Now I want to know how many data are labeled as 0/1 in each cluster. For example, assume that I have 9 labeled data grouped into three clusters. The ids of the clusters are 1, 2, and 3. The dataset is represented by the following matrix: membership Label d1 1 0 d2 1 0 d3 1 1 d4 2 0 d5 2 1 d6 2 1 d7 3 1 d8 3 1 d9 3 1 Now I want to get the following output, telling me how many data are labeled as 0 and 1 in each cluster cluster_id 0-data 1-data 1 2 1 2 1 2 3 0 3 The output does not have to be a matrix, it could be a summary of the statistics. How should I approach this problem? What R functions should I use to get such information? Thanks so much! Boya [[alternative HTML version deleted]]
Phil Spector
2010-Jul-21 21:07 UTC
[R] Get distribution of positive/negative examples for each cluster
Boya- table() is the function that does what you want:> cdat = data.frame(membership=rep(1:3,rep(3,3)),+ label=as.character(c(0,0,1,0,1,1,1,1,1)))> table(cdat)label membership 0 1 1 2 1 2 1 2 3 0 3>From there, you can rearrange it in a variety of ways:> as.data.frame(table(cdat))membership label Freq 1 1 0 2 2 2 0 1 3 3 0 0 4 1 1 1 5 2 1 2 6 3 1 3 Or, to conform with your request> reshape(as.data.frame(table(cdat)),idvar='membership',+ v.names='Freq',timevar='label',direction='wide') membership Freq.0 Freq.1 1 1 2 1 2 2 1 2 3 3 0 3 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Wed, 21 Jul 2010, Boya Sun wrote:> Dear R experts, > > I have a labeled data set. Each data is assigned a binary label 0 or 1. > Assume that I use some clustering algorithm to group the data by clusters > (using some features of the data). Now I want to know how many data are > labeled as 0/1 in each cluster. > > For example, assume that I have 9 labeled data grouped into three clusters. > The ids of the clusters are 1, 2, and 3. The dataset is represented by the > following matrix: > > membership Label > d1 1 0 > d2 1 0 > d3 1 1 > d4 2 0 > d5 2 1 > d6 2 1 > d7 3 1 > d8 3 1 > d9 3 1 > > Now I want to get the following output, telling me how many data are labeled > as 0 and 1 in each cluster > > cluster_id 0-data 1-data > 1 2 1 > 2 1 2 > 3 0 3 > > The output does not have to be a matrix, it could be a summary of the > statistics. > > How should I approach this problem? What R functions should I use to get > such information? > > Thanks so much! > > Boya > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >