thr3ads.net - R help - [R] Get distribution of positive/negative examples for each cluster [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Boya Sun

2010-Jul-21 20:55 UTC

[R] Get distribution of positive/negative examples for each cluster

Dear R experts,

I have a labeled data set. Each data is assigned a binary label 0 or 1.
Assume that I use some clustering algorithm to group the data by clusters
(using some features of the data). Now I want to know how many data are
labeled as 0/1 in each cluster.

For example, assume that I have 9 labeled data grouped into three clusters.
The ids of the clusters are 1, 2, and 3.  The dataset is represented by the
following matrix:

        membership        Label
d1    1                        0
d2    1                        0
d3    1                        1
d4    2                        0
d5    2                        1
d6    2                        1
d7    3                        1
d8    3                        1
d9    3                        1

Now I want to get the following output, telling me how many data are labeled
as 0 and 1 in each cluster

cluster_id    0-data    1-data
1                2            1
2                1            2
3                0            3

The output does not have to be a matrix, it could be a summary of the
statistics.

How should I approach this problem? What R functions should I use to get
such information?

Thanks so much!

Boya

	[[alternative HTML version deleted]]

Phil Spector

2010-Jul-21 21:07 UTC

head link

[R] Get distribution of positive/negative examples for each cluster

Boya-
    table() is the function that does what you want:
> cdat = data.frame(membership=rep(1:3,rep(3,3)),+                  
label=as.character(c(0,0,1,0,1,1,1,1,1)))> table(cdat)           label
membership 0 1
          1 2 1
          2 1 2
          3 0 3
>From there, you can rearrange it in a variety of ways:
> as.data.frame(table(cdat))   membership label Freq
1          1     0    2
2          2     0    1
3          3     0    0
4          1     1    1
5          2     1    2
6          3     1    3

Or, to conform with your request
> reshape(as.data.frame(table(cdat)),idvar='membership',+        
v.names='Freq',timevar='label',direction='wide')
   membership Freq.0 Freq.1
1          1      2      1
2          2      1      2
3          3      0      3


 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu


On Wed, 21 Jul 2010, Boya Sun wrote:
> Dear R experts,
>
> I have a labeled data set. Each data is assigned a binary label 0 or 1.
> Assume that I use some clustering algorithm to group the data by clusters
> (using some features of the data). Now I want to know how many data are
> labeled as 0/1 in each cluster.
>
> For example, assume that I have 9 labeled data grouped into three clusters.
> The ids of the clusters are 1, 2, and 3.  The dataset is represented by the
> following matrix:
>
>        membership        Label
> d1    1                        0
> d2    1                        0
> d3    1                        1
> d4    2                        0
> d5    2                        1
> d6    2                        1
> d7    3                        1
> d8    3                        1
> d9    3                        1
>
> Now I want to get the following output, telling me how many data are
labeled
> as 0 and 1 in each cluster
>
> cluster_id    0-data    1-data
> 1                2            1
> 2                1            2
> 3                0            3
>
> The output does not have to be a matrix, it could be a summary of the
> statistics.
>
> How should I approach this problem? What R functions should I use to get
> such information?
>
> Thanks so much!
>
> Boya
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Maybe Matching Threads

Search for more reasonably related threads

R help - Jul 2010 - Get distribution of positive/negative examples for each cluster

[R] Get distribution of positive/negative examples for each cluster

[R] Get distribution of positive/negative examples for each cluster

Maybe Matching Threads