Dear members.
I am having problems to understand the kmeans- results in R. I am applying
kmeans-algorithms to my big data file, and it is producing the results of
the clusters.
Q1) Does anybody knows how to find out in which cluster (I have fixed
numberofclusters = 5 ) which data have been used?
COMMAND
(kmeans.results <- kmeans(mydata,centers =5, iter.max= 1000, nstart =10000))
Q2) When I call kmeans.results I have the following output:
K-means clustering with 5 clusters of sizes 17, 1, 6, 4, 32
Cluster means:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
1 0 0 0 0 0 0 0 0 0 0 0.0000000 0.0008235294
2 0 0 0 0 0 0 0 0 0 0 0.0000000 0.0000000000
3 0 0 0 0 0 0 0 0 0 0 0.0000000 0.0000000000
4 0 0 0 0 0 0 0 0 0 0 0.0000000 0.0040000000
5 0 0 0 0 0 0 0 0 0 0 0.0003125 0.0003750000
[,13] [,14] [,15] [,16] [,17] [,18]
1 0.0008235294 0.001176471 0.005176471 0.012471295 0.041181652 0.10663935
2 0.0000000000 0.000000000 0.000000000 0.000000000 0.169491525 0.61016949
3 0.0000000000 0.000000000 0.000000000 0.002333333 0.006666667 0.07695015
4 0.0030000000 0.001500000 0.001000000 0.017500000 0.029000000 0.06150000
5 0.0015625000 0.003437500 0.010687500 0.046375000 0.100062500 0.14306250
[,19] [,20] [,21] [,22] [,23] [,24] [,25]
1 0.12946535 1.0017347 0.3360283 0.2455259 0.08565672 0.02553212 0.006000000
2 0.94915254 0.1694915 0.1016949 0.0000000 0.00000000 0.00000000 0.000000000
3 0.09376439 1.3857837 0.2659812 0.1015707 0.03804953 0.02023362 0.007666667
4 0.17100000 0.6665000 0.7860000 0.1860000 0.04650000 0.01450000 0.012000000
5 0.18100000 0.5200625 0.4156875 0.3461250 0.16925000 0.04918750 0.011500000
[,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35]
1 0.0005882353 0.001176471 0 0 0 0 0 0 0 0
2 0.0000000000 0.000000000 0 0 0 0 0 0 0 0
3 0.0010000000 0.000000000 0 0 0 0 0 0 0 0
4 0.0000000000 0.000000000 0 0 0 0 0 0 0 0
5 0.0013125000 0.000000000 0 0 0 0 0 0 0 0
[,36] [,37] [,38] [,39] [,40]
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
Clustering vector:
[1] 1 5 5 3 1 5 5 5 5 1 4 1 5 5 5 5 4 5 2 3 5 5 1 5 5 5 5 1 3 1 4 5 5 1 5 5
5 1
[39] 3 1 5 5 3 1 1 1 1 5 5 1 4 1 3 5 5 5 5 5 5 1
Within cluster sum of squares by cluster:
[1] 0.6702803 0.0000000 0.2453294 0.1860180 1.3535263
(between_SS / total_SS = 76.8 %)
Available components:
[1] "cluster" "centers" "totss"
"withinss"
"tot.withinss"
[6] "betweenss" "size" >
Q3)I would like to understand which raw data are in which cluster ? Does
somebody knows how to access the table of raw data which are in the same
cluster ?
Thanks for help
DZU
--
View this message in context:
http://r.789695.n4.nabble.com/K-means-results-understanding-tp4670171.html
Sent from the R help mailing list archive at Nabble.com.
You should read the help page
?kmeans
Especially the section labeled "Value" which tells you what kmeans
returns. You will see that the cluster membership is returned as a
vector of integers called "cluster." If you don't know how to
access
that from kmeans.results, you haven't read any of the basic
tutorials on R.
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Dzu
Sent: Monday, June 24, 2013 4:25 AM
To: r-help at r-project.org
Subject: [R] K-means results understanding!!!
Dear members.
I am having problems to understand the kmeans- results in R. I am
applying
kmeans-algorithms to my big data file, and it is producing the
results of
the clusters.
Q1) Does anybody knows how to find out in which cluster (I have
fixed
numberofclusters = 5 ) which data have been used?
COMMAND
(kmeans.results <- kmeans(mydata,centers =5, iter.max= 1000, nstart
=10000))
Q2) When I call kmeans.results I have the following output:
K-means clustering with 5 clusters of sizes 17, 1, 6, 4, 32
Cluster means:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[,12]
1 0 0 0 0 0 0 0 0 0 0 0.0000000
0.0008235294
2 0 0 0 0 0 0 0 0 0 0 0.0000000
0.0000000000
3 0 0 0 0 0 0 0 0 0 0 0.0000000
0.0000000000
4 0 0 0 0 0 0 0 0 0 0 0.0000000
0.0040000000
5 0 0 0 0 0 0 0 0 0 0 0.0003125
0.0003750000
[,13] [,14] [,15] [,16] [,17]
[,18]
1 0.0008235294 0.001176471 0.005176471 0.012471295 0.041181652
0.10663935
2 0.0000000000 0.000000000 0.000000000 0.000000000 0.169491525
0.61016949
3 0.0000000000 0.000000000 0.000000000 0.002333333 0.006666667
0.07695015
4 0.0030000000 0.001500000 0.001000000 0.017500000 0.029000000
0.06150000
5 0.0015625000 0.003437500 0.010687500 0.046375000 0.100062500
0.14306250
[,19] [,20] [,21] [,22] [,23] [,24]
[,25]
1 0.12946535 1.0017347 0.3360283 0.2455259 0.08565672 0.02553212
0.006000000
2 0.94915254 0.1694915 0.1016949 0.0000000 0.00000000 0.00000000
0.000000000
3 0.09376439 1.3857837 0.2659812 0.1015707 0.03804953 0.02023362
0.007666667
4 0.17100000 0.6665000 0.7860000 0.1860000 0.04650000 0.01450000
0.012000000
5 0.18100000 0.5200625 0.4156875 0.3461250 0.16925000 0.04918750
0.011500000
[,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
[,35]
1 0.0005882353 0.001176471 0 0 0 0 0 0 0
0
2 0.0000000000 0.000000000 0 0 0 0 0 0 0
0
3 0.0010000000 0.000000000 0 0 0 0 0 0 0
0
4 0.0000000000 0.000000000 0 0 0 0 0 0 0
0
5 0.0013125000 0.000000000 0 0 0 0 0 0 0
0
[,36] [,37] [,38] [,39] [,40]
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
Clustering vector:
[1] 1 5 5 3 1 5 5 5 5 1 4 1 5 5 5 5 4 5 2 3 5 5 1 5 5 5 5 1 3 1 4 5
5 1 5 5
5 1
[39] 3 1 5 5 3 1 1 1 1 5 5 1 4 1 3 5 5 5 5 5 5 1
Within cluster sum of squares by cluster:
[1] 0.6702803 0.0000000 0.2453294 0.1860180 1.3535263
(between_SS / total_SS = 76.8 %)
Available components:
[1] "cluster" "centers" "totss"
"withinss"
"tot.withinss"
[6] "betweenss" "size" >
Q3)I would like to understand which raw data are in which cluster ?
Does
somebody knows how to access the table of raw data which are in the
same
cluster ?
Thanks for help
DZU
--
View this message in context:
http://r.789695.n4.nabble.com/K-means-results-understanding-tp467017
1.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.