Dear all,
I am about to cluster my datasets by using K-mean clustering techniques in
R, but getting some type of scattered results. Herewith I pasted my code
below. Please suggest to me where I am lacking in my code. I was pasting my
data before applying the K-mean method as follows.
DMs<-read.table(text="Country DATA
IS -0.0092
BA -0.0235
HK -0.0239
JA -0.0333
KU -0.0022
OM -0.0963
QA -0.0706
SK -0.0322
SA -0.1233
SI -0.0141
TA -0.0142
UAE -0.0656
AUS -0.0230
BEL -0.0006
CYP -0.0085
CR -0.0398
DEN -0.0423
EST -0.0604
FIN -0.0227
FRA -0.0085
GER -0.0272
GrE -0.3519
ICE -0.0210
IRE -0.0057
LAT -0.0595
LITH -0.0451
LUXE -0.0023
MAL -0.0351
NETH -0.0048
NOR -0.0495
POL -0.0081
PORT -0.0044
SLOVA -0.1210
SLOVE -0.0031
SPA -0.0213
SWE -0.0106
SWIT -0.0152
UK -0.0030
HUNG -0.0086
CAN -0.0144
CHIL -0.0078
USA -0.0042
BERM -0.0035
AUST -0.0211
NEWZ -0.0538" ,
header = TRUE,stringsAsFactors=FALSE)
library(cluster)
k1<-kmeans(DMs[,2],centers=2,nstart=25)
plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less
Integrated"),
col=1:2,pch=19)
--
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*
[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
09/05/22,
04:55:22 PM
[[alternative HTML version deleted]]
Dear all,
I also tried the k-mean method in some other parts of my data, but still
not getting the perfect result as expected.
Herewith I attached the related code as below. Please suggest to me where I
am lacking in my below code.
DMs<-read.table(text="Country Data
Israel 0.087320199
Bahrein 0.37991129
HongKong 0.037552721
Japan 0.235350891
Kuwait 0.286427554
oman 0.400096249
Qatar 0.270693298
SouthKorea 0.007407618
SaudiArabia 0.187578553
Singapore 0.008528448
Taiwan 0.027371676
UAE 0.276795224
Austria 0.015132794
Belgium 0.008513907
Cyprus -0.000938601
CzechRepublic 0.017460065
Denmark 0.029490066
Estonia 0.114144041
Finland 0.016245116
France 0.007217465
Germany 0.00371948
Greece -0.008527501
Iceland 0.748097785
Ireland 0.023309721
Latvia 0.178227267
Lithuania 0.100033752
Luxemborg 0.044546393
Malta 0.128679817
Netherland 0.010188604
Norway 0.003437861
Poland 0.006426383
Portugal 0.00753412
Slovakia 0.505992775
Slovenia 0.162475815
Spain 0.00267973
Sweden 0.009967609
Switzerland 0.020557185
UK 0.009340789
Hungary 0.005389885
Canada -0.000531982
Chile 0.007080471
USA 0.013516878
Bermuda -0.338491435
Australia 0.113039242
Newzealand 0.154508239",
header = TRUE,stringsAsFactors=FALSE)
library(cluster)
k1<-kmeans(DMs[,2],centers=2,nstart=25)
plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,45), ylim=c(-0.001,2.5))
text(1:45+0.5,DMs[,2]+0.05,DMs[,1],col=k1$cluster)
legend(2,1,c("cluster 1: Highly efficient DMs","cluster 2: Less
efficient
DMs"),
col=1:5,pch=19)
[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
09/05/22,
05:34:10 PM
On Mon, Sep 5, 2022 at 5:01 PM Subhamitra Patra <subhamitra.patra at
gmail.com>
wrote:
> Dear all,
>
> I am about to cluster my datasets by using K-mean clustering techniques in
> R, but getting some type of scattered results. Herewith I pasted my code
> below. Please suggest to me where I am lacking in my code. I was pasting my
> data before applying the K-mean method as follows.
>
> DMs<-read.table(text="Country DATA
> IS -0.0092
> BA -0.0235
> HK -0.0239
> JA -0.0333
> KU -0.0022
> OM -0.0963
> QA -0.0706
> SK -0.0322
> SA -0.1233
> SI -0.0141
> TA -0.0142
> UAE -0.0656
> AUS -0.0230
> BEL -0.0006
> CYP -0.0085
> CR -0.0398
> DEN -0.0423
> EST -0.0604
> FIN -0.0227
> FRA -0.0085
> GER -0.0272
> GrE -0.3519
> ICE -0.0210
> IRE -0.0057
> LAT -0.0595
> LITH -0.0451
> LUXE -0.0023
> MAL -0.0351
> NETH -0.0048
> NOR -0.0495
> POL -0.0081
> PORT -0.0044
> SLOVA -0.1210
> SLOVE -0.0031
> SPA -0.0213
> SWE -0.0106
> SWIT -0.0152
> UK -0.0030
> HUNG -0.0086
> CAN -0.0144
> CHIL -0.0078
> USA -0.0042
> BERM -0.0035
> AUST -0.0211
> NEWZ -0.0538" ,
> header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(DMs[,2],centers=2,nstart=25)
> plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
> text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
> legend(10,1,c("cluster 1: Highly Integrated","cluster 2:
Less Integrated"),
> col=1:2,pch=19)
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
Sender
> notified by
> Mailtrack
>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
09/05/22,
> 04:55:22 PM
>
--
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*
[[alternative HTML version deleted]]
Hi Subhamitra,
I think the fact that you are passing a vector of values rather than a
matrix is part of the problem. As you have only one value for each
country, The points plotted will be the index on the x-axis and the
value for each country on the y-axis. Passing a value for ylim= means
that you are cutting off the lowest points. Here is an example that
will give you two clusters and show the values for the centers in the
middle of the plot. Perhaps this is all you need, but I suspect there
is more work to be done.
k2<-kmeans(DMs[,2],centers=2)
plot(DMs[,2],col=k2$cluster,pch=19,xlim=c(1,46))
text(1:46,DMs[,2],DMs[,1],col=k2$cluster)
points(rep(23,2),k2$centers,pch=1:2,cex=2,col=k2$cluster)
legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less
Integrated"),
col=1:2,pch=19)
Jim
On Mon, Sep 5, 2022 at 9:31 PM Subhamitra Patra
<subhamitra.patra at gmail.com> wrote:>
> Dear all,
>
> I am about to cluster my datasets by using K-mean clustering techniques in
> R, but getting some type of scattered results. Herewith I pasted my code
> below. Please suggest to me where I am lacking in my code. I was pasting my
> data before applying the K-mean method as follows.
>
> DMs<-read.table(text="Country DATA
> IS -0.0092
> BA -0.0235
> HK -0.0239
> JA -0.0333
> KU -0.0022
> OM -0.0963
> QA -0.0706
> SK -0.0322
> SA -0.1233
> SI -0.0141
> TA -0.0142
> UAE -0.0656
> AUS -0.0230
> BEL -0.0006
> CYP -0.0085
> CR -0.0398
> DEN -0.0423
> EST -0.0604
> FIN -0.0227
> FRA -0.0085
> GER -0.0272
> GrE -0.3519
> ICE -0.0210
> IRE -0.0057
> LAT -0.0595
> LITH -0.0451
> LUXE -0.0023
> MAL -0.0351
> NETH -0.0048
> NOR -0.0495
> POL -0.0081
> PORT -0.0044
> SLOVA -0.1210
> SLOVE -0.0031
> SPA -0.0213
> SWE -0.0106
> SWIT -0.0152
> UK -0.0030
> HUNG -0.0086
> CAN -0.0144
> CHIL -0.0078
> USA -0.0042
> BERM -0.0035
> AUST -0.0211
> NEWZ -0.0538" ,
> header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(DMs[,2],centers=2,nstart=25)
> plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
> text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
> legend(10,1,c("cluster 1: Highly Integrated","cluster 2:
Less Integrated"),
> col=1:2,pch=19)
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
> Sender
> notified by
> Mailtrack
>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
> 09/05/22,
> 04:55:22 PM
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
I am not at all sure that the following answers the question.
The code below ries to find the optimal number of clusters. One of the
changes I have made to your call to kmeans is to subset DMs not dropping
the dim attribute.
library(cluster)
max_clust <- 10
wss <- numeric(max_clust)
for(k in 1:max_clust) {
km <- kmeans(DMs[,2], centers = k, nstart = 25)
wss[k] <- km$tot.withinss
}
plot(wss, type = "b")
dm <- DMs[, 2, drop = FALSE]
# Where is the elbow, at 2 or at 4?
factoextra::fviz_nbclust(dm, kmeans, method = "wss")
factoextra::fviz_nbclust(dm, kmeans, method = "silhouette")
k2 <- kmeans(dm, centers = 2, nstart = 25)
k3 <- kmeans(dm, centers = 3, nstart = 25)
k4 <- kmeans(dm, centers = 4, nstart = 25)
main2 <- paste(length(k2$centers), "clusters")
main3 <- paste(length(k3$centers), "clusters")
main4 <- paste(length(k4$centers), "clusters")
old_par <- par(mfcol = c(1, 3))
plot(DMs[,2], col = k2$cluster, pch = 19, main = main2)
plot(DMs[,2], col = k3$cluster, pch = 19, main = main3)
plot(DMs[,2], col = k4$cluster, pch = 19, main = main4)
par(old_par)
Hope this helps,
Rui Barradas
?s 12:31 de 05/09/2022, Subhamitra Patra escreveu:> Dear all,
>
> I am about to cluster my datasets by using K-mean clustering techniques in
> R, but getting some type of scattered results. Herewith I pasted my code
> below. Please suggest to me where I am lacking in my code. I was pasting my
> data before applying the K-mean method as follows.
>
> DMs<-read.table(text="Country DATA
> IS -0.0092
> BA -0.0235
> HK -0.0239
> JA -0.0333
> KU -0.0022
> OM -0.0963
> QA -0.0706
> SK -0.0322
> SA -0.1233
> SI -0.0141
> TA -0.0142
> UAE -0.0656
> AUS -0.0230
> BEL -0.0006
> CYP -0.0085
> CR -0.0398
> DEN -0.0423
> EST -0.0604
> FIN -0.0227
> FRA -0.0085
> GER -0.0272
> GrE -0.3519
> ICE -0.0210
> IRE -0.0057
> LAT -0.0595
> LITH -0.0451
> LUXE -0.0023
> MAL -0.0351
> NETH -0.0048
> NOR -0.0495
> POL -0.0081
> PORT -0.0044
> SLOVA -0.1210
> SLOVE -0.0031
> SPA -0.0213
> SWE -0.0106
> SWIT -0.0152
> UK -0.0030
> HUNG -0.0086
> CAN -0.0144
> CHIL -0.0078
> USA -0.0042
> BERM -0.0035
> AUST -0.0211
> NEWZ -0.0538" ,
> header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(DMs[,2],centers=2,nstart=25)
> plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
> text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
> legend(10,1,c("cluster 1: Highly Integrated","cluster 2:
Less Integrated"),
> col=1:2,pch=19)
>
>