Dear all, I am about to cluster my datasets by using K-mean clustering techniques in R, but getting some type of scattered results. Herewith I pasted my code below. Please suggest to me where I am lacking in my code. I was pasting my data before applying the K-mean method as follows. DMs<-read.table(text="Country DATA IS -0.0092 BA -0.0235 HK -0.0239 JA -0.0333 KU -0.0022 OM -0.0963 QA -0.0706 SK -0.0322 SA -0.1233 SI -0.0141 TA -0.0142 UAE -0.0656 AUS -0.0230 BEL -0.0006 CYP -0.0085 CR -0.0398 DEN -0.0423 EST -0.0604 FIN -0.0227 FRA -0.0085 GER -0.0272 GrE -0.3519 ICE -0.0210 IRE -0.0057 LAT -0.0595 LITH -0.0451 LUXE -0.0023 MAL -0.0351 NETH -0.0048 NOR -0.0495 POL -0.0081 PORT -0.0044 SLOVA -0.1210 SLOVE -0.0031 SPA -0.0213 SWE -0.0106 SWIT -0.0152 UK -0.0030 HUNG -0.0086 CAN -0.0144 CHIL -0.0078 USA -0.0042 BERM -0.0035 AUST -0.0211 NEWZ -0.0538" , header = TRUE,stringsAsFactors=FALSE) library(cluster) k1<-kmeans(DMs[,2],centers=2,nstart=25) plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0)) text(1:46,DMs[,2],DMs[,1],col=k1$cluster) legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"), col=1:2,pch=19) -- *Best Regards,* *Subhamitra Patra* *Phd. Research Scholar* *Department of Humanities and Social Sciences* *Indian Institute of Technology, Kharagpur* *INDIA* [image: Mailtrack] <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> 09/05/22, 04:55:22 PM [[alternative HTML version deleted]]
Dear all, I also tried the k-mean method in some other parts of my data, but still not getting the perfect result as expected. Herewith I attached the related code as below. Please suggest to me where I am lacking in my below code. DMs<-read.table(text="Country Data Israel 0.087320199 Bahrein 0.37991129 HongKong 0.037552721 Japan 0.235350891 Kuwait 0.286427554 oman 0.400096249 Qatar 0.270693298 SouthKorea 0.007407618 SaudiArabia 0.187578553 Singapore 0.008528448 Taiwan 0.027371676 UAE 0.276795224 Austria 0.015132794 Belgium 0.008513907 Cyprus -0.000938601 CzechRepublic 0.017460065 Denmark 0.029490066 Estonia 0.114144041 Finland 0.016245116 France 0.007217465 Germany 0.00371948 Greece -0.008527501 Iceland 0.748097785 Ireland 0.023309721 Latvia 0.178227267 Lithuania 0.100033752 Luxemborg 0.044546393 Malta 0.128679817 Netherland 0.010188604 Norway 0.003437861 Poland 0.006426383 Portugal 0.00753412 Slovakia 0.505992775 Slovenia 0.162475815 Spain 0.00267973 Sweden 0.009967609 Switzerland 0.020557185 UK 0.009340789 Hungary 0.005389885 Canada -0.000531982 Chile 0.007080471 USA 0.013516878 Bermuda -0.338491435 Australia 0.113039242 Newzealand 0.154508239", header = TRUE,stringsAsFactors=FALSE) library(cluster) k1<-kmeans(DMs[,2],centers=2,nstart=25) plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,45), ylim=c(-0.001,2.5)) text(1:45+0.5,DMs[,2]+0.05,DMs[,1],col=k1$cluster) legend(2,1,c("cluster 1: Highly efficient DMs","cluster 2: Less efficient DMs"), col=1:5,pch=19) [image: Mailtrack] <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> 09/05/22, 05:34:10 PM On Mon, Sep 5, 2022 at 5:01 PM Subhamitra Patra <subhamitra.patra at gmail.com> wrote:> Dear all, > > I am about to cluster my datasets by using K-mean clustering techniques in > R, but getting some type of scattered results. Herewith I pasted my code > below. Please suggest to me where I am lacking in my code. I was pasting my > data before applying the K-mean method as follows. > > DMs<-read.table(text="Country DATA > IS -0.0092 > BA -0.0235 > HK -0.0239 > JA -0.0333 > KU -0.0022 > OM -0.0963 > QA -0.0706 > SK -0.0322 > SA -0.1233 > SI -0.0141 > TA -0.0142 > UAE -0.0656 > AUS -0.0230 > BEL -0.0006 > CYP -0.0085 > CR -0.0398 > DEN -0.0423 > EST -0.0604 > FIN -0.0227 > FRA -0.0085 > GER -0.0272 > GrE -0.3519 > ICE -0.0210 > IRE -0.0057 > LAT -0.0595 > LITH -0.0451 > LUXE -0.0023 > MAL -0.0351 > NETH -0.0048 > NOR -0.0495 > POL -0.0081 > PORT -0.0044 > SLOVA -0.1210 > SLOVE -0.0031 > SPA -0.0213 > SWE -0.0106 > SWIT -0.0152 > UK -0.0030 > HUNG -0.0086 > CAN -0.0144 > CHIL -0.0078 > USA -0.0042 > BERM -0.0035 > AUST -0.0211 > NEWZ -0.0538" , > header = TRUE,stringsAsFactors=FALSE) > library(cluster) > k1<-kmeans(DMs[,2],centers=2,nstart=25) > plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0)) > text(1:46,DMs[,2],DMs[,1],col=k1$cluster) > legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"), > col=1:2,pch=19) > > > -- > *Best Regards,* > *Subhamitra Patra* > *Phd. Research Scholar* > *Department of Humanities and Social Sciences* > *Indian Institute of Technology, Kharagpur* > *INDIA* > > [image: Mailtrack] > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> Sender > notified by > Mailtrack > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> 09/05/22, > 04:55:22 PM >-- *Best Regards,* *Subhamitra Patra* *Phd. Research Scholar* *Department of Humanities and Social Sciences* *Indian Institute of Technology, Kharagpur* *INDIA* [[alternative HTML version deleted]]
Hi Subhamitra, I think the fact that you are passing a vector of values rather than a matrix is part of the problem. As you have only one value for each country, The points plotted will be the index on the x-axis and the value for each country on the y-axis. Passing a value for ylim= means that you are cutting off the lowest points. Here is an example that will give you two clusters and show the values for the centers in the middle of the plot. Perhaps this is all you need, but I suspect there is more work to be done. k2<-kmeans(DMs[,2],centers=2) plot(DMs[,2],col=k2$cluster,pch=19,xlim=c(1,46)) text(1:46,DMs[,2],DMs[,1],col=k2$cluster) points(rep(23,2),k2$centers,pch=1:2,cex=2,col=k2$cluster) legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"), col=1:2,pch=19) Jim On Mon, Sep 5, 2022 at 9:31 PM Subhamitra Patra <subhamitra.patra at gmail.com> wrote:> > Dear all, > > I am about to cluster my datasets by using K-mean clustering techniques in > R, but getting some type of scattered results. Herewith I pasted my code > below. Please suggest to me where I am lacking in my code. I was pasting my > data before applying the K-mean method as follows. > > DMs<-read.table(text="Country DATA > IS -0.0092 > BA -0.0235 > HK -0.0239 > JA -0.0333 > KU -0.0022 > OM -0.0963 > QA -0.0706 > SK -0.0322 > SA -0.1233 > SI -0.0141 > TA -0.0142 > UAE -0.0656 > AUS -0.0230 > BEL -0.0006 > CYP -0.0085 > CR -0.0398 > DEN -0.0423 > EST -0.0604 > FIN -0.0227 > FRA -0.0085 > GER -0.0272 > GrE -0.3519 > ICE -0.0210 > IRE -0.0057 > LAT -0.0595 > LITH -0.0451 > LUXE -0.0023 > MAL -0.0351 > NETH -0.0048 > NOR -0.0495 > POL -0.0081 > PORT -0.0044 > SLOVA -0.1210 > SLOVE -0.0031 > SPA -0.0213 > SWE -0.0106 > SWIT -0.0152 > UK -0.0030 > HUNG -0.0086 > CAN -0.0144 > CHIL -0.0078 > USA -0.0042 > BERM -0.0035 > AUST -0.0211 > NEWZ -0.0538" , > header = TRUE,stringsAsFactors=FALSE) > library(cluster) > k1<-kmeans(DMs[,2],centers=2,nstart=25) > plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0)) > text(1:46,DMs[,2],DMs[,1],col=k1$cluster) > legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"), > col=1:2,pch=19) > > > -- > *Best Regards,* > *Subhamitra Patra* > *Phd. Research Scholar* > *Department of Humanities and Social Sciences* > *Indian Institute of Technology, Kharagpur* > *INDIA* > > [image: Mailtrack] > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> > Sender > notified by > Mailtrack > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> > 09/05/22, > 04:55:22 PM > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, I am not at all sure that the following answers the question. The code below ries to find the optimal number of clusters. One of the changes I have made to your call to kmeans is to subset DMs not dropping the dim attribute. library(cluster) max_clust <- 10 wss <- numeric(max_clust) for(k in 1:max_clust) { km <- kmeans(DMs[,2], centers = k, nstart = 25) wss[k] <- km$tot.withinss } plot(wss, type = "b") dm <- DMs[, 2, drop = FALSE] # Where is the elbow, at 2 or at 4? factoextra::fviz_nbclust(dm, kmeans, method = "wss") factoextra::fviz_nbclust(dm, kmeans, method = "silhouette") k2 <- kmeans(dm, centers = 2, nstart = 25) k3 <- kmeans(dm, centers = 3, nstart = 25) k4 <- kmeans(dm, centers = 4, nstart = 25) main2 <- paste(length(k2$centers), "clusters") main3 <- paste(length(k3$centers), "clusters") main4 <- paste(length(k4$centers), "clusters") old_par <- par(mfcol = c(1, 3)) plot(DMs[,2], col = k2$cluster, pch = 19, main = main2) plot(DMs[,2], col = k3$cluster, pch = 19, main = main3) plot(DMs[,2], col = k4$cluster, pch = 19, main = main4) par(old_par) Hope this helps, Rui Barradas ?s 12:31 de 05/09/2022, Subhamitra Patra escreveu:> Dear all, > > I am about to cluster my datasets by using K-mean clustering techniques in > R, but getting some type of scattered results. Herewith I pasted my code > below. Please suggest to me where I am lacking in my code. I was pasting my > data before applying the K-mean method as follows. > > DMs<-read.table(text="Country DATA > IS -0.0092 > BA -0.0235 > HK -0.0239 > JA -0.0333 > KU -0.0022 > OM -0.0963 > QA -0.0706 > SK -0.0322 > SA -0.1233 > SI -0.0141 > TA -0.0142 > UAE -0.0656 > AUS -0.0230 > BEL -0.0006 > CYP -0.0085 > CR -0.0398 > DEN -0.0423 > EST -0.0604 > FIN -0.0227 > FRA -0.0085 > GER -0.0272 > GrE -0.3519 > ICE -0.0210 > IRE -0.0057 > LAT -0.0595 > LITH -0.0451 > LUXE -0.0023 > MAL -0.0351 > NETH -0.0048 > NOR -0.0495 > POL -0.0081 > PORT -0.0044 > SLOVA -0.1210 > SLOVE -0.0031 > SPA -0.0213 > SWE -0.0106 > SWIT -0.0152 > UK -0.0030 > HUNG -0.0086 > CAN -0.0144 > CHIL -0.0078 > USA -0.0042 > BERM -0.0035 > AUST -0.0211 > NEWZ -0.0538" , > header = TRUE,stringsAsFactors=FALSE) > library(cluster) > k1<-kmeans(DMs[,2],centers=2,nstart=25) > plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0)) > text(1:46,DMs[,2],DMs[,1],col=k1$cluster) > legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"), > col=1:2,pch=19) > >