thr3ads.net - R help - [R] Clustering of datasets [Sep 2022]

If this information is useful, please help other people find it:
Share via:

Subhamitra Patra

2022-Sep-05 11:31 UTC

[R] Clustering of datasets

Dear all,

I am about to cluster my datasets by using K-mean clustering techniques in
R, but getting some type of scattered results. Herewith I pasted my code
below. Please suggest to me where I am lacking in my code. I was pasting my
data before applying the K-mean method as follows.

DMs<-read.table(text="Country DATA
                      IS -0.0092
                      BA -0.0235
                      HK -0.0239
                      JA -0.0333
                      KU -0.0022
                      OM -0.0963
                      QA -0.0706
                      SK -0.0322
                      SA -0.1233
                      SI -0.0141
                      TA -0.0142
                      UAE -0.0656
                      AUS -0.0230
                     BEL -0.0006
                     CYP -0.0085
                     CR  -0.0398
                    DEN  -0.0423
                      EST -0.0604
                      FIN -0.0227
                      FRA -0.0085
                     GER -0.0272
                     GrE -0.3519
                     ICE -0.0210
                     IRE -0.0057
                     LAT -0.0595
                    LITH -0.0451
                    LUXE -0.0023
                    MAL  -0.0351
                    NETH -0.0048
                      NOR -0.0495
                      POL -0.0081
                    PORT -0.0044
                    SLOVA -0.1210
                    SLOVE -0.0031
                      SPA -0.0213
                      SWE -0.0106
                    SWIT -0.0152
                      UK -0.0030
                    HUNG -0.0086
                      CAN -0.0144
                    CHIL -0.0078
                      USA -0.0042
                    BERM -0.0035
                    AUST -0.0211
                    NEWZ -0.0538" ,
                 header = TRUE,stringsAsFactors=FALSE)
library(cluster)
k1<-kmeans(DMs[,2],centers=2,nstart=25)
plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less
Integrated"),
col=1:2,pch=19)


-- 
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*

[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
09/05/22,
04:55:22 PM

	[[alternative HTML version deleted]]

Subhamitra Patra

2022-Sep-05 12:04 UTC

head link

[R] Clustering of datasets

Dear all,

I also tried the k-mean method in some other parts of my data, but still
not getting the perfect result as expected.

Herewith I attached the related code as below. Please suggest to me where I
am lacking in my below code.

 DMs<-read.table(text="Country Data
                 Israel 0.087320199
                Bahrein 0.37991129
              HongKong 0.037552721
                  Japan 0.235350891
                Kuwait 0.286427554
                  oman 0.400096249
                  Qatar 0.270693298
            SouthKorea 0.007407618
            SaudiArabia 0.187578553
              Singapore 0.008528448
                Taiwan 0.027371676
                    UAE 0.276795224
                Austria 0.015132794
                Belgium 0.008513907
                Cyprus  -0.000938601
          CzechRepublic 0.017460065
                Denmark 0.029490066
                Estonia 0.114144041
                Finland 0.016245116
                France 0.007217465
                Germany 0.00371948
                Greece -0.008527501
                Iceland 0.748097785
                Ireland 0.023309721
                Latvia 0.178227267
              Lithuania 0.100033752
              Luxemborg 0.044546393
                  Malta 0.128679817
            Netherland 0.010188604
                Norway 0.003437861
                Poland 0.006426383
              Portugal 0.00753412
              Slovakia 0.505992775
              Slovenia 0.162475815
                  Spain 0.00267973
                Sweden 0.009967609
            Switzerland 0.020557185
                    UK 0.009340789
                Hungary 0.005389885
                Canada -0.000531982
                  Chile 0.007080471
                    USA 0.013516878
                Bermuda -0.338491435
              Australia 0.113039242
            Newzealand 0.154508239",
                 header = TRUE,stringsAsFactors=FALSE)
library(cluster)
k1<-kmeans(DMs[,2],centers=2,nstart=25)
plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,45), ylim=c(-0.001,2.5))
text(1:45+0.5,DMs[,2]+0.05,DMs[,1],col=k1$cluster)
legend(2,1,c("cluster 1: Highly efficient DMs","cluster 2: Less
efficient
DMs"),
       col=1:5,pch=19)

[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
09/05/22,
05:34:10 PM

On Mon, Sep 5, 2022 at 5:01 PM Subhamitra Patra <subhamitra.patra at
gmail.com>
wrote:
> Dear all,
>
> I am about to cluster my datasets by using K-mean clustering techniques in
> R, but getting some type of scattered results. Herewith I pasted my code
> below. Please suggest to me where I am lacking in my code. I was pasting my
> data before applying the K-mean method as follows.
>
> DMs<-read.table(text="Country DATA
>                       IS -0.0092
>                       BA -0.0235
>                       HK -0.0239
>                       JA -0.0333
>                       KU -0.0022
>                       OM -0.0963
>                       QA -0.0706
>                       SK -0.0322
>                       SA -0.1233
>                       SI -0.0141
>                       TA -0.0142
>                       UAE -0.0656
>                       AUS -0.0230
>                      BEL -0.0006
>                      CYP -0.0085
>                      CR  -0.0398
>                     DEN  -0.0423
>                       EST -0.0604
>                       FIN -0.0227
>                       FRA -0.0085
>                      GER -0.0272
>                      GrE -0.3519
>                      ICE -0.0210
>                      IRE -0.0057
>                      LAT -0.0595
>                     LITH -0.0451
>                     LUXE -0.0023
>                     MAL  -0.0351
>                     NETH -0.0048
>                       NOR -0.0495
>                       POL -0.0081
>                     PORT -0.0044
>                     SLOVA -0.1210
>                     SLOVE -0.0031
>                       SPA -0.0213
>                       SWE -0.0106
>                     SWIT -0.0152
>                       UK -0.0030
>                     HUNG -0.0086
>                       CAN -0.0144
>                     CHIL -0.0078
>                       USA -0.0042
>                     BERM -0.0035
>                     AUST -0.0211
>                     NEWZ -0.0538" ,
>                  header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(DMs[,2],centers=2,nstart=25)
> plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
> text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
> legend(10,1,c("cluster 1: Highly Integrated","cluster 2:
Less Integrated"),
> col=1:2,pch=19)
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
Sender
> notified by
> Mailtrack
>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
09/05/22,
> 04:55:22 PM
>

-- 
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*

	[[alternative HTML version deleted]]

Jim Lemon

2022-Sep-05 12:06 UTC

head link

[R] Clustering of datasets

Hi Subhamitra,
I think the fact that you are passing a vector of values rather than a
matrix is part of the problem. As you have only one value for each
country, The points plotted will be the index on the x-axis and the
value for each country on the y-axis. Passing a value for ylim= means
that you are cutting off the lowest points. Here is an example that
will give you two clusters and show the values for the centers in the
middle of the plot. Perhaps this is all you need, but I suspect there
is more work to be done.

k2<-kmeans(DMs[,2],centers=2)
plot(DMs[,2],col=k2$cluster,pch=19,xlim=c(1,46))
text(1:46,DMs[,2],DMs[,1],col=k2$cluster)
points(rep(23,2),k2$centers,pch=1:2,cex=2,col=k2$cluster)
legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less
Integrated"),
col=1:2,pch=19)

Jim

On Mon, Sep 5, 2022 at 9:31 PM Subhamitra Patra
<subhamitra.patra at gmail.com> wrote:>
> Dear all,
>
> I am about to cluster my datasets by using K-mean clustering techniques in
> R, but getting some type of scattered results. Herewith I pasted my code
> below. Please suggest to me where I am lacking in my code. I was pasting my
> data before applying the K-mean method as follows.
>
> DMs<-read.table(text="Country DATA
>                       IS -0.0092
>                       BA -0.0235
>                       HK -0.0239
>                       JA -0.0333
>                       KU -0.0022
>                       OM -0.0963
>                       QA -0.0706
>                       SK -0.0322
>                       SA -0.1233
>                       SI -0.0141
>                       TA -0.0142
>                       UAE -0.0656
>                       AUS -0.0230
>                      BEL -0.0006
>                      CYP -0.0085
>                      CR  -0.0398
>                     DEN  -0.0423
>                       EST -0.0604
>                       FIN -0.0227
>                       FRA -0.0085
>                      GER -0.0272
>                      GrE -0.3519
>                      ICE -0.0210
>                      IRE -0.0057
>                      LAT -0.0595
>                     LITH -0.0451
>                     LUXE -0.0023
>                     MAL  -0.0351
>                     NETH -0.0048
>                       NOR -0.0495
>                       POL -0.0081
>                     PORT -0.0044
>                     SLOVA -0.1210
>                     SLOVE -0.0031
>                       SPA -0.0213
>                       SWE -0.0106
>                     SWIT -0.0152
>                       UK -0.0030
>                     HUNG -0.0086
>                       CAN -0.0144
>                     CHIL -0.0078
>                       USA -0.0042
>                     BERM -0.0035
>                     AUST -0.0211
>                     NEWZ -0.0538" ,
>                  header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(DMs[,2],centers=2,nstart=25)
> plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
> text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
> legend(10,1,c("cluster 1: Highly Integrated","cluster 2:
Less Integrated"),
> col=1:2,pch=19)
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
> Sender
> notified by
> Mailtrack
>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
> 09/05/22,
> 04:55:22 PM
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2022-Sep-05 13:02 UTC

head link

[R] Clustering of datasets

Hello,

I am not at all sure that the following answers the question.
The code below ries to find the optimal number of clusters. One of the 
changes I have made to your call to kmeans is to subset DMs not dropping 
the dim attribute.


library(cluster)

max_clust <- 10
wss <- numeric(max_clust)

for(k in 1:max_clust) {
   km <- kmeans(DMs[,2], centers = k, nstart = 25)
   wss[k] <- km$tot.withinss
}
plot(wss, type = "b")

dm <- DMs[, 2, drop = FALSE]
# Where is the elbow, at 2 or at 4?
factoextra::fviz_nbclust(dm, kmeans, method = "wss")
factoextra::fviz_nbclust(dm, kmeans, method = "silhouette")

k2 <- kmeans(dm, centers = 2, nstart = 25)
k3 <- kmeans(dm, centers = 3, nstart = 25)
k4 <- kmeans(dm, centers = 4, nstart = 25)

main2 <- paste(length(k2$centers), "clusters")
main3 <- paste(length(k3$centers), "clusters")
main4 <- paste(length(k4$centers), "clusters")

old_par <- par(mfcol = c(1, 3))
plot(DMs[,2], col = k2$cluster, pch = 19, main = main2)
plot(DMs[,2], col = k3$cluster, pch = 19, main = main3)
plot(DMs[,2], col = k4$cluster, pch = 19, main = main4)
par(old_par)


Hope this helps,

Rui Barradas


?s 12:31 de 05/09/2022, Subhamitra Patra escreveu:> Dear all,
> 
> I am about to cluster my datasets by using K-mean clustering techniques in
> R, but getting some type of scattered results. Herewith I pasted my code
> below. Please suggest to me where I am lacking in my code. I was pasting my
> data before applying the K-mean method as follows.
> 
> DMs<-read.table(text="Country DATA
>                        IS -0.0092
>                        BA -0.0235
>                        HK -0.0239
>                        JA -0.0333
>                        KU -0.0022
>                        OM -0.0963
>                        QA -0.0706
>                        SK -0.0322
>                        SA -0.1233
>                        SI -0.0141
>                        TA -0.0142
>                        UAE -0.0656
>                        AUS -0.0230
>                       BEL -0.0006
>                       CYP -0.0085
>                       CR  -0.0398
>                      DEN  -0.0423
>                        EST -0.0604
>                        FIN -0.0227
>                        FRA -0.0085
>                       GER -0.0272
>                       GrE -0.3519
>                       ICE -0.0210
>                       IRE -0.0057
>                       LAT -0.0595
>                      LITH -0.0451
>                      LUXE -0.0023
>                      MAL  -0.0351
>                      NETH -0.0048
>                        NOR -0.0495
>                        POL -0.0081
>                      PORT -0.0044
>                      SLOVA -0.1210
>                      SLOVE -0.0031
>                        SPA -0.0213
>                        SWE -0.0106
>                      SWIT -0.0152
>                        UK -0.0030
>                      HUNG -0.0086
>                        CAN -0.0144
>                      CHIL -0.0078
>                        USA -0.0042
>                      BERM -0.0035
>                      AUST -0.0211
>                      NEWZ -0.0538" ,
>                   header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(DMs[,2],centers=2,nstart=25)
> plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
> text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
> legend(10,1,c("cluster 1: Highly Integrated","cluster 2:
Less Integrated"),
> col=1:2,pch=19)
> 
>

R help - Sep 2022 - Clustering of datasets

[R] Clustering of datasets

[R] Clustering of datasets

[R] Clustering of datasets

[R] Clustering of datasets