Good afternoon. I hope I have provided enough info to get my question answered. I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456 When running a K-Means clustering routine is it possible to get the actual data from each cluster into a DF? I have reviewed a number of tutorials and unless I missed it somewhere I would like to know if it is possible. https://www.datacamp.com/community/tutorials/k-means-clustering-r https://....guru99..../r-k-means-clustering.html https://datascienceplus.com/k-means-clustering-in-r/ https://datascienceplus.com/finding-optimal-number-of-clusters/ http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/ http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/ For example: I ran the below and get K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797 Can the 1511 values of SavingsReversed and ProviderID , 1610 values of SavingsReversed and ProviderID, etc.. be run out into DF's? Thank you for your help. WHP str(rr0) Classes 'data.table' and 'data.frame':14355 obs. of 2 variables: $ SavingsReversed: num 0 0 61 128 160 ... $ ProviderID : num 113676 113676 116494 116641 116641 ... - attr(*, ".internal.selfref")=<externalptr> head(rr0, n=35) SavingsReversed ProviderID 1: 0.00 113676 2: 0.00 113676 3: 61.00 116494 4: 128.25 116641 5: 159.60 116641 6: 372.66 119316 7: 18.79 121319 8: 15.64 121319 9: 0.00 121319 10: 18.79 121319 11: 23.00 121319 12: 18.79 121319 13: 0.00 121319 14: 25.86 121319 15: 14.00 121319 16: 113.00 121545 17: 50.00 121545 18: 1155.32 121545 19: 113.00 121545 20: 197.20 121545 21: 0.00 121780 22: 36.00 122536 23: 1171.32 125198 24: 1171.32 125198 25: 43.00 125303 26: 0.00 125881 27: 69.64 128435 28: 420.18 128435 29: 175.18 128435 30: 71.54 128435 31: 99.85 128435 32: 0.00 128435 33: 42.75 128435 34: 175.18 128435 35: 846.45 128435 set.seed(213) rr0a <- kmeans(rr0, 10) View(rr0a) summary(rr0a) # Length Class Mode # cluster 14355 -none- numeric # centers 20 -none- numeric # totss 1 -none- numeric # withinss 10 -none- numeric # tot.withinss 1 -none- numeric # betweenss 1 -none- numeric # size 10 -none- numeric # iter 1 -none- numeric # ifault 1 -none- numeric x1 <- as.data.frame(rr0a$centers) sort(x1) #SavingsReversed ProviderID # 2 75.19665 2773789.2 # 3 99.31959 4147091.6 # 5 101.21070 3558532.7 # 4 103.41147 3893274.4 # 1 105.38310 2241031.2 # 8 114.61562 3240701.5 # 10 121.14184 4718727.6 # 9 153.70536 4470878.9 # 6 156.84426 5560636.6 # 7 185.09745 173732.9 print(rr0a) # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797 # # Cluster means: # SavingsReversed ProviderID # 1 105.38310 2241031.2 # 2 75.19665 2773789.2 # 3 99.31959 4147091.6 # 4 103.41147 3893274.4 # 5 101.21070 3558532.7 # 6 156.84426 5560636.6 # 7 185.09745 173732.9 # 8 114.61562 3240701.5 # 9 153.70536 4470878.9 # 10 121.14184 4718727.6 #Within cluster sum of squares by cluster: # [1] 74529288379846 25846368411171 4692898666512 6277704963344 8428785199973 90824041558798 1468798013919 12143462193009 5483877005233 # [10] 51547955737867 # (between_SS / total_SS = 98.7 %) # # Available components: # # [1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault" Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
Please see ?kmeans and note the "cluster" component of the returned value that would appear to provide the info you seek. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <Bill.Poling at zelis.com> wrote:> Good afternoon. I hope I have provided enough info to get my question > answered. > > I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456 > > When running a K-Means clustering routine is it possible to get the actual > data from each cluster into a DF? > > I have reviewed a number of tutorials and unless I missed it somewhere I > would like to know if it is possible. > > https://www.datacamp.com/community/tutorials/k-means-clustering-r > https://....guru99..../r-k-means-clustering.html > https://datascienceplus.com/k-means-clustering-in-r/ > https://datascienceplus.com/finding-optimal-number-of-clusters/ > http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/ > http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/ > > For example: > > I ran the below and get K-means clustering with 10 clusters of sizes 1511, > 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797 > Can the 1511 values of SavingsReversed and ProviderID , 1610 values of > SavingsReversed and ProviderID, etc.. be run out into DF's? > > Thank you for your help. > > WHP > > str(rr0) > Classes 'data.table' and 'data.frame':14355 obs. of 2 variables: > $ SavingsReversed: num 0 0 61 128 160 ... > $ ProviderID : num 113676 113676 116494 116641 116641 ... > - attr(*, ".internal.selfref")=<externalptr> > > head(rr0, n=35) > SavingsReversed ProviderID > 1: 0.00 113676 > 2: 0.00 113676 > 3: 61.00 116494 > 4: 128.25 116641 > 5: 159.60 116641 > 6: 372.66 119316 > 7: 18.79 121319 > 8: 15.64 121319 > 9: 0.00 121319 > 10: 18.79 121319 > 11: 23.00 121319 > 12: 18.79 121319 > 13: 0.00 121319 > 14: 25.86 121319 > 15: 14.00 121319 > 16: 113.00 121545 > 17: 50.00 121545 > 18: 1155.32 121545 > 19: 113.00 121545 > 20: 197.20 121545 > 21: 0.00 121780 > 22: 36.00 122536 > 23: 1171.32 125198 > 24: 1171.32 125198 > 25: 43.00 125303 > 26: 0.00 125881 > 27: 69.64 128435 > 28: 420.18 128435 > 29: 175.18 128435 > 30: 71.54 128435 > 31: 99.85 128435 > 32: 0.00 128435 > 33: 42.75 128435 > 34: 175.18 128435 > 35: 846.45 128435 > > set.seed(213) > rr0a <- kmeans(rr0, 10) > View(rr0a) > summary(rr0a) > # Length Class Mode > # cluster 14355 -none- numeric > # centers 20 -none- numeric > # totss 1 -none- numeric > # withinss 10 -none- numeric > # tot.withinss 1 -none- numeric > # betweenss 1 -none- numeric > # size 10 -none- numeric > # iter 1 -none- numeric > # ifault 1 -none- numeric > > x1 <- as.data.frame(rr0a$centers) > sort(x1) > #SavingsReversed ProviderID > # 2 75.19665 2773789.2 > # 3 99.31959 4147091.6 > # 5 101.21070 3558532.7 > # 4 103.41147 3893274.4 > # 1 105.38310 2241031.2 > # 8 114.61562 3240701.5 > # 10 121.14184 4718727.6 > # 9 153.70536 4470878.9 > # 6 156.84426 5560636.6 > # 7 185.09745 173732.9 > print(rr0a) > # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, > 1076, 580, 2429, 728, 3797 > # > # Cluster means: > # SavingsReversed ProviderID > # 1 105.38310 2241031.2 > # 2 75.19665 2773789.2 > # 3 99.31959 4147091.6 > # 4 103.41147 3893274.4 > # 5 101.21070 3558532.7 > # 6 156.84426 5560636.6 > # 7 185.09745 173732.9 > # 8 114.61562 3240701.5 > # 9 153.70536 4470878.9 > # 10 121.14184 4718727.6 > #Within cluster sum of squares by cluster: > # [1] 74529288379846 25846368411171 4692898666512 6277704963344 > 8428785199973 90824041558798 1468798013919 12143462193009 5483877005233 > # [10] 51547955737867 > # (between_SS / total_SS = 98.7 %) > # > # Available components: > # > # [1] "cluster" "centers" "totss" "withinss" > "tot.withinss" "betweenss" "size" "iter" "ifault" > > > > > > > > > > Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}} > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
You should also read the manual page for ?split and learn how to work with lists: # Split the data according to cluster membership # to create a list of data frames rr0.clus <- split(rr0, rr0a$cluster) # The data frame for cluster 1: rr0.clus[[1]] -------------------------------------------------------- David L. Carlson Department of Anthropology Texas A&M University -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter Sent: Saturday, December 8, 2018 9:46 AM To: Bill.Poling at zelis.com Cc: R-help <r-help at r-project.org> Subject: Re: [R] Help with K-Means output Please see ?kmeans and note the "cluster" component of the returned value that would appear to provide the info you seek. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <Bill.Poling at zelis.com> wrote:> Good afternoon. I hope I have provided enough info to get my question > answered. > > I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456 > > When running a K-Means clustering routine is it possible to get the actual > data from each cluster into a DF? > > I have reviewed a number of tutorials and unless I missed it somewhere I > would like to know if it is possible. > > https://www.datacamp.com/community/tutorials/k-means-clustering-r > https://www.guru99.com/r-k-means-clustering.html > https://datascienceplus.com/k-means-clustering-in-r/ > https://datascienceplus.com/finding-optimal-number-of-clusters/ > http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/ > http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/ > > For example: > > I ran the below and get K-means clustering with 10 clusters of sizes 1511, > 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797 > Can the 1511 values of SavingsReversed and ProviderID , 1610 values of > SavingsReversed and ProviderID, etc.. be run out into DF's? > > Thank you for your help. > > WHP > > str(rr0) > Classes 'data.table' and 'data.frame':14355 obs. of 2 variables: > $ SavingsReversed: num 0 0 61 128 160 ... > $ ProviderID : num 113676 113676 116494 116641 116641 ... > - attr(*, ".internal.selfref")=<externalptr> > > head(rr0, n=35) > SavingsReversed ProviderID > 1: 0.00 113676 > 2: 0.00 113676 > 3: 61.00 116494 > 4: 128.25 116641 > 5: 159.60 116641 > 6: 372.66 119316 > 7: 18.79 121319 > 8: 15.64 121319 > 9: 0.00 121319 > 10: 18.79 121319 > 11: 23.00 121319 > 12: 18.79 121319 > 13: 0.00 121319 > 14: 25.86 121319 > 15: 14.00 121319 > 16: 113.00 121545 > 17: 50.00 121545 > 18: 1155.32 121545 > 19: 113.00 121545 > 20: 197.20 121545 > 21: 0.00 121780 > 22: 36.00 122536 > 23: 1171.32 125198 > 24: 1171.32 125198 > 25: 43.00 125303 > 26: 0.00 125881 > 27: 69.64 128435 > 28: 420.18 128435 > 29: 175.18 128435 > 30: 71.54 128435 > 31: 99.85 128435 > 32: 0.00 128435 > 33: 42.75 128435 > 34: 175.18 128435 > 35: 846.45 128435 > > set.seed(213) > rr0a <- kmeans(rr0, 10) > View(rr0a) > summary(rr0a) > # Length Class Mode > # cluster 14355 -none- numeric > # centers 20 -none- numeric > # totss 1 -none- numeric > # withinss 10 -none- numeric > # tot.withinss 1 -none- numeric > # betweenss 1 -none- numeric > # size 10 -none- numeric > # iter 1 -none- numeric > # ifault 1 -none- numeric > > x1 <- as.data.frame(rr0a$centers) > sort(x1) > #SavingsReversed ProviderID > # 2 75.19665 2773789.2 > # 3 99.31959 4147091.6 > # 5 101.21070 3558532.7 > # 4 103.41147 3893274.4 > # 1 105.38310 2241031.2 > # 8 114.61562 3240701.5 > # 10 121.14184 4718727.6 > # 9 153.70536 4470878.9 > # 6 156.84426 5560636.6 > # 7 185.09745 173732.9 > print(rr0a) > # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, > 1076, 580, 2429, 728, 3797 > # > # Cluster means: > # SavingsReversed ProviderID > # 1 105.38310 2241031.2 > # 2 75.19665 2773789.2 > # 3 99.31959 4147091.6 > # 4 103.41147 3893274.4 > # 5 101.21070 3558532.7 > # 6 156.84426 5560636.6 > # 7 185.09745 173732.9 > # 8 114.61562 3240701.5 > # 9 153.70536 4470878.9 > # 10 121.14184 4718727.6 > #Within cluster sum of squares by cluster: > # [1] 74529288379846 25846368411171 4692898666512 6277704963344 > 8428785199973 90824041558798 1468798013919 12143462193009 5483877005233 > # [10] 51547955737867 > # (between_SS / total_SS = 98.7 %) > # > # Available components: > # > # [1] "cluster" "centers" "totss" "withinss" > "tot.withinss" "betweenss" "size" "iter" "ifault" > > > > > > > > > > Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}} > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you Bert, I see, so I think this is the process? set.seed(213) rr0a1 <- kmeans(rr0, 10) summary(rr0a1) #Just the cluster #Length Class Mode #cluster 14355 -none- numeric head(rr0a1$cluster, n=35) # [1] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 Xcluster <- as.data.frame(rr0a1$cluster) head(Xcluster, n=5) #rr0a1$cluster # 1 7 # 2 7 # 3 7 # 4 7 # 5 7 tail(Xcluster, n=5) #rr0a1$cluster # 14351 6 # 14352 6 # 14353 6 # 14354 6 # 14355 6 And I can just join this DF with my original DF used for the KMean, correct? The vertical order is the same? WHP From: Bert Gunter <bgunter.4567 at gmail.com> Sent: Saturday, December 8, 2018 10:46 AM To: Bill Poling <Bill.Poling at zelis.com> Cc: R-help <r-help at r-project.org> Subject: Re: [R] Help with K-Means output Please see ?kmeans and note the "cluster" component of the returned value that would appear to provide the info you seek. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <mailto:Bill.Poling at zelis.com> wrote: Good afternoon. I hope I have provided enough info to get my question answered. I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456 When running a K-Means clustering routine is it possible to get the actual data from each cluster into a DF? I have reviewed a number of tutorials and unless I missed it somewhere I would like to know if it is possible. https://www.datacamp.com/community/tutorials/k-means-clustering-r https://www.guru99.com/r-k-means-clustering.html https://datascienceplus.com/k-means-clustering-in-r/ https://datascienceplus.com/finding-optimal-number-of-clusters/ http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/ http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/ For example: I ran the below and get K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797 Can the 1511 values of SavingsReversed and ProviderID , 1610 values of SavingsReversed and ProviderID, etc.. be run out into DF's? Thank you for your help. WHP str(rr0) Classes 'data.table' and 'data.frame':14355 obs. of 2 variables: $ SavingsReversed: num 0 0 61 128 160 ... $ ProviderID : num 113676 113676 116494 116641 116641 ... - attr(*, ".internal.selfref")=<externalptr> head(rr0, n=35) SavingsReversed ProviderID 1: 0.00 113676 2: 0.00 113676 3: 61.00 116494 4: 128.25 116641 5: 159.60 116641 6: 372.66 119316 7: 18.79 121319 8: 15.64 121319 9: 0.00 121319 10: 18.79 121319 11: 23.00 121319 12: 18.79 121319 13: 0.00 121319 14: 25.86 121319 15: 14.00 121319 16: 113.00 121545 17: 50.00 121545 18: 1155.32 121545 19: 113.00 121545 20: 197.20 121545 21: 0.00 121780 22: 36.00 122536 23: 1171.32 125198 24: 1171.32 125198 25: 43.00 125303 26: 0.00 125881 27: 69.64 128435 28: 420.18 128435 29: 175.18 128435 30: 71.54 128435 31: 99.85 128435 32: 0.00 128435 33: 42.75 128435 34: 175.18 128435 35: 846.45 128435 set.seed(213) rr0a <- kmeans(rr0, 10) View(rr0a) summary(rr0a) # Length Class Mode # cluster 14355 -none- numeric # centers 20 -none- numeric # totss 1 -none- numeric # withinss 10 -none- numeric # tot.withinss 1 -none- numeric # betweenss 1 -none- numeric # size 10 -none- numeric # iter 1 -none- numeric # ifault 1 -none- numeric x1 <- as.data.frame(rr0a$centers) sort(x1) #SavingsReversed ProviderID # 2 75.19665 2773789.2 # 3 99.31959 4147091.6 # 5 101.21070 3558532.7 # 4 103.41147 3893274.4 # 1 105.38310 2241031.2 # 8 114.61562 3240701.5 # 10 121.14184 4718727.6 # 9 153.70536 4470878.9 # 6 156.84426 5560636.6 # 7 185.09745 173732.9 print(rr0a) # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797 # # Cluster means: # SavingsReversed ProviderID # 1 105.38310 2241031.2 # 2 75.19665 2773789.2 # 3 99.31959 4147091.6 # 4 103.41147 3893274.4 # 5 101.21070 3558532.7 # 6 156.84426 5560636.6 # 7 185.09745 173732.9 # 8 114.61562 3240701.5 # 9 153.70536 4470878.9 # 10 121.14184 4718727.6 #Within cluster sum of squares by cluster: # [1] 74529288379846 25846368411171 4692898666512 6277704963344 8428785199973 90824041558798 1468798013919 12143462193009 5483877005233 # [10] 51547955737867 # (between_SS / total_SS = 98.7 %) # # Available components: # # [1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault" Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}} ______________________________________________ mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.