Greetings R-users, I have been using the fpc package in R to cluster my data. Speficically I am using kmeansruns clustering. I would like to know how I use R to partition data into clusters. What I am doing is as follows. # Use csv file as input ##################### wholeset = read.csv("Spellman800genesImputed.csv") # exclude first col (gene names) ########################## wholeset2 = wholeset[,-1] #Use fpc ########################### library(fpc) cl.kmr10 <- kmeansruns(wholeset2,k=10,runs=10) #append cluster label to original dataset ################### cl2 <- data.frame(wholeset, cl.kmr10$cluster) After this step, I write cl2 into a csv file and manually partition data into its respective clusters using Excel. Then I read the data from each clusters back into R for further analysis. Can I do the data partitioning directly in R? TQ -- Suhaila Zainudin PhD Candidate Universiti Teknologi Malaysia [[alternative HTML version deleted]]
Can you supply a small self-contained example of this? --- Suhaila Zainudin <suhaila.zainudin at gmail.com> wrote:> Greetings R-users, > > I have been using the fpc package in R to cluster my > data. Speficically I am > using kmeansruns clustering. > I would like to know how I use R to partition data > into clusters. What I am > doing is as follows. > > # Use csv file as input > ##################### > wholeset = read.csv("Spellman800genesImputed.csv") > > # exclude first col (gene names) > ########################## > wholeset2 = wholeset[,-1] > > #Use fpc > ########################### > library(fpc) > > cl.kmr10 <- kmeansruns(wholeset2,k=10,runs=10) > > > #append cluster label to original dataset > ################### > cl2 <- data.frame(wholeset, cl.kmr10$cluster) > > After this step, I write cl2 into a csv file and > manually partition data > into its respective clusters using Excel. > Then I read the data from each clusters back into R > for further analysis. > > Can I do the data partitioning directly in R? > > TQ > > > -- > Suhaila Zainudin > PhD Candidate > Universiti Teknologi Malaysia > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
Hello, Referring to your email, I attaced a rar file containing a sample data file and the R script for clustering the data. After I finished the clustering process, I write the clustering result into a csv file. To partition the data, I open the csv file in Excel, sort the data according to cluster number and put the clustered data into separate csv files. I am trying to run the above step (which is currently done in Excel) in R. Any comments are appreciated. -- Suhaila Zainudin PhD Candidate Universiti Teknologi Malaysia -- Suhaila Zainudin PhD Candidate Universiti Teknologi Malaysia
Hi, Thanks for your reply. I have tried yr suggestions with success. TQVM. I have another query, say I want to write each cluster into a csv file such as follows: clus1 <- my.clusters[[1]] write.csv(clus1, file = "clus1.csv") .... ..... ..... clus10 <- my.clusters[[10]] write.csv(clus10,file = "clus10.csv") I can write the functions to do that for all 10 clusters by repeatedly calling write.csv as above. Is there a more elegant way of doing it by using a loop (for example). for(i in 1:xx) { clusn <- my.clusters[[n]] write.csv(clusn, file = "clusn.csv") } I am trying something using list as well, as folows: names( my.clusters ) <- paste('Cluster_',1:10, sep='') Now I can use my.clusters$Cluster_1 The above will return all members of Cluster_1. I have an idea to use lapply or sapply to do write.csv on each components(Cluster_1.....Cluster_n) from my.clusters, that will do the same as the loop example. Maybe this is the better way.. Any comments are appreciated. Thanks! [[alternative HTML version deleted]]