Greetings R-users,
I have been using the fpc package in R to cluster my data. Speficically I am
using kmeansruns clustering.
I would like to know how I use R to partition data into clusters. What I am
doing is as follows.
# Use csv file as input
#####################
wholeset = read.csv("Spellman800genesImputed.csv")
# exclude first col (gene names)
##########################
wholeset2 = wholeset[,-1]
#Use fpc
###########################
library(fpc)
cl.kmr10 <- kmeansruns(wholeset2,k=10,runs=10)
#append cluster label to original dataset
###################
cl2 <- data.frame(wholeset, cl.kmr10$cluster)
After this step, I write cl2 into a csv file and manually partition data
into its respective clusters using Excel.
Then I read the data from each clusters back into R for further analysis.
Can I do the data partitioning directly in R?
TQ
--
Suhaila Zainudin
PhD Candidate
Universiti Teknologi Malaysia
[[alternative HTML version deleted]]
Can you supply a small self-contained example of this? --- Suhaila Zainudin <suhaila.zainudin at gmail.com> wrote:> Greetings R-users, > > I have been using the fpc package in R to cluster my > data. Speficically I am > using kmeansruns clustering. > I would like to know how I use R to partition data > into clusters. What I am > doing is as follows. > > # Use csv file as input > ##################### > wholeset = read.csv("Spellman800genesImputed.csv") > > # exclude first col (gene names) > ########################## > wholeset2 = wholeset[,-1] > > #Use fpc > ########################### > library(fpc) > > cl.kmr10 <- kmeansruns(wholeset2,k=10,runs=10) > > > #append cluster label to original dataset > ################### > cl2 <- data.frame(wholeset, cl.kmr10$cluster) > > After this step, I write cl2 into a csv file and > manually partition data > into its respective clusters using Excel. > Then I read the data from each clusters back into R > for further analysis. > > Can I do the data partitioning directly in R? > > TQ > > > -- > Suhaila Zainudin > PhD Candidate > Universiti Teknologi Malaysia > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
Hello, Referring to your email, I attaced a rar file containing a sample data file and the R script for clustering the data. After I finished the clustering process, I write the clustering result into a csv file. To partition the data, I open the csv file in Excel, sort the data according to cluster number and put the clustered data into separate csv files. I am trying to run the above step (which is currently done in Excel) in R. Any comments are appreciated. -- Suhaila Zainudin PhD Candidate Universiti Teknologi Malaysia -- Suhaila Zainudin PhD Candidate Universiti Teknologi Malaysia
Hi,
Thanks for your reply. I have tried yr suggestions with success. TQVM.
I have another query, say I want to write each cluster into a csv file such
as follows:
clus1 <- my.clusters[[1]]
write.csv(clus1, file = "clus1.csv")
....
.....
.....
clus10 <- my.clusters[[10]]
write.csv(clus10,file = "clus10.csv")
I can write the functions to do that for all 10 clusters by repeatedly
calling write.csv as above. Is there a more elegant way of doing it by
using a loop (for example).
for(i in 1:xx) {
clusn <- my.clusters[[n]]
write.csv(clusn, file = "clusn.csv")
}
I am trying something using list as well, as folows:
names( my.clusters ) <- paste('Cluster_',1:10, sep='')
Now I can use
my.clusters$Cluster_1
The above will return all members of Cluster_1.
I have an idea to use lapply or sapply to do write.csv on each
components(Cluster_1.....Cluster_n) from my.clusters, that will do the same
as the loop example. Maybe this is the better way..
Any comments are appreciated.
Thanks!
[[alternative HTML version deleted]]