Hi, all I would like to use R to perform k-means clustering on my data which included 33 samples measured with ~1000 variables. I have already used kmeans package for this analysis, and showed that there are 4 clusters in my data. However, it's really difficult to plot this cluster in 2-D format since the "huge" number of variables. One possible way is to project the multidimensional space into 2-D platform, but I could not find any good way to do that. Any suggestions or comments will be really helpful! Thanks, Meng [[alternative HTML version deleted]]
I wonder if it makes sense to reduce the dimensionality of the variables somehow? David Cross d.cross at tcu.edu www.davidcross.us On May 18, 2011, at 9:41 AM, Meng Wu wrote:> Hi, all > > I would like to use R to perform k-means clustering on my data which > included 33 samples measured with ~1000 variables. I have already used > kmeans package for this analysis, and showed that there are 4 clusters in my > data. However, it's really difficult to plot this cluster in 2-D format > since the "huge" number of variables. One possible way is to project the > multidimensional space into 2-D platform, but I could not find any good way > to do that. Any suggestions or comments will be really helpful! > > Thanks, > > Meng > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Peter Langfelder
2011-May-18 15:25 UTC
[R] Help with 2-D plot of k-mean clustering analysis
On Wed, May 18, 2011 at 7:41 AM, Meng Wu <mengwu1002 at gmail.com> wrote:> Hi, all > > ?I would like to use R to perform k-means clustering on my data which > included 33 samples measured with ~1000 variables. I have already used > kmeans package for this analysis, and showed that there are 4 clusters in my > data. However, it's really difficult to plot this cluster in 2-D format > since the "huge" number of variables. One possible way is to project the > multidimensional space into 2-D platform, but I could not find any good way > to do that. Any suggestions or comments will be really helpful!You could use multidimensional scaling, function cmdscale(), to produce a 2-dimensional representation of your data, then plot it using colors that correspond to the clusters. For example, suppose your data is stored in matrix X (1000x33). I assume you clustered the samples, not the variables, so you have a vector label[] with length 33 that has values between 1 and 4. Since k-means uses Euclidean distance, you would re-create the distance dst = dist(t(X)) then feed it into cmdscale() mds = cmdscale(dst); then plot it: plot(mds, col = label) HTH, Peter
Claudia Beleites
2011-May-18 16:55 UTC
[R] Help with 2-D plot of k-mean clustering analysis
Hi Meng,> I would like to use R to perform k-means clustering on my data which > included 33 samples measured with ~1000 variables. I have already used > kmeans package for this analysis, and showed that there are 4 clusters in my > data. However, it's really difficult to plot this cluster in 2-D format > since the "huge" number of variables. One possible way is to project the > multidimensional space into 2-D platform, but I could not find any good way > to do that. Any suggestions or comments will be really helpful!For suggestions it would be extremely helpful to tell us what kind of variables your 1000 variables are. Parallel coordinate plots plot values over (many) variables. Whether this is useful, depends very much on your variables: E.g. I have spectral channels, they have an intrinsic order and the values have physically the same meaning (and almost the same range), so the parallel coordinate plot comes naturally (it produces in fact the spectra). Claudia> > Thanks, > > Meng > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.beleites at ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399