Hi folks, I tried for the first time hclust. Unfortunately, with missing data in my data file, it doesn't seem to work. I found no information about how to consider missing data. Omission of all missings is not really an option as I would loose to many cases. Thanks in advance Holger -- View this message in context: http://www.nabble.com/Cluster-analysis-with-missing-data-tp24474486p24474486.html Sent from the R help mailing list archive at Nabble.com.
vegdist() in the vegan package optionally allows pairwise deletion of missing values when computing dissimilarities. The result can be used as the first agrument to hclust() ('Caveat emptor', of course.) ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Hollix [Holger.steinmetz at web.de] Sent: 14 July 2009 16:42 To: r-help at r-project.org Subject: [R] Cluster analysis with missing data Hi folks, I tried for the first time hclust. Unfortunately, with missing data in my data file, it doesn't seem to work. I found no information about how to consider missing data. Omission of all missings is not really an option as I would loose to many cases. Thanks in advance Holger -- View this message in context: http://www.nabble.com/Cluster-analysis-with-missing-data-tp24474486p24474486.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Mon, 2009-07-13 at 23:42 -0700, Hollix wrote:> Hi folks, > > I tried for the first time hclust. Unfortunately, with missing data in my > data file, it doesn't seem > to work. I found no information about how to consider missing data. > > Omission of all missings is not really an option as I would loose to many > cases.Holger, hclust takes a dissimilarity matrix as input, not your data, so the problem is in finding an appropriate dissimilarity/distance coefficient that handles missing data. Once such measure is Gower's coefficient and is implemented in function 'daisy' in recommended package 'cluster'. Try: require(cluster) ?daisy to read about it. Also 'vegdist' in package 'vegan' has an ability to not consider pairwise missingness. See ?vegdist after loading 'vegan' and in particular, the 'na.rm' argument. Whether either of these (i.e. the resulting dissimilarities) make sense for your particular problem is another matter... HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Reasonably Related Threads
- All possible combinations of functions within a function
- NMDS plot and Adonis (PerMANOVA) of community composition with presence absence and relative intensity
- hvcluster() with distance method from vegdist(), package = vegan
- metaMDS NMDS: use of alternative distances?
- simprof test using jaccard distance