thr3ads.net - R help - [R] Cluster analysis with missing data [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Hollix

2009-Jul-14 06:42 UTC

[R] Cluster analysis with missing data

Hi folks,

I tried for the first time hclust. Unfortunately, with missing data in my
data file, it doesn't seem
to work. I found no information about how to consider missing data.

Omission of all missings is not really an option as I would loose to many
cases.

Thanks in advance
Holger
-- 
View this message in context:
http://www.nabble.com/Cluster-analysis-with-missing-data-tp24474486p24474486.html
Sent from the R help mailing list archive at Nabble.com.

Bill.Venables at csiro.au

2009-Jul-14 08:10 UTC

head link

[R] Cluster analysis with missing data

vegdist() in the vegan package optionally allows pairwise deletion of missing
values when computing dissimilarities.  The result can be used as the first
agrument to hclust()

('Caveat emptor', of course.)
________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On
Behalf Of Hollix [Holger.steinmetz at web.de]
Sent: 14 July 2009 16:42
To: r-help at r-project.org
Subject: [R]  Cluster analysis with missing data

Hi folks,

I tried for the first time hclust. Unfortunately, with missing data in my
data file, it doesn't seem
to work. I found no information about how to consider missing data.

Omission of all missings is not really an option as I would loose to many
cases.

Thanks in advance
Holger
--
View this message in context:
http://www.nabble.com/Cluster-analysis-with-missing-data-tp24474486p24474486.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Gavin Simpson

2009-Jul-14 08:41 UTC

head link

[R] Cluster analysis with missing data

On Mon, 2009-07-13 at 23:42 -0700, Hollix wrote:> Hi folks,
> 
> I tried for the first time hclust. Unfortunately, with missing data in my
> data file, it doesn't seem
> to work. I found no information about how to consider missing data.
> 
> Omission of all missings is not really an option as I would loose to many
> cases.
Holger,

hclust takes a dissimilarity matrix as input, not your data, so the
problem is in finding an appropriate dissimilarity/distance coefficient
that handles missing data.

Once such measure is Gower's coefficient and is implemented in function
'daisy' in recommended package 'cluster'. Try:

require(cluster)
?daisy

to read about it.

Also 'vegdist' in package 'vegan' has an ability to not consider
pairwise missingness. See ?vegdist after loading 'vegan' and in
particular, the 'na.rm' argument.

Whether either of these (i.e. the resulting dissimilarities) make sense
for your particular problem is another matter...

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jul 2009 - Cluster analysis with missing data

[R] Cluster analysis with missing data

[R] Cluster analysis with missing data

[R] Cluster analysis with missing data

Possibly Parallel Threads