thr3ads.net - R help - [R] Enquiry about Hierarchical Clustering [Sep 2003]

If this information is useful, please help other people find it:
Share via:

Setsuko Kinoshita

2003-Sep-27 04:30 UTC

[R] Enquiry about Hierarchical Clustering

Dear Sir,

This is Ms. Setsuko Kinoshita writing from Japan.

I have a question about " missing value" in Hierarchical Clustering.
Hierarchical Clustering was not available the data with missing value for
earlier version of "R".
I used Euclidean distance and complete linkage method for
"plot(hclust(dist()),hang=-1)".

How are missing values treated for Hierarchical Clustering in the latest "R
1.7.1" program?
e.g. : Is an average replaced ?

Yours Sincerely,

-----
Setsuko Kinoshita

Social?and Environmental Medicine,?
Graduate School of Comprehensive Human Sciences,
University of Tsukuba
1-1-1, Tennoudai, Tsukuba,
Ibaraki, 305-8575, Japan
Tel&Fax: +81-29-853-3489
E-mail:setsuko at epidemiology.md.tsukuba.ac.jp(office)
E-mail:setsuko at mbj.ocn.ne.jp(private)

kjetil brinchmann halvorsen

2003-Sep-27 06:16 UTC

head link

[R] Enquiry about Hierarchical Clustering

On 27 Sep 2003 at 13:30, Setsuko Kinoshita wrote:

Try package cluster:

library(cluster)
?daisy # computes dissimilarity matrix with missing data
?agnes # aglomerative nesting

Kjetil Halvorsen
> Dear Sir,
> 
> This is Ms. Setsuko Kinoshita writing from Japan.
> 
> I have a question about " missing value" in Hierarchical
Clustering.
> Hierarchical Clustering was not available the data with missing value for
earlier version of "R".
> I used Euclidean distance and complete linkage method for
"plot(hclust(dist()),hang=-1)".
> 
> How are missing values treated for Hierarchical Clustering in the latest
"R 1.7.1" program?
> e.g. : Is an average replaced ?
> 
> Yours Sincerely,
> 
> -----
> Setsuko Kinoshita
> 
> Social $B!! (Band Environmental Medicine, $B!! (B
> Graduate School of Comprehensive Human Sciences,
> University of Tsukuba
> 1-1-1, Tennoudai, Tsukuba,
> Ibaraki, 305-8575, Japan
> Tel&Fax: +81-29-853-3489
> E-mail:setsuko at epidemiology.md.tsukuba.ac.jp(office)
> E-mail:setsuko at mbj.ocn.ne.jp(private)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Adaikalavan RAMASAMY

2003-Sep-27 09:05 UTC

head link

[R] Enquiry about Hierarchical Clustering

Hclust is unable to handle missing values in dist().

There will be missing values in dist() function if 
1. all elements in a row are missing
2. all pairs between any two rows have at least one missing values.

In the former case, it is better to remove the row with all missing as
it is completely uninformative. The latter is harder to detect and I am
not sure how to deal with this.

Here is how dist() calculates its output for the following data:

   NA    3    5
    2    4    6

dist( rbind( c(NA, 3, 5) , c(2,4,6) ) ) = 1.732051 
= sqrt( [ (6-5)^2 + (4-3)^2  ] x 3/2 )

The factor 3/2 scales up the sum of squares of difference to account for
the missing pair.

Hope this helps.

--
Adaikalavan Ramasamy 


> Dear Sir,
> 
> This is Ms. Setsuko Kinoshita writing from Japan.
> 
> I have a question about " missing value" in Hierarchical
Clustering.
> Hierarchical Clustering was not available the data with missing value 
> for earlier version of "R". I used Euclidean distance and
complete
> linkage method for "plot(hclust(dist()),hang=-1)".
> 
> How are missing values treated for Hierarchical Clustering in the 
> latest "R 1.7.1" program? e.g. : Is an average replaced ?
> 
> Yours Sincerely,
> 
> -----
> Setsuko Kinoshita
> 
> Social $B!! (Band Environmental Medicine, $B!! (B
> Graduate School of Comprehensive Human Sciences,
> University of Tsukuba
> 1-1-1, Tennoudai, Tsukuba,
> Ibaraki, 305-8575, Japan
> Tel&Fax: +81-29-853-3489
> E-mail:setsuko at epidemiology.md.tsukuba.ac.jp(office)
> E-mail:setsuko at mbj.ocn.ne.jp(private)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Martin Maechler

2003-Sep-27 17:11 UTC

head link

[R] Enquiry about Hierarchical Clustering

>>>>> "Adaikalavan" == Adaikalavan RAMASAMY
<ramasamya at gis.a-star.edu.sg>
>>>>>     on Sat, 27 Sep 2003 17:05:43 +0800 writes:
    Adaikalavan> Hclust is unable to handle missing values in
    Adaikalavan> dist().  There will be missing values in dist()
    Adaikalavan> function if 1. all elements in a row are
    Adaikalavan> missing 2. all pairs between any two rows have
    Adaikalavan> at least one missing values.

As Kjetial Halvorsen said,  use  daisy() from the cluster
package instead of dist().
The daisy() function has two advantages over dist():
1. Handling of missing values
2. Handling of data with continuous *and* categorical variables.

[Btw, this has not really anything to do with the clustering
 method used *after* the distance has been computed.
 You can use hclust() on a daisy result if you want]

Regards,
Martin Maechler <maechler at stat.math.ethz.ch>
http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><

R help - Sep 2003 - Enquiry about Hierarchical Clustering

[R] Enquiry about Hierarchical Clustering

[R] Enquiry about Hierarchical Clustering

[R] Enquiry about Hierarchical Clustering

[R] Enquiry about Hierarchical Clustering