Hi, I want to perform a hierarchical clustering using the median as linkage metric. As I understand it the function hcluster in package amap have this option but it does not produce the results that I expect. In the example below M is a matrix of similarities that is transformed into a matrix of dissimilarities D.> D[,1] [,2] [,3] [,4] [,5] [1,] 1.0 0.9 0.2 0.2 0.1 [2,] 0.9 1.0 0.7 1.0 0.0 [3,] 0.2 0.7 1.0 0.8 0.8 [4,] 0.2 1.0 0.8 1.0 0.5 [5,] 0.1 0.0 0.8 0.5 1.0 Since [2,5]=0 the objects 2 and 5 should be grouped together in the first step as is done by the agnes function but hcluster start by clustering objects 3 and 4. Why is this? Regards Henrik library(cluster) library(amap) # Create matrix M M <- matrix(nr=5,nc=5) M[,1] <- c(0,1,8,8,9) M[,2] <- c(1,0,3,0,10) M[,3] <- c(8,3,0,2,2) M[,4] <- c(8,0,2,0,5) M[,5] <- c(9,10,2,5,0) # Create matrix D n <- dim(M)[1] o <- matrix(1,n,n) mn <- (1/max(M))*M D <- o-mn # Clustering using hcluster ce <- hcluster(D,link="median") plot(ce) # Clustering using agnes av <- agnes(D,diss=T,method="average") pltree(av) -- View this message in context: http://r.789695.n4.nabble.com/hcluster-with-linkage-median-tp2715585p2715585.html Sent from the R help mailing list archive at Nabble.com.
On Mon, Sep 27, 2010 at 8:22 AM, Kennedy <henrik.aldberg at gmail.com> wrote:> > Hi, > > I want to perform a hierarchical clustering using the median as linkage > metric. As I understand it the function hcluster in package amap have this > option but it does not produce the results that I expect. > > In the example below M is a matrix of similarities that is transformed into > a matrix of dissimilarities D. >> D > ? ? [,1] [,2] [,3] [,4] [,5] > [1,] ?1.0 ?0.9 ?0.2 ?0.2 ?0.1 > [2,] ?0.9 ?1.0 ?0.7 ?1.0 ?0.0 > [3,] ?0.2 ?0.7 ?1.0 ?0.8 ?0.8 > [4,] ?0.2 ?1.0 ?0.8 ?1.0 ?0.5 > [5,] ?0.1 ?0.0 ?0.8 ?0.5 ?1.0 > > Since [2,5]=0 the objects 2 and 5 should be grouped together in the first > step as is done by the agnes function but hcluster start by clustering > objects 3 and 4. Why is this? >>From reading the hcluster help file I get the sense that the input is_not_ the distance matrix, but a numeric matrix from which the distance is computed. I think you should simply look at hclust since that does implement the median method. Peter
On Mon, Sep 27, 2010 at 8:22 AM, Kennedy <henrik.aldberg at gmail.com> wrote:> > Hi, > > I want to perform a hierarchical clustering using the median as linkage > metric. As I understand it the function hcluster in package amap have this > option but it does not produce the results that I expect.Also, if you have a large(r) data set, the package flashClust provides a much faster (n^2 vs. n^3) replacement for hclust with exactly the same results. Peter
Thank you Peter for your help. I had tried hclust before but I made the mistake of using the D matrix above instead of a dist object. Hence library(flashClust) d <- as.dist(D) # Clustering using hclust hc <- hclust(d, method = "median",members=NULL) # Clustering using flashClust fc <- flashClust(d,method="median",members=NULL) solves the problem I posted. But another question arises. How is the median linkage calculated? I want it to be like this: Given clusters C1=(1,2,3) and C2=(4), the distance between C1 and C2 is: d(C1,C2) = median(d(1,4),d(2,4),d(3,4)) = median(0.2, 1.0, 0.8) = 0.8, where the values d(1,4), d(2,4) and d(3,4) are taken from the D matrix above. If this is not the case, is there any function that uses this linkage metric? Thanks Henrik -- View this message in context: http://r.789695.n4.nabble.com/hcluster-with-linkage-median-tp2715585p2716728.html Sent from the R help mailing list archive at Nabble.com.