Hi,
We are trying to interpret the clusters generated by hclust method of R
"stats"
package. The problem here is when i get the hc$order then there is some
order, while exporting to file that order is lost. Here is the example code
and their results:
> hc <- hclust(dist(USArrests), "ave")
> plot(hc)
> hc$label
[1] "Alabama" "Alaska" "Arizona"
"Arkansas"
[5] "California" "Colorado"
"Connecticut" "Delaware"
[9] "Florida" "Georgia" "Hawaii"
"Idaho"
[13] "Illinois" "Indiana" "Iowa"
"Kansas"
[17] "Kentucky" "Louisiana" "Maine"
"Maryland"
[21] "Massachusetts" "Michigan" "Minnesota"
"Mississippi"
[25] "Missouri" "Montana" "Nebraska"
"Nevada"
[29] "New Hampshire" "New Jersey" "New
Mexico" "New York"
[33] "North Carolina" "North Dakota" "Ohio"
"Oklahoma"
[37] "Oregon" "Pennsylvania" "Rhode
Island" "South Carolina"
[41] "South Dakota" "Tennessee" "Texas"
"Utah"
[45] "Vermont" "Virginia"
"Washington" "West Virginia"
[49] "Wisconsin" "Wyoming"> hc$order
[1] 9 33 5 20 3 31 8 1 18 13 32 22 28 2 24 40 47 37 50 36 46 39 21 30
25
[26] 4 42 10 6 43 12 27 17 26 35 44 14 16 7 38 11 48 19 41 34 45 23 49 15
29> hc$height
[1] 2.291288 3.834058 3.929377 6.236986 6.637771 7.355270
[7] 8.027453 8.537564 10.184218 10.736739 10.771175 11.456439
[13] 12.438692 12.614278 12.878100 13.044922 13.297368 13.352260
[19] 13.896043 14.501034 15.026107 15.122897 15.453120 15.454449
[25] 16.425489 16.891499 18.417331 18.993398 20.198479 20.598507
[31] 21.167192 22.595978 23.972143 26.363428 26.713777 27.779904
[37] 28.012211 28.095803 29.054195 33.117815 38.527912 39.394633
[43] 41.094765 44.283922 44.837933 54.746831 77.605024 89.232093
[49] 152.313999
> hc$merge
[,1] [,2]
[1,] -15 -29
[2,] -17 -26
[3,] -14 -16
[4,] -13 -32
[5,] -35 -44
[6,] -36 -46
[7,] -7 -38
[8,] -19 -41
[9,] -49 1
[10,] -50 6
[11,] -48 8
[12,] -21 -30
[13,] -27 2
[14,] -4 -42
[15,] -37 10
[16,] -34 -45
[17,] -22 -28
[18,] 3 7
[19,] -3 -31
[20,] -6 -43
[21,] -12 13
[22,] 5 18
[23,] -20 19
[24,] -1 -18
[25,] -47 15
[26,] -8 24
[27,] 4 17
[28,] -23 9
[29,] -25 14
[30,] 21 22
[31,] -24 -40
[32,] -39 12
[33,] -10 20
[34,] 26 27
[35,] 25 32
[36,] 16 28
[37,] -5 23
[38,] -2 31
[39,] 29 33
[40,] 11 36
[41,] -9 -33
[42,] 34 38
[43,] -11 40
[44,] 37 42
[45,] 35 39
[46,] 30 43
[47,] 41 44
[48,] 45 46
[49,] 47 48
Plot generates a dendrogram with clustered nodes. Ideal solution for us
would be, a method which generates a matrix with distance attributes for
each node from the dendrogram. Even if anyone could suggest a method such
that we could keep the hc$order structure intact. It would help us a lot.
Second problem is the interpretation of the matrix which is generated by
"hc$merge" command.
Thanking You
-Tarun
[[alternative HTML version deleted]]