Norm.Good at csiro.au
2007-Jun-13 00:27 UTC
[R] Setting a minimum number of observations within an individual cluster
Hi I'm trying to cluster a continuous dataset with a varying number of clusters and with a restriction that each cluster must have more than 'x' number of observations. I have tried the clara function, using silhouette to give me the neighbouring cluster mediod of each observation, then merging an observation from a cluster with less than 'x' obs. into its' neighbour, but this comes unstuck if their neighbours also have less than 'x' obs. So I'm fiddling with dendrogram objects. Is there any way of using the 'members' attribute to cut a dendrogram to only include branches with more than 'x' members? An example output from clara with a data set of 1000 obs. and 82 clusters> cl$clusinfosize max_diss av_diss isolation [1,] 1 0.00000000 0.00000000 0.0000000 [2,] 3 1.19840221 0.40837142 5.0938561 [3,] 4 0.16867940 0.07284916 0.5830662 [4,] 2 0.13380551 0.06690276 0.5687456 [5,] 3 0.21862177 0.13428115 1.0371933 [6,] 5 0.10384573 0.05270335 0.5887887 [7,] 2 0.08547020 0.04273510 0.4846024 [8,] 4 0.18615254 0.09545067 0.7396865 [9,] 7 0.15688781 0.08572887 0.6234016 . . . [75,] 11 0.26963387 0.13985980 1.1447836 [76,] 6 0.21439705 0.11953365 0.5754212 [77,] 5 0.21131875 0.12920395 0.5567024 [78,] 3 0.17126227 0.09685930 0.7160261 [79,] 2 0.22622024 0.11311012 0.9457984 [80,] 2 0.10268536 0.05134268 0.5167766 [81,] 1 0.00000000 0.00000000 0.0000000 [82,] 2 0.10018837 0.05009419 0.2474480 Note that all observations from cluster 1 are not necessarily closest to cluster 2. Cheers Norm Norm Good Statistician CMIS/e-Health Research Centre A joint venture between CSIRO and the Queensland Government Lvl 20, 300 Adelaide Street BRISBANE QLD 4000 PO Box 10842 Adelaide Street BRISBANE QLD 4000 Ph: 07 3024 1640 Fx: 07 3024 1690 Em: norm.good at csiro.au? Web: http://e-hrc.net/