Murk Wuite
2004-May-24 13:58 UTC
[R] non-hierarchical non-exclusive clustering of large data sets
Hi, I'm trying to use R to cluster words with related meanings. Does anyone know of a non-hierarchical clustering method in R that produces non-exclusive clusters? With non-exclusive, I mean that words should be allowed to be part of multiple clusters. So my data matrix would look something like: T1 T2 T3 CLOWN_N 0 1 0 BANK_N 3 0 2 RIVER_N 0 0 2 FLOW_V 0 0 3 MONEY_N 2 0 0 PAY_V 2 0 0 The first line indicates the noun "clown" occurred only once in my text collection, namely in text 2. Ideally, the clustering method would produce the clusters [bank_n,river_n,flow_v], [bank_n,money_n,pay_v] and [clown_n]. The data matrix I would use would be much bigger than the one above, its dimensions would be in the order of (100000,100000). Does anyone know if this would cause practical problems, perhaps very slow clustering? Best wishes, Murk Wuite, MA student Department of Language and Speech Katholieke Universiteit Nijmegen, The Netherlands
Bhaskar S. Manda
2004-May-24 15:12 UTC
[R] non-hierarchical non-exclusive clustering of large data sets
On Mon, 24 May 2004 15:58:57 +0200, Murk Wuite wrote:> I'm trying to use R to cluster words with related meanings. Does anyone > know of a non-hierarchical clustering method in R that produces > non-exclusive clusters? With non-exclusive, I mean that words shouldThe "fanny" method in library(cluster) outputs probabilities of membership in each cluster.> the one above, its dimensions would be in the order of (100000, > 100000). Does anyone know if this would cause practical problems, > perhaps very slow clustering?I had a much smaller matrix, 4000x3, fanny took about 4 minutes wall clock time on a lightly loaded (there were many other processes, but none computational) 1.4 GHz Athlon, It was completely CPU-bound. -- bhaskar