thr3ads.net - R help - [R] non-hierarchical non-exclusive clustering of large data sets [May 2004]

If this information is useful, please help other people find it:
Share via:

Murk Wuite

2004-May-24 13:58 UTC

[R] non-hierarchical non-exclusive clustering of large data sets

Hi,

I'm trying to use R to cluster words with related meanings. Does anyone
know of a non-hierarchical clustering method in R that produces
non-exclusive clusters? With non-exclusive, I mean that words should be
allowed to be part of multiple clusters. So my data matrix would look
something like:

		T1	T2	T3
CLOWN_N	0	1	0
BANK_N	3	0	2
RIVER_N	0	0	2
FLOW_V	0	0	3
MONEY_N	2	0	0
PAY_V		2	0	0

The first line indicates the noun "clown" occurred only once in my
text
collection, namely in text 2. Ideally, the clustering method would
produce the clusters [bank_n,river_n,flow_v], [bank_n,money_n,pay_v] and
[clown_n].
The data matrix I would use would be much bigger than the one above, its
dimensions would be in the order of (100000,100000). Does anyone know if
this would cause practical problems, perhaps very slow clustering?

Best wishes,

Murk Wuite, MA student
Department of Language and Speech
Katholieke Universiteit Nijmegen, The Netherlands

Bhaskar S. Manda

2004-May-24 15:12 UTC

head link

[R] non-hierarchical non-exclusive clustering of large data sets

On Mon, 24 May 2004 15:58:57 +0200, Murk Wuite wrote: > I'm trying to use R to cluster words with related meanings. Does anyone
> know of a non-hierarchical clustering method in R that produces
> non-exclusive clusters? With non-exclusive, I mean that words should 
The "fanny" method in library(cluster) outputs probabilities of
membership in
each cluster.
> the one above, its dimensions would be in the order of (100000,
> 100000). Does anyone know if this would cause practical problems,
>  perhaps very slow clustering?
I had a much smaller matrix, 4000x3, fanny took about 4 minutes wall clock
time on a lightly loaded (there were many other processes, but none
computational) 1.4 GHz Athlon, It was completely CPU-bound. 

--
bhaskar

R help - May 2004 - non-hierarchical non-exclusive clustering of large data sets

[R] non-hierarchical non-exclusive clustering of large data sets

[R] non-hierarchical non-exclusive clustering of large data sets

Possibly Parallel Threads