>>>>> "Mark" == Mark Marques <mmarques at
power.inescn.pt>
>>>>> on Fri, 28 Feb 2003 09:51:02 +0000 writes:
Mark> I have "small" problem ...
Mark> with the cluster library each time I try to use
Mark> the "agnes","pam","fanny" functions
with more than 20000 elements
Mark> I get the following error:
>> Error in vector("double", length) : negative length
vectors are not allowed
>> In addition: Warning message:
>> NAs introduced by coercion
"negative" is certainly misleading here; I presume it's an
integer overflow somewhere.
But (with agnes()) I could never get close, even
a <- agnes(dist(cbind(1,rnorm(5000))))
pumps my R up to a memory footprint of 638 MBytes...
Mark> But with the clara function everything works fine...
because clara() is for large applications !!
In clustering, 20000 is definitely "large".
I would recommend to use quite a bit larger `samples' and `sampsize'
than the default in clara().
All routines but clara() work with a dissimilarity/distance object
of size n*(n-1)/2 (basically one the triangles of a symmetric n^2 matrix).
The implementation will need to duplicate these at least, and
one double is 8 bytes.
Mark> What could be wrong ?
You have no chance of getting anything from agnes() or pam()
when you want to cluster 20'000 objects at least not on 32-bit computers.
It seems though one could carefully change agnes() (e.g.) to use
less duplication of the large objects and save memory..
Martin Maechler <maechler at stat.math.ethz.ch>
http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><