mlennert@ulb.ac.be
2001-Apr-27 09:18 UTC
weithed clustering (was: Re: [R] problems with a large data set)
kmeans and clara work great. Thank you for the tip. I have another question: Is it possible to weight the observations in a cluster analysis ? I haven't found any mention of this in the kmeans of clara help texts. Moritz Lennert Charg? de recherche IGEAT - ULB t?l: 32-2-650.65.16 fax: 32-2-650.50.92 email: mlennert at ulb.ac.be> On Wed, 25 Apr 2001, Moritz Lennert wrote: > > > Hello, > > > > I have trouble with a data set that comprises 2136 lines of 20 columns. > > I would like to do a hierarchical clustering and I tried the following: > > > > ages.hclust <- hclust(dist(ages, method="euclidean"), "ward") > > > > but I get the following error message: > > > > Error: cannot allocate vector of size 17797 Kb > > > > When I try to do the dist() alone first without the hclust(), I get the > > same type of message. > > > > Then I tried with the RPgSQL packages by typing > > > > >db.connect(dbname="space") > > Connected to database "space" on "localhost" > > > bind.db.proxy("ages") > > > ages.hclust <- hclust(dist(ages, method="euclidean"), "ward") > > That does not help. You need to retrieve the data to use it! > > > This time I get: > > > > Error in dist(ages, method = "euclidean") : > > NA/NaN/Inf in foreign function call (arg 1) > > In addition: Warning message: > > NAs introduced by coercion > > > > > > I've checked, and I can't find any missing values of something similar. > > Could someone tell me if I'm doing something wrong, or wether this is > > just too much data for R ? > > This may be too much data for your computer, but not for R: I've > just done this in a few seconds. I suggest that you need more memory > (real or virtual): on my simulation it used about 80Mb. > > I should say that doing agglomerative hierarchical cluster on thousands of > points makes little sense: it is a not a good way to find large clusters: > try a partitioning method like kmeans or clara (in package cluster). > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272860 (secr) > Oxford OX1 3TG, UK Fax: +44 1865 272595-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._