On Tue, 17 Nov 2009, akonla wrote:
>
> Hi,
>
> I am new to clustering in R and I have a dataset with approximately 17,000
> rows and 8 columns with each data point a numerical character with three
> decimal places. I would like to cluster the 8 columns so that I get a
> dendrogram as an output. So, I am simply creating a distance matrix of my
> data, using the 'hclust' function, and then plotting the results
(see below,
> my data is contained in the text file).
>
> x<-read.table('SEP_IR_1113_3.txt',
header=TRUE,sep="\t')
> x.dist=dist(x)
See
?dist
which explains
This function computes and returns the distance matrix computed by
using the specified distance measure to compute the distances
between the rows of a data matrix.
You are trying to cluster 17,000 rows.
No wonder it (dist) is taking its time!
Chuck
> hc=hclust(x.dist,method="average")
> plot(hc, hang=-1)
>
> Unfortunately, the hclust function, although it produces no error terms,
> takes a very long time to run (>4 hours) and my computer kills the
program
> before it finishes. I don't think this data set is so large to cause
such a
> long computing time, and I have plenty of memory since I am running this
> analysis on our university computing cluster.
>
> Has anyone run into this problem before and does anyone have any tips on
how
> I can speed up processing? I can provide extra information if necessary
> regarding my problem.
>
> Thank you!
> --
> View this message in context:
http://old.nabble.com/hclust-too-slow--tp26395774p26395774.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901