Ben Harrison
2013-Aug-22 11:39 UTC
[R] Error: cannot allocate vector of size 18.4 Gb (NbClust)
I have a 70363 x 5 double matrix that I am playing with. > head(df) GR SP SN LN NEUT 1 1.458543 1.419946 -0.2928088 -0.2615358 -0.5565227 2 1.432041 1.418573 -0.2942713 -0.2634204 -0.5927334 3 1.406642 1.418226 -0.2958296 -0.2652920 -0.6267121 4 1.382284 1.418843 -0.2974732 -0.2671464 -0.6585127 5 1.358903 1.420360 -0.2991920 -0.2689792 -0.6881888 6 1.336436 1.422717 -0.3009756 -0.2707864 -0.7157941 In an attempt to explore it using clustering, I have tried the NbClust package with the following code: library(NbClust) nc <- NbClust(df, min.nc=5, max.nc=7, method="kmeans") which returns the error Error: cannot allocate vector of size 18.4 Gb My workstation is an Intel Xeon with 23.5 GiB of memory. I am very ignorant of the requirements of the package, but for comparison using stats::kmeans to cluster the data set is no problem. What is the issue with this? Can anyone spell it out for me, so that perhaps I can do something to reduce the problem a little? Or offer a solution to work around the memory restrictions? Should I round off the variables? Should I sample it, and analyse the sample? Thanks, Ben.
Michael Weylandt
2013-Aug-22 11:57 UTC
[R] Error: cannot allocate vector of size 18.4 Gb (NbClust)
On Aug 22, 2013, at 7:39, Ben Harrison <harb at student.unimelb.edu.au> wrote:> I have a 70363 x 5 double matrix that I am playing with. > > > head(df) > GR SP SN LN NEUT > 1 1.458543 1.419946 -0.2928088 -0.2615358 -0.5565227 > 2 1.432041 1.418573 -0.2942713 -0.2634204 -0.5927334 > 3 1.406642 1.418226 -0.2958296 -0.2652920 -0.6267121 > 4 1.382284 1.418843 -0.2974732 -0.2671464 -0.6585127 > 5 1.358903 1.420360 -0.2991920 -0.2689792 -0.6881888 > 6 1.336436 1.422717 -0.3009756 -0.2707864 -0.7157941 > > In an attempt to explore it using clustering, I have tried the NbClust package with the following code: > > library(NbClust) > nc <- NbClust(df, min.nc=5, max.nc=7, method="kmeans") > > which returns the error > Error: cannot allocate vector of size 18.4 Gb > > My workstation is an Intel Xeon with 23.5 GiB of memory. > > I am very ignorant of the requirements of the package, but for comparison using stats::kmeans to cluster the data set is no problem. > What is the issue with this? Can anyone spell it out for me, so that perhaps I can do something to reduce the problem a little? > Or offer a solution to work around the memory restrictions? > Should I round off the variables? > Should I sample it, and analyse the sample?No idea about the problem specifics but what are your OS and version of R? You might be limited there. More likely, however, is that your problem is just really big: if you need two copies of the allocated vector, your already in trouble, regardless of everything else you're doing. You might need to look at some of the big memory specialized packages off the CRAN HPC task view. Michael> > Thanks, > Ben. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.