Hi, I'm playing around with the 'bigmemory' package, and I have finally managed to create some really big matrices. However, only now I realize that there may not be functions made for what I want to do with the matrices... I would like to perform a cluster analysis based on a big.matrix. Googling around I have found indications that a certain kmeans.big.matrix() function should exist. It is mentioned, among other places, in this document: http://www.stat.yale.edu/~jay/662/bm-nojss.pdf Unfortunately, on my computer the following happens:> require(bigmemory)Loading required package: bigmemory> kmeans.big.matrixError: object 'kmeans.big.matrix' not found Does anybody know how to get the kmeans.big.matrix() function? Are there other cluster algorithms out there ready to accept a big.matrix as input? Thanks! -- Michael Knudsen micknudsen at gmail.com http://lifeofknudsen.blogspot.com/
The kmeans.big.matrix function seems to have disappeared between the 2.3 and the 3.5 release of the library. (?) I am not sure why. You can download old versions from CRAN. The default package on Fedora (R-bigmemory-2.3-4.fc11.x86_64 or similar for your platform) also has the function (from the 2.3 library) and this may also work for other distributions. In general, most R functions somehow and somewhere relies on being able to convert your matrix to a vector which is unfortunately going to fail for matrices with more than 2^31-1 elements (on any platform) and is therefore not supported by many of the big data packages. Allan On 20/07/09 20:46, Michael Knudsen wrote:> Hi, > > I'm playing around with the 'bigmemory' package, and I have finally > managed to create some really big matrices. However, only now I > realize that there may not be functions made for what I want to do > with the matrices... > > I would like to perform a cluster analysis based on a big.matrix. > Googling around I have found indications that a certain > kmeans.big.matrix() function should exist. It is mentioned, among > other places, in this document: > > http://www.stat.yale.edu/~jay/662/bm-nojss.pdf > > Unfortunately, on my computer the following happens: > > >> require(bigmemory) >> > Loading required package: bigmemory > >> kmeans.big.matrix >> > Error: object 'kmeans.big.matrix' not found > > Does anybody know how to get the kmeans.big.matrix() function? Are > there other cluster algorithms out there ready to accept a big.matrix > as input? > > Thanks! > >[[alternative HTML version deleted]]
This sort of question is ideal to send directly to the maintainer. We've removed kmeans.big.matrix for the time being and will place it in a new package, bigmemoryAnalytics. bigmemory itself is the core building block and tool, and we don't want to pollute it with lots of extras. Allan's point is right: big data packages (like bigmemory and ff) can't be used directly with R functions (like lm). And because of R's design you can't extract subsets with more than 2^31-1 elements, even though the big.matrix can be as large as you need (with filebacking). I hope that helps. Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]]