Hi R users, I have problem with very large matrix - sparse matrix. Is in R any possibility how effective use matrix with a lots of the same number (0 or some X)? I think every element of matrix in R now needs own memory space. Thanks Martin Gotz -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 8 Jan 2001, Martin Gotz wrote:> Hi R users, > > I have problem with very large matrix - sparse matrix. Is in R any > possibility how effective use matrix with a lots of the same number (0 or > some X)? > > I think every element of matrix in R now needs own memory space.A matrix in R is a vector plus a dim atttribute. I presume you mean a numeric matrix (there are other types). Then it needs a 8byte space per entry. There are other ways to handle matrices: look at package Matrix, for example. One obvious representation is to store the non-zero elements and their locations. Just how large is this matrix, and does it have a pattern to its sparsity? And what do you want to do with it? Doing things with sparse matrices has a tendency to make them less sparse. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 8 Jan 2001, Prof Brian Ripley wrote:> A matrix in R is a vector plus a dim atttribute. I presume you mean a > numeric matrix (there are other types). Then it needs a 8byte space per > entry.Yes. Numeric matrix.> There are other ways to handle matrices: look at package Matrix, for > example. > > One obvious representation is to store the non-zero elements and their > locations. > > Just how large is this matrix, and does it have a pattern to its sparsity?10000 x 10000 elements and 50 000 x 50 000 elements There is no pattern of sparsity. Values in matrix means some distance between words (for 10000 and 5 0000 words). 98% of elements in matrix are the same (with value 7500)> And what do you want to do with it? Doing things with sparse matrices has > a tendency to make them less sparse.Hierarchical clustering. Input matrix is distance matrix for hierarchical clustering with function hclust. Thank you Martin Gotz -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 8 Jan 2001, Martin Gotz wrote:> On Mon, 8 Jan 2001, Prof Brian Ripley wrote: > > > A matrix in R is a vector plus a dim atttribute. I presume you mean a > > numeric matrix (there are other types). Then it needs a 8byte space per > > entry. > > Yes. Numeric matrix. > > > There are other ways to handle matrices: look at package Matrix, for > > example. > > > > One obvious representation is to store the non-zero elements and their > > locations. > > > > Just how large is this matrix, and does it have a pattern to its sparsity? > > 10000 x 10000 elements and 50 000 x 50 000 elements > There is no pattern of sparsity. Values in matrix means some distance > between words (for 10000 and 5 0000 words). > 98% of elements in matrix are the same (with value 7500) > > > And what do you want to do with it? Doing things with sparse matrices has > > a tendency to make them less sparse. > > Hierarchical clustering. Input matrix is distance matrix for hierarchical > clustering with function hclust.No chance. You need to find an algorithm that does not store the distance matrix. I think *any* clustering algorithm on 50 000 elements is going to be pretty pointless, but other low-storage algorithms do exist. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._