I'm seeking some advice on effectively using the new Matrix library in R1.9.0 for operations with large dense matrices. I'm working on integral operator models (implemented numerically via matrix operations) and except for the way entries are generated, the examples below really are representative of my problem sizes. My main concern is speed of large dense matrix multiplication. In R 1.8.1 (Windows2000 Professional, dual AthlonMP 2800)> a=matrix(rnorm(2500*2500),2500,2500); v=rnorm(2500); > system.time(a%*%v);[1] 0.11 0.00 0.12 NA NA In R 1.9.0, same platform:> a=matrix(rnorm(2500*2500),2500,2500); v=rnorm(2500); > system.time(a%*%v);[1] 0.24 0.00 0.25 NA NA These differences are consistent. But using the Matrix library in 1.9.0, the discrepancy disappears> library(Matrix); > a=Matrix(rnorm(2500*2500),2500,2500); v=Matrix(rnorm(2500),2500,1); > system.time(a%*%v);[1] 0.11 0.00 0.11 NA NA The problem is> b=a/3Error in a/3 : non-numeric argument to binary operator which seems to mean that I can't just rewrite code to use Matrix instead of matrix objects -- I would have to do lots and lots of conversions between Matrix and matrix. Am I missing a trick here somewhere, that would let me use only Matrix objects and do with them the things one can do with matrix objects? Or some other way to avoid the twofold speed hit in moving to 1.9? I've tried using the Rblas.dll for AthlonXP on CRAN, and it doesn't help. Thanks in advance, Steve Stephen P. Ellner (spe2 at cornell.edu) Department of Ecology and Evolutionary Biology Corson Hall, Cornell University, Ithaca NY 14853-2701 Phone (607) 254-4221 FAX (607) 255-8088
Stephen Ellner <spe2 at cornell.edu> writes:> I'm seeking some advice on effectively using the new Matrix > library in R1.9.0 for operations with large dense matrices. I'm working on > integral operator models (implemented numerically via matrix operations) > and except for the way entries are generated, the examples below really are > representative of my problem sizes. > > My main concern is speed of large dense matrix multiplication. > In R 1.8.1 (Windows2000 Professional, dual AthlonMP 2800) > > a=matrix(rnorm(2500*2500),2500,2500); v=rnorm(2500); > > system.time(a%*%v); > [1] 0.11 0.00 0.12 NA NA > > In R 1.9.0, same platform: > > a=matrix(rnorm(2500*2500),2500,2500); v=rnorm(2500); > > system.time(a%*%v); > [1] 0.24 0.00 0.25 NA NA > > These differences are consistent. But using the Matrix library > in 1.9.0, the discrepancy disappears > > library(Matrix); > > a=Matrix(rnorm(2500*2500),2500,2500); v=Matrix(rnorm(2500),2500,1); > > system.time(a%*%v); > [1] 0.11 0.00 0.11 NA NA > > The problem is > > b=a/3 > Error in a/3 : non-numeric argument to binary operator > > which seems to mean that I can't just rewrite code to use Matrix > instead of matrix objects -- I would have to do lots and lots of > conversions between Matrix and matrix. Am I missing a trick > here somewhere, that would let me use only Matrix objects and do > with them the things one can do with matrix objects? Or some other > way to avoid the twofold speed hit in moving to 1.9?The trick is waiting for the author of the Matrix package to write the methods for arithmetic operations or contributing said methods yourself. :-) Actually I want to at least e-discuss the implementation with John Chambers and other members of the R Development Core Team before doing much more implementation. There are some subtle issues about how to arrange the classes and this is usually the point where John can provide invaluable guidance.
>>>>> "Stephen" == Stephen Ellner <spe2 at cornell.edu> >>>>> on Wed, 28 Apr 2004 17:19:38 -0400 writes:Stephen> I'm seeking some advice on effectively using the Stephen> new Matrix library in R1.9.0 for operations with Stephen> large dense matrices. I'm working on integral Stephen> operator models (implemented numerically via matrix Stephen> operations) and except for the way entries are Stephen> generated, the examples below really are Stephen> representative of my problem sizes. Stephen> My main concern is speed of large dense matrix Stephen> multiplication. In R 1.8.1 (Windows2000 Stephen> Professional, dual AthlonMP 2800) >> a=matrix(rnorm(2500*2500),2500,2500); v=rnorm(2500); >> system.time(a%*%v); Stephen> [1] 0.11 0.00 0.12 NA NA Stephen> In R 1.9.0, same platform: >> a=matrix(rnorm(2500*2500),2500,2500); v=rnorm(2500); >> system.time(a%*%v); Stephen> [1] 0.24 0.00 0.25 NA NA <... and then you talk about the Matrix **package** (not `library') ...> Unfortunately, the 1.9.0 vs. 1.8.1 performance comparison is not just on your computer/OS/R compilation/... version, but I see the same phenomenon on my Linux and Solaris clients, e.g., > n <- 2500; set.seed(1); a <- matrix(rnorm(n * n), n); v <- rnorm(n) > gc() used (Mb) gc trigger (Mb) Ncells 435805 11.7 741108 19.8 Vcells 6378701 48.7 19150535 146.2 For R 1.8.1 on an AMD Athlon > tmp <- gc(); system.time(for(i in 1:4) y <- a %*% v) [1] 0.36 0.00 0.39 0.00 0.00 For R 1.9.0 (patched): > tmp <- gc(); system.time(for(i in 1:4) y <- a %*% v) [1] 0.81 0.00 0.87 0.00 0.00 ---- On a fast (hyper threaded) pentium 4 with 2 GB RAM, the efficiency loss factor is even about 3 {with several runs, showing only one here}: R 1.8.1:> tmp <- gc(); system.time(for(i in 1:10) y <- a %*% v)[1] 0.23 0.00 0.23 0.00 0.00 R 1.9.0 (patched):> tmp <- gc(); system.time(for(i in 1:10) y <- a %*% v)[1] 0.75 0.00 0.74 0.00 0.00 --------- So, there's definitely something we (R core) should look into. Martin Maechler