Without reaching out to another package in R, I wonder what the best way is to speed enhance the following toy example? Over the years I have become very comfortable with the family of apply functions and generally not good at finding an improvement for speed. This toy example is small, but my real data has many millions of rows and the same operations is repeated many times and so finding a less expensive alternative would be helpful. mm <- matrix(rnorm(100), ncol = 10) rn <- apply(mm, 1, prod) [[alternative HTML version deleted]]
On Tue, 8 Nov 2016, Doran, Harold wrote:> Without reaching out to another package in R, I wonder what the best way is to speed enhance the following toy example? Over the years I have become very comfortable with the family of apply functions and generally not good at finding an improvement for speed. > > This toy example is small, but my real data has many millions of rows > and the same operations is repeated many times and so finding a less > expensive alternative would be helpful. > > mm <- matrix(rnorm(100), ncol = 10) > rn <- apply(mm, 1, prod)If the real example has only 10 columns, try this:> y <- mm[,1] > for (i in 2:10) y[] <- y*mm[,i] > all.equal(y,rn)If it has many more columns, I would `reach out' to the inline package and write 3 lines of C or Fortran to do the operation. HTH, Chuck
> On Nov 8, 2016, at 7:57 AM, Doran, Harold <HDoran at air.org> wrote: > > Without reaching out to another package in R, I wonder what the best way is to speed enhance the following toy example? Over the years I have become very comfortable with the family of apply functions and generally not good at finding an improvement for speed. > > This toy example is small, but my real data has many millions of rows and the same operations is repeated many times and so finding a less expensive alternative would be helpful. > > mm <- matrix(rnorm(100), ncol = 10) > rn <- apply(mm, 1, prod)I believe you will find that a for-loop is faster. library(microbenchmark) help(pac=microbenchmark) microbenchmark( forloop={ y = mm[,1]; for (i in 2:dim(mm)[2]) y=mm[,i]}, apply = apply(mm,2,prod) ) +---------- Unit: microseconds expr min lq mean median uq max neval cld forloop 10.735 11.6450 15.00425 13.1295 13.7455 115.600 100 a apply 60.775 63.2025 71.95027 64.4530 71.3525 209.309 100 b> > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Dear Harold, If the actual data with which you're dealing are non-negative, you could log all the values, and use colSums() on the logs. That might also have the advantage of greater numerical accuracy than multiplying millions of numbers. Depending on the numbers, the products may be too large or small to be represented. Of course, logs won't work with your toy example, where rnorm() will generate values that are both negative and positive. I hope this helps, John ----------------------------- John Fox, Professor McMaster University Hamilton, Ontario Canada L8S 4M4 web: socserv.mcmaster.ca/jfox ________________________________________ From: R-help [r-help-bounces at r-project.org] on behalf of Doran, Harold [HDoran at air.org] Sent: November 8, 2016 10:57 AM To: r-help at r-project.org Subject: [R] Alternative to apply in base R Without reaching out to another package in R, I wonder what the best way is to speed enhance the following toy example? Over the years I have become very comfortable with the family of apply functions and generally not good at finding an improvement for speed. This toy example is small, but my real data has many millions of rows and the same operations is repeated many times and so finding a less expensive alternative would be helpful. mm <- matrix(rnorm(100), ncol = 10) rn <- apply(mm, 1, prod) [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
It?s a good suggestion. Multiplication in this case is over 7 columns in the data, but the number of rows is millions. Unfortunately, the values are negative as these are actually gauss-quad nodes used to evaluate a multidimensional integral. colSums is better than something like apply(dat, 2, sum); I was hoping there was something similar to colSums/rowSums using prod(). On 11/8/16, 3:00 PM, "Fox, John" <jfox at mcmaster.ca> wrote:>Dear Harold, > >If the actual data with which you're dealing are non-negative, you could >log all the values, and use colSums() on the logs. That might also have >the advantage of greater numerical accuracy than multiplying millions of >numbers. Depending on the numbers, the products may be too large or small >to be represented. Of course, logs won't work with your toy example, >where rnorm() will generate values that are both negative and positive. > >I hope this helps, > John >----------------------------- >John Fox, Professor >McMaster University >Hamilton, Ontario >Canada L8S 4M4 >web: socserv.mcmaster.ca/jfox > > >________________________________________ >From: R-help [r-help-bounces at r-project.org] on behalf of Doran, Harold >[HDoran at air.org] >Sent: November 8, 2016 10:57 AM >To: r-help at r-project.org >Subject: [R] Alternative to apply in base R > >Without reaching out to another package in R, I wonder what the best way >is to speed enhance the following toy example? Over the years I have >become very comfortable with the family of apply functions and generally >not good at finding an improvement for speed. > >This toy example is small, but my real data has many millions of rows and >the same operations is repeated many times and so finding a less >expensive alternative would be helpful. > >mm <- matrix(rnorm(100), ncol = 10) >rn <- apply(mm, 1, prod) > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.