Well, I wish R-help had a ?like? button as I would most certainly like this reply :) As usual, you?re right. I should have added a disclaimer that ?in this instance? there are 7 columns as the function I wrote evaluates an N-dimensional integral and so as the dimensions change, so do the number of columns in this matrix (plus another factor). But the number of columns is never all that large. On 11/8/16, 4:37 PM, "peter dalgaard" <pdalgd at gmail.com> wrote:> >> On 08 Nov 2016, at 21:23 , Doran, Harold <HDoran at air.org> wrote: >> >> It?s a good suggestion. Multiplication in this case is over 7 columns in >> the data, but the number of rows is millions. Unfortunately, the values >> are negative as these are actually gauss-quad nodes used to evaluate a >> multidimensional integral. > >If there really are only 7 cols, then there's also the blindingly obvious > >mm[,1]*mm[,2]*mm[,3]*mm[,4]*mm[,5]*mm[,6]*mm[,7] > >-pd > > >> >> colSums is better than something like apply(dat, 2, sum); I was hoping >> there was something similar to colSums/rowSums using prod(). >> >> On 11/8/16, 3:00 PM, "Fox, John" <jfox at mcmaster.ca> wrote: >> >>> Dear Harold, >>> >>> If the actual data with which you're dealing are non-negative, you >>>could >>> log all the values, and use colSums() on the logs. That might also have >>> the advantage of greater numerical accuracy than multiplying millions >>>of >>> numbers. Depending on the numbers, the products may be too large or >>>small >>> to be represented. Of course, logs won't work with your toy example, >>> where rnorm() will generate values that are both negative and positive. >>> >>> I hope this helps, >>> John >>> ----------------------------- >>> John Fox, Professor >>> McMaster University >>> Hamilton, Ontario >>> Canada L8S 4M4 >>> web: socserv.mcmaster.ca/jfox >>> >>> >>> ________________________________________ >>> From: R-help [r-help-bounces at r-project.org] on behalf of Doran, Harold >>> [HDoran at air.org] >>> Sent: November 8, 2016 10:57 AM >>> To: r-help at r-project.org >>> Subject: [R] Alternative to apply in base R >>> >>> Without reaching out to another package in R, I wonder what the best >>>way >>> is to speed enhance the following toy example? Over the years I have >>> become very comfortable with the family of apply functions and >>>generally >>> not good at finding an improvement for speed. >>> >>> This toy example is small, but my real data has many millions of rows >>>and >>> the same operations is repeated many times and so finding a less >>> expensive alternative would be helpful. >>> >>> mm <- matrix(rnorm(100), ncol = 10) >>> rn <- apply(mm, 1, prod) >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- >Peter Dalgaard, Professor, >Center for Statistics, Copenhagen Business School >Solbjerg Plads 3, 2000 Frederiksberg, Denmark >Phone: (+45)38153501 >Office: A 4.23 >Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > >
The version which allows any number of columns does not take much more time than the one that requires exactly 7 columns. If you have a zillion columns then these are not so good.> f1 <- function(x) x[,1]*x[,2]*x[,3]*x[,4]*x[,5]*x[,6]*x[,7] > f2 <- function(x) {+ val <- rep(1, nrow(x)) + for(i in seq_len(ncol(x))) { + val <- val * x[,i] + } + val + }> z <- matrix(runif(10e6 * 7), ncol=7) > system.time(v1 <- f1(z))user system elapsed 0.686 0.140 0.826> system.time(v2 <- f2(z))user system elapsed 0.663 0.196 0.860> all.equal(v1,v2,tolerance=0)[1] TRUE You might speed up f2 a tad by special-casing the ncol==0, ncol==1, and ncol>1 cases. The versions that call prod() nrow(x) times take about 25 seconds on this machine and dataset. Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Nov 8, 2016 at 1:58 PM, Doran, Harold <HDoran at air.org> wrote:> Well, I wish R-help had a ?like? button as I would most certainly like > this reply :) > > As usual, you?re right. I should have added a disclaimer that ?in this > instance? there are 7 columns as the function I wrote evaluates an > N-dimensional integral and so as the dimensions change, so do the number > of columns in this matrix (plus another factor). But the number of columns > is never all that large. > > > > On 11/8/16, 4:37 PM, "peter dalgaard" <pdalgd at gmail.com> wrote: > > > > >> On 08 Nov 2016, at 21:23 , Doran, Harold <HDoran at air.org> wrote: > >> > >> It?s a good suggestion. Multiplication in this case is over 7 columns in > >> the data, but the number of rows is millions. Unfortunately, the values > >> are negative as these are actually gauss-quad nodes used to evaluate a > >> multidimensional integral. > > > >If there really are only 7 cols, then there's also the blindingly obvious > > > >mm[,1]*mm[,2]*mm[,3]*mm[,4]*mm[,5]*mm[,6]*mm[,7] > > > >-pd > > > > > >> > >> colSums is better than something like apply(dat, 2, sum); I was hoping > >> there was something similar to colSums/rowSums using prod(). > >> > >> On 11/8/16, 3:00 PM, "Fox, John" <jfox at mcmaster.ca> wrote: > >> > >>> Dear Harold, > >>> > >>> If the actual data with which you're dealing are non-negative, you > >>>could > >>> log all the values, and use colSums() on the logs. That might also have > >>> the advantage of greater numerical accuracy than multiplying millions > >>>of > >>> numbers. Depending on the numbers, the products may be too large or > >>>small > >>> to be represented. Of course, logs won't work with your toy example, > >>> where rnorm() will generate values that are both negative and positive. > >>> > >>> I hope this helps, > >>> John > >>> ----------------------------- > >>> John Fox, Professor > >>> McMaster University > >>> Hamilton, Ontario > >>> Canada L8S 4M4 > >>> web: socserv.mcmaster.ca/jfox > >>> > >>> > >>> ________________________________________ > >>> From: R-help [r-help-bounces at r-project.org] on behalf of Doran, Harold > >>> [HDoran at air.org] > >>> Sent: November 8, 2016 10:57 AM > >>> To: r-help at r-project.org > >>> Subject: [R] Alternative to apply in base R > >>> > >>> Without reaching out to another package in R, I wonder what the best > >>>way > >>> is to speed enhance the following toy example? Over the years I have > >>> become very comfortable with the family of apply functions and > >>>generally > >>> not good at finding an improvement for speed. > >>> > >>> This toy example is small, but my real data has many millions of rows > >>>and > >>> the same operations is repeated many times and so finding a less > >>> expensive alternative would be helpful. > >>> > >>> mm <- matrix(rnorm(100), ncol = 10) > >>> rn <- apply(mm, 1, prod) > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >>http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > >-- > >Peter Dalgaard, Professor, > >Center for Statistics, Copenhagen Business School > >Solbjerg Plads 3, 2000 Frederiksberg, Denmark > >Phone: (+45)38153501 > >Office: A 4.23 > >Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > > > > > > > > > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
The speed enhancement using your f2() is remarkable when compared to the apply() method I implemented. In the larger context of my actual problem, this essentially now solves the big computational hog and I can do some real work in a meaningful timeframe as a result. Thank you for all the suggestions to those on this thread. From: William Dunlap [mailto:wdunlap at tibco.com] Sent: Tuesday, November 08, 2016 5:14 PM To: Doran, Harold <HDoran at air.org> Cc: peter dalgaard <pdalgd at gmail.com>; r-help at r-project.org; Fox, John <jfox at mcmaster.ca> Subject: Re: [R] Alternative to apply in base R The version which allows any number of columns does not take much more time than the one that requires exactly 7 columns. If you have a zillion columns then these are not so good.> f1 <- function(x) x[,1]*x[,2]*x[,3]*x[,4]*x[,5]*x[,6]*x[,7] > f2 <- function(x) {+ val <- rep(1, nrow(x)) + for(i in seq_len(ncol(x))) { + val <- val * x[,i] + } + val + }> z <- matrix(runif(10e6 * 7), ncol=7) > system.time(v1 <- f1(z))user system elapsed 0.686 0.140 0.826> system.time(v2 <- f2(z))user system elapsed 0.663 0.196 0.860> all.equal(v1,v2,tolerance=0)[1] TRUE You might speed up f2 a tad by special-casing the ncol==0, ncol==1, and ncol>1 cases. The versions that call prod() nrow(x) times take about 25 seconds on this machine and dataset. Bill Dunlap TIBCO Software wdunlap tibco.com<http://tibco.com> On Tue, Nov 8, 2016 at 1:58 PM, Doran, Harold <HDoran at air.org<mailto:HDoran at air.org>> wrote: Well, I wish R-help had a ?like? button as I would most certainly like this reply :) As usual, you?re right. I should have added a disclaimer that ?in this instance? there are 7 columns as the function I wrote evaluates an N-dimensional integral and so as the dimensions change, so do the number of columns in this matrix (plus another factor). But the number of columns is never all that large. On 11/8/16, 4:37 PM, "peter dalgaard" <pdalgd at gmail.com<mailto:pdalgd at gmail.com>> wrote:> >> On 08 Nov 2016, at 21:23 , Doran, Harold <HDoran at air.org<mailto:HDoran at air.org>> wrote: >> >> It?s a good suggestion. Multiplication in this case is over 7 columns in >> the data, but the number of rows is millions. Unfortunately, the values >> are negative as these are actually gauss-quad nodes used to evaluate a >> multidimensional integral. > >If there really are only 7 cols, then there's also the blindingly obvious > >mm[,1]*mm[,2]*mm[,3]*mm[,4]*mm[,5]*mm[,6]*mm[,7] > >-pd > > >> >> colSums is better than something like apply(dat, 2, sum); I was hoping >> there was something similar to colSums/rowSums using prod(). >> >> On 11/8/16, 3:00 PM, "Fox, John" <jfox at mcmaster.ca<mailto:jfox at mcmaster.ca>> wrote: >> >>> Dear Harold, >>> >>> If the actual data with which you're dealing are non-negative, you >>>could >>> log all the values, and use colSums() on the logs. That might also have >>> the advantage of greater numerical accuracy than multiplying millions >>>of >>> numbers. Depending on the numbers, the products may be too large or >>>small >>> to be represented. Of course, logs won't work with your toy example, >>> where rnorm() will generate values that are both negative and positive. >>> >>> I hope this helps, >>> John >>> ----------------------------- >>> John Fox, Professor >>> McMaster University >>> Hamilton, Ontario >>> Canada L8S 4M4 >>> web: socserv.mcmaster.ca/jfox<http://socserv.mcmaster.ca/jfox> >>> >>> >>> ________________________________________ >>> From: R-help [r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>] on behalf of Doran, Harold >>> [HDoran at air.org<mailto:HDoran at air.org>] >>> Sent: November 8, 2016 10:57 AM >>> To: r-help at r-project.org<mailto:r-help at r-project.org> >>> Subject: [R] Alternative to apply in base R >>> >>> Without reaching out to another package in R, I wonder what the best >>>way >>> is to speed enhance the following toy example? Over the years I have >>> become very comfortable with the family of apply functions and >>>generally >>> not good at finding an improvement for speed. >>> >>> This toy example is small, but my real data has many millions of rows >>>and >>> the same operations is repeated many times and so finding a less >>> expensive alternative would be helpful. >>> >>> mm <- matrix(rnorm(100), ncol = 10) >>> rn <- apply(mm, 1, prod) >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- >Peter Dalgaard, Professor, >Center for Statistics, Copenhagen Business School >Solbjerg Plads 3, 2000 Frederiksberg, Denmark >Phone: (+45)38153501<tel:%28%2B45%2938153501> >Office: A 4.23 >Email: pd.mes at cbs.dk<mailto:pd.mes at cbs.dk> Priv: PDalgd at gmail.com<mailto:PDalgd at gmail.com> > > > > > > > > >______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]