thr3ads.net - R help - [R] fast rowCumsums wanted for calculating the cdf [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Gregor

2010-Oct-12 08:24 UTC

[R] fast rowCumsums wanted for calculating the cdf

Dear all,

I am struggling with a (currently) cost-intensive problem: calculating the
(non-normalized) cumulative distribution function, given the (non-normalized)
probabilities. something like:

probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
F <- t(apply(probs, 1, cumsum)) #SLOOOW!

One (already faster, but for sure not ideal) solution - thanks to Henrik
Bengtsson:

F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
F[,1] <- probs[,1,drop=TRUE];
for (cc in 2:ncol(F)) {
  F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
}

In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step
around
200,000 times, so speed is crucial. I currently can make sure to have no NAs,
but in
order to extend matrixStats, this could be a nontrivial issue.

Any ideas for speeding up this - probably routine - task?

Thanks in advance,
Gregor

Gregor

2010-Oct-15 06:51 UTC

head link

[R] fast rowCumsums wanted for calculating the cdf

Dear all,

Maybe the "easiest" solution: Is there anything that speaks against
generalizing
cumsum from base to cope with matrices (as is done in matlab)? E.g.:

"cumsum(Matrix, 1)"
equivalent to
"apply(Matrix, 1, cumsum)"

The main advantage could be optimized code if the Matrix is extreme nonsquare
(e.g. 100,000x10), but the summation is done over the short side (in this case
10).
apply would practically yield a loop over 100,000 elements, and vectorization
w.r.t.
the long side (loop over 10 elements) provides considerable efficiency gains.

Many regards,
Gregor




On Tue, 12 Oct 2010 10:24:53 +0200
Gregor <mailinglist at gmx.at> wrote:
> Dear all,
> 
> I am struggling with a (currently) cost-intensive problem: calculating the
> (non-normalized) cumulative distribution function, given the
(non-normalized)
> probabilities. something like:
> 
> probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise
probabilites
> F <- t(apply(probs, 1, cumsum)) #SLOOOW!
> 
> One (already faster, but for sure not ideal) solution - thanks to Henrik
Bengtsson:
> 
> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
> F[,1] <- probs[,1,drop=TRUE];
> for (cc in 2:ncol(F)) {
>   F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
> }
> 
> In my case, probs is a (30,000 x 10) matrix, and i need to iterate this
step around
> 200,000 times, so speed is crucial. I currently can make sure to have no
NAs, but
> in order to extend matrixStats, this could be a nontrivial issue.
> 
> Any ideas for speeding up this - probably routine - task?
> 
> Thanks in advance,
> Gregor
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Joshua Wiley

2010-Oct-15 07:23 UTC

head link

[R] fast rowCumsums wanted for calculating the cdf

Hi,

You might look at Reduce().  It seems faster.  I converted the matrix
to a list in an incredibly sloppy way (which you should not emulate)
because I cannot think of  the simple way.
> probs <- t(matrix(rep(1:10000000), nrow=10)) # matrix with row-wise
probabilites
> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
> F[,1] <- probs[,1,drop=TRUE];
> add <- function(x) {Reduce(`+`, x, accumulate = TRUE)}
>
>
> system.time(F.slow <- t(apply(probs, 1, cumsum)))   user  system elapsed
 36.758   0.416  42.464>
> system.time(for (cc in 2:ncol(F)) {+  F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
+ })
   user  system elapsed
  0.980   0.196   1.328>
> system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5],
probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))   user  system elapsed
  0.420   0.072   0.539>
.539 seconds for 10 vectors each with 1 million elements does not seem
bad.  For 30000 each, on my system it took .014 seconds, which for
200,000 iterations, I estimated as:
> (200000*.014)/60/60[1] 0.7777778

(and this is on a stone age system with a single core processor and
only 756MB of RAM)

It should not be difficult to get the list back to a matrix.

Cheers,

Josh



On Thu, Oct 14, 2010 at 11:51 PM, Gregor <mailinglist at gmx.at>
wrote:> Dear all,
>
> Maybe the "easiest" solution: Is there anything that speaks
against generalizing
> cumsum from base to cope with matrices (as is done in matlab)? E.g.:
>
> "cumsum(Matrix, 1)"
> equivalent to
> "apply(Matrix, 1, cumsum)"
>
> The main advantage could be optimized code if the Matrix is extreme
nonsquare
> (e.g. 100,000x10), but the summation is done over the short side (in this
case 10).
> apply would practically yield a loop over 100,000 elements, and
vectorization w.r.t.
> the long side (loop over 10 elements) provides considerable efficiency
gains.
>
> Many regards,
> Gregor
>
>
>
>
> On Tue, 12 Oct 2010 10:24:53 +0200
> Gregor <mailinglist at gmx.at> wrote:
>
>> Dear all,
>>
>> I am struggling with a (currently) cost-intensive problem: calculating
the
>> (non-normalized) cumulative distribution function, given the
(non-normalized)
>> probabilities. something like:
>>
>> probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise
probabilites
>> F <- t(apply(probs, 1, cumsum)) #SLOOOW!
>>
>> One (already faster, but for sure not ideal) solution - thanks to
Henrik Bengtsson:
>>
>> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
>> F[,1] <- probs[,1,drop=TRUE];
>> for (cc in 2:ncol(F)) {
>> ? F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
>> }
>>
>> In my case, probs is a (30,000 x 10) matrix, and i need to iterate this
step around
>> 200,000 times, so speed is crucial. I currently can make sure to have
no NAs, but
>> in order to extend matrixStats, this could be a nontrivial issue.
>>
>> Any ideas for speeding up this - probably routine - task?
>>
>> Thanks in advance,
>> Gregor
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Joshua Wiley

2010-Oct-15 08:17 UTC

head link

[R] fast rowCumsums wanted for calculating the cdf

On Fri, Oct 15, 2010 at 12:23 AM, Joshua Wiley <jwiley.psych at gmail.com>
wrote:>
> Hi,
>
> You might look at Reduce(). ?It seems faster. ?I converted the matrix
> to a list in an incredibly sloppy way (which you should not emulate)
> because I cannot think of ?the simple way.
Dennis provided the answer:  system.time(add(unclass(as.data.frame(probs))))

I probably could have suggested Reduce days ago if I was anywhere near
that efficient in my coding....
>
> > probs <- t(matrix(rep(1:10000000), nrow=10)) # matrix with row-wise
probabilites
> > F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
> > F[,1] <- probs[,1,drop=TRUE];
> > add <- function(x) {Reduce(`+`, x, accumulate = TRUE)}
> >
> >
> > system.time(F.slow <- t(apply(probs, 1, cumsum)))
> ? user ?system elapsed
> ?36.758 ? 0.416 ?42.464
> >
> > system.time(for (cc in 2:ncol(F)) {
> + ?F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
> + })
> ? user ?system elapsed
> ?0.980 ? 0.196 ? 1.328
> >
> > system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4],
probs[,5], probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
> ? user ?system elapsed
> ?0.420 ? 0.072 ? 0.539
Josh

R help - Oct 2010 - fast rowCumsums wanted for calculating the cdf

[R] fast rowCumsums wanted for calculating the cdf

[R] fast rowCumsums wanted for calculating the cdf

[R] fast rowCumsums wanted for calculating the cdf

[R] fast rowCumsums wanted for calculating the cdf