thr3ads.net - R help - [R] speeding up applying hist() over rows of a matrix [May 2014]

If this information is useful, please help other people find it:
Share via:

Ortiz-Bobea, Ariel

2014-May-01 19:48 UTC

[R] speeding up applying hist() over rows of a matrix

Hello everyone,



I'm trying to construct bins for each row in a matrix. I'm using apply()
in combination with hist() to do this. Performing this binning for a 10K-by-50
matrix takes about 5 seconds, but only 0.5 seconds for a 1K-by-500 matrix. This
suggests the bottleneck is accessing rows in apply() rather than the
calculations going on inside hist().



My initial idea is to process as many columns (as make sense for the intended
use) at once. However, I still have many many rows to process and I would
appreciate any feedback on how to speed this up.



Any thoughts?



Thanks,



Ariel



Here is the illustration:



# create data

m1 <- matrix(10*rnorm(50*10^4), ncol=50)

m2 <- matrix(10*rnorm(50*10^4), ncol=500)



# compute bins

bins <- seq(-100,100,1)

system.time({ out1 <- t(apply(m1,1, function(x) hist(x,breaks=bins,
plot=FALSE)$counts)) })

system.time({ out2 <- t(apply(m2,1, function(x) hist(x,breaks=bins,
plot=FALSE)$counts)) })


---
Ariel Ortiz-Bobea
Fellow
Resources for the Future
1616 P Street, N.W.
Washington, DC 20036

	[[alternative HTML version deleted]]

William Dunlap

2014-May-02 16:23 UTC

head link

[R] speeding up applying hist() over rows of a matrix

Your original code, as a function of 'm' and 'bins' is
f0 <- function (m, bins) {
    t(apply(m, 1, function(x) hist(x, breaks = bins, plot = FALSE)$counts))
}
and the time it takes to run on your m1 is about 5 s. on my
machine> system.time(r0 <- f0(m1,bins))   user  system elapsed
   4.95    0.00    5.02


hist(x,breaks=bins) is essentially tabulate(cut(x,bins),nbins=length(bins)-1).
See how much it speeds things up by replacing hist() with tabulate(cut()):
f1 <- function (m, bins)
{
    nbins <- length(bins) - 1L
    t(apply(m, 1, function(x) tabulate(cut(x, bins), nbins = nbins)))
}
That doesn't help with the time, but it does give the same
output> system.time(r1 <- f1(m1,bins))   user  system elapsed
   4.85    0.10    5.35> identical(r0, r1)[1] TRUE

Now try speeding it up by calling cut() on the whole matrix first
and then applying tabulate to each row, as in
f2 <- function (m, bins)  {
    nbins <- length(bins) - 1L
    m <- array(as.integer(cut(m, bins)), dim = dim(m))
    t(apply(m, 1, tabulate, nbins = nbins))
}
That saves quite a bit of time and gives the same output> system.time(r2 <- f2(m1,bins))   user  system elapsed
   0.25    0.00    0.25> identical(r0, r2)[1] TRUE

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, May 1, 2014 at 12:48 PM, Ortiz-Bobea, Ariel <Ortiz-Bobea at
rff.org> wrote:> Hello everyone,
>
>
>
> I'm trying to construct bins for each row in a matrix. I'm using
apply() in combination with hist() to do this. Performing this binning for a
10K-by-50 matrix takes about 5 seconds, but only 0.5 seconds for a 1K-by-500
matrix. This suggests the bottleneck is accessing rows in apply() rather than
the calculations going on inside hist().
>
>
>
> My initial idea is to process as many columns (as make sense for the
intended use) at once. However, I still have many many rows to process and I
would appreciate any feedback on how to speed this up.
>
>
>
> Any thoughts?
>
>
>
> Thanks,
>
>
>
> Ariel
>
>
>
> Here is the illustration:
>
>
>
> # create data
>
> m1 <- matrix(10*rnorm(50*10^4), ncol=50)
>
> m2 <- matrix(10*rnorm(50*10^4), ncol=500)
>
>
>
> # compute bins
>
> bins <- seq(-100,100,1)
>
> system.time({ out1 <- t(apply(m1,1, function(x) hist(x,breaks=bins,
plot=FALSE)$counts)) })
>
> system.time({ out2 <- t(apply(m2,1, function(x) hist(x,breaks=bins,
plot=FALSE)$counts)) })
>
>
> ---
> Ariel Ortiz-Bobea
> Fellow
> Resources for the Future
> 1616 P Street, N.W.
> Washington, DC 20036
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - May 2014 - speeding up applying hist() over rows of a matrix

[R] speeding up applying hist() over rows of a matrix

[R] speeding up applying hist() over rows of a matrix