thr3ads.net - R help - [R] How to make the "apply" faster [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Debasish Pai Mazumder

2016-Jul-10 20:38 UTC

[R] How to make the "apply" faster

Thanks for your response. It is faster than before but still very slow. Any
other suggestion ?
-Deb


On Sun, Jul 10, 2016 at 2:13 PM, William Dunlap <wdunlap at tibco.com>
wrote:
> There is no need to test that a logical equals TRUE:
'logicalVector==TRUE'
> is the
> same as just 'logicalVector'.
>
> There is no need to convert logical vectors to numeric, since rle() works
> on both
> types.
>
> There is no need to use length(subset(x, logicalVector)) to count how many
> elements
> in logicalVector are TRUE, just use sum(logicalVector).
>
> There is no need to make a variable, 'ans', then immediately return
it.
>
> Hence your
>
>     b[b == TRUE] = 1
>     y <- rle(b)
>     ans <- length(subset(y$lengths[y$values == 1], y$lengths[y$values
=> 1] >= 2))
>     return(ans)
>
> could be replaced by
>
>     y <- rle(b)
>     sum(y$lengths[y$values] >= 2)
>
> This gives some speedup, mainly for long vectors, but I find it more
> understandable.
> E.g., if f1 is your original function and f2 has the above replacement I
> get:
>   > d <- -sin(1:10000+sqrt(1:4))
>   > system.time(for(i in 1:10000)f1(d,.3))
>      user  system elapsed
>      5.19    0.00    5.19
>   > system.time(for(i in 1:10000)f2(d,.3))
>      user  system elapsed
>      3.65    0.00    3.65
>   > c(f1(d,.3), f2(d,.3))
>   [1] 1492 1492
>   > length(d)
>   [1] 10000
>
> If it were my function, I would also get rid of the part that deals with
> the threshhold
> and direction of the inequality and tell the user to to use f(data <=
0.3)
> instead of
> f(data, .3, "below").  I would also make the spell length an
argument
> instead of
> fixing it at 2.  E.g.
>
>    > f3 <- function (condition, spellLength = 2)
>    {
>        stopifnot(is.logical(condition), !anyNA(condition))
>        y <- rle(condition)
>        sum(y$lengths[y$values] >= spellLength)
>    }
>    > f3( d >= .3 )
>    [1] 1492
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sun, Jul 10, 2016 at 11:58 AM, Debasish Pai Mazumder <pai1981 at
gmail.com
> > wrote:
>
>> Hi Everyone,
>> Thanks for your help. It works. I have similar problem when I am
>> calculating number of spell.
>> I am also calculation spell (definition: period of two or more days
where
>> x
>> exceeds 70) using similar way:
>>
>> *new = apply(x,c(1,2,4),FUN=function(y) {fun.spell.deb(y, 70)})*
>>
>> where fun.spell.deb.R:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *## Calculate spell durationfun.spell.deb <- function(data,
threshold = 1,
>> direction = c("above", "below")){  #coln <-
grep(weather, names(data))#
>> var <- data[,8]  if(missing(direction)) {direction <-
"above"}
>> if(direction=="below") {b <- (data <= threshold)} else 
{b <- (data >>> threshold)}    b[b==TRUE] = 1  y <-rle(b)  ans
>> <-length(subset((y$lengths[y$values==1]),
(y$lengths[y$values==1])>=2))
>> return(ans)}*
>>
>> Do you have any idea how to make the "apply" faster here?
>>
>> -Deb
>>
>>
>> On Sat, Jul 9, 2016 at 3:46 PM, Charles C. Berry <ccberry at
ucsd.edu>
>> wrote:
>>
>> > On Sat, 9 Jul 2016, Debasish Pai Mazumder wrote:
>> >
>> > I have 4-dimension array x(lat,lon,time,var)
>> >>
>> >> I am using "apply" to calculate over time
>> >> new = apply(x,c(1,2,4),FUN=function(y)
{length(which(y>=70))})
>> >>
>> >> This is very slow. Is there anyway make it faster?
>> >>
>> >
>> > If dim(x)[3] << prod(dim(x)[-3]),
>> >
>> > new <-  Reduce("+",lapply(1:dim(x)[3],function(z)
x[,,z,]>=70))
>> >
>> > will be faster.
>> >
>> > However, if you can follow Peter Langfelder's suggestion to
use rowSums,
>> > that would be best. Even using
rowSums(aperm(x,c(1,2,4,3)>=70,dims=3)
>> and
>> > paying the price of aperm() might be better.
>> >
>> > Chuck
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
	[[alternative HTML version deleted]]

William Dunlap

2016-Jul-11 16:42 UTC

head link

[R] How to make the "apply" faster

How fast is fast enough and what size and shape is your dataset
(show the output of str(yourData))?  You will get the fastest execution
time by using C or C++ or Fortran, but you will want to parameterize
the problem well enough that you can amortize the time it takes to
write the code over many problems.

Peter Langflder's suggested you use aperm followed by colSums
for your earlier problem.   For this one you can use aperm followed
by filter (to identify the runs of a given minimum length, column by
column) and then use colSums (to count the number of runs filter
identifies in each column).  E.g.,

  f4 <- function (condition, spellLength = 2)
  {
      # obscure way to count runs of length>spellLength
      # in 3'rd dimension of logical array 'condition'.
      stopifnot(is.logical(condition), !anyNA(condition))
      coef <- c(-1, rep(1, spellLength))
      d <- dim(condition)
      dn <- dimnames(condition)
      tmp <- array(aperm(condition * 2 - 1, c(3, 1, 2, 4)), c(d[3],
          prod(d[-3])))
      fTmp <- filter(rbind(tmp, -1), coef, sides = 1)
     sfTmp <- colSums(fTmp == spellLength + 1, na.rm = TRUE)
     array(sfTmp, dim = d[-3], dimnames = dn[-3])
  }

f4(x>=1) is not a great deal faster (48 s. vs. 67 s.) than
apply(x>=1, c(1,2,4), FUN=f3) where f3 is
  f3 <- function (condition, spellLength = 2)
  {
      stopifnot(is.logical(condition), !anyNA(condition))
      y <- rle(condition)
      sum(y$lengths[y$values] >= spellLength)
  }
and where x has dimensions c(101,107,17,103).

The relative speed will depend on the size and shape of your dataset,
so show the output of str(yourData).





Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sun, Jul 10, 2016 at 1:38 PM, Debasish Pai Mazumder <pai1981 at
gmail.com>
wrote:
> Thanks for your response. It is faster than before but still very slow.
> Any other suggestion ?
> -Deb
>
>
> On Sun, Jul 10, 2016 at 2:13 PM, William Dunlap <wdunlap at
tibco.com> wrote:
>
>> There is no need to test that a logical equals TRUE:
>> 'logicalVector==TRUE' is the
>> same as just 'logicalVector'.
>>
>> There is no need to convert logical vectors to numeric, since rle()
works
>> on both
>> types.
>>
>> There is no need to use length(subset(x, logicalVector)) to count how
>> many elements
>> in logicalVector are TRUE, just use sum(logicalVector).
>>
>> There is no need to make a variable, 'ans', then immediately
return it.
>>
>> Hence your
>>
>>     b[b == TRUE] = 1
>>     y <- rle(b)
>>     ans <- length(subset(y$lengths[y$values == 1],
y$lengths[y$values =>> 1] >= 2))
>>     return(ans)
>>
>> could be replaced by
>>
>>     y <- rle(b)
>>     sum(y$lengths[y$values] >= 2)
>>
>> This gives some speedup, mainly for long vectors, but I find it more
>> understandable.
>> E.g., if f1 is your original function and f2 has the above replacement
I
>> get:
>>   > d <- -sin(1:10000+sqrt(1:4))
>>   > system.time(for(i in 1:10000)f1(d,.3))
>>      user  system elapsed
>>      5.19    0.00    5.19
>>   > system.time(for(i in 1:10000)f2(d,.3))
>>      user  system elapsed
>>      3.65    0.00    3.65
>>   > c(f1(d,.3), f2(d,.3))
>>   [1] 1492 1492
>>   > length(d)
>>   [1] 10000
>>
>> If it were my function, I would also get rid of the part that deals
with
>> the threshhold
>> and direction of the inequality and tell the user to to use f(data
<>> 0.3) instead of
>> f(data, .3, "below").  I would also make the spell length an
argument
>> instead of
>> fixing it at 2.  E.g.
>>
>>    > f3 <- function (condition, spellLength = 2)
>>    {
>>        stopifnot(is.logical(condition), !anyNA(condition))
>>        y <- rle(condition)
>>        sum(y$lengths[y$values] >= spellLength)
>>    }
>>    > f3( d >= .3 )
>>    [1] 1492
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Sun, Jul 10, 2016 at 11:58 AM, Debasish Pai Mazumder <
>> pai1981 at gmail.com> wrote:
>>
>>> Hi Everyone,
>>> Thanks for your help. It works. I have similar problem when I am
>>> calculating number of spell.
>>> I am also calculation spell (definition: period of two or more days
>>> where x
>>> exceeds 70) using similar way:
>>>
>>> *new = apply(x,c(1,2,4),FUN=function(y) {fun.spell.deb(y, 70)})*
>>>
>>> where fun.spell.deb.R:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *## Calculate spell durationfun.spell.deb <- function(data,
threshold >>> 1,
>>> direction = c("above", "below")){  #coln <-
grep(weather, names(data))#
>>> var <- data[,8]  if(missing(direction)) {direction <-
"above"}
>>> if(direction=="below") {b <- (data <= threshold)}
else  {b <- (data >>>> threshold)}    b[b==TRUE] = 1  y
<-rle(b)  ans
>>> <-length(subset((y$lengths[y$values==1]),
(y$lengths[y$values==1])>=2))
>>> return(ans)}*
>>>
>>> Do you have any idea how to make the "apply" faster here?
>>>
>>> -Deb
>>>
>>>
>>> On Sat, Jul 9, 2016 at 3:46 PM, Charles C. Berry <ccberry at
ucsd.edu>
>>> wrote:
>>>
>>> > On Sat, 9 Jul 2016, Debasish Pai Mazumder wrote:
>>> >
>>> > I have 4-dimension array x(lat,lon,time,var)
>>> >>
>>> >> I am using "apply" to calculate over time
>>> >> new = apply(x,c(1,2,4),FUN=function(y)
{length(which(y>=70))})
>>> >>
>>> >> This is very slow. Is there anyway make it faster?
>>> >>
>>> >
>>> > If dim(x)[3] << prod(dim(x)[-3]),
>>> >
>>> > new <-  Reduce("+",lapply(1:dim(x)[3],function(z)
x[,,z,]>=70))
>>> >
>>> > will be faster.
>>> >
>>> > However, if you can follow Peter Langfelder's suggestion
to use
>>> rowSums,
>>> > that would be best. Even using
rowSums(aperm(x,c(1,2,4,3)>=70,dims=3)
>>> and
>>> > paying the price of aperm() might be better.
>>> >
>>> > Chuck
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
	[[alternative HTML version deleted]]

William Dunlap

2016-Jul-11 20:32 UTC

head link

[R] How to make the "apply" faster

If you use Rcpp::Rcpp.package.skeleton() to make a package out
of the attached C++ code you can speed up the run counting quite
a bit - to 1.4 s. from 48 s. for the 101 x 107 x 17 x 103 example.

The package you make will define an R function called CountColumnRuns
that computes the number of runs of TRUE of a given minimum length
for each column in a matrix.  You can adapt it to your 4-dimensional array
with

f5 <- function (condition, spellLength = 2)
{
    stopifnot(is.logical(condition), !anyNA(condition))
    d <- dim(condition)
    dn <- dimnames(condition)
    tmp <- array(aperm(condition, c(3, 1, 2, 4)), c(d[3], prod(d[-3])))
    runCounts <- RcppCountRuns::CountColumnRuns(tmp, spellLength)
    array(runCounts, dim = d[-3], dimnames = dn[-3])
}

and call it as

f5( x > 1, spellLength=2)

to get the counts of all runs of length at least 2 in the 3rd dimension of
your 4-d array.



#include <Rcpp.h>

int CountRunsRaw(const int *x, int n, int minRun)
{
    int count(0);
    int runLength(0);
    const int* lastX(x + n);
    for( ; x != lastX ; x++)
    {
        if (*x)
        {
            runLength++;
        }
        else
        {
            if (runLength >= minRun)
            {
                count++;
            }
            runLength = 0;
        }
    }
    if (runLength >= minRun)
    {
        count++;
    }
    return count;
}

// [[Rcpp::export]]
int CountRuns(const Rcpp::LogicalVector& x, int minRun)
{
    return CountRunsRaw(&x[0], x.size(), minRun);
}
// [[Rcpp::export]]
Rcpp::IntegerVector CountColumnRuns(const Rcpp::LogicalMatrix& x, int
minRun)
{
    Rcpp::IntegerVector out(x.ncol());
    for(int i = 0 ; i < x.ncol() ; i++)
    {
        out[i] = CountRuns(x(Rcpp::_, i), minRun);
    }
    return out;
}


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Jul 11, 2016 at 9:42 AM, William Dunlap <wdunlap at tibco.com>
wrote:
> How fast is fast enough and what size and shape is your dataset
> (show the output of str(yourData))?  You will get the fastest execution
> time by using C or C++ or Fortran, but you will want to parameterize
> the problem well enough that you can amortize the time it takes to
> write the code over many problems.
>
> Peter Langflder's suggested you use aperm followed by colSums
> for your earlier problem.   For this one you can use aperm followed
> by filter (to identify the runs of a given minimum length, column by
> column) and then use colSums (to count the number of runs filter
> identifies in each column).  E.g.,
>
>   f4 <- function (condition, spellLength = 2)
>   {
>       # obscure way to count runs of length>spellLength
>       # in 3'rd dimension of logical array 'condition'.
>       stopifnot(is.logical(condition), !anyNA(condition))
>       coef <- c(-1, rep(1, spellLength))
>       d <- dim(condition)
>       dn <- dimnames(condition)
>       tmp <- array(aperm(condition * 2 - 1, c(3, 1, 2, 4)), c(d[3],
>           prod(d[-3])))
>       fTmp <- filter(rbind(tmp, -1), coef, sides = 1)
>      sfTmp <- colSums(fTmp == spellLength + 1, na.rm = TRUE)
>      array(sfTmp, dim = d[-3], dimnames = dn[-3])
>   }
>
> f4(x>=1) is not a great deal faster (48 s. vs. 67 s.) than
> apply(x>=1, c(1,2,4), FUN=f3) where f3 is
>   f3 <- function (condition, spellLength = 2)
>   {
>       stopifnot(is.logical(condition), !anyNA(condition))
>       y <- rle(condition)
>       sum(y$lengths[y$values] >= spellLength)
>   }
> and where x has dimensions c(101,107,17,103).
>
> The relative speed will depend on the size and shape of your dataset,
> so show the output of str(yourData).
>
>
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sun, Jul 10, 2016 at 1:38 PM, Debasish Pai Mazumder <pai1981 at
gmail.com>
> wrote:
>
>> Thanks for your response. It is faster than before but still very slow.
>> Any other suggestion ?
>> -Deb
>>
>>
>> On Sun, Jul 10, 2016 at 2:13 PM, William Dunlap <wdunlap at
tibco.com>
>> wrote:
>>
>>> There is no need to test that a logical equals TRUE:
>>> 'logicalVector==TRUE' is the
>>> same as just 'logicalVector'.
>>>
>>> There is no need to convert logical vectors to numeric, since rle()
>>> works on both
>>> types.
>>>
>>> There is no need to use length(subset(x, logicalVector)) to count
how
>>> many elements
>>> in logicalVector are TRUE, just use sum(logicalVector).
>>>
>>> There is no need to make a variable, 'ans', then
immediately return it.
>>>
>>> Hence your
>>>
>>>     b[b == TRUE] = 1
>>>     y <- rle(b)
>>>     ans <- length(subset(y$lengths[y$values == 1],
y$lengths[y$values =>>> 1] >= 2))
>>>     return(ans)
>>>
>>> could be replaced by
>>>
>>>     y <- rle(b)
>>>     sum(y$lengths[y$values] >= 2)
>>>
>>> This gives some speedup, mainly for long vectors, but I find it
more
>>> understandable.
>>> E.g., if f1 is your original function and f2 has the above
replacement I
>>> get:
>>>   > d <- -sin(1:10000+sqrt(1:4))
>>>   > system.time(for(i in 1:10000)f1(d,.3))
>>>      user  system elapsed
>>>      5.19    0.00    5.19
>>>   > system.time(for(i in 1:10000)f2(d,.3))
>>>      user  system elapsed
>>>      3.65    0.00    3.65
>>>   > c(f1(d,.3), f2(d,.3))
>>>   [1] 1492 1492
>>>   > length(d)
>>>   [1] 10000
>>>
>>> If it were my function, I would also get rid of the part that deals
with
>>> the threshhold
>>> and direction of the inequality and tell the user to to use f(data
<>>> 0.3) instead of
>>> f(data, .3, "below").  I would also make the spell length
an argument
>>> instead of
>>> fixing it at 2.  E.g.
>>>
>>>    > f3 <- function (condition, spellLength = 2)
>>>    {
>>>        stopifnot(is.logical(condition), !anyNA(condition))
>>>        y <- rle(condition)
>>>        sum(y$lengths[y$values] >= spellLength)
>>>    }
>>>    > f3( d >= .3 )
>>>    [1] 1492
>>>
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Sun, Jul 10, 2016 at 11:58 AM, Debasish Pai Mazumder <
>>> pai1981 at gmail.com> wrote:
>>>
>>>> Hi Everyone,
>>>> Thanks for your help. It works. I have similar problem when I
am
>>>> calculating number of spell.
>>>> I am also calculation spell (definition: period of two or more
days
>>>> where x
>>>> exceeds 70) using similar way:
>>>>
>>>> *new = apply(x,c(1,2,4),FUN=function(y) {fun.spell.deb(y,
70)})*
>>>>
>>>> where fun.spell.deb.R:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *## Calculate spell durationfun.spell.deb <- function(data,
threshold >>>> 1,
>>>> direction = c("above", "below")){  #coln
<- grep(weather, names(data))#
>>>> var <- data[,8]  if(missing(direction)) {direction <-
"above"}
>>>> if(direction=="below") {b <- (data <=
threshold)} else  {b <- (data >>>>> threshold)}    b[b==TRUE]
= 1  y <-rle(b)  ans
>>>> <-length(subset((y$lengths[y$values==1]),
(y$lengths[y$values==1])>=2))
>>>> return(ans)}*
>>>>
>>>> Do you have any idea how to make the "apply" faster
here?
>>>>
>>>> -Deb
>>>>
>>>>
>>>> On Sat, Jul 9, 2016 at 3:46 PM, Charles C. Berry <ccberry at
ucsd.edu>
>>>> wrote:
>>>>
>>>> > On Sat, 9 Jul 2016, Debasish Pai Mazumder wrote:
>>>> >
>>>> > I have 4-dimension array x(lat,lon,time,var)
>>>> >>
>>>> >> I am using "apply" to calculate over time
>>>> >> new = apply(x,c(1,2,4),FUN=function(y)
{length(which(y>=70))})
>>>> >>
>>>> >> This is very slow. Is there anyway make it faster?
>>>> >>
>>>> >
>>>> > If dim(x)[3] << prod(dim(x)[-3]),
>>>> >
>>>> > new <- 
Reduce("+",lapply(1:dim(x)[3],function(z) x[,,z,]>=70))
>>>> >
>>>> > will be faster.
>>>> >
>>>> > However, if you can follow Peter Langfelder's
suggestion to use
>>>> rowSums,
>>>> > that would be best. Even using
rowSums(aperm(x,c(1,2,4,3)>=70,dims=3)
>>>> and
>>>> > paying the price of aperm() might be better.
>>>> >
>>>> > Chuck
>>>> >
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
>>>
>>
>
	[[alternative HTML version deleted]]

R help - Jul 2016 - How to make the "apply" faster

[R] How to make the "apply" faster

[R] How to make the "apply" faster

[R] How to make the "apply" faster