Hi Everyone, Thanks for your help. It works. I have similar problem when I am calculating number of spell. I am also calculation spell (definition: period of two or more days where x exceeds 70) using similar way: *new = apply(x,c(1,2,4),FUN=function(y) {fun.spell.deb(y, 70)})* where fun.spell.deb.R: *## Calculate spell durationfun.spell.deb <- function(data, threshold = 1, direction = c("above", "below")){ #coln <- grep(weather, names(data))# var <- data[,8] if(missing(direction)) {direction <- "above"} if(direction=="below") {b <- (data <= threshold)} else {b <- (data >threshold)} b[b==TRUE] = 1 y <-rle(b) ans <-length(subset((y$lengths[y$values==1]), (y$lengths[y$values==1])>=2)) return(ans)}* Do you have any idea how to make the "apply" faster here? -Deb On Sat, Jul 9, 2016 at 3:46 PM, Charles C. Berry <ccberry at ucsd.edu> wrote:> On Sat, 9 Jul 2016, Debasish Pai Mazumder wrote: > > I have 4-dimension array x(lat,lon,time,var) >> >> I am using "apply" to calculate over time >> new = apply(x,c(1,2,4),FUN=function(y) {length(which(y>=70))}) >> >> This is very slow. Is there anyway make it faster? >> > > If dim(x)[3] << prod(dim(x)[-3]), > > new <- Reduce("+",lapply(1:dim(x)[3],function(z) x[,,z,]>=70)) > > will be faster. > > However, if you can follow Peter Langfelder's suggestion to use rowSums, > that would be best. Even using rowSums(aperm(x,c(1,2,4,3)>=70,dims=3) and > paying the price of aperm() might be better. > > Chuck >[[alternative HTML version deleted]]
There is no need to test that a logical equals TRUE: 'logicalVector==TRUE' is the same as just 'logicalVector'. There is no need to convert logical vectors to numeric, since rle() works on both types. There is no need to use length(subset(x, logicalVector)) to count how many elements in logicalVector are TRUE, just use sum(logicalVector). There is no need to make a variable, 'ans', then immediately return it. Hence your b[b == TRUE] = 1 y <- rle(b) ans <- length(subset(y$lengths[y$values == 1], y$lengths[y$values == 1]>= 2))return(ans) could be replaced by y <- rle(b) sum(y$lengths[y$values] >= 2) This gives some speedup, mainly for long vectors, but I find it more understandable. E.g., if f1 is your original function and f2 has the above replacement I get: > d <- -sin(1:10000+sqrt(1:4)) > system.time(for(i in 1:10000)f1(d,.3)) user system elapsed 5.19 0.00 5.19 > system.time(for(i in 1:10000)f2(d,.3)) user system elapsed 3.65 0.00 3.65 > c(f1(d,.3), f2(d,.3)) [1] 1492 1492 > length(d) [1] 10000 If it were my function, I would also get rid of the part that deals with the threshhold and direction of the inequality and tell the user to to use f(data <= 0.3) instead of f(data, .3, "below"). I would also make the spell length an argument instead of fixing it at 2. E.g. > f3 <- function (condition, spellLength = 2) { stopifnot(is.logical(condition), !anyNA(condition)) y <- rle(condition) sum(y$lengths[y$values] >= spellLength) } > f3( d >= .3 ) [1] 1492 Bill Dunlap TIBCO Software wdunlap tibco.com On Sun, Jul 10, 2016 at 11:58 AM, Debasish Pai Mazumder <pai1981 at gmail.com> wrote:> Hi Everyone, > Thanks for your help. It works. I have similar problem when I am > calculating number of spell. > I am also calculation spell (definition: period of two or more days where x > exceeds 70) using similar way: > > *new = apply(x,c(1,2,4),FUN=function(y) {fun.spell.deb(y, 70)})* > > where fun.spell.deb.R: > > > > > > > > > > > > > > > > *## Calculate spell durationfun.spell.deb <- function(data, threshold = 1, > direction = c("above", "below")){ #coln <- grep(weather, names(data))# > var <- data[,8] if(missing(direction)) {direction <- "above"} > if(direction=="below") {b <- (data <= threshold)} else {b <- (data >> threshold)} b[b==TRUE] = 1 y <-rle(b) ans > <-length(subset((y$lengths[y$values==1]), (y$lengths[y$values==1])>=2)) > return(ans)}* > > Do you have any idea how to make the "apply" faster here? > > -Deb > > > On Sat, Jul 9, 2016 at 3:46 PM, Charles C. Berry <ccberry at ucsd.edu> wrote: > > > On Sat, 9 Jul 2016, Debasish Pai Mazumder wrote: > > > > I have 4-dimension array x(lat,lon,time,var) > >> > >> I am using "apply" to calculate over time > >> new = apply(x,c(1,2,4),FUN=function(y) {length(which(y>=70))}) > >> > >> This is very slow. Is there anyway make it faster? > >> > > > > If dim(x)[3] << prod(dim(x)[-3]), > > > > new <- Reduce("+",lapply(1:dim(x)[3],function(z) x[,,z,]>=70)) > > > > will be faster. > > > > However, if you can follow Peter Langfelder's suggestion to use rowSums, > > that would be best. Even using rowSums(aperm(x,c(1,2,4,3)>=70,dims=3) and > > paying the price of aperm() might be better. > > > > Chuck > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thanks for your response. It is faster than before but still very slow. Any other suggestion ? -Deb On Sun, Jul 10, 2016 at 2:13 PM, William Dunlap <wdunlap at tibco.com> wrote:> There is no need to test that a logical equals TRUE: 'logicalVector==TRUE' > is the > same as just 'logicalVector'. > > There is no need to convert logical vectors to numeric, since rle() works > on both > types. > > There is no need to use length(subset(x, logicalVector)) to count how many > elements > in logicalVector are TRUE, just use sum(logicalVector). > > There is no need to make a variable, 'ans', then immediately return it. > > Hence your > > b[b == TRUE] = 1 > y <- rle(b) > ans <- length(subset(y$lengths[y$values == 1], y$lengths[y$values => 1] >= 2)) > return(ans) > > could be replaced by > > y <- rle(b) > sum(y$lengths[y$values] >= 2) > > This gives some speedup, mainly for long vectors, but I find it more > understandable. > E.g., if f1 is your original function and f2 has the above replacement I > get: > > d <- -sin(1:10000+sqrt(1:4)) > > system.time(for(i in 1:10000)f1(d,.3)) > user system elapsed > 5.19 0.00 5.19 > > system.time(for(i in 1:10000)f2(d,.3)) > user system elapsed > 3.65 0.00 3.65 > > c(f1(d,.3), f2(d,.3)) > [1] 1492 1492 > > length(d) > [1] 10000 > > If it were my function, I would also get rid of the part that deals with > the threshhold > and direction of the inequality and tell the user to to use f(data <= 0.3) > instead of > f(data, .3, "below"). I would also make the spell length an argument > instead of > fixing it at 2. E.g. > > > f3 <- function (condition, spellLength = 2) > { > stopifnot(is.logical(condition), !anyNA(condition)) > y <- rle(condition) > sum(y$lengths[y$values] >= spellLength) > } > > f3( d >= .3 ) > [1] 1492 > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Sun, Jul 10, 2016 at 11:58 AM, Debasish Pai Mazumder <pai1981 at gmail.com > > wrote: > >> Hi Everyone, >> Thanks for your help. It works. I have similar problem when I am >> calculating number of spell. >> I am also calculation spell (definition: period of two or more days where >> x >> exceeds 70) using similar way: >> >> *new = apply(x,c(1,2,4),FUN=function(y) {fun.spell.deb(y, 70)})* >> >> where fun.spell.deb.R: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> *## Calculate spell durationfun.spell.deb <- function(data, threshold = 1, >> direction = c("above", "below")){ #coln <- grep(weather, names(data))# >> var <- data[,8] if(missing(direction)) {direction <- "above"} >> if(direction=="below") {b <- (data <= threshold)} else {b <- (data >>> threshold)} b[b==TRUE] = 1 y <-rle(b) ans >> <-length(subset((y$lengths[y$values==1]), (y$lengths[y$values==1])>=2)) >> return(ans)}* >> >> Do you have any idea how to make the "apply" faster here? >> >> -Deb >> >> >> On Sat, Jul 9, 2016 at 3:46 PM, Charles C. Berry <ccberry at ucsd.edu> >> wrote: >> >> > On Sat, 9 Jul 2016, Debasish Pai Mazumder wrote: >> > >> > I have 4-dimension array x(lat,lon,time,var) >> >> >> >> I am using "apply" to calculate over time >> >> new = apply(x,c(1,2,4),FUN=function(y) {length(which(y>=70))}) >> >> >> >> This is very slow. Is there anyway make it faster? >> >> >> > >> > If dim(x)[3] << prod(dim(x)[-3]), >> > >> > new <- Reduce("+",lapply(1:dim(x)[3],function(z) x[,,z,]>=70)) >> > >> > will be faster. >> > >> > However, if you can follow Peter Langfelder's suggestion to use rowSums, >> > that would be best. Even using rowSums(aperm(x,c(1,2,4,3)>=70,dims=3) >> and >> > paying the price of aperm() might be better. >> > >> > Chuck >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >[[alternative HTML version deleted]]