Hi all, I have a very large binary vector, I wish to calculate the number of 1's over sliding windows. this is my very slow function slide<-function(seq,window){ n<-length(seq)-window tot<-c() tot[1]<-sum(seq[1:window]) for (i in 2:n) { tot[i]<- tot[i-1]-seq[i-1]+seq[i] } return(tot) } this works well for for reasonably sized vectors. Does anybody know a way for large vectors ( length=12 million), im trying to avoid using C. Thanks, Chris
you can have a look at the rollapply() function in the zoo package, e.g., x <- rbinom(100, 1, 0.5) z <- zoo(x) rollapply(z, 3, sum) I hope it helps. Best, Dimitris Chris Oldmeadow wrote:> Hi all, > > I have a very large binary vector, I wish to calculate the number of > 1's over sliding windows. > > this is my very slow function > > slide<-function(seq,window){ > n<-length(seq)-window > tot<-c() > tot[1]<-sum(seq[1:window]) for (i in 2:n) { > tot[i]<- tot[i-1]-seq[i-1]+seq[i] > } > return(tot) > } > > this works well for for reasonably sized vectors. Does anybody know a > way for large vectors ( length=12 million), im trying to avoid using C. > > Thanks, > Chris > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
> sl <- function(x,z) c(0,cumsum(diff(x)[1:(length(x)-z-1)])) + rep(sum(x[1:z]),length(x)-z) > x <- rbinom(100000, 1, 0.5) > system.time(xx1 <- slide(x,12))utilisateur syst?me ?coul? 36.86 0.45 37.32> system.time(xx2 <- sl(x,12))utilisateur syst?me ?coul? 0.01 0.00 0.02> all.equal(xx1,xx2)[1] TRUE Jacques VESLOT CEMAGREF - UR Hydrobiologie Route de C?zanne - CS 40061 13182 AIX-EN-PROVENCE Cedex 5, France T?l. + 0033 04 42 66 99 76 fax + 0033 04 42 66 99 34 email jacques.veslot at cemagref.fr>-----Message d'origine----- >De?: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] De la part >de Chris Oldmeadow >Envoy??: mardi 16 d?cembre 2008 05:20 >??: r-help at r-project.org >Objet?: [R] sliding window over a large vector > >Hi all, > >I have a very large binary vector, I wish to calculate the number of >1's over sliding windows. > >this is my very slow function > >slide<-function(seq,window){ > n<-length(seq)-window > tot<-c() > tot[1]<-sum(seq[1:window]) > for (i in 2:n) { > tot[i]<- tot[i-1]-seq[i-1]+seq[i] > } > return(tot) >} > >this works well for for reasonably sized vectors. Does anybody know a >way for large vectors ( length=12 million), im trying to avoid using C. > >Thanks, >Chris > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hi: Veslot: I'm too tired to even try to figure out why but I think that there is something wrong with your sl function. see below for an empirical proof of that statement. OR maybe you're definition of sliding window is different than rollapply's definition but rollapply's answer makes more sense to me ? Output> set.seed(1) > x <- rbinom(24, 1, 0.5) > print(x)[1] 0 0 1 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 1 1 0 1 0> > xx1 <- sl(x,3) > print(xx1)[1] 1 1 2 2 1 2 2 2 2 1 1 1 2 1 2 1 2 2 1 2 2> > temp <- zoo(x) > ans<-rollapply(temp,3,sum) > print(ans)2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 2 2 2 2 3 3 2 1 0 1 1 2 1 2 2 2 2 2 2 2 1 On Tue, Dec 16, 2008 at 3:47 AM, Veslot Jacques wrote:>> sl <- function(x,z) c(0,cumsum(diff(x)[1:(length(x)-z-1)])) + >> rep(sum(x[1:z]),length(x)-z) >> x <- rbinom(100000, 1, 0.5) >> system.time(xx1 <- slide(x,12)) > utilisateur syst?me ?coul? 36.86 0.45 > 37.32 >> system.time(xx2 <- sl(x,12)) > utilisateur syst?me ?coul? 0.01 0.00 > 0.02 >> all.equal(xx1,xx2) > [1] TRUE > > Jacques VESLOT > > CEMAGREF - UR Hydrobiologie > > Route de C?zanne - CS 40061 13182 AIX-EN-PROVENCE Cedex 5, France > > T?l. + 0033 04 42 66 99 76 > fax + 0033 04 42 66 99 34 > email jacques.veslot at cemagref.fr > >> -----Message d'origine----- >> De?: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] De la part >> de Chris Oldmeadow >> Envoy??: mardi 16 d?cembre 2008 05:20 >> ??: r-help at r-project.org >> Objet?: [R] sliding window over a large vector >> >> Hi all, >> >> I have a very large binary vector, I wish to calculate the number of >> 1's over sliding windows. >> >> this is my very slow function >> >> slide<-function(seq,window){ >> n<-length(seq)-window >> tot<-c() >> tot[1]<-sum(seq[1:window]) >> for (i in 2:n) { >> tot[i]<- tot[i-1]-seq[i-1]+seq[i] >> } >> return(tot) >> } >> >> this works well for for reasonably sized vectors. Does anybody know a >> way for large vectors ( length=12 million), im trying to avoid using >> C. >> >> Thanks, >> Chris >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Chris, On Tuesday 16 December 2008, Chris Oldmeadow wrote:> Hi all, > > I have a very large binary vector, I wish to calculate the number of > 1's over sliding windows. > [...snip...]Your function does not seem to function very well, could you please offer a self-contained, reproducible example? When writing a function, it is indicated not to use reserved commands like "seq" (which is a function itself, generating a sequence of numbers). In any case, I suspect you might want to take a look on the function "rle". See ?rle I hope this helps, Adrian -- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391 [[alternative HTML version deleted]]
c.oldmeadow at student.qut.edu.au
2008-Dec-16 10:46 UTC
[R] sliding window over a large vector
the function works for me s<-rbinom(1000,1,0.5) t<-slide(s,50) just too slow. Thanks.
For this particular proble (counting), doesn't cumsum solve it effectively and efficiently? vv <- cumsum(v) vv[n:length(vv)] - vv[1:(length(vv)-n+1] Of course, this doesn't work for the general case of an arbitrary sliding window function. -s On 12/15/08, Chris Oldmeadow <c.oldmeadow at student.qut.edu.au> wrote:> Hi all, > > I have a very large binary vector, I wish to calculate the number of > 1's over sliding windows. > > this is my very slow function > > slide<-function(seq,window){ > n<-length(seq)-window > tot<-c() > tot[1]<-sum(seq[1:window]) > for (i in 2:n) { > tot[i]<- tot[i-1]-seq[i-1]+seq[i] > } > return(tot) > } > > this works well for for reasonably sized vectors. Does anybody know a > way for large vectors ( length=12 million), im trying to avoid using C. > > Thanks, > Chris > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Sent from my mobile device
There seems to be something wrong:> slide(c(1, 1, 0, 1), 2)[1] 2 2 but the output should be c(2, 1, 2) At any rate try this: library(zoo) 3 * rollmean(x, 3) On Mon, Dec 15, 2008 at 11:19 PM, Chris Oldmeadow <c.oldmeadow at student.qut.edu.au> wrote:> Hi all, > > I have a very large binary vector, I wish to calculate the number of 1's > over sliding windows. > > this is my very slow function > > slide<-function(seq,window){ > n<-length(seq)-window > tot<-c() > tot[1]<-sum(seq[1:window]) for (i in 2:n) { > tot[i]<- tot[i-1]-seq[i-1]+seq[i] > } > return(tot) > } > > this works well for for reasonably sized vectors. Does anybody know a way > for large vectors ( length=12 million), im trying to avoid using C. > > Thanks, > Chris > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Because I had too much time on my hands, here's a little function that will do whatever you want over a window you specify. No, I haven't done any time trials :-( # my own boxcar tool, just because. # use bfunc to specify what function to apply to the windowed # region. boxcar<-function(x, width=5, bfunc='mean'){ bfunc<-get(bfunc) boxout<-mapply(function(shiftx) { bfunc(window(x,shiftx,shiftx+width)) } ,seq(1,(length(x)-width)) return(invisible(boxout)) }