# Hi there, # I am trying to apply a function over a moving-window for a large number of multivariate time-series that are grouped in a nested set of factors. I have spent a few days searching for solutions with no luck, so any suggestions are much appreciated. # The data I have are for the abundance dynamics of multiple species observed in multiple fixed plots at multiple sites. (I total I have 7 sites, ~3-5 plots/site, ~150 species/plot, for 60 time-steps each.) So my data look something like this: dat<-data.frame(Site=rep(1), Plot=rep(c(rep(1,8),rep(2,8),rep(3,8)),1), Time=rep(c(1,1,2,2,3,3,4,4)), Sp=rep(1:2), Count=sample(24)) dat # Let the function I want to apply over a right-aligned window of w=2 time steps be: cv<-function(x){sd(x)/mean(x)} w<-2 # The final output I want would look something like this: Out<-data.frame(dat,CV=round(c(NA,NA,runif(6,0,1),c(NA,NA,runif(6,0,1))),2)) # I could reshape and apply zoo:rollapply() to a given plot at a given site, and reshape again as follows: library(zoo) a<-subset(dat,Site==1&Plot==1) b<-reshape(a[-c(1,2)],v.names='Count',idvar='Time',timevar='Sp',direction='wide') d<-zoo(b[,-1],b[,1]) d out<-rollapply(d, w, cv, na.pad=T, align='right') out # I would thereby have to loop through all my sites and plots which, although it deals with all species at once, still seems exceedingly inefficient. # So the question is, how do I use something like aggregate.zoo or tapply or even lapply to apply rollapply on each species' time series. # The closest I've come is the following two approaches: # First let: datx<-list(Site=dat$Site,Plot=dat$Plot,Sp=dat$Sp) daty<-dat$Count # Method 1. out1<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]), w, cv, na.pad=T, align='right') }) out1 out1[,,1] # Which "works" in that it gives me the right answers, but in a format from which I can't figure out how to get back into the format I want. # Method 2. fun<-function(x){y<-zoo(x);coredata(rollapply(y, w, cv,na.pad=T,align='right'))} out2<-aggregate(daty,by=datx,fun) out2 # Which superficially "works" better, but again only in a format I can't figure out how to use because the output seems to be a mix of data.frame and lists. out2[1,4] out2[1,5] is.data.frame(out2) is.list(out2) # The situation is made more problematic by the fact that the time point of first survey can differ between plots (e.g., site1-plot3 may only start at time-point 3). As in... dat2<-dat dat2<-dat2[-which(dat2$Plot==3 & dat2$Time<3),] dat2 # I must therefore ensure that I'm keeping track of the true time associated with each value, not just the order of their occurences. This information is (seemingly) lost by both methods. datx<-list(Site=dat2$Site,Plot=dat2$Plot,Sp=dat2$Sp) daty<-dat2$Count # Method 1. out3<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]), w, cv, na.pad=T, align='right') }) out3 out3[1,3,1] time(out3[1,3,1]) # Method 2 out4<-aggregate(daty,by=datx,fun) out4 time(out4[3,4]) # Am I going about this all wrong? Is there a different package to try? Any thoughts and suggestions are much appreciated! # R 2.12.2 GUI 1.36 Leopard build 32-bit (5691); zoo 1.6-4 # Thanks! # -mark -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- Ecology & Evolutionary Biology University of California, Santa Cruz Long Marine Laboratory 100 Shaffer Road Santa Cruz, CA 95060-5730 Ph: 773-256-8645 Fax: 831-459-3383 http://people.ucsc.edu/~mnovak1/ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
On Sun, Apr 3, 2011 at 11:58 AM, Mark Novak <mnovak1 at ucsc.edu> wrote:> # Hi there, > # I am trying to apply a function over a moving-window for a large number of > multivariate time-series that are grouped in a nested set of factors. ?I > have spent a few days searching for solutions with no luck, so any > suggestions are much appreciated. > > # The data I have are for the abundance dynamics of multiple species > observed in multiple fixed plots at multiple sites. ?(I total I have 7 > sites, ~3-5 plots/site, ~150 species/plot, for 60 time-steps each.) So my > data look something like this: > > dat<-data.frame(Site=rep(1), Plot=rep(c(rep(1,8),rep(2,8),rep(3,8)),1), > Time=rep(c(1,1,2,2,3,3,4,4)), Sp=rep(1:2), Count=sample(24)) > dat > > # Let the function I want to apply over a right-aligned window of w=2 time > steps be: > cv<-function(x){sd(x)/mean(x)} > w<-2 > > # The final output I want would look something like this: > Out<-data.frame(dat,CV=round(c(NA,NA,runif(6,0,1),c(NA,NA,runif(6,0,1))),2)) > > # I could reshape and apply zoo:rollapply() to a given plot at a given site, > and reshape again as follows: > library(zoo) > a<-subset(dat,Site==1&Plot==1) > b<-reshape(a[-c(1,2)],v.names='Count',idvar='Time',timevar='Sp',direction='wide') > d<-zoo(b[,-1],b[,1]) > d > out<-rollapply(d, w, cv, na.pad=T, align='right') > out > > # I would thereby have to loop through all my sites and plots which, > although it deals with all species at once, still seems exceedingly > inefficient. > > # So the question is, how do I use something like aggregate.zoo or tapply or > even lapply to apply rollapply on each species' time series. > > # The closest I've come is the following two approaches: > > # First let: > datx<-list(Site=dat$Site,Plot=dat$Plot,Sp=dat$Sp) > daty<-dat$Count > > # Method 1. > out1<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]), > w, cv, na.pad=T, align='right') }) > out1 > out1[,,1] > > # Which "works" in that it gives me the right answers, but in a format from > which I can't figure out how to get back into the format I want. > > # Method 2. > fun<-function(x){y<-zoo(x);coredata(rollapply(y, w, > cv,na.pad=T,align='right'))} > out2<-aggregate(daty,by=datx,fun) > out2 > > # Which superficially "works" better, but again only in a format I can't > figure out how to use because the output seems to be a mix of data.frame and > lists. > out2[1,4] > out2[1,5] > is.data.frame(out2) > is.list(out2) > > # The situation is made more problematic by the fact that the time point of > first survey can differ between plots ?(e.g., site1-plot3 may only start at > time-point 3). ?As in... > dat2<-dat > dat2<-dat2[-which(dat2$Plot==3 & dat2$Time<3),] > dat2 > > # I must therefore ensure that I'm keeping track of the true time associated > with each value, not just the order of their occurences. ?This information > is (seemingly) lost by both methods. > datx<-list(Site=dat2$Site,Plot=dat2$Plot,Sp=dat2$Sp) > daty<-dat2$Count > > # Method 1. > out3<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]), > w, cv, na.pad=T, align='right') }) > out3 > out3[1,3,1] > time(out3[1,3,1]) > > # Method 2 > out4<-aggregate(daty,by=datx,fun) > out4 > time(out4[3,4]) > > > # Am I going about this all wrong? ?Is there a different package to try? > ?Any thoughts and suggestions are much appreciated! > > # R 2.12.2 GUI 1.36 Leopard build 32-bit (5691); zoo 1.6-4 > > # Thanks! > # -mark >Try ave: dat$cv <- ave(dat$Count, dat[c("Site", "Plot", "Sp")], FUN function(x) rollapply(zoo(x), 2, cv, na.pad = TRUE, align = "right")) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Possibly Parallel Threads
- sliding window analysis with rollapply
- performance: zoo's rollapply() vs inline
- [PATCH 3/6] Input: Update vmmouse.c to use the common VMW_PORT macros
- [PATCH 3/6] Input: Update vmmouse.c to use the common VMW_PORT macros
- [PATCH 3/6] Input: Update vmmouse.c to use the common VMW_PORT macros