Dear all, I have weekly data by city (variable citycode). I would like to take the average of the previous two, three, four weeks (without the current week) of the variable called value. This is what I have tried to compute the average of the two previous weeks; df = df %>% mutate(value.lag1 = lag(value, n = 1)) %>% mutate(value .2.previous = rollapply(data = value.lag1, width = 2, FUN = mean, align = "right", fill = NA, na.rm = T)) I crated the lag of the variable first and then attempted to compute the average but this does not seem to to what I want. What I am doing wrong? Any help will be appreciated. The data is below. Thank you. Sincerely, Milu dput(droplevels(head(df, 10))) structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date structure(c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10", "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14", "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"), value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995, -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year", "citycode", "month", "week", "date", "tmin"), row.names = c(NA, 10L), class = "data.frame") [[alternative HTML version deleted]]
There is no `value` column in the `dput` output shown in the question so using `tmin` instead note that the `width=` argument of `rollapply` can be a list containing a vector of offsets (-1 is prior value, -2 is value before that, etc.) and that we can use `rollapplyr` with an `r` on the end to get right alignment. See `?rollapply` library(dplyr) library(zoo) roll <- function(x, k) rollapplyr(x, list(-seq(1:k)), mean, fill = NA) df %>% group_by(citycode) %>% mutate(mean2 = roll(tmin, 2), mean3 = roll(tmin, 3), mean4 roll(tmin, 4)) %>% ungroup (The code above has been indented 2 spaces so you can identify inadvertent line wrapping by the email system.) On Sun, Mar 25, 2018 at 10:48 AM, Miluji Sb <milujisb at gmail.com> wrote:> Dear all, > > I have weekly data by city (variable citycode). I would like to take the > average of the previous two, three, four weeks (without the current week) > of the variable called value. > > This is what I have tried to compute the average of the two previous weeks; > > df = df %>% > mutate(value.lag1 = lag(value, n = 1)) %>% > mutate(value .2.previous = rollapply(data = value.lag1, > width = 2, > FUN = mean, > align = "right", > fill = NA, > na.rm = T)) > > I crated the lag of the variable first and then attempted to compute the > average but this does not seem to to what I want. What I am doing wrong? > Any help will be appreciated. The data is below. Thank you. > > Sincerely, > > Milu > > dput(droplevels(head(df, 10))) > structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, > 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, > 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date > structure(c(1L, > 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10", > "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14", > "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"), > value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995, > -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year", > "citycode", "month", "week", "date", "tmin"), row.names = c(NA, > 10L), class = "data.frame") > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
I am sure that this sort of thing has been asked and answered before, so in case my suggestions don't work for you, just search the archives a bit more. I am also sure that it can be handled directly by numerous functions in numerous packages, e.g. via time series methods or by calculating running means of suitably shifted series. However, as it seems to be a straightforward task, I'll provide what I think is a simple solution in base R. Adjust to your situation. ## First I need a little utility function to offset rows. Lots of ways to do this,many nicer than this I'm sure.> shift <- function(x,k)+ ## x is a vector of values -- e.g. of a column in your df + { + sapply(seq_len(k),function(i)c(rep(NA,i),head(x,-i))) + }> > > ## Testit > x <- c(1,3,5,7,8:11) > m <- shift(x,3) ## matrix of prior values up to lag 3 > m ## note rows have been omitted where lags don't exist[,1] [,2] [,3] [1,] NA NA NA [2,] 1 NA NA [3,] 3 1 NA [4,] 5 3 1 [5,] 7 5 3 [6,] 8 7 5 [7,] 9 8 7 [8,] 10 9 8> rowMeans(m) ## means of previous 3[1] NA NA NA 3.000000 5.000000 6.666667 8.000000 9.000000> rowMeans(m[,1:2]) ## means of previous 2[1] NA NA 2.0 4.0 6.0 7.5 8.5 9.5 Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Mar 25, 2018 at 7:48 AM, Miluji Sb <milujisb at gmail.com> wrote:> Dear all, > > I have weekly data by city (variable citycode). I would like to take the > average of the previous two, three, four weeks (without the current week) > of the variable called value. > > This is what I have tried to compute the average of the two previous weeks; > > df = df %>% > mutate(value.lag1 = lag(value, n = 1)) %>% > mutate(value .2.previous = rollapply(data = value.lag1, > width = 2, > FUN = mean, > align = "right", > fill = NA, > na.rm = T)) > > I crated the lag of the variable first and then attempted to compute the > average but this does not seem to to what I want. What I am doing wrong? > Any help will be appreciated. The data is below. Thank you. > > Sincerely, > > Milu > > dput(droplevels(head(df, 10))) > structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, > 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, > 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date > structure(c(1L, > 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10", > "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14", > "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"), > value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995, > -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year", > "citycode", "month", "week", "date", "tmin"), row.names = c(NA, > 10L), class = "data.frame") > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear Bert, Thank you very much.This works. I was wondering if the fact that I want to create new variables (sorry for not stating that fact) makes any difference? Thank you again. Sincerely, Milu On Sun, Mar 25, 2018 at 10:05 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> I am sure that this sort of thing has been asked and answered before, > so in case my suggestions don't work for you, just search the archives > a bit more. > I am also sure that it can be handled directly by numerous functions > in numerous packages, e.g. via time series methods or by calculating > running means of suitably shifted series. > > However, as it seems to be a straightforward task, I'll provide what I > think is a simple solution in base R. Adjust to your situation. > > ## First I need a little utility function to offset rows. Lots of ways > to do this,many nicer than this I'm sure. > > > shift <- function(x,k) > + ## x is a vector of values -- e.g. of a column in your df > + { > + sapply(seq_len(k),function(i)c(rep(NA,i),head(x,-i))) > + } > > > > > > ## Testit > > x <- c(1,3,5,7,8:11) > > m <- shift(x,3) ## matrix of prior values up to lag 3 > > m ## note rows have been omitted where lags don't exist > [,1] [,2] [,3] > [1,] NA NA NA > [2,] 1 NA NA > [3,] 3 1 NA > [4,] 5 3 1 > [5,] 7 5 3 > [6,] 8 7 5 > [7,] 9 8 7 > [8,] 10 9 8 > > rowMeans(m) ## means of previous 3 > [1] NA NA NA 3.000000 5.000000 6.666667 8.000000 9.000000 > > rowMeans(m[,1:2]) ## means of previous 2 > [1] NA NA 2.0 4.0 6.0 7.5 8.5 9.5 > > > Cheers, > Bert > > > > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Sun, Mar 25, 2018 at 7:48 AM, Miluji Sb <milujisb at gmail.com> wrote: > > Dear all, > > > > I have weekly data by city (variable citycode). I would like to take the > > average of the previous two, three, four weeks (without the current week) > > of the variable called value. > > > > This is what I have tried to compute the average of the two previous > weeks; > > > > df = df %>% > > mutate(value.lag1 = lag(value, n = 1)) %>% > > mutate(value .2.previous = rollapply(data = value.lag1, > > width = 2, > > FUN = mean, > > align = "right", > > fill = NA, > > na.rm = T)) > > > > I crated the lag of the variable first and then attempted to compute the > > average but this does not seem to to what I want. What I am doing wrong? > > Any help will be appreciated. The data is below. Thank you. > > > > Sincerely, > > > > Milu > > > > dput(droplevels(head(df, 10))) > > structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, > > 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L, > > 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, > > 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date > > structure(c(1L, > > 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10", > > "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14", > > "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"), > > value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995, > > -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year", > > "citycode", "month", "week", "date", "tmin"), row.names = c(NA, > > 10L), class = "data.frame") > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]