Hello, I'm struggling with an unexpected interference between the two packages dplyr and plm, or to be more concrete with the "lag(x, ...)" function of both packages. If dplyr is in the namespace the plm function uses no longer the appropriate lag()-function which accounts for the panel structure. The following code demonstrates the unexpected behaviour: ## starting from a new R-Session (plm and dplyr unloaded) ## ## generate dataset set.seed(4711) df <- data.frame( i = rep(1:10, each = 4), t = rep(1:4, times = 10), y = rnorm(40), x = rnorm(40) ) ## manually generated laged variable df$lagx <- c(NA, df$x[-40]) df$lagx[df$t == 1] <- NA require(plm) summary(plm(y~lagx, data = df, index = c("i", "t"))) summary(plm(y~lag(x, 1), data = df, index = c("i", "t"))) # > this result is expected require(dplyr) summary(plm(y~lagx, data = df, index = c("i", "t"))) summary(plm(y~lag(x, 1), data = df, index = c("i", "t"))) # > this result is unexpected Is there a way to force R to use the "correct" lag-function? (or at the devel-level to harmonise both functions) Thank you very much in advance for your answer Yours Constantin -- ^ | X | /eiser, Dr. Constantin (weiserc at hhu.de) | /Chair of Statistics and Econometrics | / Heinrich Heine-University of D?sseldorf | * /\ / Universit?tsstra?e 1, 40225 D?sseldorf, Germany | \ / \ / Oeconomicum (Building 24.31), Room 01.22 | \/ \/ Tel: 0049 211 81-15307 +----------------------------------------------------------->
Hi, It shouldn't be entirely unexpected: when I load dplyr, I get a series of messages telling me that certain functions are masked. The following object is masked from ?package:plm?: between The following objects are masked from ?package:stats?: filter, lag The following objects are masked from ?package:base?: intersect, setdiff, setequal, union You can see the search path that R uses when looking for a function or other object here: In your example, it should look like this:> search()[1] ".GlobalEnv" "package:dplyr" "package:plm" "package:Formula" [5] "package:stats" "package:graphics" "package:grDevices" "package:utils" [9] "package:datasets" "package:vimcom" "package:setwidth" "package:colorout" [13] "package:methods" "Autoloads" "package:base" So R is searching the local environment, then dplyr, and then farther down the list, stats, which is where the lag function comes from (see above warning). Once you know where the desired function comes from you can specify its namespace: summary(plm(y~lagx, data = df, index = c("i", "t"))) summary(plm(y~stats::lag(x, 1), data = df, index = c("i", "t"))) If you weren't paying attention to the warning messages at package load, you can also use the getAnywhere function to find out:> getAnywhere(lag)2 differing objects matching ?lag? were found in the following places package:dplyr package:stats namespace:dplyr namespace:stats Sarah On Tue, Nov 29, 2016 at 9:36 AM, Constantin Weiser <weiserc at hhu.de> wrote:> Hello, > > I'm struggling with an unexpected interference between the two packages > dplyr and plm, or to be more concrete with the "lag(x, ...)" function of > both packages. > > If dplyr is in the namespace the plm function uses no longer the appropriate > lag()-function which accounts for the panel structure. > > The following code demonstrates the unexpected behaviour: > > ## starting from a new R-Session (plm and dplyr unloaded) ## > > ## generate dataset > set.seed(4711) > df <- data.frame( > i = rep(1:10, each = 4), > t = rep(1:4, times = 10), > y = rnorm(40), > x = rnorm(40) > ) > ## manually generated laged variable > df$lagx <- c(NA, df$x[-40]) > df$lagx[df$t == 1] <- NA > > > require(plm) > summary(plm(y~lagx, data = df, index = c("i", "t"))) > summary(plm(y~lag(x, 1), data = df, index = c("i", "t"))) > # > this result is expected > > require(dplyr) > summary(plm(y~lagx, data = df, index = c("i", "t"))) > summary(plm(y~lag(x, 1), data = df, index = c("i", "t"))) > # > this result is unexpected > > Is there a way to force R to use the "correct" lag-function? (or at the > devel-level to harmonise both functions) > > Thank you very much in advance for your answer > > Yours > Constantin > > -- > ^-- Sarah Goslee http://www.functionaldiversity.org
> On Nov 29, 2016, at 6:52 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote: > > Hi, > > It shouldn't be entirely unexpected: when I load dplyr, I get a series > of messages telling me that certain functions are masked. > > > The following object is masked from ?package:plm?: > > between > > The following objects are masked from ?package:stats?: > > filter, lag > > The following objects are masked from ?package:base?: > > intersect, setdiff, setequal, union > > > You can see the search path that R uses when looking for a function or > other object here: > > In your example, it should look like this: > >> search() > [1] ".GlobalEnv" "package:dplyr" "package:plm" > "package:Formula" > [5] "package:stats" "package:graphics" "package:grDevices" > "package:utils" > [9] "package:datasets" "package:vimcom" "package:setwidth" > "package:colorout" > [13] "package:methods" "Autoloads" "package:base" > > > So R is searching the local environment, then dplyr, and then farther > down the list, stats, which is where the lag function comes from (see > above warning). > > Once you know where the desired function comes from you can specify > its namespace:The other option would be to load dplyr first (which would give the waring that stats::lag was masked) and then later load plm (which should give a further warning that dplyr::lag is masked). Then the plm::lag function will be found first. -- David.> > > summary(plm(y~lagx, data = df, index = c("i", "t"))) > summary(plm(y~stats::lag(x, 1), data = df, index = c("i", "t"))) > > If you weren't paying attention to the warning messages at package > load, you can also use the getAnywhere function to find out: > >> getAnywhere(lag) > 2 differing objects matching ?lag? were found > in the following places > package:dplyr > package:stats > namespace:dplyr > namespace:stats > > > Sarah > > > On Tue, Nov 29, 2016 at 9:36 AM, Constantin Weiser <weiserc at hhu.de> wrote: >> Hello, >> >> I'm struggling with an unexpected interference between the two packages >> dplyr and plm, or to be more concrete with the "lag(x, ...)" function of >> both packages. >> >> If dplyr is in the namespace the plm function uses no longer the appropriate >> lag()-function which accounts for the panel structure. >> >> The following code demonstrates the unexpected behaviour: >> >> ## starting from a new R-Session (plm and dplyr unloaded) ## >> >> ## generate dataset >> set.seed(4711) >> df <- data.frame( >> i = rep(1:10, each = 4), >> t = rep(1:4, times = 10), >> y = rnorm(40), >> x = rnorm(40) >> ) >> ## manually generated laged variable >> df$lagx <- c(NA, df$x[-40]) >> df$lagx[df$t == 1] <- NA >> >> >> require(plm) >> summary(plm(y~lagx, data = df, index = c("i", "t"))) >> summary(plm(y~lag(x, 1), data = df, index = c("i", "t"))) >> # > this result is expected >> >> require(dplyr) >> summary(plm(y~lagx, data = df, index = c("i", "t"))) >> summary(plm(y~lag(x, 1), data = df, index = c("i", "t"))) >> # > this result is unexpected >> >> Is there a way to force R to use the "correct" lag-function? (or at the >> devel-level to harmonise both functions) >> >> Thank you very much in advance for your answer >> >> Yours >> Constantin >> >> -- >> ^ > > -- > Sarah Goslee > http://www.functionaldiversity.org > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA