Hi, we came across the following unexpected (for us) behavior in terms.formula: When determining whether a term is duplicated, only the order of the arguments in function calls seems to be checked but not their names. Thus the terms f(x, a = z) and f(x, b = z) are deemed to be duplicated and one of the terms is thus dropped. R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels") [1] "f(x, a = z)" However, changing the arguments or the order of arguments keeps both terms: R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels") [1] "f(x, a = z)" "f(x, b = zz)" R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels") [1] "f(x, a = z)" "f(b = z, x)" Is this intended behavior or needed for certain terms? We came across this problem when setting up certain smooth regressors with different kinds of patterns. As a trivial simplified example we can generate the same kind of problem with rep(). Consider the two dummy variables rep(x = 0:1, each = 4) and rep(x = 0:1, times = 4). With the response y = 1:8 I get: R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4)) Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4)) Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0 So while the model is identified because the two regressors are not the same, terms.fomula does not recognize this and drops the second regressor. What I would have wanted can be obtained by switching the arguments: R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4)) Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4)) Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x = 0:1, times = 4) 2 4 1 Of course, here I could avoid the problem by setting up proper factors etc. But to me this looks a potential bug in terms.formula... Thanks in advance for any insights, Z
Dear Achim,>>>>> Achim Zeileis <Achim.Zeileis at r-project.org> >>>>> on Fri, 10 Mar 2017 15:02:38 +0100 writes:> Hi, we came across the following unexpected (for us) > behavior in terms.formula: When determining whether a term > is duplicated, only the order of the arguments in function > calls seems to be checked but not their names. Thus the > terms f(x, a = z) and f(x, b = z) are deemed to be > duplicated and one of the terms is thus dropped. R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels") > [1] "f(x, a = z)" > However, changing the arguments or the order of arguments > keeps both terms: R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels") > [1] "f(x, a = z)" "f(x, b = zz)" R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels") > [1] "f(x, a = z)" "f(b = z, x)" > Is this intended behavior or needed for certain terms? > We came across this problem when setting up certain smooth > regressors with different kinds of patterns. As a trivial > simplified example we can generate the same kind of > problem with rep(). Consider the two dummy variables rep(x > = 0:1, each = 4) and rep(x = 0:1, times = 4). With the > response y = 1:8 I get: R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4)) > Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x > = 0:1, times = 4)) > Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0 > So while the model is identified because the two > regressors are not the same, terms.fomula does not > recognize this and drops the second regressor. What I > would have wanted can be obtained by switching the > arguments: R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times =4)) > Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x > = 0:1, times = 4)) > Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x > 0:1, times = 4) 2 4 1 > Of course, here I could avoid the problem by setting up > proper factors etc. But to me this looks a potential bug > in terms.formula... I agree that there is a bug. According to https://www.r-project.org/bugs.html I have generated an R bugzilla account for you so you can report it there (for "book keeping", posteriority, etc). > Thanks in advance for any insights, Z and thank *you* (and Nikolaus ?) for the report! Best regards, Martin
Martin, thanks for the follow-up! On Mon, 13 Mar 2017, Martin Maechler wrote:> Dear Achim, > >>>>>> Achim Zeileis <Achim.Zeileis at r-project.org> >>>>>> on Fri, 10 Mar 2017 15:02:38 +0100 writes: > > > Hi, we came across the following unexpected (for us) > > behavior in terms.formula: When determining whether a term > > is duplicated, only the order of the arguments in function > > calls seems to be checked but not their names. Thus the > > terms f(x, a = z) and f(x, b = z) are deemed to be > > duplicated and one of the terms is thus dropped. > > R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels") > > [1] "f(x, a = z)" > > > However, changing the arguments or the order of arguments > > keeps both terms: > > R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels") > > [1] "f(x, a = z)" "f(x, b = zz)" > R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels") > > [1] "f(x, a = z)" "f(b = z, x)" > > > Is this intended behavior or needed for certain terms? > > > We came across this problem when setting up certain smooth > > regressors with different kinds of patterns. As a trivial > > simplified example we can generate the same kind of > > problem with rep(). Consider the two dummy variables rep(x > > = 0:1, each = 4) and rep(x = 0:1, times = 4). With the > > response y = 1:8 I get: > > R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4)) > > > Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x > > = 0:1, times = 4)) > > > Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0 > > > So while the model is identified because the two > > regressors are not the same, terms.fomula does not > > recognize this and drops the second regressor. What I > > would have wanted can be obtained by switching the > > arguments: > > R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times =4)) > > > Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x > > = 0:1, times = 4)) > > > Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x > > 0:1, times = 4) 2 4 1 > > > Of course, here I could avoid the problem by setting up > > proper factors etc. But to me this looks a potential bug > > in terms.formula... > > I agree that there is a bug.OK, good. I just wasn't sure whether I had missed some documentation somewhere that this is intended behavior.> According to https://www.r-project.org/bugs.html > I have generated an R bugzilla account for you so you can report > it there (for "book keeping", posteriority, etc).Thanks, I had already looked at that but waited for feedback on this list first.> > Thanks in advance for any insights, Z > > and thank *you* (and Nikolaus ?) for the report!No problem. Niki found the problem and I came up with the simplified example. In any case, I just posted a slightly modified version of my e-mail as #17235 on Bugzilla: https://bugs.R-project.org/bugzilla/show_bug.cgi?id=17235 Thanks & best wishes, Z> Best regards, > Martin > >