ggrothendieck@myway.com
2004-Aug-28 06:05 UTC
[Rd] model.matrix.default chokes on backquote (PR#7202)
Full_Name: Gabor Grothendieck Version: R version 1.9.1, 2004-08-03 OS: Windows XP Submission from: (NULL) (207.35.143.52) The following gives an error: > `a(b)` <- 1:4 > `c(d)` <- (1:4)^2 > lm(`a(b)` ~ `c(d)`) Error in model.matrix.default(mt, mf, contrasts) : model frame and formula mismatch in model.matrix() To fix it replace this line in model.matrix.default: reorder <- match(attr(t, "variables")[-1], names(data)) with these two lines: strip.backquote <- function(x) gsub("^`(.*)`", "\\1", x) reorder <- match(strip.backquote(attr(t, "variables"))[-1], strip.backquote(names(data)))
Gabor Grothendieck
2004-Aug-28 06:27 UTC
[Rd] model.matrix.default chokes on backquote (PR#7202)
Sorry, but my solution itself had a shortcoming. The $ at the end of the regular expression is missing and gsub can simply be sub. Here it is again: strip.backquote <- function(x) sub("^`(.*)`$", "\\1", x) reorder <- match(strip.backquote(attr(t, "variables"))[-1], strip.backquote(names(data))) Full_Name: Gabor Grothendieck Version: R version 1.9.1, 2004-08-03 OS: Windows XP Submission from: (NULL) (207.35.143.52) The following gives an error: > `a(b)` <- 1:4 > `c(d)` <- (1:4)^2 > lm(`a(b)` ~ `c(d)`) Error in model.matrix.default(mt, mf, contrasts) : model frame and formula mismatch in model.matrix() To fix it replace this line in model.matrix.default: reorder <- match(attr(t, "variables")[-1], names(data)) with these two lines: strip.backquote <- function(x) gsub("^`(.*)`", "\\1", x) reorder <- match(strip.backquote(attr(t, "variables"))[-1], strip.backquote(names(data)))
Peter Dalgaard
2004-Aug-28 10:46 UTC
[Rd] model.matrix.default chokes on backquote (PR#7202)
ggrothendieck@myway.com writes:> The following gives an error: > > > `a(b)` <- 1:4 > > `c(d)` <- (1:4)^2 > > lm(`a(b)` ~ `c(d)`) > Error in model.matrix.default(mt, mf, contrasts) : > model frame and formula mismatch in model.matrix() > > To fix it replace this line in model.matrix.default: > > reorder <- match(attr(t, "variables")[-1], names(data)) > > with these two lines: > > strip.backquote <- function(x) gsub("^`(.*)`", "\\1", x) > reorder <- match(strip.backquote(attr(t, "variables"))[-1], > strip.backquote(names(data)))Hmm.. Yes, there's a bug (and it's likely not the only one we have relating to odd variable names in model formulas), but I suspect that the fix is wrong. The backquotes are not part of the variable names, but get added by deparsing -- sometimes! Other times they do not: Try for instance as.character(quote(`a(b)`)). (Which is as it should be. Other pieces of logic relating to nonsyntactical names represent some rather awkward compromises.) When backquotes have found their way into names(data) or the "variables" attribute, I would rather suspect that they were created by the wrong tool and fix that, not cure the symptom by stripping them off at a later stage. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Gabor Grothendieck
2004-Aug-28 15:36 UTC
[Rd] model.matrix.default chokes on backquote (PR#7202)
> ggrothendieck@myway.com writes: > > > The following gives an error: > > > > > `a(b)` <- 1:4 > > > `c(d)` <- (1:4)^2 > > > lm(`a(b)` ~ `c(d)`) > > Error in model.matrix.default(mt, mf, contrasts) : > > model frame and formula mismatch in model.matrix() > > > > To fix it replace this line in model.matrix.default: > > > > reorder <- match(attr(t, "variables")[-1], names(data)) > > > > with these two lines: > > > > strip.backquote <- function(x) gsub("^`(.*)`", "\\1", x) > > reorder <- match(strip.backquote(attr(t, "variables"))[-1], > > strip.backquote(names(data))) > > Hmm.. Yes, there's a bug (and it's likely not the only one we have > relating to odd variable names in model formulas), but I suspect that > the fix is wrong. > > The backquotes are not part of the variable names, but get added by > deparsing -- sometimes! Other times they do not: Try for instance > as.character(quote(`a(b)`)). (Which is as it should be. Other pieces > of logic relating to nonsyntactical names represent some rather > awkward compromises.) > > When backquotes have found their way into names(data) or the > "variables" attribute, I would rather suspect that they were created > by the wrong tool and fix that, not cure the symptom by stripping them > off at a later stage.In model.frame.default there is a line: varnames <- as.character(vars[-1]) that turns part of a call object, vars, into a character string. We could change that to: varnames <- strip.backquote(as.character(as.list(vars[-1]))) or perhaps as.character should not return the backquotes in the first place in which case the fix would be to fix as.character.
Gabor Grothendieck
2004-Aug-28 17:15 UTC
[Rd] model.matrix.default chokes on backquote (PR#7202)
> > From: Peter Dalgaard <p.dalgaard@biostat.ku.dk> > > "Gabor Grothendieck" <ggrothendieck@myway.com> writes: > > > > ggrothendieck@myway.com writes: > > > > > > > The following gives an error: > > > > > > > > > `a(b)` <- 1:4 > > > > > `c(d)` <- (1:4)^2 > > > > > lm(`a(b)` ~ `c(d)`) > > > > Error in model.matrix.default(mt, mf, contrasts) : > > > > model frame and formula mismatch in model.matrix() > > > > > > > > To fix it replace this line in model.matrix.default: > > > > > > > > reorder <- match(attr(t, "variables")[-1], names(data)) > > > > > > > > with these two lines: > > > > > > > > strip.backquote <- function(x) gsub("^`(.*)`", "\\1", x) > > > > reorder <- match(strip.backquote(attr(t, "variables"))[-1], > > > > strip.backquote(names(data))) > > > > > > Hmm.. Yes, there's a bug (and it's likely not the only one we have > > > relating to odd variable names in model formulas), but I suspect that > > > the fix is wrong. > > > > > > The backquotes are not part of the variable names, but get added by > > > deparsing -- sometimes! Other times they do not: Try for instance > > > as.character(quote(`a(b)`)). (Which is as it should be. Other pieces > > > of logic relating to nonsyntactical names represent some rather > > > awkward compromises.) > > > > > > When backquotes have found their way into names(data) or the > > > "variables" attribute, I would rather suspect that they were created > > > by the wrong tool and fix that, not cure the symptom by stripping them > > > off at a later stage. > > > > In model.frame.default there is a line: > > > > varnames <- as.character(vars[-1]) > > > > that turns part of a call object, vars, into a character string. > > We could change that to: > > > > varnames <- strip.backquote(as.character(as.list(vars[-1]))) > > > > or perhaps as.character should not return the backquotes in the > > first place in which case the fix would be to fix as.character. > > Or not use it in this way. I forget what the reasoning was behind the > current behaviour of as.character, but the point is that > > > as.character(attr(terms(`a(b)`~`c(d)`),"variables")) > [1] "list" "`a(b)`" "`c(d)`" > > whereas for instance > > > sapply(attr(terms(`a(b)`~`c(d)`),"variables")[-1],as.character) > [1] "a(b)" "c(d)"1. That is quite subtle but a fix based on that would appear to solve it. 2. Your example and possibly some verbiage should be added to ?as.character . 3. In looking for the offending spot, I seem to remember (though I did not keep track of it) that one or more of lm, model.frame.default, terms.formula, etc. had additional applications of as.character directly to a list as in your first example and these should probably be changed to correspond to your second example, as well, where as.character is applied to the elements of the list rather than the lsit itself.
Gabor Grothendieck
2004-Aug-30 03:35 UTC
[Rd] model.matrix.default chokes on backquote (PR#7202)
Peter Dalgaard <p.dalgaard@biostat.ku.dk> writes:> > "Gabor Grothendieck" <ggrothendieck@myway.com> writes: > > > > > as.character(attr(terms(`a(b)`~`c(d)`),"variables")) > > > [1] "list" "`a(b)`" "`c(d)`" > > > > > > whereas for instance > > > > > > > sapply(attr(terms(`a(b)`~`c(d)`),"variables")[-1],as.character) > > > [1] "a(b)" "c(d)" > > > > 1. That is quite subtle but a fix based on that would appear to > > solve it. > > Hmm, not quite. I tried, and terms like offset(foo) gets me in > trouble. Probably, I was fixing the wrong end of the original problem: > In the comparisons, we can't have one side with backquotes and the > other without them. That doesn't have to mean that they should be > removed from both sides, and indeed it would get us in trouble if > someone was perverse enough to do things like > > y ~ `offset(foo)` + offset(foo) > > I.e. perhaps the real issue is that names(data) gets generated without > backquotes. > > Anyways, this is a real can of worms and I'm not sure we're not too > close to 2.0.0 to start tampering with it...How about a partial fix that does not address pathological cases where the variable names themselves have embedded backquotes but does address the common cases such as: y <- ts(1:10); x1 <- y^2; x2 <- y^4 lm(`lag(y)` ~ ., cbind(lag(y), x1, diff(x2)) ) without having to resort to: lm(lag.y ~ ., cbind(lag.y = lag(y), x1, diff(x2)) )
Gabor Grothendieck
2004-Aug-30 16:21 UTC
[Rd] model.matrix.default chokes on backquote (PR#7202)
Peter Dalgaard <p.dalgaard@biostat.ku.dk> writes:> > "Gabor Grothendieck" <ggrothendieck@myway.com> writes: > > > > y ~ `offset(foo)` + offset(foo) > > > > > > I.e. perhaps the real issue is that names(data) gets generated without > > > backquotes. > > > > > > Anyways, this is a real can of worms and I'm not sure we're not too > > > close to 2.0.0 to start tampering with it... > > > > How about a partial fix that does not address pathological > > cases where the variable names themselves have embedded > > backquotes but does address the common cases such as: > > > > y <- ts(1:10); x1 <- y^2; x2 <- y^4 > > lm(`lag(y)` ~ ., cbind(lag(y), x1, diff(x2)) ) > > > > without having to resort to: > > > > lm(lag.y ~ ., cbind(lag.y = lag(y), x1, diff(x2)) ) > > Hmmm... Point taken, but I'm not happy about the fact that the > internals seem unable to discern `lag(y)` (the name) and lag(y) (the > call). One might consider "backtickifying" the names of the data > matrix instead: > > > bq <- function(x)sapply(x, > function(nm)deparse(as.name(nm),backtick=TRUE)) > > bq(c("a","a(b)")) > a a(b) > "a" "`a(b)`"Just one other comment. The reason that the example I provided arises is that one cannot write: lm(lag(y) ~ x1 + diff(x2)) or lm(lag(y) ~ x1 + diff(x2), cbind(y, x1, x2)) For these to work it would have to align the time scales of ts objects passed to it.