Paul Johnson
2012-May-29 15:43 UTC
[R] trouble automating formula edits when log or * are present; update trouble
Greetings I want to take a fitted regression and replace all uses of a variable in a formula. For example, I'd like to take m1 <- lm(y ~ x1, data=dat) and replace x1 with something else, say x1c, so the formula would become m1 <- lm(y ~ x1c, data=dat) I have working code to finish that part of the problem, but it fails when the formula is more complicated. If the formula has log(x1) or x1:x2, the update code I'm testing doesn't get right. Here's the test code: ##PJ ## 2012-05-29 dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50), x3=rnorm(100,m=50), y=rnorm(100)) m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat) m2 <- lm(y ~ log(x1) + x2*x3, data=dat) suffixX <- function(fmla, x, s){ upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s))) update.formula(fmla, upform) } newFmla <- formula(m2) newFmla suffixX(newFmla, "x2", "c") suffixX(newFmla, "x1", "c") The last few lines of the output. See how the update misses x1 inside log(x1) or in the interaction?> newFmla <- formula(m2) > newFmlay ~ log(x1) + x2 * x3> suffixX(newFmla, "x2", "c")y ~ log(x1) + x3 + x2c + x2:x3> suffixX(newFmla, "x1", "c")y ~ log(x1) + x2 + x3 + x1c + x2:x3 It gets the target if the target is all by itself, but not otherwise. After messing with this for quite a while, I conclude that update was the wrong way to go because it is geared to replacement of individual bits, not editing all instances of a thing. So I started studying the structure of formula objects. I noticed this really interesting thing. the newFmla object can be probed recursively to eventually reveal all of the individual pieces:> newFmlay ~ log(x1) + x2 * x3> newFmla[[3]]log(x1) + x2 * x3> newFmla[[3]][[2]]log(x1)> newFmla[[3]][[2]][[2]]x1 So, if you could tell me of a general way to "walk" though a formula object, couldn't I use "gsub" or something like that to recognize each instance of "x1" and replace with "x1c"?? I just can't figure how to automate the checking of each possible element in a formula, to get the right combination of [[]][[]][[]]. See what I mean? I need to avoid this:> newFmla[[3]][[2]][[3]]Error in newFmla[[3]][[2]][[3]] : subscript out of bounds pj -- Paul E. Johnson Professor, Political Science ? ?Assoc. Director 1541 Lilac Lane, Room 504 ? ? Center for Research Methods University of Kansas ? ? ? ? ? ? ? University of Kansas http://pj.freefaculty.org ? ? ? ? ? ?http://quant.ku.edu
R. Michael Weylandt
2012-May-29 16:31 UTC
[R] trouble automating formula edits when log or * are present; update trouble
Hi Paul, I haven't quite thought through this yet, but might it not be easier to convert your formula to a character and then use gsub et al on it directly? Something like this # Using m2 as you set up below m2 <- lm(y ~ log(x1) + x2*x3, data=dat) f2 <- formula(m2) as.formula(paste(f2[2], f2[1],gsub("x1", "x1c", as.character(f2[3])))) It's admittedly unwieldy, but it seems pretty robust. Something like: changeFormula <- function(form, xIn, xOut){ as.formula(paste(form[2], form[1], gsub(xIn, xOut, as.character(form[3])))) } changeForm(formula(m2), "x1", "x1c") I'm not sure if this will play nice with environments and what not so you might need to change those manually. Hope this gets you started, Michael On Tue, May 29, 2012 at 11:43 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:> Greetings > > I want to take a fitted regression and replace all uses of a variable > in a formula. For example, I'd like to take > > m1 <- lm(y ~ x1, data=dat) > > and replace x1 with something else, say x1c, so the formula would become > > m1 <- lm(y ~ x1c, data=dat) > > I have working code to finish that part of the problem, but it fails > when the formula is more complicated. If the formula has log(x1) or > x1:x2, the update code I'm testing doesn't get right. > > Here's the test code: > > ##PJ > ## 2012-05-29 > dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50), > x3=rnorm(100,m=50), y=rnorm(100)) > > m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat) > m2 <- lm(y ~ log(x1) + x2*x3, data=dat) > > suffixX <- function(fmla, x, s){ > ? ?upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s))) > ? ?update.formula(fmla, upform) > } > > newFmla <- formula(m2) > newFmla > suffixX(newFmla, "x2", "c") > suffixX(newFmla, "x1", "c") > > The last few lines of the output. See how the update misses x1 inside > log(x1) or in the interaction? > > >> newFmla <- formula(m2) >> newFmla > y ~ log(x1) + x2 * x3 >> suffixX(newFmla, "x2", "c") > y ~ log(x1) + x3 + x2c + x2:x3 >> suffixX(newFmla, "x1", "c") > y ~ log(x1) + x2 + x3 + x1c + x2:x3 > > It gets the target if the target is all by itself, but not otherwise. > > After messing with this for quite a while, I conclude that update was > the wrong way to go because it is geared to replacement of individual > bits, not editing all instances of a thing. > > So I started studying the structure of formula objects. ?I noticed > this really interesting thing. the newFmla object can be probed > recursively to eventually reveal all of the individual pieces: > > >> newFmla > y ~ log(x1) + x2 * x3 >> newFmla[[3]] > log(x1) + x2 * x3 >> newFmla[[3]][[2]] > log(x1) >> newFmla[[3]][[2]][[2]] > x1 > > So, if you could tell me of a general way to "walk" though a formula > object, couldn't I use "gsub" or something like that to recognize each > instance of "x1" and replace with "x1c"?? > > I just can't figure how to automate the checking of each possible > element in a formula, to get the right combination of [[]][[]][[]]. > See what I mean? I need to avoid this: > >> newFmla[[3]][[2]][[3]] > Error in newFmla[[3]][[2]][[3]] : subscript out of bounds > > pj > > -- > Paul E. Johnson > Professor, Political Science ? ?Assoc. Director > 1541 Lilac Lane, Room 504 ? ? Center for Research Methods > University of Kansas ? ? ? ? ? ? ? University of Kansas > http://pj.freefaculty.org ? ? ? ? ? ?http://quant.ku.edu > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2012-May-29 17:26 UTC
[R] trouble automating formula edits when log or * are present; update trouble
On Tue, May 29, 2012 at 11:43 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:> Greetings > > I want to take a fitted regression and replace all uses of a variable > in a formula. For example, I'd like to take > > m1 <- lm(y ~ x1, data=dat) > > and replace x1 with something else, say x1c, so the formula would become > > m1 <- lm(y ~ x1c, data=dat) > > I have working code to finish that part of the problem, but it fails > when the formula is more complicated. If the formula has log(x1) or > x1:x2, the update code I'm testing doesn't get right. > > Here's the test code: > > ##PJ > ## 2012-05-29 > dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50), > x3=rnorm(100,m=50), y=rnorm(100)) > > m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat) > m2 <- lm(y ~ log(x1) + x2*x3, data=dat) > > suffixX <- function(fmla, x, s){ > ? ?upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s))) > ? ?update.formula(fmla, upform) > } > > newFmla <- formula(m2) > newFmla > suffixX(newFmla, "x2", "c") > suffixX(newFmla, "x1", "c") > > The last few lines of the output. See how the update misses x1 inside > log(x1) or in the interaction? > > >> newFmla <- formula(m2) >> newFmla > y ~ log(x1) + x2 * x3 >> suffixX(newFmla, "x2", "c") > y ~ log(x1) + x3 + x2c + x2:x3 >> suffixX(newFmla, "x1", "c") > y ~ log(x1) + x2 + x3 + x1c + x2:x3 >Try substitute:> do.call("substitute", list(newFmla, setNames(list(as.name("x1c")), "x1")))y ~ log(x1c) + x2 * x3 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
cberry at tajo.ucsd.edu
2012-May-29 17:48 UTC
[R] trouble automating formula edits when log or * are present; update trouble
Paul Johnson <pauljohn32 at gmail.com> writes:> Greetings > > I want to take a fitted regression and replace all uses of a variable > in a formula. For example, I'd like to take > > m1 <- lm(y ~ x1, data=dat) > > and replace x1 with something else, say x1c, so the formula would become > > m1 <- lm(y ~ x1c, data=dat)So incantation involving substitute(), perhaps??> frm <- y ~ log( x ) * ( w + u ) > do.call( substitute, list( frm, list( x = as.name("z") ) ) )y ~ log(z) * (w + u) HTH, Chuck> > I have working code to finish that part of the problem, but it fails > when the formula is more complicated. If the formula has log(x1) or > x1:x2, the update code I'm testing doesn't get right. > > Here's the test code: > > ##PJ > ## 2012-05-29 > dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50), > x3=rnorm(100,m=50), y=rnorm(100)) > > m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat) > m2 <- lm(y ~ log(x1) + x2*x3, data=dat) > > suffixX <- function(fmla, x, s){ > upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s))) > update.formula(fmla, upform) > } > > newFmla <- formula(m2) > newFmla > suffixX(newFmla, "x2", "c") > suffixX(newFmla, "x1", "c") > > The last few lines of the output. See how the update misses x1 inside > log(x1) or in the interaction? > > >> newFmla <- formula(m2) >> newFmla > y ~ log(x1) + x2 * x3 >> suffixX(newFmla, "x2", "c") > y ~ log(x1) + x3 + x2c + x2:x3 >> suffixX(newFmla, "x1", "c") > y ~ log(x1) + x2 + x3 + x1c + x2:x3 > > It gets the target if the target is all by itself, but not otherwise. > > After messing with this for quite a while, I conclude that update was > the wrong way to go because it is geared to replacement of individual > bits, not editing all instances of a thing. > > So I started studying the structure of formula objects. I noticed > this really interesting thing. the newFmla object can be probed > recursively to eventually reveal all of the individual pieces: > > >> newFmla > y ~ log(x1) + x2 * x3 >> newFmla[[3]] > log(x1) + x2 * x3 >> newFmla[[3]][[2]] > log(x1) >> newFmla[[3]][[2]][[2]] > x1 > > So, if you could tell me of a general way to "walk" though a formula > object, couldn't I use "gsub" or something like that to recognize each > instance of "x1" and replace with "x1c"?? > > I just can't figure how to automate the checking of each possible > element in a formula, to get the right combination of [[]][[]][[]]. > See what I mean? I need to avoid this: > >> newFmla[[3]][[2]][[3]] > Error in newFmla[[3]][[2]][[3]] : subscript out of bounds > > pj-- Charles C. Berry Dept of Family/Preventive Medicine cberry at ucsd edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Possibly Parallel Threads
- Working on a Vignette called Rcheology
- Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing
- Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing
- Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing
- Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing