I want to write a function to standardize regression predictors, which will require me to do some character-string manipulation to parse the variables in a call to lm() or glm(). For example, consider the call lm (y ~ female + I(age^2) + female:black + (age + education)*female). I want to be able to parse this to pick out the input variables ("female", "age", "black", "education"). Then I can transform these as appropriate (to get "z.female", "z.age", etc), feed them back into the lm() function, and go from there. Does anyone know an easy way to pull out the variables? I basically have to parse out the symbols "+", ":", "*", and " ", but there's also the problem of handling parentheses and the I() operator. Thanks! Andrew -- Andrew Gelman Professor, Department of Statistics Professor, Department of Political Science gelman at stat.columbia.edu www.stat.columbia.edu/~gelman Statistics department office: Social Work Bldg (Amsterdam Ave at 122 St), Room 1016 212-851-2142 Political Science department office: International Affairs Bldg (Amsterdam Ave at 118 St), Room 731 212-854-7075 Mailing address: 1255 Amsterdam Ave, Room 1016 Columbia University New York, NY 10027-5904 212-851-2142 (fax) 212-851-2164
probably all.vars() could be useful in this case, e.g., m1 <- lm(y ~ female + I(age^2) + female:black + (age + education)*female) all.vars(formula(m1)) Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Andrew Gelman" <gelman at stat.columbia.edu> To: <r-help at stat.math.ethz.ch> Sent: Monday, May 01, 2006 12:46 PM Subject: [R] pulling items out of a lm() call>I want to write a function to standardize regression predictors, >which > will require me to do some character-string manipulation to parse > the > variables in a call to lm() or glm(). > > For example, consider the call > lm (y ~ female + I(age^2) + female:black + (age + > education)*female). > > I want to be able to parse this to pick out the input variables > ("female", "age", "black", "education"). Then I can transform these > as > appropriate (to get "z.female", "z.age", etc), feed them back into > the > lm() function, and go from there. > > Does anyone know an easy way to pull out the variables? I basically > have to parse out the symbols "+", ":", "*", and " ", but there's > also > the problem of handling parentheses and the I() operator. > > Thanks! > Andrew > > -- > Andrew Gelman > Professor, Department of Statistics > Professor, Department of Political Science > gelman at stat.columbia.edu > www.stat.columbia.edu/~gelman > > Statistics department office: > Social Work Bldg (Amsterdam Ave at 122 St), Room 1016 > 212-851-2142 > Political Science department office: > International Affairs Bldg (Amsterdam Ave at 118 St), Room 731 > 212-854-7075 > > Mailing address: > 1255 Amsterdam Ave, Room 1016 > Columbia University > New York, NY 10027-5904 > 212-851-2142 > (fax) 212-851-2164 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Try this: # test data fo <- y ~ female + I(age^2) + female:black + (age + education) * female # create a list of form list(y = as.name("z.y"), ...) for use with substitute L <- sapply(all.vars(fo), function(nm) as.name(paste("z", nm, sep = "."))) do.call(substitute, list(fo, L)) On 5/1/06, Andrew Gelman <gelman at stat.columbia.edu> wrote:> I want to write a function to standardize regression predictors, which > will require me to do some character-string manipulation to parse the > variables in a call to lm() or glm(). > > For example, consider the call > lm (y ~ female + I(age^2) + female:black + (age + education)*female). > > I want to be able to parse this to pick out the input variables > ("female", "age", "black", "education"). Then I can transform these as > appropriate (to get "z.female", "z.age", etc), feed them back into the > lm() function, and go from there. > > Does anyone know an easy way to pull out the variables? I basically > have to parse out the symbols "+", ":", "*", and " ", but there's also > the problem of handling parentheses and the I() operator. > > Thanks! > Andrew > > -- > Andrew Gelman > Professor, Department of Statistics > Professor, Department of Political Science > gelman at stat.columbia.edu > www.stat.columbia.edu/~gelman > > Statistics department office: > Social Work Bldg (Amsterdam Ave at 122 St), Room 1016 > 212-851-2142 > Political Science department office: > International Affairs Bldg (Amsterdam Ave at 118 St), Room 731 > 212-854-7075 > > Mailing address: > 1255 Amsterdam Ave, Room 1016 > Columbia University > New York, NY 10027-5904 > 212-851-2142 > (fax) 212-851-2164 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Andrew Gelman <gelman at stat.columbia.edu> writes:> I want to write a function to standardize regression predictors, which > will require me to do some character-string manipulation to parse the > variables in a call to lm() or glm(). > > For example, consider the call > lm (y ~ female + I(age^2) + female:black + (age + education)*female). > > I want to be able to parse this to pick out the input variables > ("female", "age", "black", "education"). Then I can transform these as > appropriate (to get "z.female", "z.age", etc), feed them back into the > lm() function, and go from there. > > Does anyone know an easy way to pull out the variables? I basically > have to parse out the symbols "+", ":", "*", and " ", but there's also > the problem of handling parentheses and the I() operator.At which level of generality do you want this? Consider> attr(terms(y ~ female + I(age^2) + female:black + (age ++ education)*female),"variables") list(y, female, I(age^2), black, age, education)> attr(delete.response(terms(y ~ female + I(age^2) + female:black ++ (age + education)*female)),"variables") list(female, I(age^2), black, age, education) This gets you some of the way. However, there are complications: You can't just remove composite terms like "I(age^2)" because it is not guaranteed that "age" is in among the other terms:> attr(terms( ~ I(speed^2)),"variables")list(I(speed^2)) So you need some way to tease out the individual variables inside I(). Here's a first cut. l <- attr(delete.response(terms(y ~ female + I(age^2) + female:black + (age + education)*female)),"variables") getterms <- function(e) { if (is.name(e)) e else if (is.call(e)) lapply(e[-1], getterms)} unique(c(lapply(l[-1],getterms), recursive=TRUE)) and possibly throw in an as.character() to get a vector of strings, rather than a list of symbols. Notice that since anything can go inside I(), you can get in trouble if parts of the expression is not intended as a variable (e.g., y^lambda where lambda is a scalar). The getterms function above pragmatically assumes that at least function names need to be discarded. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907