Ravi Varadhan
2015-Oct-27 15:19 UTC
[R] How to get variable name while doing series of regressions in an automated manner?
Hi, I am running through a series of regression in a loop as follows: results <- vector("list", length(mydata$varnames)) for (i in 1:length(mydata$varnames)) { results[[i]] <- summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + CMV.status, data=mydata)) } Now, when I look at the results[i]] objects, I won't be able to see the original variable names. Obviously, I will only see the following: Call: lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status, data = mydata) Is there a way to display the original variable names on the LHS? In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion? Thank you very much, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite 1111-E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]]
Marc Schwartz
2015-Oct-27 17:07 UTC
[R] How to get variable name while doing series of regressions in an automated manner?
> On Oct 27, 2015, at 10:19 AM, Ravi Varadhan <ravi.varadhan at jhu.edu> wrote: > > Hi, > > I am running through a series of regression in a loop as follows: > > results <- vector("list", length(mydata$varnames)) > > for (i in 1:length(mydata$varnames)) { > results[[i]] <- summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + CMV.status, data=mydata)) > } > > Now, when I look at the results[i]] objects, I won't be able to see the original variable names. Obviously, I will only see the following: > > Call: > lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status, > data = mydata) > > > Is there a way to display the original variable names on the LHS? In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion? > > Thank you very much, > RaviRavi, Something like this, using the 'iris' dataset might be helpful as an example: # Define the response variables VarNames <- c("Sepal.Length", "Sepal.Width", "Petal.Length") # Create the formulae> paste0("log(", VarNames, ") ~ Petal.Width + Species")[1] "log(Sepal.Length) ~ Petal.Width + Species" [2] "log(Sepal.Width) ~ Petal.Width + Species" [3] "log(Petal.Length) ~ Petal.Width + Species" # Create a list of model summary objects # The result of paste0() will be coerced to a formula by lm() # if a valid formula, so no need to call as.formula() MODS <- lapply(paste0("log(", VarNames, ") ~ Petal.Width + Species"), function(x) summary(lm(x, data = iris))) You can either use the original 'VarNames' vector for the source response variables, or consider:> as.character(formula(MODS[[1]]))[1] "~" "log(Sepal.Length)" [3] "Petal.Width + Species"> sapply(MODS, function(x) formula(x)[[2]])[[1]] log(Sepal.Length) [[2]] log(Sepal.Width) [[3]] log(Petal.Length) Regards, Marc Schwartz
Bert Gunter
2015-Oct-27 17:50 UTC
[R] How to get variable name while doing series of regressions in an automated manner?
Marc,Ravi: I may misunderstand, but I think Marc's solution labels the list components but not necessarily the summary() outputs. This might be sufficient, as in:> z <- list(y1=rnorm(10,5),y2 = rnorm(10,8),x=1:10) > > ##1 > results1<-lapply(z[-3],function(y)lm(log(y)~x,data=z)) > lapply(results1,summary)$y1 Call: lm(formula = log(y) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.2185 -0.1259 -0.0643 0.1340 0.3988 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.69319 0.14375 11.779 2.47e-06 *** x -0.01495 0.02317 -0.645 0.537 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.2104 on 8 degrees of freedom Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937 F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369 $y2 Call: lm(formula = log(y) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.229072 -0.094579 -0.006498 0.134303 0.188158 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.084846 0.104108 20.026 4.03e-08 *** x -0.006226 0.016778 -0.371 0.72 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.1524 on 8 degrees of freedom Multiple R-squared: 0.01692, Adjusted R-squared: -0.106 F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202 ## 2 Alternatively, if you want output with the correct variable names, bquote() can be used, as in:> results2 <-lapply(names(z)[1:2],+ function(nm){ + fo <-formula(paste0("log(",nm,")~x")) + eval(bquote(lm(.(u),data=z),list(u=fo))) + })> lapply(results2,summary)[[1]] Call: lm(formula = log(y1) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.2185 -0.1259 -0.0643 0.1340 0.3988 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.69319 0.14375 11.779 2.47e-06 *** x -0.01495 0.02317 -0.645 0.537 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.2104 on 8 degrees of freedom Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937 F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369 [[2]] Call: lm(formula = log(y2) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.229072 -0.094579 -0.006498 0.134303 0.188158 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.084846 0.104108 20.026 4.03e-08 *** x -0.006226 0.016778 -0.371 0.72 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.1524 on 8 degrees of freedom Multiple R-squared: 0.01692, Adjusted R-squared: -0.106 F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202 HTH or apologies if I've missed the point and broadcasted noise. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Oct 27, 2015 at 8:19 AM, Ravi Varadhan <ravi.varadhan at jhu.edu> wrote:> Hi, > > I am running through a series of regression in a loop as follows: > > results <- vector("list", length(mydata$varnames)) > > for (i in 1:length(mydata$varnames)) { > results[[i]] <- summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + CMV.status, data=mydata)) > } > > Now, when I look at the results[i]] objects, I won't be able to see the original variable names. Obviously, I will only see the following: > > Call: > lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status, > data = mydata) > > > Is there a way to display the original variable names on the LHS? In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion? > > Thank you very much, > Ravi > > Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) > Associate Professor, Department of Oncology > Division of Biostatistics & Bionformatics > Sidney Kimmel Comprehensive Cancer Center > Johns Hopkins University > 550 N. Broadway, Suite 1111-E > Baltimore, MD 21205 > 410-502-2619 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Ravi Varadhan
2015-Oct-27 18:27 UTC
[R] How to get variable name while doing series of regressions in an automated manner?
Thank you very much, Marc & Bert. Bert - I think you're correct. With Marc's solution, I am not able to get the response variable name in the call to lm(). But, your solution works well. Best regards, Ravi -----Original Message----- From: Bert Gunter [mailto:bgunter.4567 at gmail.com] Sent: Tuesday, October 27, 2015 1:50 PM To: Ravi Varadhan <ravi.varadhan at jhu.edu> Cc: r-help at r-project.org Subject: Re: [R] How to get variable name while doing series of regressions in an automated manner? Marc,Ravi: I may misunderstand, but I think Marc's solution labels the list components but not necessarily the summary() outputs. This might be sufficient, as in:> z <- list(y1=rnorm(10,5),y2 = rnorm(10,8),x=1:10) > > ##1 > results1<-lapply(z[-3],function(y)lm(log(y)~x,data=z)) > lapply(results1,summary)$y1 Call: lm(formula = log(y) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.2185 -0.1259 -0.0643 0.1340 0.3988 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.69319 0.14375 11.779 2.47e-06 *** x -0.01495 0.02317 -0.645 0.537 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.2104 on 8 degrees of freedom Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937 F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369 $y2 Call: lm(formula = log(y) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.229072 -0.094579 -0.006498 0.134303 0.188158 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.084846 0.104108 20.026 4.03e-08 *** x -0.006226 0.016778 -0.371 0.72 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.1524 on 8 degrees of freedom Multiple R-squared: 0.01692, Adjusted R-squared: -0.106 F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202 ## 2 Alternatively, if you want output with the correct variable names, bquote() can be used, as in:> results2 <-lapply(names(z)[1:2],+ function(nm){ + fo <-formula(paste0("log(",nm,")~x")) + eval(bquote(lm(.(u),data=z),list(u=fo))) + })> lapply(results2,summary)[[1]] Call: lm(formula = log(y1) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.2185 -0.1259 -0.0643 0.1340 0.3988 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.69319 0.14375 11.779 2.47e-06 *** x -0.01495 0.02317 -0.645 0.537 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.2104 on 8 degrees of freedom Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937 F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369 [[2]] Call: lm(formula = log(y2) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.229072 -0.094579 -0.006498 0.134303 0.188158 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.084846 0.104108 20.026 4.03e-08 *** x -0.006226 0.016778 -0.371 0.72 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.1524 on 8 degrees of freedom Multiple R-squared: 0.01692, Adjusted R-squared: -0.106 F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202 HTH or apologies if I've missed the point and broadcasted noise. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Oct 27, 2015 at 8:19 AM, Ravi Varadhan <ravi.varadhan at jhu.edu> wrote:> Hi, > > I am running through a series of regression in a loop as follows: > > results <- vector("list", length(mydata$varnames)) > > for (i in 1:length(mydata$varnames)) { results[[i]] <- > summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + > CMV.status, data=mydata)) } > > Now, when I look at the results[i]] objects, I won't be able to see the original variable names. Obviously, I will only see the following: > > Call: > lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status, > data = mydata) > > > Is there a way to display the original variable names on the LHS? In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion? > > Thank you very much, > Ravi > > Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) > Associate Professor, Department of Oncology Division of Biostatistics > & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns > Hopkins University > 550 N. Broadway, Suite 1111-E > Baltimore, MD 21205 > 410-502-2619 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2015-Nov-01 22:06 UTC
[R] How to get variable name while doing series of regressions in an automated manner?
Ravi et. al: My prior "solution" nagged at me, as I thought it was pretty clumsy -- I was hoping someone would show how to fix it up. As no one did, I finally realized how to do it myself. Here's how to do the iteration to get the right labeling with no pasting or formula() call by using as.name() to substitute via bquote() directly into the (parsed) lm() call. As one can see, it's a general approach to this sort of thing. (It's also been offered in the past by others, but I forgot it). z <- list(y1=rnorm(10,5),y2=rnorm(10,8),x=runif(10)) lapply(names(z)[-3],function(u) { eval(bquote(lm(log(.(y)) ~ x, data=z), list(y=as.name(u)))) }) There -- now I feel better. No need to respond. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Oct 27, 2015 at 10:50 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> Marc,Ravi: > > I may misunderstand, but I think Marc's solution labels the list > components but not necessarily the summary() outputs. This might be > sufficient, as in: > >> z <- list(y1=rnorm(10,5),y2 = rnorm(10,8),x=1:10) >> >> ##1 >> results1<-lapply(z[-3],function(y)lm(log(y)~x,data=z)) >> lapply(results1,summary) > $y1 > > Call: > lm(formula = log(y) ~ x, data = z) > > Residuals: > Min 1Q Median 3Q Max > -0.2185 -0.1259 -0.0643 0.1340 0.3988 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.69319 0.14375 11.779 2.47e-06 *** > x -0.01495 0.02317 -0.645 0.537 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 0.2104 on 8 degrees of freedom > Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937 > F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369 > > > $y2 > > Call: > lm(formula = log(y) ~ x, data = z) > > Residuals: > Min 1Q Median 3Q Max > -0.229072 -0.094579 -0.006498 0.134303 0.188158 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.084846 0.104108 20.026 4.03e-08 *** > x -0.006226 0.016778 -0.371 0.72 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 0.1524 on 8 degrees of freedom > Multiple R-squared: 0.01692, Adjusted R-squared: -0.106 > F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202 > > > ## 2 > > Alternatively, if you want output with the correct variable names, > bquote() can be used, as in: > >> results2 <-lapply(names(z)[1:2], > + function(nm){ > + fo <-formula(paste0("log(",nm,")~x")) > + eval(bquote(lm(.(u),data=z),list(u=fo))) > + }) >> lapply(results2,summary) > [[1]] > > Call: > lm(formula = log(y1) ~ x, data = z) > > Residuals: > Min 1Q Median 3Q Max > -0.2185 -0.1259 -0.0643 0.1340 0.3988 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.69319 0.14375 11.779 2.47e-06 *** > x -0.01495 0.02317 -0.645 0.537 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 0.2104 on 8 degrees of freedom > Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937 > F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369 > > > [[2]] > > Call: > lm(formula = log(y2) ~ x, data = z) > > Residuals: > Min 1Q Median 3Q Max > -0.229072 -0.094579 -0.006498 0.134303 0.188158 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.084846 0.104108 20.026 4.03e-08 *** > x -0.006226 0.016778 -0.371 0.72 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 0.1524 on 8 degrees of freedom > Multiple R-squared: 0.01692, Adjusted R-squared: -0.106 > F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202 > > > HTH or apologies if I've missed the point and broadcasted noise. > > Cheers, > Bert > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Tue, Oct 27, 2015 at 8:19 AM, Ravi Varadhan <ravi.varadhan at jhu.edu> wrote: >> Hi, >> >> I am running through a series of regression in a loop as follows: >> >> results <- vector("list", length(mydata$varnames)) >> >> for (i in 1:length(mydata$varnames)) { >> results[[i]] <- summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + CMV.status, data=mydata)) >> } >> >> Now, when I look at the results[i]] objects, I won't be able to see the original variable names. Obviously, I will only see the following: >> >> Call: >> lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status, >> data = mydata) >> >> >> Is there a way to display the original variable names on the LHS? In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion? >> >> Thank you very much, >> Ravi >> >> Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) >> Associate Professor, Department of Oncology >> Division of Biostatistics & Bionformatics >> Sidney Kimmel Comprehensive Cancer Center >> Johns Hopkins University >> 550 N. Broadway, Suite 1111-E >> Baltimore, MD 21205 >> 410-502-2619 >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.