Johannes Radinger
2011-Aug-15 12:32 UTC
[R] Extracting information from lm results (multiple model runs)
Just to inform: I posted that before in R-sig-ecology but as it might be interesting also for other useRs, I post it also to the general r-user list: Hello Alexandre, thank you very much. I also found another way to extract summarizing information from lm results over e.g. 1000 repeated model runs: results2 <- t(as.data.frame(results)) summary(results2) Although some questions popped up in my analysis (procedure see below) : 1) it seems that I can't get an element with the usual command like results2$intercept ... that doesn't work here, why? That would be useful for plotting an element like R-squared etc. It can be done with e.g hist(results2[,1]) but then I've to know which column is which coefficient. 2) is there a correct way to get a "mean" p - value or any other mean of the regression coefficients? What is the statistically proper way to calculate summaries for regression coefficients of many model runs Or can I just use the mean from the summary? Is there any restriction when using a log transformation like: lm(log(Y)=log(X1)+X2())? 3) How can I check for normality of the residuals in my overall model? I know how to run lillie.test(resid(lm.model)). But how to do that in my case with 1000 model runs? Perform the test for each run and get an average or pool the residuals? 4)with single lm-runs it is possible to compare the results of two models with an anova...how can I compare the results of lets say two models with each 1000 model runs? Thank you very much /johannes -------- Original-Nachricht --------> Datum: Mon, 15 Aug 2011 08:33:52 -0300 > Von: Alexandre Camargo Martensen <acmartensen at gmail.com> > An: Johannes Radinger <JRadinger at gmx.at> > Betreff: Re: [R-sig-eco] Extracting information from lm results> Hi Johannes, > > try this > > results.df<-data.frame(results) > #results.df > > p<-t(results.df[4,]) > r<-t(results.df[5,]) > > mean(p) > mean(r) > > HTH, > > Alexandre > > > On Mon, Aug 15, 2011 at 8:07 AM, Johannes Radinger <JRadinger at gmx.at> > wrote: > > > Hello Peter, > > > > I just found a mail list correspondence from you in R-sig-eco and > > found your information and code snipped about extracting information > from > > lm results very useful. I do have a similar challenge to perform several > > lm-regression and in the end I'd like to get e.g a mean value for > R-squared > > or p or any other of the regression coefficients. > > > > So far I managed to write following code example: > > > > #Defining variables > > Y=c(15,14,23,18,19,9,19,13) > > X1=c(0.2,0.6,0.45,0.27,0.6,0.14,0.1,0.52) > > X2a=c(17,22,21,18,19,25,8,19) > > X2b=c(22,22,29,34,19,26,17,22) > > X2 <- function()runif(length(X2a), X2a, X2b) > > > > #create empty list > > mod <- list() > > #perfrom n-model runs for lm > > for(i in 1:10) { > > mod[[paste("run",i,sep="")]] <- lm(Y~X1+X2()) > > } > > > > #define coeficients and how to extract them from lm > > All_Model_runs <- function(lm){ > > out <- c(lm$coefficients[1], > > lm$coefficients[2], > > summary(lm)$coefficients[2,2], > > pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], > > summary(lm)$fstatistic[3], lower.tail = FALSE), > > summary(lm)$r.squared) > > names(out) <- > > c("intercept","slope","slope.SE","p.value","r.squared") > > return(out) > > } > > > > #create empty list > > results <- list() > > #get coeffiencts for all model runs into list: results > > for (i in 1:length(mod)) results[[names(mod)[i]]] <- > > All_Model_runs(mod[[i]]) > > > > > > Now my questions you might probably be able to answer: > > > > 1) Is that so far good what I was doing (according to your code)? > > 2) How can I get a mean/average p value for example or an average > R-squared > > over all model runs? > > 3) Is there any special function to call a mean of these coefficients? > > > > Thank you very much > > /Johannes > > > > -- > > > > _______________________________________________ > > R-sig-ecology mailing list > > R-sig-ecology at r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > >-- --
R. Michael Weylandt
2011-Aug-15 13:39 UTC
[R] Extracting information from lm results (multiple model runs)
I can't help with the questions on pooling regressions, but for the question about data-frame usage, I'll note two things. One,> x = data.frame(y = 1:3, z = 4:6) > is.data.frame(x)TRUE> is.data.frame(t(x))FALSE> is.data.frame(as.data.frame(t(x)))TRUE Two, you'll need to check that your row and column names for the transposed data frame are what you expect them to be. Once you do both,( results2 = as.data.frame(t(results)) & names ) , results2$intercept should work fine. Hope this helps, Michael Weylandt On Mon, Aug 15, 2011 at 8:32 AM, Johannes Radinger <JRadinger@gmx.at> wrote:> Just to inform: > I posted that before in R-sig-ecology but as it might be interesting also > for other useRs, I post it also to the general r-user list: > > > Hello Alexandre, > > thank you very much. I also found another way to extract summarizing > information from lm results over e.g. 1000 repeated model runs: > > results2 <- t(as.data.frame(results)) > summary(results2) > > Although some questions popped up in my analysis (procedure see below) : > > 1) it seems that I can't get an element with the usual > command like results2$intercept ... that doesn't work here, why? > That would be useful for plotting an element like R-squared etc. > It can be done with e.g hist(results2[,1]) but then I've to know which > column is which coefficient. > > 2) is there a correct way to get a "mean" p - value or any > other mean of the regression coefficients? What is the statistically proper > way to calculate summaries for regression coefficients of many model runs > Or can I just use the mean from the summary? Is there any restriction when > using a log transformation like: lm(log(Y)=log(X1)+X2())? > > 3) How can I check for normality of the residuals in my overall model? > I know how to run lillie.test(resid(lm.model)). But how to do that in my > case with 1000 model runs? Perform the test for each run and get an average > or pool the residuals? > > 4)with single lm-runs it is possible to compare the results of > two models with an anova...how can I compare the results of lets > say two models with each 1000 model runs? > > > Thank you very much > > /johannes > > > -------- Original-Nachricht -------- > > Datum: Mon, 15 Aug 2011 08:33:52 -0300 > > Von: Alexandre Camargo Martensen <acmartensen@gmail.com> > > An: Johannes Radinger <JRadinger@gmx.at> > > Betreff: Re: [R-sig-eco] Extracting information from lm results > > > Hi Johannes, > > > > try this > > > > results.df<-data.frame(results) > > #results.df > > > > p<-t(results.df[4,]) > > r<-t(results.df[5,]) > > > > mean(p) > > mean(r) > > > > HTH, > > > > Alexandre > > > > > > On Mon, Aug 15, 2011 at 8:07 AM, Johannes Radinger <JRadinger@gmx.at> > > wrote: > > > > > Hello Peter, > > > > > > I just found a mail list correspondence from you in R-sig-eco and > > > found your information and code snipped about extracting information > > from > > > lm results very useful. I do have a similar challenge to perform > several > > > lm-regression and in the end I'd like to get e.g a mean value for > > R-squared > > > or p or any other of the regression coefficients. > > > > > > So far I managed to write following code example: > > > > > > #Defining variables > > > Y=c(15,14,23,18,19,9,19,13) > > > X1=c(0.2,0.6,0.45,0.27,0.6,0.14,0.1,0.52) > > > X2a=c(17,22,21,18,19,25,8,19) > > > X2b=c(22,22,29,34,19,26,17,22) > > > X2 <- function()runif(length(X2a), X2a, X2b) > > > > > > #create empty list > > > mod <- list() > > > #perfrom n-model runs for lm > > > for(i in 1:10) { > > > mod[[paste("run",i,sep="")]] <- lm(Y~X1+X2()) > > > } > > > > > > #define coeficients and how to extract them from lm > > > All_Model_runs <- function(lm){ > > > out <- c(lm$coefficients[1], > > > lm$coefficients[2], > > > summary(lm)$coefficients[2,2], > > > pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], > > > summary(lm)$fstatistic[3], lower.tail = FALSE), > > > summary(lm)$r.squared) > > > names(out) <- > > > c("intercept","slope","slope.SE","p.value","r.squared") > > > return(out) > > > } > > > > > > #create empty list > > > results <- list() > > > #get coeffiencts for all model runs into list: results > > > for (i in 1:length(mod)) results[[names(mod)[i]]] <- > > > All_Model_runs(mod[[i]]) > > > > > > > > > Now my questions you might probably be able to answer: > > > > > > 1) Is that so far good what I was doing (according to your code)? > > > 2) How can I get a mean/average p value for example or an average > > R-squared > > > over all model runs? > > > 3) Is there any special function to call a mean of these coefficients? > > > > > > Thank you very much > > > /Johannes > > > > > > -- > > > > > > _______________________________________________ > > > R-sig-ecology mailing list > > > R-sig-ecology@r-project.org > > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > > > > -- > > > > > -- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
R. Michael Weylandt
2011-Aug-15 15:21 UTC
[R] Extracting information from lm results (multiple model runs)
Two ways to do it, both of which come down to forcing R to recognize "(" as a string element than an actual parenthesis: x = data.frame(1:3, 4:6) colnames(x) = c("(Intercept)","Slope") x$"(Intercept)" or x[,"(Intercept)"] # Works with or without the comma in slightly different ways -- figure out which you need Alternatively, you can change the name to just "Intercept": this isn't the most efficient way, but it does avoid the problem of having to know which column is "(Intercept)" colnames(x) <- ifelse(colnames(x) == "(Intercept)","Intercept",colnames(x)) Michael On Mon, Aug 15, 2011 at 10:15 AM, Johannes Radinger <JRadinger@gmx.at>wrote:> Hello, > > thank you...it works! > > > results.dataframe <- as.data.frame(t(as.data.frame(results.model_runs))) > > is.data.frame(results.dataframe) > [1] TRUE > > that works... > just: how to call columns in the dataframe with names like (Intercept), > rT()...there must be a special handling of names with brackets. > > /johannes > > > > -------- Original-Nachricht -------- > > Datum: Mon, 15 Aug 2011 09:39:47 -0400 > > Von: "R. Michael Weylandt" <michael.weylandt@gmail.com> > > An: Johannes Radinger <JRadinger@gmx.at> > > CC: r-help@r-project.org > > Betreff: Re: [R] Extracting information from lm results (multiple model > runs) > > > I can't help with the questions on pooling regressions, but for the > > question > > about data-frame usage, I'll note two things. > > > > One, > > > > > x = data.frame(y = 1:3, z = 4:6) > > > is.data.frame(x) > > TRUE > > > > > is.data.frame(t(x)) > > FALSE > > > > > is.data.frame(as.data.frame(t(x))) > > TRUE > > > > Two, you'll need to check that your row and column names for the > > transposed > > data frame are what you expect them to be. > > > > Once you do both,( results2 = as.data.frame(t(results)) & names ) , > > results2$intercept should work fine. > > > > Hope this helps, > > > > Michael Weylandt > > > > On Mon, Aug 15, 2011 at 8:32 AM, Johannes Radinger <JRadinger@gmx.at> > > wrote: > > > > > Just to inform: > > > I posted that before in R-sig-ecology but as it might be interesting > > also > > > for other useRs, I post it also to the general r-user list: > > > > > > > > > Hello Alexandre, > > > > > > thank you very much. I also found another way to extract summarizing > > > information from lm results over e.g. 1000 repeated model runs: > > > > > > results2 <- t(as.data.frame(results)) > > > summary(results2) > > > > > > Although some questions popped up in my analysis (procedure see below) > : > > > > > > 1) it seems that I can't get an element with the usual > > > command like results2$intercept ... that doesn't work here, why? > > > That would be useful for plotting an element like R-squared etc. > > > It can be done with e.g hist(results2[,1]) but then I've to know which > > > column is which coefficient. > > > > > > 2) is there a correct way to get a "mean" p - value or any > > > other mean of the regression coefficients? What is the statistically > > proper > > > way to calculate summaries for regression coefficients of many model > > runs > > > Or can I just use the mean from the summary? Is there any restriction > > when > > > using a log transformation like: lm(log(Y)=log(X1)+X2())? > > > > > > 3) How can I check for normality of the residuals in my overall model? > > > I know how to run lillie.test(resid(lm.model)). But how to do that in > my > > > case with 1000 model runs? Perform the test for each run and get an > > average > > > or pool the residuals? > > > > > > 4)with single lm-runs it is possible to compare the results of > > > two models with an anova...how can I compare the results of lets > > > say two models with each 1000 model runs? > > > > > > > > > Thank you very much > > > > > > /johannes > > > > > > > > > -------- Original-Nachricht -------- > > > > Datum: Mon, 15 Aug 2011 08:33:52 -0300 > > > > Von: Alexandre Camargo Martensen <acmartensen@gmail.com> > > > > An: Johannes Radinger <JRadinger@gmx.at> > > > > Betreff: Re: [R-sig-eco] Extracting information from lm results > > > > > > > Hi Johannes, > > > > > > > > try this > > > > > > > > results.df<-data.frame(results) > > > > #results.df > > > > > > > > p<-t(results.df[4,]) > > > > r<-t(results.df[5,]) > > > > > > > > mean(p) > > > > mean(r) > > > > > > > > HTH, > > > > > > > > Alexandre > > > > > > > > > > > > On Mon, Aug 15, 2011 at 8:07 AM, Johannes Radinger <JRadinger@gmx.at > > > > > > wrote: > > > > > > > > > Hello Peter, > > > > > > > > > > I just found a mail list correspondence from you in R-sig-eco and > > > > > found your information and code snipped about extracting > information > > > > from > > > > > lm results very useful. I do have a similar challenge to perform > > > several > > > > > lm-regression and in the end I'd like to get e.g a mean value for > > > > R-squared > > > > > or p or any other of the regression coefficients. > > > > > > > > > > So far I managed to write following code example: > > > > > > > > > > #Defining variables > > > > > Y=c(15,14,23,18,19,9,19,13) > > > > > X1=c(0.2,0.6,0.45,0.27,0.6,0.14,0.1,0.52) > > > > > X2a=c(17,22,21,18,19,25,8,19) > > > > > X2b=c(22,22,29,34,19,26,17,22) > > > > > X2 <- function()runif(length(X2a), X2a, X2b) > > > > > > > > > > #create empty list > > > > > mod <- list() > > > > > #perfrom n-model runs for lm > > > > > for(i in 1:10) { > > > > > mod[[paste("run",i,sep="")]] <- lm(Y~X1+X2()) > > > > > } > > > > > > > > > > #define coeficients and how to extract them from lm > > > > > All_Model_runs <- function(lm){ > > > > > out <- c(lm$coefficients[1], > > > > > lm$coefficients[2], > > > > > summary(lm)$coefficients[2,2], > > > > > pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], > > > > > summary(lm)$fstatistic[3], lower.tail = FALSE), > > > > > summary(lm)$r.squared) > > > > > names(out) <- > > > > > c("intercept","slope","slope.SE","p.value","r.squared") > > > > > return(out) > > > > > } > > > > > > > > > > #create empty list > > > > > results <- list() > > > > > #get coeffiencts for all model runs into list: results > > > > > for (i in 1:length(mod)) results[[names(mod)[i]]] <- > > > > > All_Model_runs(mod[[i]]) > > > > > > > > > > > > > > > Now my questions you might probably be able to answer: > > > > > > > > > > 1) Is that so far good what I was doing (according to your code)? > > > > > 2) How can I get a mean/average p value for example or an average > > > > R-squared > > > > > over all model runs? > > > > > 3) Is there any special function to call a mean of these > > coefficients? > > > > > > > > > > Thank you very much > > > > > /Johannes > > > > > > > > > > -- > > > > > > > > > > _______________________________________________ > > > > > R-sig-ecology mailing list > > > > > R-sig-ecology@r-project.org > > > > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > > > > > > > > > > -- > > > > > > > > > > > > > > > -- > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie! > Jetzt informieren: http://www.gmx.net/de/go/freephone >[[alternative HTML version deleted]]