thr3ads.net - R help - [R] Extracting information from lm results (multiple model runs) [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Johannes Radinger

2011-Aug-15 12:32 UTC

[R] Extracting information from lm results (multiple model runs)

Just to inform:
I posted that before in R-sig-ecology but as it might be interesting also for
other useRs, I post it also to the general r-user list:

Hello Alexandre,

thank you very much. I also found another way to extract summarizing information
from lm results over e.g. 1000 repeated model runs:

results2 <- t(as.data.frame(results))
summary(results2)

Although some questions popped up in my analysis (procedure see below) :

1) it seems that I can't get an element with the usual
command like results2$intercept ... that doesn't work here, why?
That would be useful for plotting an element like R-squared etc.
It can be done with e.g hist(results2[,1]) but then I've to know which
column is which coefficient.

2) is there a correct way to get a "mean" p - value or any
other mean of the regression coefficients? What is the statistically proper
way to calculate summaries for regression coefficients of many model runs Or can
I just use the mean from the summary? Is there any restriction when using a log
transformation like: lm(log(Y)=log(X1)+X2())?

3) How can I check for normality of the residuals in my overall model?
I know how to run lillie.test(resid(lm.model)). But how to do that in my case
with 1000 model runs? Perform the test for each run and get an average or pool
the residuals?

4)with single lm-runs it is possible to compare the results of
two models with an anova...how can I compare the results of lets
say two models with each 1000 model runs?

Thank you very much

/johannes

-------- Original-Nachricht --------> Datum: Mon, 15 Aug 2011 08:33:52 -0300
> Von: Alexandre Camargo Martensen <acmartensen at gmail.com>
> An: Johannes Radinger <JRadinger at gmx.at>
> Betreff: Re: [R-sig-eco] Extracting information from lm results
> Hi Johannes,
> 
> try this
> 
> results.df<-data.frame(results)
> #results.df
> 
> p<-t(results.df[4,])
> r<-t(results.df[5,])
> 
> mean(p)
> mean(r)
> 
> HTH,
> 
> Alexandre
> 
> 
> On Mon, Aug 15, 2011 at 8:07 AM, Johannes Radinger <JRadinger at
gmx.at>
> wrote:
> 
> > Hello Peter,
> >
> > I just found a mail list correspondence from you in R-sig-eco and
> > found your information and code snipped about extracting information
> from
> > lm results very useful. I do have a similar challenge to perform
several
> > lm-regression and in the end I'd like to get e.g a mean value for
> R-squared
> > or p or any other of the regression coefficients.
> >
> > So far I managed to write following code example:
> >
> > #Defining variables
> > Y=c(15,14,23,18,19,9,19,13)
> > X1=c(0.2,0.6,0.45,0.27,0.6,0.14,0.1,0.52)
> > X2a=c(17,22,21,18,19,25,8,19)
> > X2b=c(22,22,29,34,19,26,17,22)
> > X2 <- function()runif(length(X2a), X2a, X2b)
> >
> > #create empty list
> > mod <- list()
> > #perfrom n-model runs for lm
> > for(i in 1:10) {
> >        mod[[paste("run",i,sep="")]] <-
lm(Y~X1+X2())
> >        }
> >
> > #define coeficients and how to extract them from lm
> > All_Model_runs <- function(lm){
> >        out <- c(lm$coefficients[1],
> >        lm$coefficients[2],
> >        summary(lm)$coefficients[2,2],
> >        pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
> >        summary(lm)$fstatistic[3], lower.tail = FALSE),
> >        summary(lm)$r.squared)
> >        names(out) <-
> >
c("intercept","slope","slope.SE","p.value","r.squared")
> >        return(out)
> >        }
> >
> > #create empty list
> > results <- list()
> > #get coeffiencts for all model runs into list: results
> > for (i in 1:length(mod)) results[[names(mod)[i]]] <-
> > All_Model_runs(mod[[i]])
> >
> >
> > Now my questions you might probably be able to answer:
> >
> > 1) Is that so far good what I was doing (according to your code)?
> > 2) How can I get a mean/average p value for example or an average
> R-squared
> > over all model runs?
> > 3) Is there any special function to call a mean of these coefficients?
> >
> > Thank you very much
> > /Johannes
> >
> > --
> >
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >
-- 

--

R. Michael Weylandt

2011-Aug-15 13:39 UTC

head link

[R] Extracting information from lm results (multiple model runs)

I can't help with the questions on pooling regressions, but for the question
about data-frame usage, I'll note two things.

One,
> x = data.frame(y = 1:3, z = 4:6)
> is.data.frame(x)TRUE
> is.data.frame(t(x))FALSE
> is.data.frame(as.data.frame(t(x)))TRUE

Two, you'll need to check that your row and column names for the transposed
data frame are what you expect them to be.

Once you do both,(   results2 = as.data.frame(t(results)) & names ) ,
results2$intercept should work fine.

Hope this helps,

Michael Weylandt

On Mon, Aug 15, 2011 at 8:32 AM, Johannes Radinger <JRadinger@gmx.at>
wrote:
> Just to inform:
> I posted that before in R-sig-ecology but as it might be interesting also
> for other useRs, I post it also to the general r-user list:
>
>
> Hello Alexandre,
>
> thank you very much. I also found another way to extract summarizing
> information from lm results over e.g. 1000 repeated model runs:
>
> results2 <- t(as.data.frame(results))
> summary(results2)
>
> Although some questions popped up in my analysis (procedure see below) :
>
> 1) it seems that I can't get an element with the usual
> command like results2$intercept ... that doesn't work here, why?
> That would be useful for plotting an element like R-squared etc.
> It can be done with e.g hist(results2[,1]) but then I've to know which
> column is which coefficient.
>
> 2) is there a correct way to get a "mean" p - value or any
> other mean of the regression coefficients? What is the statistically proper
> way to calculate summaries for regression coefficients of many model runs
> Or can I just use the mean from the summary? Is there any restriction when
> using a log transformation like: lm(log(Y)=log(X1)+X2())?
>
> 3) How can I check for normality of the residuals in my overall model?
> I know how to run lillie.test(resid(lm.model)). But how to do that in my
> case with 1000 model runs? Perform the test for each run and get an average
> or pool the residuals?
>
> 4)with single lm-runs it is possible to compare the results of
> two models with an anova...how can I compare the results of lets
> say two models with each 1000 model runs?
>
>
> Thank you very much
>
> /johannes
>
>
> -------- Original-Nachricht --------
> > Datum: Mon, 15 Aug 2011 08:33:52 -0300
> > Von: Alexandre Camargo Martensen <acmartensen@gmail.com>
> > An: Johannes Radinger <JRadinger@gmx.at>
> > Betreff: Re: [R-sig-eco] Extracting information from lm results
>
> > Hi Johannes,
> >
> > try this
> >
> > results.df<-data.frame(results)
> > #results.df
> >
> > p<-t(results.df[4,])
> > r<-t(results.df[5,])
> >
> > mean(p)
> > mean(r)
> >
> > HTH,
> >
> > Alexandre
> >
> >
> > On Mon, Aug 15, 2011 at 8:07 AM, Johannes Radinger
<JRadinger@gmx.at>
> > wrote:
> >
> > > Hello Peter,
> > >
> > > I just found a mail list correspondence from you in R-sig-eco and
> > > found your information and code snipped about extracting
information
> > from
> > > lm results very useful. I do have a similar challenge to perform
> several
> > > lm-regression and in the end I'd like to get e.g a mean value
for
> > R-squared
> > > or p or any other of the regression coefficients.
> > >
> > > So far I managed to write following code example:
> > >
> > > #Defining variables
> > > Y=c(15,14,23,18,19,9,19,13)
> > > X1=c(0.2,0.6,0.45,0.27,0.6,0.14,0.1,0.52)
> > > X2a=c(17,22,21,18,19,25,8,19)
> > > X2b=c(22,22,29,34,19,26,17,22)
> > > X2 <- function()runif(length(X2a), X2a, X2b)
> > >
> > > #create empty list
> > > mod <- list()
> > > #perfrom n-model runs for lm
> > > for(i in 1:10) {
> > >        mod[[paste("run",i,sep="")]] <-
lm(Y~X1+X2())
> > >        }
> > >
> > > #define coeficients and how to extract them from lm
> > > All_Model_runs <- function(lm){
> > >        out <- c(lm$coefficients[1],
> > >        lm$coefficients[2],
> > >        summary(lm)$coefficients[2,2],
> > >        pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
> > >        summary(lm)$fstatistic[3], lower.tail = FALSE),
> > >        summary(lm)$r.squared)
> > >        names(out) <-
> > >
c("intercept","slope","slope.SE","p.value","r.squared")
> > >        return(out)
> > >        }
> > >
> > > #create empty list
> > > results <- list()
> > > #get coeffiencts for all model runs into list: results
> > > for (i in 1:length(mod)) results[[names(mod)[i]]] <-
> > > All_Model_runs(mod[[i]])
> > >
> > >
> > > Now my questions you might probably be able to answer:
> > >
> > > 1) Is that so far good what I was doing (according to your code)?
> > > 2) How can I get a mean/average p value for example or an average
> > R-squared
> > > over all model runs?
> > > 3) Is there any special function to call a mean of these
coefficients?
> > >
> > > Thank you very much
> > > /Johannes
> > >
> > > --
> > >
> > > _______________________________________________
> > > R-sig-ecology mailing list
> > > R-sig-ecology@r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> > >
>
> --
>
>
>
>
> --
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R. Michael Weylandt

2011-Aug-15 15:21 UTC

head link

[R] Extracting information from lm results (multiple model runs)

Two ways to do it, both of which come down to forcing R to recognize
"(" as
a string element than an actual parenthesis:

x = data.frame(1:3, 4:6)
colnames(x) = c("(Intercept)","Slope")

x$"(Intercept)"

or x[,"(Intercept)"] # Works with or without the comma in slightly
different
ways -- figure out which you need

Alternatively, you can change the name to just "Intercept": this
isn't the
most efficient way, but it does avoid the problem of having to know which
column is "(Intercept)"

colnames(x) <- ifelse(colnames(x) ==
"(Intercept)","Intercept",colnames(x))

Michael

On Mon, Aug 15, 2011 at 10:15 AM, Johannes Radinger
<JRadinger@gmx.at>wrote:
> Hello,
>
> thank you...it works!
>
> > results.dataframe <-
as.data.frame(t(as.data.frame(results.model_runs)))
> > is.data.frame(results.dataframe)
> [1] TRUE
>
> that works...
> just: how to call columns in the dataframe with names like (Intercept),
> rT()...there must be a special handling of names with brackets.
>
> /johannes
>
>
>
> -------- Original-Nachricht --------
> > Datum: Mon, 15 Aug 2011 09:39:47 -0400
> > Von: "R. Michael Weylandt"
<michael.weylandt@gmail.com>
> > An: Johannes Radinger <JRadinger@gmx.at>
> > CC: r-help@r-project.org
> > Betreff: Re: [R] Extracting information from lm results (multiple
model
> runs)
>
> > I can't help with the questions on pooling regressions, but for
the
> > question
> > about data-frame usage, I'll note two things.
> >
> > One,
> >
> > > x = data.frame(y = 1:3, z = 4:6)
> > > is.data.frame(x)
> > TRUE
> >
> > > is.data.frame(t(x))
> > FALSE
> >
> > > is.data.frame(as.data.frame(t(x)))
> > TRUE
> >
> > Two, you'll need to check that your row and column names for the
> > transposed
> > data frame are what you expect them to be.
> >
> > Once you do both,(   results2 = as.data.frame(t(results)) & names
) ,
> > results2$intercept should work fine.
> >
> > Hope this helps,
> >
> > Michael Weylandt
> >
> > On Mon, Aug 15, 2011 at 8:32 AM, Johannes Radinger
<JRadinger@gmx.at>
> > wrote:
> >
> > > Just to inform:
> > > I posted that before in R-sig-ecology but as it might be
interesting
> > also
> > > for other useRs, I post it also to the general r-user list:
> > >
> > >
> > > Hello Alexandre,
> > >
> > > thank you very much. I also found another way to extract
summarizing
> > > information from lm results over e.g. 1000 repeated model runs:
> > >
> > > results2 <- t(as.data.frame(results))
> > > summary(results2)
> > >
> > > Although some questions popped up in my analysis (procedure see
below)
> :
> > >
> > > 1) it seems that I can't get an element with the usual
> > > command like results2$intercept ... that doesn't work here,
why?
> > > That would be useful for plotting an element like R-squared etc.
> > > It can be done with e.g hist(results2[,1]) but then I've to
know which
> > > column is which coefficient.
> > >
> > > 2) is there a correct way to get a "mean" p - value or
any
> > > other mean of the regression coefficients? What is the
statistically
> > proper
> > > way to calculate summaries for regression coefficients of many
model
> > runs
> > > Or can I just use the mean from the summary? Is there any
restriction
> > when
> > > using a log transformation like: lm(log(Y)=log(X1)+X2())?
> > >
> > > 3) How can I check for normality of the residuals in my overall
model?
> > > I know how to run lillie.test(resid(lm.model)). But how to do
that in
> my
> > > case with 1000 model runs? Perform the test for each run and get
an
> > average
> > > or pool the residuals?
> > >
> > > 4)with single lm-runs it is possible to compare the results of
> > > two models with an anova...how can I compare the results of lets
> > > say two models with each 1000 model runs?
> > >
> > >
> > > Thank you very much
> > >
> > > /johannes
> > >
> > >
> > > -------- Original-Nachricht --------
> > > > Datum: Mon, 15 Aug 2011 08:33:52 -0300
> > > > Von: Alexandre Camargo Martensen
<acmartensen@gmail.com>
> > > > An: Johannes Radinger <JRadinger@gmx.at>
> > > > Betreff: Re: [R-sig-eco] Extracting information from lm
results
> > >
> > > > Hi Johannes,
> > > >
> > > > try this
> > > >
> > > > results.df<-data.frame(results)
> > > > #results.df
> > > >
> > > > p<-t(results.df[4,])
> > > > r<-t(results.df[5,])
> > > >
> > > > mean(p)
> > > > mean(r)
> > > >
> > > > HTH,
> > > >
> > > > Alexandre
> > > >
> > > >
> > > > On Mon, Aug 15, 2011 at 8:07 AM, Johannes Radinger
<JRadinger@gmx.at
> >
> > > > wrote:
> > > >
> > > > > Hello Peter,
> > > > >
> > > > > I just found a mail list correspondence from you in
R-sig-eco and
> > > > > found your information and code snipped about
extracting
> information
> > > > from
> > > > > lm results very useful. I do have a similar challenge
to perform
> > > several
> > > > > lm-regression and in the end I'd like to get e.g a
mean value for
> > > > R-squared
> > > > > or p or any other of the regression coefficients.
> > > > >
> > > > > So far I managed to write following code example:
> > > > >
> > > > > #Defining variables
> > > > > Y=c(15,14,23,18,19,9,19,13)
> > > > > X1=c(0.2,0.6,0.45,0.27,0.6,0.14,0.1,0.52)
> > > > > X2a=c(17,22,21,18,19,25,8,19)
> > > > > X2b=c(22,22,29,34,19,26,17,22)
> > > > > X2 <- function()runif(length(X2a), X2a, X2b)
> > > > >
> > > > > #create empty list
> > > > > mod <- list()
> > > > > #perfrom n-model runs for lm
> > > > > for(i in 1:10) {
> > > > >        mod[[paste("run",i,sep="")]]
<- lm(Y~X1+X2())
> > > > >        }
> > > > >
> > > > > #define coeficients and how to extract them from lm
> > > > > All_Model_runs <- function(lm){
> > > > >        out <- c(lm$coefficients[1],
> > > > >        lm$coefficients[2],
> > > > >        summary(lm)$coefficients[2,2],
> > > > >        pf(summary(lm)$fstatistic[1],
summary(lm)$fstatistic[2],
> > > > >        summary(lm)$fstatistic[3], lower.tail = FALSE),
> > > > >        summary(lm)$r.squared)
> > > > >        names(out) <-
> > > > >
c("intercept","slope","slope.SE","p.value","r.squared")
> > > > >        return(out)
> > > > >        }
> > > > >
> > > > > #create empty list
> > > > > results <- list()
> > > > > #get coeffiencts for all model runs into list: results
> > > > > for (i in 1:length(mod)) results[[names(mod)[i]]] <-
> > > > > All_Model_runs(mod[[i]])
> > > > >
> > > > >
> > > > > Now my questions you might probably be able to answer:
> > > > >
> > > > > 1) Is that so far good what I was doing (according to
your code)?
> > > > > 2) How can I get a mean/average p value for example or
an average
> > > > R-squared
> > > > > over all model runs?
> > > > > 3) Is there any special function to call a mean of
these
> > coefficients?
> > > > >
> > > > > Thank you very much
> > > > > /Johannes
> > > > >
> > > > > --
> > > > >
> > > > > _______________________________________________
> > > > > R-sig-ecology mailing list
> > > > > R-sig-ecology@r-project.org
> > > > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> > > > >
> > >
> > > --
> > >
> > >
> > >
> > >
> > > --
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> > >
>
> --
> NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
> Jetzt informieren: http://www.gmx.net/de/go/freephone
>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more possibly parallel threads

R help - Aug 2011 - Extracting information from lm results (multiple model runs)

[R] Extracting information from lm results (multiple model runs)

[R] Extracting information from lm results (multiple model runs)

[R] Extracting information from lm results (multiple model runs)

Maybe Matching Threads