thr3ads.net - R help - [R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense? [Mar 2011]

If this information is useful, please help other people find it:
Share via:

agent dunham

2011-Mar-22 16:31 UTC

[R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

Dear all, 

I want to improve my adj - R sq. I 've chequed some established models and
they introduce two times the same variable, one transformed, and the other
not. It also improves my adj - R sq. 

But, isn't this bad for the collinearity? Do I interpret coefficients as
usual?

                  Estimate   Std. Error t value   Pr(>|t|)   
(Intercept)   1.73140    7.22477   0.240     0.81086   
v1             -0.33886    0.20321   -1.668   0.09705 . 
log(v1)        2.63194    3.74556    0.703   0.48311   
v2             -0.01517    0.01089   -1.394   0.16507   
log(v3)      -0.45719     0.27656   -1.653   0.09995 . 
factor1      -1.81517     0.62155   -2.920  0.00392 **
factor2      -1.87330     0.84375   -2.220  0.02759 * 

Analysis of Variance Table

Response: height rise
               Df  Sum Sq Mean Sq F value    Pr(>F)    
v1            1   51.25    51.246   21.4128 6.842e-06 ***
log(v1)      1   13.62   13.617    5.6897  0.018048 *  
v2            1    2.84    2.836    1.1850  0.277713    
log(v3)      1    3.02    3.024    1.2638  0.262357    
factor1      1   17.62   17.616  7.3608  0.007279 ** 
factor2      1   11.80   11.797  4.9294  0.027586 *  
Residuals  190 454.71   2.393

Thanks, 
user at host.com

--
View this message in context:
http://r.789695.n4.nabble.com/lm-v1-log-v1-improve-adj-Rsq-any-sense-tp3396935p3396935.html
Sent from the R help mailing list archive at Nabble.com.

Frank Harrell

2011-Mar-22 17:20 UTC

head link

[R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

If you care about confidence interval coverage, type I error, or predictive
accuracy, trying different models in this way is not the way to go.

Frank


agent dunham wrote:> 
> Dear all, 
> 
> I want to improve my adj - R sq. I 've chequed some established models
and
> they introduce two times the same variable, one transformed, and the other
> not. It also improves my adj - R sq. 
> 
> But, isn't this bad for the collinearity? Do I interpret coefficients
as
> usual?
> 
>                   Estimate   Std. Error t value   Pr(>|t|)   
> (Intercept)   1.73140    7.22477   0.240     0.81086   
> v1             -0.33886    0.20321   -1.668   0.09705 . 
> log(v1)        2.63194    3.74556    0.703   0.48311   
> v2             -0.01517    0.01089   -1.394   0.16507   
> log(v3)      -0.45719     0.27656   -1.653   0.09995 . 
> factor1      -1.81517     0.62155   -2.920  0.00392 **
> factor2      -1.87330     0.84375   -2.220  0.02759 * 
> 
> Analysis of Variance Table
> 
> Response: height rise
>                Df  Sum Sq Mean Sq F value    Pr(>F)    
> v1            1   51.25    51.246   21.4128 6.842e-06 ***
> log(v1)      1   13.62   13.617    5.6897  0.018048 *  
> v2            1    2.84    2.836    1.1850  0.277713    
> log(v3)      1    3.02    3.024    1.2638  0.262357    
> factor1      1   17.62   17.616  7.3608  0.007279 ** 
> factor2      1   11.80   11.797  4.9294  0.027586 *  
> Residuals  190 454.71   2.393
> 
> Thanks, 
> user at host.com
> 

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/lm-v1-log-v1-improve-adj-Rsq-any-sense-tp3396935p3397136.html
Sent from the R help mailing list archive at Nabble.com.

Mike Marchywka

2011-Mar-23 01:55 UTC

head link

[R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

----------------------------------------> Date: Tue, 22 Mar 2011 09:31:01 -0700
> From: crosspide at hotmail.com
> To: r-help at r-project.org
> Subject: [R] lm ~ v1 + log(v1) + ... improve adj Rsq ?any sense?
>
> Dear all,
>
> I want to improve my adj - R sq. I 've chequed some established models
and
> they introduce two times the same variable, one transformed, and the other
> not. It also improves my adj - R sq.
>
> But, isn't this bad for the collinearity? Do I interpret coefficients
as
> usual?

I'm not sure how many replies you got or if your question was answered but
just offhand
let me see if I understand your concern.
If your data is only over a limited range of v1 where you can Taylor
expand to linear term only then sure it can be hard to tell a linear from log
dependence
of quantify a mixture of the two. If you try to find a and b
to fit y=a*f(x) + b*g(x) that minimizes some error, you should be able
to see the issues on paper.  Presumaly log is not linear over a larger
range and any error function, like SSE, would have "reasonbly " peaked
minimum for some values of the two coefficients but you could do a sensitivty
analysis to check- find the second derivatives of your error function or
just perturb the coefficients a bit. I guess if there is some direction
where the error does not change as a and b vary then you have the case you
are worried about.  I'm not sure what you consider to be "usual"
but
when I'm doing something like this, I usually have some physical
interpretation mind. Most uninfomratively, you could interpret these
coefficients as those which minimize your error given the data you have :)
What you do from there depends on a lot of specifics. To tell if
a given function seems to be appropriate for the data, it is always good
to look at a plot of residuals. Note that ability to find a unique
set of coefficients that minimizes a given error has nothing to do
with independence of the two terms attached to the coefficients- indeed
polynomial fits are a common example( log having a taylor series just constrains
a lot of coefficient relationships LOL).

P-values and confidence intervals are another matter with post hoc
exploratory work but I'll let a statistician comment on that
as well as the meaning of the R output.
Usually the final decision on a putative model impovement comes
from your ability to infer something about the underlying system
although you may just want a simple empirical approximation
and be more worried about meeting a given error with a limited
number of computations etc etc.

Apparently you found on a retrospective literature search that
everyone else is using the log term. 
Sometimes you see people ask questions like, " given that in 10 papers on
the subject 4 of them used the log term and these authors have historically
been right 50 percent of the time but the other 6 are right 40 percent of the
time, what are the chances that the log term should be included?" I will
also avoid commenting on this question except to say it illustrates
a number of ways people do approach these problems and what you consider
to be relevant to your situation. 


>
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.73140 7.22477 0.240 0.81086
> v1 -0.33886 0.20321 -1.668 0.09705 .
> log(v1) 2.63194 3.74556 0.703 0.48311
> v2 -0.01517 0.01089 -1.394 0.16507
> log(v3) -0.45719 0.27656 -1.653 0.09995 .
> factor1 -1.81517 0.62155 -2.920 0.00392 **
> factor2 -1.87330 0.84375 -2.220 0.02759 *
>
> Analysis of Variance Table
>
> Response: height rise
> Df Sum Sq Mean Sq F value Pr(>F)
> v1 1 51.25 51.246 21.4128 6.842e-06 ***
> log(v1) 1 13.62 13.617 5.6897 0.018048 *
> v2 1 2.84 2.836 1.1850 0.277713
> log(v3) 1 3.02 3.024 1.2638 0.262357
> factor1 1 17.62 17.616 7.3608 0.007279 **
> factor2 1 11.80 11.797 4.9294 0.027586 *
> Residuals 190 454.71 2.393
>
> Thanks,
> user at host.com
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/lm-v1-log-v1-improve-adj-Rsq-any-sense-tp3396935p3396935.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Mar 2011 - lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

[R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

[R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

[R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

Possibly Parallel Threads