thr3ads.net - R help - [R] dealing with multicollinearity [Apr 2005]

If this information is useful, please help other people find it:
Share via:

Manuel Gutierrez

2005-Apr-11 10:22 UTC

[R] dealing with multicollinearity

I have a linear model y~x1+x2 of some data where the
coefficient for
x1 is higher than I would have expected from theory
(0.7 vs 0.88)
I wondered whether this would be an artifact due to x1
and x2 being correlated despite that the variance
inflation factor is not too high (1.065):
I used perturbation analysis to evaluate collinearity
library(perturb)
P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1))> summary(P)Perturb variables:
x1 		 normal(0,1) 
x2 		 normal(0,1) 

Impact of perturbations on coefficients:
            mean     s.d.     min      max     
(Intercept)  -26.067    0.270  -27.235  -25.481
x1             0.726    0.025    0.672    0.882
x2             0.060    0.011    0.037    0.082

I get a mean for x1 of 0.726 which is closer to what
is expected.
I am not an statistical expert so I'd like to know if
my evaluation of the effects of collinearity is
correct and in that case any solutions to obtain a
reliable linear model.
Thanks,
Manuel

Some more detailed information:
> A<-lm(y~x1+x2)
> summary(A)
Call:
lm(formula = y ~ x1 + x2)

Residuals:
      Min        1Q    Median        3Q       Max 
-4.221946 -0.484055 -0.004762  0.397508  2.542769 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -27.23472    0.27996 -97.282  < 2e-16 ***
x1            0.88202    0.02475  35.639  < 2e-16 ***
x2            0.08180    0.01239   6.604 2.53e-10 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` ' 1 

Residual standard error: 0.823 on 241 degrees of
freedom
Multiple R-Squared: 0.8411,	Adjusted R-squared: 0.8398

F-statistic: 637.8 on 2 and 241 DF,  p-value: <
2.2e-16 
> cor.test(x1,x2)
	Pearson's product-moment correlation

data:  x1 and x2 
t = -3.9924, df = 242, p-value = 8.678e-05
alternative hypothesis: true correlation is not equal
to 0 
95 percent confidence interval:
 -0.3628424 -0.1269618 
sample estimates:
      cor 
-0.248584

John Sorkin

2005-Apr-11 12:43 UTC

head link

[R] dealing with multicollinearity

Manuel,
The problem you describe does not sound like it is due to
multicolinearity. I state this because you variance inflation factor is
modest (1.1) and, more importantly, the correlation between your
independent variables (x1 and x2) is modest, -0.25. I suspect the
problem is due to one, or more, observations having a disproportionally
large influence on your coefficients. I suggest you plot your residuals
vs. predicted values. I would also do a formal analysis of the influence
each observation has on the reported coefficients. You might consider
computing Cook's distance for each observation.
 
I hope this has helped.
 
John
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
- NOTE NEW EMAIL ADDRESS:
jsorkin@grecc.umaryland.edu
>>> Manuel Gutierrez <manuel_gutierrez_lopez@yahoo.es> 4/11/20056:22:55 AM >>>



I have a linear model y~x1+x2 of some data where the
coefficient for
x1 is higher than I would have expected from theory
(0.7 vs 0.88)
I wondered whether this would be an artifact due to x1
and x2 being correlated despite that the variance
inflation factor is not too high (1.065):
I used perturbation analysis to evaluate collinearity
library(perturb)
P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1))> summary(P)Perturb variables:
x1         normal(0,1) 
x2         normal(0,1) 

Impact of perturbations on coefficients:
            mean     s.d.     min      max     
(Intercept)  -26.067    0.270  -27.235  -25.481
x1             0.726    0.025    0.672    0.882
x2             0.060    0.011    0.037    0.082

I get a mean for x1 of 0.726 which is closer to what
is expected.
I am not an statistical expert so I'd like to know if
my evaluation of the effects of collinearity is
correct and in that case any solutions to obtain a
reliable linear model.
Thanks,
Manuel

Some more detailed information:
> A<-lm(y~x1+x2)
> summary(A)
Call:
lm(formula = y ~ x1 + x2)

Residuals:
      Min        1Q    Median        3Q       Max 
-4.221946 -0.484055 -0.004762  0.397508  2.542769 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -27.23472    0.27996 -97.282  < 2e-16 ***
x1            0.88202    0.02475  35.639  < 2e-16 ***
x2            0.08180    0.01239   6.604 2.53e-10 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` ' 1 

Residual standard error: 0.823 on 241 degrees of
freedom
Multiple R-Squared: 0.8411,    Adjusted R-squared: 0.8398

F-statistic: 637.8 on 2 and 241 DF,  p-value: <
2.2e-16 
> cor.test(x1,x2)
    Pearson's product-moment correlation

data:  x1 and x2 
t = -3.9924, df = 242, p-value = 8.678e-05
alternative hypothesis: true correlation is not equal
to 0 
95 percent confidence interval:
-0.3628424 -0.1269618 
sample estimates:
      cor 
-0.248584

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


	[[alternative HTML version deleted]]

ronggui

2005-Apr-11 13:01 UTC

head link

[R] dealing with multicollinearity

why not use vif command (from car library) to caculate the VIF to help you
assess is a collinearity is infulential?

I have never  seen any book dealling with this topics by perturbation analysis.

the VIF,tolerance,principal component analysis are the tools dealing with
collinearity.you can get the information from john fox's book.

generally,caculating the correlation directly is not essential.

one more thing,if your purpose of modeling is  prediction but not
interpretation,collinearity does not matter much.


On Mon, 11 Apr 2005 12:22:55 +0200 (CEST)
Manuel Gutierrez <manuel_gutierrez_lopez at yahoo.es> wrote:
> 
> I have a linear model y~x1+x2 of some data where the
> coefficient for
> x1 is higher than I would have expected from theory
> (0.7 vs 0.88)
> I wondered whether this would be an artifact due to x1
> and x2 being correlated despite that the variance
> inflation factor is not too high (1.065):
> I used perturbation analysis to evaluate collinearity
> library(perturb)
> P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1))
> > summary(P)
> Perturb variables:
> x1 		 normal(0,1) 
> x2 		 normal(0,1) 
> 
> Impact of perturbations on coefficients:
>             mean     s.d.     min      max     
> (Intercept)  -26.067    0.270  -27.235  -25.481
> x1             0.726    0.025    0.672    0.882
> x2             0.060    0.011    0.037    0.082
> 
> I get a mean for x1 of 0.726 which is closer to what
> is expected.
> I am not an statistical expert so I'd like to know if
> my evaluation of the effects of collinearity is
> correct and in that case any solutions to obtain a
> reliable linear model.
> Thanks,
> Manuel
> 
> Some more detailed information:
> 
> > A<-lm(y~x1+x2)
> > summary(A)
> 
> Call:
> lm(formula = y ~ x1 + x2)
> 
> Residuals:
>       Min        1Q    Median        3Q       Max 
> -4.221946 -0.484055 -0.004762  0.397508  2.542769 
> 
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)    
> (Intercept) -27.23472    0.27996 -97.282  < 2e-16 ***
> x1            0.88202    0.02475  35.639  < 2e-16 ***
> x2            0.08180    0.01239   6.604 2.53e-10 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
> 0.1 ` ' 1 
> 
> Residual standard error: 0.823 on 241 degrees of
> freedom
> Multiple R-Squared: 0.8411,	Adjusted R-squared: 0.8398
> 
> F-statistic: 637.8 on 2 and 241 DF,  p-value: <
> 2.2e-16 
> 
> > cor.test(x1,x2)
> 
> 	Pearson's product-moment correlation
> 
> data:  x1 and x2 
> t = -3.9924, df = 242, p-value = 8.678e-05
> alternative hypothesis: true correlation is not equal
> to 0 
> 95 percent confidence interval:
>  -0.3628424 -0.1269618 
> sample estimates:
>       cor 
> -0.248584
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Possibly Parallel Threads

Search for more maybe matching threads

R help - Apr 2005 - dealing with multicollinearity

[R] dealing with multicollinearity

[R] dealing with multicollinearity

[R] dealing with multicollinearity

Possibly Parallel Threads