I have a linear model y~x1+x2 of some data where the coefficient for x1 is higher than I would have expected from theory (0.7 vs 0.88) I wondered whether this would be an artifact due to x1 and x2 being correlated despite that the variance inflation factor is not too high (1.065): I used perturbation analysis to evaluate collinearity library(perturb) P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1))> summary(P)Perturb variables: x1 normal(0,1) x2 normal(0,1) Impact of perturbations on coefficients: mean s.d. min max (Intercept) -26.067 0.270 -27.235 -25.481 x1 0.726 0.025 0.672 0.882 x2 0.060 0.011 0.037 0.082 I get a mean for x1 of 0.726 which is closer to what is expected. I am not an statistical expert so I'd like to know if my evaluation of the effects of collinearity is correct and in that case any solutions to obtain a reliable linear model. Thanks, Manuel Some more detailed information:> A<-lm(y~x1+x2) > summary(A)Call: lm(formula = y ~ x1 + x2) Residuals: Min 1Q Median 3Q Max -4.221946 -0.484055 -0.004762 0.397508 2.542769 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -27.23472 0.27996 -97.282 < 2e-16 *** x1 0.88202 0.02475 35.639 < 2e-16 *** x2 0.08180 0.01239 6.604 2.53e-10 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.823 on 241 degrees of freedom Multiple R-Squared: 0.8411, Adjusted R-squared: 0.8398 F-statistic: 637.8 on 2 and 241 DF, p-value: < 2.2e-16> cor.test(x1,x2)Pearson's product-moment correlation data: x1 and x2 t = -3.9924, df = 242, p-value = 8.678e-05 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.3628424 -0.1269618 sample estimates: cor -0.248584
Manuel, The problem you describe does not sound like it is due to multicolinearity. I state this because you variance inflation factor is modest (1.1) and, more importantly, the correlation between your independent variables (x1 and x2) is modest, -0.25. I suspect the problem is due to one, or more, observations having a disproportionally large influence on your coefficients. I suggest you plot your residuals vs. predicted values. I would also do a formal analysis of the influence each observation has on the reported coefficients. You might consider computing Cook's distance for each observation. I hope this has helped. John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 - NOTE NEW EMAIL ADDRESS: jsorkin@grecc.umaryland.edu>>> Manuel Gutierrez <manuel_gutierrez_lopez@yahoo.es> 4/11/20056:22:55 AM >>> I have a linear model y~x1+x2 of some data where the coefficient for x1 is higher than I would have expected from theory (0.7 vs 0.88) I wondered whether this would be an artifact due to x1 and x2 being correlated despite that the variance inflation factor is not too high (1.065): I used perturbation analysis to evaluate collinearity library(perturb) P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1))> summary(P)Perturb variables: x1 normal(0,1) x2 normal(0,1) Impact of perturbations on coefficients: mean s.d. min max (Intercept) -26.067 0.270 -27.235 -25.481 x1 0.726 0.025 0.672 0.882 x2 0.060 0.011 0.037 0.082 I get a mean for x1 of 0.726 which is closer to what is expected. I am not an statistical expert so I'd like to know if my evaluation of the effects of collinearity is correct and in that case any solutions to obtain a reliable linear model. Thanks, Manuel Some more detailed information:> A<-lm(y~x1+x2) > summary(A)Call: lm(formula = y ~ x1 + x2) Residuals: Min 1Q Median 3Q Max -4.221946 -0.484055 -0.004762 0.397508 2.542769 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -27.23472 0.27996 -97.282 < 2e-16 *** x1 0.88202 0.02475 35.639 < 2e-16 *** x2 0.08180 0.01239 6.604 2.53e-10 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.823 on 241 degrees of freedom Multiple R-Squared: 0.8411, Adjusted R-squared: 0.8398 F-statistic: 637.8 on 2 and 241 DF, p-value: < 2.2e-16> cor.test(x1,x2)Pearson's product-moment correlation data: x1 and x2 t = -3.9924, df = 242, p-value = 8.678e-05 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.3628424 -0.1269618 sample estimates: cor -0.248584 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]]
why not use vif command (from car library) to caculate the VIF to help you assess is a collinearity is infulential? I have never seen any book dealling with this topics by perturbation analysis. the VIF,tolerance,principal component analysis are the tools dealing with collinearity.you can get the information from john fox's book. generally,caculating the correlation directly is not essential. one more thing,if your purpose of modeling is prediction but not interpretation,collinearity does not matter much. On Mon, 11 Apr 2005 12:22:55 +0200 (CEST) Manuel Gutierrez <manuel_gutierrez_lopez at yahoo.es> wrote:> > I have a linear model y~x1+x2 of some data where the > coefficient for > x1 is higher than I would have expected from theory > (0.7 vs 0.88) > I wondered whether this would be an artifact due to x1 > and x2 being correlated despite that the variance > inflation factor is not too high (1.065): > I used perturbation analysis to evaluate collinearity > library(perturb) > P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1)) > > summary(P) > Perturb variables: > x1 normal(0,1) > x2 normal(0,1) > > Impact of perturbations on coefficients: > mean s.d. min max > (Intercept) -26.067 0.270 -27.235 -25.481 > x1 0.726 0.025 0.672 0.882 > x2 0.060 0.011 0.037 0.082 > > I get a mean for x1 of 0.726 which is closer to what > is expected. > I am not an statistical expert so I'd like to know if > my evaluation of the effects of collinearity is > correct and in that case any solutions to obtain a > reliable linear model. > Thanks, > Manuel > > Some more detailed information: > > > A<-lm(y~x1+x2) > > summary(A) > > Call: > lm(formula = y ~ x1 + x2) > > Residuals: > Min 1Q Median 3Q Max > -4.221946 -0.484055 -0.004762 0.397508 2.542769 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -27.23472 0.27996 -97.282 < 2e-16 *** > x1 0.88202 0.02475 35.639 < 2e-16 *** > x2 0.08180 0.01239 6.604 2.53e-10 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' > 0.1 ` ' 1 > > Residual standard error: 0.823 on 241 degrees of > freedom > Multiple R-Squared: 0.8411, Adjusted R-squared: 0.8398 > > F-statistic: 637.8 on 2 and 241 DF, p-value: < > 2.2e-16 > > > cor.test(x1,x2) > > Pearson's product-moment correlation > > data: x1 and x2 > t = -3.9924, df = 242, p-value = 8.678e-05 > alternative hypothesis: true correlation is not equal > to 0 > 95 percent confidence interval: > -0.3628424 -0.1269618 > sample estimates: > cor > -0.248584 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html