Hi I have a set of data with both quantitative and categorical predictors. After scaling of response variable, i looked for multicollinearity (VIF values) among the predictors and removed the predictors who were hinding some of the other significant predictors. I'm curious to know whether the predictors (who are not significant) while doing simple 'lm' will be involved in interactions. How do i take into account interactions of those predictors whom i removed just on the basis of multicollinearity ? I'll appreciate if someone can throw some light on this matter and how to use R to detect the interactions effectively . Thanks Regards Dev ------Final 'lm model'--------------------> logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) ~ hit+rbi + walk+ obp + strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.)> summary(logmodelfull_minus_run_hr_walk_batting)Call: lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out + free.agent.eligible + free.agent.1991 + arbitr.elgible.) Residuals: Min 1Q Median 3Q Max -2.41786 -0.28911 -0.02814 0.31890 1.49007 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.340782 0.251218 21.260 < 2e-16 *** hit 0.004479 0.001158 3.867 0.000133 *** rbi 0.011102 0.002195 5.059 7.05e-07 *** walk 0.005421 0.002206 2.457 0.014533 * obp -1.385584 0.824105 -1.681 0.093653 . strike.out -0.005399 0.001438 -3.755 0.000205 *** free.agent.eligible1 1.611521 0.080657 19.980 < 2e-16 *** free.agent.19911 -0.301243 0.103481 -2.911 0.003848 ** arbitr.elgible.1 1.293059 0.086696 14.915 < 2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.5351 on 328 degrees of freedom Multiple R-Squared: 0.7981, Adjusted R-squared: 0.7932 F-statistic: 162.1 on 8 and 328 DF, p-value: < 2.2e-16 ---------------------------------------------------------------------------- ---------------------------------------------------- --------------with interactions---------------------------------------------------------------- ---------------------------> > summary(baseball.lgmodel_with_interactions_ALL_arbid)Call: lm(formula = log(salary) ~ hit + rbi + strike.out + free.agent.eligible + free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 + hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible + rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out + strike.out * free.agent.eligible + strike.out * arbitr.elgible. + strike.out * run + strike.out * hr + hit * free.agent.eligible + free.agent.eligible * run + hit * free.agent.1991 + strike.out * free.agent.1991 + free.agent.1991 * batting + free.agent.1991 * obp + arbitr.elgible. * run + batting * double + obp * run + obp * hr + walk * stolen.base + hit * arbitr.1991 + free.agent.eligible * double + arbitr.elgible. * double + strike.out * triple + triple * batting + triple * walk + triple * walk + hit * hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 * hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk + free.agent.eligible * walk + walk * rbi + rbi * stolen.base + strike.out * stolen.base + stolen.base * batting + stolen.base * walk + stolen.base * rbi + stolen.base * walk + arbitr.elgible. * error) Residuals: Min 1Q Median 3Q Max -2.29352 -0.28287 -0.03748 0.29790 1.31590 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.217e+00 3.467e-01 15.048 < 2e-16 *** hit 6.927e-03 6.226e-03 1.112 0.266889 rbi 1.908e-02 1.150e-02 1.658 0.098350 . strike.out -5.692e-03 4.586e-03 -1.241 0.215517 free.agent.eligible1 1.287e+00 2.259e-01 5.699 3.05e-08 *** free.agent.19911 3.828e-01 6.575e-01 0.582 0.560914 arbitr.elgible.1 1.038e+00 2.195e-01 4.726 3.63e-06 *** arbitr.19911 -1.024e+00 4.392e-01 -2.331 0.020443 * run 4.932e-02 2.905e-02 1.698 0.090682 . hr -1.093e-01 7.208e-02 -1.516 0.130543 batting -1.814e-01 2.558e+00 -0.071 0.943522 obp -1.375e+00 2.253e+00 -0.610 0.542099 double -5.259e-02 4.489e-02 -1.172 0.242349 walk 1.395e-02 9.757e-03 1.430 0.153808 stolen.base -1.685e-02 4.299e-02 -0.392 0.695372 triple -1.367e-01 1.600e-01 -0.854 0.393807 error -4.097e-03 6.879e-03 -0.595 0.552007 hit:free.agent.19911 8.248e-04 4.611e-03 0.179 0.858174 hit:arbitr.elgible.1 4.873e-03 6.448e-03 0.756 0.450395 hit:rbi -1.382e-04 7.709e-05 -1.792 0.074184 . rbi:free.agent.eligible1 5.352e-03 9.555e-03 0.560 0.575855 rbi:arbitr.elgible.1 -3.384e-03 1.136e-02 -0.298 0.766072 rbi:arbitr.19911 3.596e-02 2.179e-02 1.650 0.100046 hit:strike.out 5.480e-06 5.446e-05 0.101 0.919917 strike.out:free.agent.eligible1 -2.570e-03 4.282e-03 -0.600 0.548890 strike.out:arbitr.elgible.1 -9.703e-04 5.234e-03 -0.185 0.853068 strike.out:run 1.685e-04 1.246e-04 1.352 0.177345 strike.out:hr -3.088e-04 2.277e-04 -1.356 0.176229 hit:free.agent.eligible1 -1.359e-03 6.224e-03 -0.218 0.827363 free.agent.eligible1:run 1.248e-02 9.109e-03 1.370 0.171917 strike.out:free.agent.19911 -1.851e-02 5.974e-03 -3.099 0.002140 ** free.agent.19911:batting 7.076e-01 6.200e+00 0.114 0.909215 free.agent.19911:obp -1.421e+00 3.952e+00 -0.360 0.719394 arbitr.elgible.1:run -8.541e-03 8.773e-03 -0.974 0.331100 batting:double 2.346e-01 1.609e-01 1.458 0.145884 run:obp -1.825e-01 7.492e-02 -2.436 0.015462 * hr:obp 3.687e-01 2.116e-01 1.742 0.082608 . walk:stolen.base -6.789e-05 1.557e-04 -0.436 0.663083 hit:arbitr.19911 -5.835e-03 7.084e-03 -0.824 0.410808 free.agent.eligible1:double -1.151e-02 1.663e-02 -0.692 0.489334 arbitr.elgible.1:double 2.169e-03 1.938e-02 0.112 0.910985 strike.out:triple -8.106e-04 6.023e-04 -1.346 0.179475 batting:triple 5.179e-01 5.599e-01 0.925 0.355841 walk:triple 8.755e-04 9.262e-04 0.945 0.345349 hit:hr -3.320e-04 2.626e-04 -1.264 0.207180 rbi:hr 4.748e-04 3.015e-04 1.575 0.116414 free.agent.eligible1:hr 1.840e-02 2.313e-02 0.796 0.426972 free.agent.19911:hr 7.216e-02 1.889e-02 3.819 0.000165 *** arbitr.elgible.1:hr 4.111e-02 2.803e-02 1.467 0.143564 arbitr.19911:hr -2.368e-02 4.647e-02 -0.510 0.610723 hit:walk 3.173e-05 7.826e-05 0.405 0.685442 free.agent.eligible1:walk -5.423e-03 4.984e-03 -1.088 0.277472 rbi:walk -7.569e-05 1.313e-04 -0.577 0.564598 rbi:stolen.base 3.980e-05 1.605e-04 0.248 0.804409 strike.out:stolen.base -2.611e-04 1.615e-04 -1.617 0.107004 batting:stolen.base 1.552e-01 1.434e-01 1.082 0.280020 arbitr.elgible.1:error 3.930e-03 1.390e-02 0.283 0.777495 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.4925 on 280 degrees of freedom Multiple R-Squared: 0.854, Adjusted R-squared: 0.8248 F-statistic: 29.24 on 56 and 280 DF, p-value: < 2.2e-16
Hi I have a set of data with both quantitative and categorical predictors. After scaling of response variable, i looked for multicollinearity (VIF values) among the predictors and removed the predictors who were hinding some of the other significant predictors. I'm curious to know whether the predictors (who are not significant) while doing simple 'lm' will be involved in interactions. How do i take into account interactions of those predictors whom i removed just on the basis of multicollinearity ? I'll appreciate if someone can throw some light on this matter and how to use R to detect the interactions effectively . Thanks Regards Dev> ------Final 'lm model'-------------------- > > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) ~ hit+rbi +walk> + obp + strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.) > > summary(logmodelfull_minus_run_hr_walk_batting) > > Call: > lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out + > free.agent.eligible + free.agent.1991 + arbitr.elgible.) > > Residuals: > Min 1Q Median 3Q Max > -2.41786 -0.28911 -0.02814 0.31890 1.49007 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 5.340782 0.251218 21.260 < 2e-16 *** > hit 0.004479 0.001158 3.867 0.000133 *** > rbi 0.011102 0.002195 5.059 7.05e-07 *** > walk 0.005421 0.002206 2.457 0.014533 * > obp -1.385584 0.824105 -1.681 0.093653 . > strike.out -0.005399 0.001438 -3.755 0.000205 *** > free.agent.eligible1 1.611521 0.080657 19.980 < 2e-16 *** > free.agent.19911 -0.301243 0.103481 -2.911 0.003848 ** > arbitr.elgible.1 1.293059 0.086696 14.915 < 2e-16 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.5351 on 328 degrees of freedom > Multiple R-Squared: 0.7981, Adjusted R-squared: 0.7932 > F-statistic: 162.1 on 8 and 328 DF, p-value: < 2.2e-16 > > ----------------------------------------------------------------------------> ---------------------------------------------------- > > > --------------with >interactions----------------------------------------------------------------> --------------------------- > > > > > summary(baseball.lgmodel_with_interactions_ALL_arbid) > > Call: > lm(formula = log(salary) ~ hit + rbi + strike.out + free.agent.eligible + > free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 + > hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible + > rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out + > strike.out * free.agent.eligible + strike.out * arbitr.elgible. + > strike.out * run + strike.out * hr + hit * free.agent.eligible + > free.agent.eligible * run + hit * free.agent.1991 + strike.out * > free.agent.1991 + free.agent.1991 * batting + free.agent.1991 * > obp + arbitr.elgible. * run + batting * double + obp * run + > obp * hr + walk * stolen.base + hit * arbitr.1991 +free.agent.eligible> * > double + arbitr.elgible. * double + strike.out * triple + > triple * batting + triple * walk + triple * walk + hit * > hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 * > hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk + > free.agent.eligible * walk + walk * rbi + rbi * stolen.base + > strike.out * stolen.base + stolen.base * batting + stolen.base * > walk + stolen.base * rbi + stolen.base * walk + arbitr.elgible. * > error) > > Residuals: > Min 1Q Median 3Q Max > -2.29352 -0.28287 -0.03748 0.29790 1.31590 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 5.217e+00 3.467e-01 15.048 < 2e-16 *** > hit 6.927e-03 6.226e-03 1.112 0.266889 > rbi 1.908e-02 1.150e-02 1.658 0.098350 . > strike.out -5.692e-03 4.586e-03 -1.241 0.215517 > free.agent.eligible1 1.287e+00 2.259e-01 5.699 3.05e-08 *** > free.agent.19911 3.828e-01 6.575e-01 0.582 0.560914 > arbitr.elgible.1 1.038e+00 2.195e-01 4.726 3.63e-06 *** > arbitr.19911 -1.024e+00 4.392e-01 -2.331 0.020443 * > run 4.932e-02 2.905e-02 1.698 0.090682 . > hr -1.093e-01 7.208e-02 -1.516 0.130543 > batting -1.814e-01 2.558e+00 -0.071 0.943522 > obp -1.375e+00 2.253e+00 -0.610 0.542099 > double -5.259e-02 4.489e-02 -1.172 0.242349 > walk 1.395e-02 9.757e-03 1.430 0.153808 > stolen.base -1.685e-02 4.299e-02 -0.392 0.695372 > triple -1.367e-01 1.600e-01 -0.854 0.393807 > error -4.097e-03 6.879e-03 -0.595 0.552007 > hit:free.agent.19911 8.248e-04 4.611e-03 0.179 0.858174 > hit:arbitr.elgible.1 4.873e-03 6.448e-03 0.756 0.450395 > hit:rbi -1.382e-04 7.709e-05 -1.792 0.074184 . > rbi:free.agent.eligible1 5.352e-03 9.555e-03 0.560 0.575855 > rbi:arbitr.elgible.1 -3.384e-03 1.136e-02 -0.298 0.766072 > rbi:arbitr.19911 3.596e-02 2.179e-02 1.650 0.100046 > hit:strike.out 5.480e-06 5.446e-05 0.101 0.919917 > strike.out:free.agent.eligible1 -2.570e-03 4.282e-03 -0.600 0.548890 > strike.out:arbitr.elgible.1 -9.703e-04 5.234e-03 -0.185 0.853068 > strike.out:run 1.685e-04 1.246e-04 1.352 0.177345 > strike.out:hr -3.088e-04 2.277e-04 -1.356 0.176229 > hit:free.agent.eligible1 -1.359e-03 6.224e-03 -0.218 0.827363 > free.agent.eligible1:run 1.248e-02 9.109e-03 1.370 0.171917 > strike.out:free.agent.19911 -1.851e-02 5.974e-03 -3.099 0.002140 ** > free.agent.19911:batting 7.076e-01 6.200e+00 0.114 0.909215 > free.agent.19911:obp -1.421e+00 3.952e+00 -0.360 0.719394 > arbitr.elgible.1:run -8.541e-03 8.773e-03 -0.974 0.331100 > batting:double 2.346e-01 1.609e-01 1.458 0.145884 > run:obp -1.825e-01 7.492e-02 -2.436 0.015462 * > hr:obp 3.687e-01 2.116e-01 1.742 0.082608 . > walk:stolen.base -6.789e-05 1.557e-04 -0.436 0.663083 > hit:arbitr.19911 -5.835e-03 7.084e-03 -0.824 0.410808 > free.agent.eligible1:double -1.151e-02 1.663e-02 -0.692 0.489334 > arbitr.elgible.1:double 2.169e-03 1.938e-02 0.112 0.910985 > strike.out:triple -8.106e-04 6.023e-04 -1.346 0.179475 > batting:triple 5.179e-01 5.599e-01 0.925 0.355841 > walk:triple 8.755e-04 9.262e-04 0.945 0.345349 > hit:hr -3.320e-04 2.626e-04 -1.264 0.207180 > rbi:hr 4.748e-04 3.015e-04 1.575 0.116414 > free.agent.eligible1:hr 1.840e-02 2.313e-02 0.796 0.426972 > free.agent.19911:hr 7.216e-02 1.889e-02 3.819 0.000165 *** > arbitr.elgible.1:hr 4.111e-02 2.803e-02 1.467 0.143564 > arbitr.19911:hr -2.368e-02 4.647e-02 -0.510 0.610723 > hit:walk 3.173e-05 7.826e-05 0.405 0.685442 > free.agent.eligible1:walk -5.423e-03 4.984e-03 -1.088 0.277472 > rbi:walk -7.569e-05 1.313e-04 -0.577 0.564598 > rbi:stolen.base 3.980e-05 1.605e-04 0.248 0.804409 > strike.out:stolen.base -2.611e-04 1.615e-04 -1.617 0.107004 > batting:stolen.base 1.552e-01 1.434e-01 1.082 0.280020 > arbitr.elgible.1:error 3.930e-03 1.390e-02 0.283 0.777495 > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.4925 on 280 degrees of freedom > Multiple R-Squared: 0.854, Adjusted R-squared: 0.8248 > F-statistic: 29.24 on 56 and 280 DF, p-value: < 2.2e-16 >
Hi I have a set of data with both quantitative and categorical predictors. After scaling of response variable, i looked for multicollinearity (VIF values) among the predictors and removed the predictors who were hinding some of the other significant predictors. I'm curious to know whether the predictors (who are not significant) while doing simple 'lm' will be involved in interactions. How do i take into account interactions of those predictors whom i removed just on the basis of multicollinearity ? I'll appreciate if someone can throw some light on this matter and how to use R to detect the interactions effectively . Thanks Regards Dev> ------Final 'lm model'-------------------- > > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) ~ hit+rbi +walk> + obp + strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.) > > summary(logmodelfull_minus_run_hr_walk_batting) > > Call: > lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out + > free.agent.eligible + free.agent.1991 + arbitr.elgible.) > > Residuals: > Min 1Q Median 3Q Max > -2.41786 -0.28911 -0.02814 0.31890 1.49007 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 5.340782 0.251218 21.260 < 2e-16 *** > hit 0.004479 0.001158 3.867 0.000133 *** > rbi 0.011102 0.002195 5.059 7.05e-07 *** > walk 0.005421 0.002206 2.457 0.014533 * > obp -1.385584 0.824105 -1.681 0.093653 . > strike.out -0.005399 0.001438 -3.755 0.000205 *** > free.agent.eligible1 1.611521 0.080657 19.980 < 2e-16 *** > free.agent.19911 -0.301243 0.103481 -2.911 0.003848 ** > arbitr.elgible.1 1.293059 0.086696 14.915 < 2e-16 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.5351 on 328 degrees of freedom > Multiple R-Squared: 0.7981, Adjusted R-squared: 0.7932 > F-statistic: 162.1 on 8 and 328 DF, p-value: < 2.2e-16 > > ----------------------------------------------------------------------------> ---------------------------------------------------- > > > --------------with >interactions----------------------------------------------------------------> --------------------------- > > > > > summary(baseball.lgmodel_with_interactions_ALL_arbid) > > Call: > lm(formula = log(salary) ~ hit + rbi + strike.out + free.agent.eligible + > free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 + > hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible + > rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out + > strike.out * free.agent.eligible + strike.out * arbitr.elgible. + > strike.out * run + strike.out * hr + hit * free.agent.eligible + > free.agent.eligible * run + hit * free.agent.1991 + strike.out * > free.agent.1991 + free.agent.1991 * batting + free.agent.1991 * > obp + arbitr.elgible. * run + batting * double + obp * run + > obp * hr + walk * stolen.base + hit * arbitr.1991 +free.agent.eligible> * > double + arbitr.elgible. * double + strike.out * triple + > triple * batting + triple * walk + triple * walk + hit * > hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 * > hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk + > free.agent.eligible * walk + walk * rbi + rbi * stolen.base + > strike.out * stolen.base + stolen.base * batting + stolen.base * > walk + stolen.base * rbi + stolen.base * walk + arbitr.elgible. * > error) > > Residuals: > Min 1Q Median 3Q Max > -2.29352 -0.28287 -0.03748 0.29790 1.31590 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 5.217e+00 3.467e-01 15.048 < 2e-16 *** > hit 6.927e-03 6.226e-03 1.112 0.266889 > rbi 1.908e-02 1.150e-02 1.658 0.098350 > strike.out -5.692e-03 4.586e-03 -1.241 0.215517 > free.agent.eligible1 1.287e+00 2.259e-01 5.699 3.05e-08 *** > free.agent.19911 3.828e-01 6.575e-01 0.582 0.560914 > arbitr.elgible.1 1.038e+00 2.195e-01 4.726 3.63e-06 *** > arbitr.19911 -1.024e+00 4.392e-01 -2.331 0.020443 * > run 4.932e-02 2.905e-02 1.698 0.090682 > hr -1.093e-01 7.208e-02 -1.516 0.130543 > batting -1.814e-01 2.558e+00 -0.071 0.943522 > obp -1.375e+00 2.253e+00 -0.610 0.542099 > double -5.259e-02 4.489e-02 -1.172 0.242349 > walk 1.395e-02 9.757e-03 1.430 0.153808 > stolen.base -1.685e-02 4.299e-02 -0.392 0.695372 > triple -1.367e-01 1.600e-01 -0.854 0.393807 > error -4.097e-03 6.879e-03 -0.595 0.552007 > hit:free.agent.19911 8.248e-04 4.611e-03 0.179 0.858174 > hit:arbitr.elgible.1 4.873e-03 6.448e-03 0.756 0.450395 > hit:rbi -1.382e-04 7.709e-05 -1.792 0.074184 > rbi:free.agent.eligible1 5.352e-03 9.555e-03 0.560 0.575855 > rbi:arbitr.elgible.1 -3.384e-03 1.136e-02 -0.298 0.766072 > rbi:arbitr.19911 3.596e-02 2.179e-02 1.650 0.100046 > hit:strike.out 5.480e-06 5.446e-05 0.101 0.919917 > strike.out:free.agent.eligible1 -2.570e-03 4.282e-03 -0.600 0.548890 > strike.out:arbitr.elgible.1 -9.703e-04 5.234e-03 -0.185 0.853068 > strike.out:run 1.685e-04 1.246e-04 1.352 0.177345 > strike.out:hr -3.088e-04 2.277e-04 -1.356 0.176229 > hit:free.agent.eligible1 -1.359e-03 6.224e-03 -0.218 0.827363 > free.agent.eligible1:run 1.248e-02 9.109e-03 1.370 0.171917 > strike.out:free.agent.19911 -1.851e-02 5.974e-03 -3.099 0.002140 ** > free.agent.19911:batting 7.076e-01 6.200e+00 0.114 0.909215 > free.agent.19911:obp -1.421e+00 3.952e+00 -0.360 0.719394 > arbitr.elgible.1:run -8.541e-03 8.773e-03 -0.974 0.331100 > batting:double 2.346e-01 1.609e-01 1.458 0.145884 > run:obp -1.825e-01 7.492e-02 -2.436 0.015462 * > hr:obp 3.687e-01 2.116e-01 1.742 0.082608 > walk:stolen.base -6.789e-05 1.557e-04 -0.436 0.663083 > hit:arbitr.19911 -5.835e-03 7.084e-03 -0.824 0.410808 > free.agent.eligible1:double -1.151e-02 1.663e-02 -0.692 0.489334 > arbitr.elgible.1:double 2.169e-03 1.938e-02 0.112 0.910985 > strike.out:triple -8.106e-04 6.023e-04 -1.346 0.179475 > batting:triple 5.179e-01 5.599e-01 0.925 0.355841 > walk:triple 8.755e-04 9.262e-04 0.945 0.345349 > hit:hr -3.320e-04 2.626e-04 -1.264 0.207180 > rbi:hr 4.748e-04 3.015e-04 1.575 0.116414 > free.agent.eligible1:hr 1.840e-02 2.313e-02 0.796 0.426972 > free.agent.19911:hr 7.216e-02 1.889e-02 3.819 0.000165 *** > arbitr.elgible.1:hr 4.111e-02 2.803e-02 1.467 0.143564 > arbitr.19911:hr -2.368e-02 4.647e-02 -0.510 0.610723 > hit:walk 3.173e-05 7.826e-05 0.405 0.685442 > free.agent.eligible1:walk -5.423e-03 4.984e-03 -1.088 0.277472 > rbi:walk -7.569e-05 1.313e-04 -0.577 0.564598 > rbi:stolen.base 3.980e-05 1.605e-04 0.248 0.804409 > strike.out:stolen.base -2.611e-04 1.615e-04 -1.617 0.107004 > batting:stolen.base 1.552e-01 1.434e-01 1.082 0.280020 > arbitr.elgible.1:error 3.930e-03 1.390e-02 0.283 0.777495 > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.4925 on 280 degrees of freedom > Multiple R-Squared: 0.854, Adjusted R-squared: 0.8248 > F-statistic: 29.24 on 56 and 280 DF, p-value: < 2.2e-16 >[[alternative HTML version deleted]]
If variables are colinear, then looking at interactions among them doesn't make much sense. High collinearity means that one variable is nearly a linear combination of others. IOW, that variable is not adding much information. So, if you look at the interaction, you are ALMOST looking at a quadratic (e.g., if the collinearity involves only 2 variables, then one is very similar to the other, so X1*X2 is almost X1*X1). The output will be confusing, to say the least. Worse, when you include collinear variables, the resulting equation is highly sensitive to small (sometimes very small) changes in the data. Belsley gives an example where changes in the third decimal place result in totally different equations. For details see Belsley's book titled something like "collinearity and weak data in regression" (sorry, the book and my files are at the office, but this should let you find it HTH Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)>>> "Devshruti Pahuja" <devshruti at hotmail.com> 06/11/04 5:35 AM >>>Hi I have a set of data with both quantitative and categorical predictors. After scaling of response variable, i looked for multicollinearity (VIF values) among the predictors and removed the predictors who were hinding some of the other significant predictors. I'm curious to know whether the predictors (who are not significant) while doing simple 'lm' will be involved in interactions. How do i take into account interactions of those predictors whom i removed just on the basis of multicollinearity ? I'll appreciate if someone can throw some light on this matter and how to use R to detect the interactions effectively . Thanks Regards Dev> ------Final 'lm model'-------------------- > > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) ~ hit+rbi +walk> + obp +strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.)> > summary(logmodelfull_minus_run_hr_walk_batting) > > Call: > lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out + > free.agent.eligible + free.agent.1991 + arbitr.elgible.) > > Residuals: > Min 1Q Median 3Q Max > -2.41786 -0.28911 -0.02814 0.31890 1.49007 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 5.340782 0.251218 21.260 < 2e-16 *** > hit 0.004479 0.001158 3.867 0.000133 *** > rbi 0.011102 0.002195 5.059 7.05e-07 *** > walk 0.005421 0.002206 2.457 0.014533 * > obp -1.385584 0.824105 -1.681 0.093653 . > strike.out -0.005399 0.001438 -3.755 0.000205 *** > free.agent.eligible1 1.611521 0.080657 19.980 < 2e-16 *** > free.agent.19911 -0.301243 0.103481 -2.911 0.003848 ** > arbitr.elgible.1 1.293059 0.086696 14.915 < 2e-16 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.5351 on 328 degrees of freedom > Multiple R-Squared: 0.7981, Adjusted R-squared: 0.7932 > F-statistic: 162.1 on 8 and 328 DF, p-value: < 2.2e-16 > >-------------------------------------------------------------------------- --> ---------------------------------------------------- > > > --------------with >interactions----------------------------------------------------------------> --------------------------- > > > > > summary(baseball.lgmodel_with_interactions_ALL_arbid) > > Call: > lm(formula = log(salary) ~ hit + rbi + strike.out +free.agent.eligible +> free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 + > hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible + > rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out + > strike.out * free.agent.eligible + strike.out * arbitr.elgible. + > strike.out * run + strike.out * hr + hit * free.agent.eligible + > free.agent.eligible * run + hit * free.agent.1991 + strike.out * > free.agent.1991 + free.agent.1991 * batting + free.agent.1991 * > obp + arbitr.elgible. * run + batting * double + obp * run + > obp * hr + walk * stolen.base + hit * arbitr.1991 +free.agent.eligible> * > double + arbitr.elgible. * double + strike.out * triple + > triple * batting + triple * walk + triple * walk + hit * > hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 * > hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk + > free.agent.eligible * walk + walk * rbi + rbi * stolen.base + > strike.out * stolen.base + stolen.base * batting + stolen.base * > walk + stolen.base * rbi + stolen.base * walk + arbitr.elgible. * > error) > > Residuals: > Min 1Q Median 3Q Max > -2.29352 -0.28287 -0.03748 0.29790 1.31590 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 5.217e+00 3.467e-01 15.048 < 2e-16***> hit 6.927e-03 6.226e-03 1.112 0.266889 > rbi 1.908e-02 1.150e-02 1.658 0.098350.> strike.out -5.692e-03 4.586e-03 -1.241 0.215517 > free.agent.eligible1 1.287e+00 2.259e-01 5.699 3.05e-08***> free.agent.19911 3.828e-01 6.575e-01 0.582 0.560914 > arbitr.elgible.1 1.038e+00 2.195e-01 4.726 3.63e-06***> arbitr.19911 -1.024e+00 4.392e-01 -2.331 0.020443*> run 4.932e-02 2.905e-02 1.698 0.090682.> hr -1.093e-01 7.208e-02 -1.516 0.130543 > batting -1.814e-01 2.558e+00 -0.071 0.943522 > obp -1.375e+00 2.253e+00 -0.610 0.542099 > double -5.259e-02 4.489e-02 -1.172 0.242349 > walk 1.395e-02 9.757e-03 1.430 0.153808 > stolen.base -1.685e-02 4.299e-02 -0.392 0.695372 > triple -1.367e-01 1.600e-01 -0.854 0.393807 > error -4.097e-03 6.879e-03 -0.595 0.552007 > hit:free.agent.19911 8.248e-04 4.611e-03 0.179 0.858174 > hit:arbitr.elgible.1 4.873e-03 6.448e-03 0.756 0.450395 > hit:rbi -1.382e-04 7.709e-05 -1.792 0.074184.> rbi:free.agent.eligible1 5.352e-03 9.555e-03 0.560 0.575855 > rbi:arbitr.elgible.1 -3.384e-03 1.136e-02 -0.298 0.766072 > rbi:arbitr.19911 3.596e-02 2.179e-02 1.650 0.100046 > hit:strike.out 5.480e-06 5.446e-05 0.101 0.919917 > strike.out:free.agent.eligible1 -2.570e-03 4.282e-03 -0.600 0.548890 > strike.out:arbitr.elgible.1 -9.703e-04 5.234e-03 -0.185 0.853068 > strike.out:run 1.685e-04 1.246e-04 1.352 0.177345 > strike.out:hr -3.088e-04 2.277e-04 -1.356 0.176229 > hit:free.agent.eligible1 -1.359e-03 6.224e-03 -0.218 0.827363 > free.agent.eligible1:run 1.248e-02 9.109e-03 1.370 0.171917 > strike.out:free.agent.19911 -1.851e-02 5.974e-03 -3.099 0.002140**> free.agent.19911:batting 7.076e-01 6.200e+00 0.114 0.909215 > free.agent.19911:obp -1.421e+00 3.952e+00 -0.360 0.719394 > arbitr.elgible.1:run -8.541e-03 8.773e-03 -0.974 0.331100 > batting:double 2.346e-01 1.609e-01 1.458 0.145884 > run:obp -1.825e-01 7.492e-02 -2.436 0.015462*> hr:obp 3.687e-01 2.116e-01 1.742 0.082608.> walk:stolen.base -6.789e-05 1.557e-04 -0.436 0.663083 > hit:arbitr.19911 -5.835e-03 7.084e-03 -0.824 0.410808 > free.agent.eligible1:double -1.151e-02 1.663e-02 -0.692 0.489334 > arbitr.elgible.1:double 2.169e-03 1.938e-02 0.112 0.910985 > strike.out:triple -8.106e-04 6.023e-04 -1.346 0.179475 > batting:triple 5.179e-01 5.599e-01 0.925 0.355841 > walk:triple 8.755e-04 9.262e-04 0.945 0.345349 > hit:hr -3.320e-04 2.626e-04 -1.264 0.207180 > rbi:hr 4.748e-04 3.015e-04 1.575 0.116414 > free.agent.eligible1:hr 1.840e-02 2.313e-02 0.796 0.426972 > free.agent.19911:hr 7.216e-02 1.889e-02 3.819 0.000165***> arbitr.elgible.1:hr 4.111e-02 2.803e-02 1.467 0.143564 > arbitr.19911:hr -2.368e-02 4.647e-02 -0.510 0.610723 > hit:walk 3.173e-05 7.826e-05 0.405 0.685442 > free.agent.eligible1:walk -5.423e-03 4.984e-03 -1.088 0.277472 > rbi:walk -7.569e-05 1.313e-04 -0.577 0.564598 > rbi:stolen.base 3.980e-05 1.605e-04 0.248 0.804409 > strike.out:stolen.base -2.611e-04 1.615e-04 -1.617 0.107004 > batting:stolen.base 1.552e-01 1.434e-01 1.082 0.280020 > arbitr.elgible.1:error 3.930e-03 1.390e-02 0.283 0.777495 > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.4925 on 280 degrees of freedom > Multiple R-Squared: 0.854, Adjusted R-squared: 0.8248 > F-statistic: 29.24 on 56 and 280 DF, p-value: < 2.2e-16 >______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> From: Peter Flom > > If variables are colinear, then looking at interactions among them > doesn't make much sense. High collinearity means that one variable is > nearly a linear combination of others. IOW, that variable is > not adding > much information. So, if you look at the interaction, you are ALMOST > looking at a quadratic (e.g., if the collinearity involves only 2 > variables, then one is very similar to the other, so X1*X2 is almost > X1*X1). The output will be confusing, to say the least. > > Worse, when you include collinear variables, the resulting equation is > highly sensitive to small (sometimes very small) changes in the data. > Belsley gives an example where changes in the third decimal > place result > in totally different equations. > > For details see Belsley's book titled something like "collinearity and > weak data in regression" (sorry, the book and my files are at the > office, but this should let you find itI guess you're referring to: "Conditioning Diagnostics: Collinearity and Weak Data in Regression" (Wiley, 1992, rather pricey...). Hocking has a plot that shows the effect of collinearity in a paper from the early '80s (the "picket fence"). The plot is used on the cover of his latest linear model book, also published by Wiley, now in 2nd edition. [An exercise for R newbies: Try reproducing that plot in R, probably using the Scaterplot3D package.] Best, Andy> HTH > > Peter L. Flom, PhD > Assistant Director, Statistics and Data Analysis Core > Center for Drug Use and HIV Research > National Development and Research Institutes > 71 W. 23rd St > www.peterflom.com > New York, NY 10010 > (212) 845-4485 (voice) > (917) 438-0894 (fax) > > > >>> "Devshruti Pahuja" <devshruti at hotmail.com> 06/11/04 5:35 AM >>> > Hi > > I have a set of data with both quantitative and categorical > predictors. > After scaling of response variable, i looked for > multicollinearity (VIF > values) among the predictors and removed the predictors who > were hinding > some of the > other significant predictors. I'm curious to know whether the > predictors > (who are not significant) while doing simple 'lm' will be involved in > interactions. How do i take into > account interactions of those predictors whom i removed just on the > basis > of multicollinearity ? > > I'll appreciate if someone can throw some light on this > matter and how > to > use R to detect the interactions effectively . > > Thanks > > Regards > Dev > > > ------Final 'lm model'-------------------- > > > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) > ~ hit+rbi + > walk > > + obp + > strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.) > > > summary(logmodelfull_minus_run_hr_walk_batting) > > > > Call: > > lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out + > > free.agent.eligible + free.agent.1991 + arbitr.elgible.) > > > > Residuals: > > Min 1Q Median 3Q Max > > -2.41786 -0.28911 -0.02814 0.31890 1.49007 > > > > Coefficients: > > Estimate Std. Error t value Pr(>|t|) > > (Intercept) 5.340782 0.251218 21.260 < 2e-16 *** > > hit 0.004479 0.001158 3.867 0.000133 *** > > rbi 0.011102 0.002195 5.059 7.05e-07 *** > > walk 0.005421 0.002206 2.457 0.014533 * > > obp -1.385584 0.824105 -1.681 0.093653 . > > strike.out -0.005399 0.001438 -3.755 0.000205 *** > > free.agent.eligible1 1.611521 0.080657 19.980 < 2e-16 *** > > free.agent.19911 -0.301243 0.103481 -2.911 0.003848 ** > > arbitr.elgible.1 1.293059 0.086696 14.915 < 2e-16 *** > > --- > > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > > > Residual standard error: 0.5351 on 328 degrees of freedom > > Multiple R-Squared: 0.7981, Adjusted R-squared: 0.7932 > > F-statistic: 162.1 on 8 and 328 DF, p-value: < 2.2e-16 > > > > > -------------------------------------------------------------- > ------------ > -- > > ---------------------------------------------------- > > > > > > --------------with > > > interactions-------------------------------------------------- > -------------- > > --------------------------- > > > > > > > > summary(baseball.lgmodel_with_interactions_ALL_arbid) > > > > Call: > > lm(formula = log(salary) ~ hit + rbi + strike.out + > free.agent.eligible + > > free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 + > > hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible + > > rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out + > > strike.out * free.agent.eligible + strike.out * > arbitr.elgible. + > > strike.out * run + strike.out * hr + hit * free.agent.eligible + > > free.agent.eligible * run + hit * free.agent.1991 + strike.out * > > free.agent.1991 + free.agent.1991 * batting + free.agent.1991 * > > obp + arbitr.elgible. * run + batting * double + obp * run + > > obp * hr + walk * stolen.base + hit * arbitr.1991 + > free.agent.eligible > > * > > double + arbitr.elgible. * double + strike.out * triple + > > triple * batting + triple * walk + triple * walk + hit * > > hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 * > > hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk + > > free.agent.eligible * walk + walk * rbi + rbi * stolen.base + > > strike.out * stolen.base + stolen.base * batting + stolen.base * > > walk + stolen.base * rbi + stolen.base * walk + > arbitr.elgible. * > > error) > > > > Residuals: > > Min 1Q Median 3Q Max > > -2.29352 -0.28287 -0.03748 0.29790 1.31590 > > > > Coefficients: > > Estimate Std. Error t > value Pr(>|t|) > > (Intercept) 5.217e+00 3.467e-01 > 15.048 < 2e-16 > *** > > hit 6.927e-03 6.226e-03 > 1.112 0.266889 > > rbi 1.908e-02 1.150e-02 > 1.658 0.098350 > . > > strike.out -5.692e-03 4.586e-03 > -1.241 0.215517 > > free.agent.eligible1 1.287e+00 2.259e-01 > 5.699 3.05e-08 > *** > > free.agent.19911 3.828e-01 6.575e-01 > 0.582 0.560914 > > arbitr.elgible.1 1.038e+00 2.195e-01 > 4.726 3.63e-06 > *** > > arbitr.19911 -1.024e+00 4.392e-01 > -2.331 0.020443 > * > > run 4.932e-02 2.905e-02 > 1.698 0.090682 > . > > hr -1.093e-01 7.208e-02 > -1.516 0.130543 > > batting -1.814e-01 2.558e+00 > -0.071 0.943522 > > obp -1.375e+00 2.253e+00 > -0.610 0.542099 > > double -5.259e-02 4.489e-02 > -1.172 0.242349 > > walk 1.395e-02 9.757e-03 > 1.430 0.153808 > > stolen.base -1.685e-02 4.299e-02 > -0.392 0.695372 > > triple -1.367e-01 1.600e-01 > -0.854 0.393807 > > error -4.097e-03 6.879e-03 > -0.595 0.552007 > > hit:free.agent.19911 8.248e-04 4.611e-03 > 0.179 0.858174 > > hit:arbitr.elgible.1 4.873e-03 6.448e-03 > 0.756 0.450395 > > hit:rbi -1.382e-04 7.709e-05 > -1.792 0.074184 > . > > rbi:free.agent.eligible1 5.352e-03 9.555e-03 > 0.560 0.575855 > > rbi:arbitr.elgible.1 -3.384e-03 1.136e-02 > -0.298 0.766072 > > rbi:arbitr.19911 3.596e-02 2.179e-02 > 1.650 0.100046 > > hit:strike.out 5.480e-06 5.446e-05 > 0.101 0.919917 > > strike.out:free.agent.eligible1 -2.570e-03 4.282e-03 > -0.600 0.548890 > > strike.out:arbitr.elgible.1 -9.703e-04 5.234e-03 > -0.185 0.853068 > > strike.out:run 1.685e-04 1.246e-04 > 1.352 0.177345 > > strike.out:hr -3.088e-04 2.277e-04 > -1.356 0.176229 > > hit:free.agent.eligible1 -1.359e-03 6.224e-03 > -0.218 0.827363 > > free.agent.eligible1:run 1.248e-02 9.109e-03 > 1.370 0.171917 > > strike.out:free.agent.19911 -1.851e-02 5.974e-03 > -3.099 0.002140 > ** > > free.agent.19911:batting 7.076e-01 6.200e+00 > 0.114 0.909215 > > free.agent.19911:obp -1.421e+00 3.952e+00 > -0.360 0.719394 > > arbitr.elgible.1:run -8.541e-03 8.773e-03 > -0.974 0.331100 > > batting:double 2.346e-01 1.609e-01 > 1.458 0.145884 > > run:obp -1.825e-01 7.492e-02 > -2.436 0.015462 > * > > hr:obp 3.687e-01 2.116e-01 > 1.742 0.082608 > . > > walk:stolen.base -6.789e-05 1.557e-04 > -0.436 0.663083 > > hit:arbitr.19911 -5.835e-03 7.084e-03 > -0.824 0.410808 > > free.agent.eligible1:double -1.151e-02 1.663e-02 > -0.692 0.489334 > > arbitr.elgible.1:double 2.169e-03 1.938e-02 > 0.112 0.910985 > > strike.out:triple -8.106e-04 6.023e-04 > -1.346 0.179475 > > batting:triple 5.179e-01 5.599e-01 > 0.925 0.355841 > > walk:triple 8.755e-04 9.262e-04 > 0.945 0.345349 > > hit:hr -3.320e-04 2.626e-04 > -1.264 0.207180 > > rbi:hr 4.748e-04 3.015e-04 > 1.575 0.116414 > > free.agent.eligible1:hr 1.840e-02 2.313e-02 > 0.796 0.426972 > > free.agent.19911:hr 7.216e-02 1.889e-02 > 3.819 0.000165 > *** > > arbitr.elgible.1:hr 4.111e-02 2.803e-02 > 1.467 0.143564 > > arbitr.19911:hr -2.368e-02 4.647e-02 > -0.510 0.610723 > > hit:walk 3.173e-05 7.826e-05 > 0.405 0.685442 > > free.agent.eligible1:walk -5.423e-03 4.984e-03 > -1.088 0.277472 > > rbi:walk -7.569e-05 1.313e-04 > -0.577 0.564598 > > rbi:stolen.base 3.980e-05 1.605e-04 > 0.248 0.804409 > > strike.out:stolen.base -2.611e-04 1.615e-04 > -1.617 0.107004 > > batting:stolen.base 1.552e-01 1.434e-01 > 1.082 0.280020 > > arbitr.elgible.1:error 3.930e-03 1.390e-02 > 0.283 0.777495 > > --- > > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > > > Residual standard error: 0.4925 on 280 degrees of freedom > > Multiple R-Squared: 0.854, Adjusted R-squared: 0.8248 > > F-statistic: 29.24 on 56 and 280 DF, p-value: < 2.2e-16 > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >