Liviu Andronic
2011-Aug-23 07:35 UTC
[R] obtaining p-values for lm.ridge() coefficients (package 'MASS')
Dear all I'm familiarising myself with Ridge Regressions in R and the following is bugging me: How does one get p-values for the coefficients obtained from MASS::lm.ridge() output (for a given lambda)? Consider the example below (adapted from PRA [1]):> require(MASS) > data(longley) > gr <- lm.ridge(Employed ~ .,longley,lambda = seq(0,0.1,0.001)) > plot(gr) > select(gr)modified HKB estimator is 0.004275 modified L-W estimator is 0.0323 smallest value of GCV at 0.003> ##let's choose 0.03 for lambda > coef(gr)[gr$lam == 0.03,]GNP.deflator GNP Unemployed Armed.Forces Population Year -1620.429355 0.021060 0.007994 -0.013146 -0.007752 -0.101880 0.869116 But how does one obtain the customary 'lm' summary information for the model above? I tried supplying the chosen lambda to Design::ols() using its 'penalty' argument, but strangely the results differ. See below.> require(Design) > Design::ols(Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + Population + Year,data=longley,penalty = 0.03,tol=1e-12)Linear Regression Model Design::ols(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + Population + Year, data = longley, penalty = 0.03, tol = 1e-12) n Model L.R. d.f. R2 Sigma 16 78.22 7.05 0.9929 0.3438 Residuals: Min 1Q Median 3Q Max -0.42480 -0.17774 -0.02169 0.16834 0.77203 Coefficients: Value Std. Error t Pr(>|t|) Intercept -1580.022688 518.501881 -3.0473 0.016006 GNP.deflator 0.023161 0.068862 0.3363 0.745322 GNP 0.008351 0.015160 0.5509 0.596839 Unemployed -0.013057 0.002651 -4.9255 0.001178 Armed.Forces -0.007691 0.002097 -3.6671 0.006406 Population -0.096757 0.144240 -0.6708 0.521355 Year 0.847932 0.269614 3.1450 0.013812 Adjusted R-Squared: 0.9866> ##the above seems more similar to a lambda of 0.032 than the one required (0.03) > coef(gr)[gr$lam %in% c(0.03, 0.032),] ##lm.ridge outputGNP.deflator GNP Unemployed Armed.Forces Population Year 0.030 -1620 0.02106 0.007994 -0.01315 -0.007752 -0.10188 0.8691 0.032 -1580 0.02316 0.008351 -0.01306 -0.007691 -0.09676 0.8479 The difference between the coefficients of MASS::lm.ridge and Design::ols is small, but they're different nonetheless and I'm wondering if I'm doing something wrong. Perhaps Design::ols uses a penalty different from the one specified? Alternatively, is there some other way to perform diagnostics on Ridge Regression models? Thank you Liviu [1] http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf
Michael Friendly
2011-Aug-23 14:05 UTC
[R] obtaining p-values for lm.ridge() coefficients (package 'MASS')
On 8/23/2011 3:35 AM, Liviu Andronic wrote: [snip]> > But how does one obtain the customary 'lm' summary information for the > model above? I tried supplying the chosen lambda to Design::ols() > using its 'penalty' argument, but strangely the results differ. See > below. > >> require(Design) >> Design::ols(Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + Population + Year,data=longley,penalty = 0.03,tol=1e-12) >First, you should be using rms::ols, as Design is old. Second, penalty in ols() is not the same as the ridge constant in lm.ridge, but rather a penalty on the log likelihood. The documentation for ols refers to ?lrm for the description. Third, other ridge estimators (e.g., ElemStatLearn::simple.ridge also give slightly different results for the same data due to differences in the way scaling is handled. hth, -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA