Dear useRs (called Frank Harrell, most likely), after having preached for years to my medical colleagues to be cautious with stepwise selection procedures, they chanted back asking for an alternative when using mixed models. There is a half dozen laXXX packages around for all types of linear models, but as far I see there is none for mixed models such as lme. Even boot.stepAIC (which I considered valid, but Frank objects) is not for lme. I found one SPSS macro around that comes close. Any ideas? Dieter
I have been working on lasso type model selection
using AD Model Builders random effects module.
I came across a few questions about this topic on the list
so I thought this would be of interest.
The advantage of AD Model Builder is that it lets you formulate
general nonlinear random effects models which is I believe
necessary for the lasso type approach. I think the approach would work
for any kind of RE model such as say a negative binomial mixed model,
but lets keep it simple for now.
Let the fixed effect parameters be a_i
The usual lasso penalty then is
p* sum_i abs(a_i)
with tunable penalty weight
Since AD Model Builders random effects module can deal with arbitrary
nonlinear random effects models. I tried simply adding this to the -LL.
However what happens is that in order to get the penalty large enough to
get the lasso effect the estimation procedure simply makes all the a_i
too small and uses the RE's instead to fit the data. I figured what was
needed was a penalty term which is more encouraging for setting most of
the terms to 0 and making a few of them large. Using a square root
would have this property so I tried
p* sum_i sqrt*(abs(a_i))
Of course the thing is not differentiable at 0 so I smoothed it off a
bit there, but that is the general idea.
To test it I generated data of the form
Y_ij = a(1) *X_1i + u_i +e_ij
for 1<=i<=500 1<=j<=5
and a(1)=2.0 and std_dev u_i =0.6 std dev e_ij = 1.5
and the IV X_1i is std normal.
For the other IV's I generated 9 more IV's X_ki for 2<=k<=10
with X_ki = X_1i + .1*V_ki
where the V_li's are std normal
The correlation matrix for them is
1 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995
0.995 1 0.99 0.99 0.989 0.988 0.99 0.989 0.99 0.99
0.995 0.99 1 0.99 0.989 0.99 0.99 0.99 0.989 0.99
0.995 0.99 0.99 1 0.99 0.989 0.99 0.99 0.99 0.99
0.995 0.989 0.989 0.99 1 0.989 0.99 0.989 0.99 0.99
0.995 0.988 0.99 0.989 0.989 1 0.99 0.99 0.989 0.99
0.995 0.99 0.99 0.99 0.99 0.99 1 0.991 0.99 0.991
0.995 0.989 0.99 0.99 0.989 0.99 0.991 1 0.988 0.99
0.995 0.99 0.989 0.99 0.99 0.989 0.99 0.988 1 0.99
0.995 0.99 0.99 0.99 0.99 0.99 0.991 0.99 0.99 1
so these are highly correlated explanatory variables.
Then I fit the model with no lasso penalty.
index name value std dev
1 a 3.1565e+00 1.2302e+00
2 a 6.7390e-02 3.8737e-01
3 a -1.9154e-01 4.0204e-01
4 a -2.7920e-01 3.9847e-01
5 a 3.0924e-02 3.9842e-01
6 a 7.6508e-02 3.8698e-01
7 a -5.3902e-01 4.1590e-01
8 a -2.4941e-01 4.0141e-01
9 a 2.5324e-01 3.8875e-01
10 a -2.9435e-01 4.1345e-01
11 sig_u 8.2845e-01 3.0015e-02
so a(1) is rather badly estimated and its estimated std dev is large.
One way to try and deal with this is ridge regression. This is like
adding the following quadratic penalty.
p* sum_i square(a_i)
Of course we know that this will have exactly the opposite effect to
what we want that is it tends to make all the estimated parameters non zero.
With a value of p=1.0 I got the following estimates.
index name value std dev
1 a 8.7729e-01 5.9032e-01
2 a 2.9461e-01 3.2352e-01
3 a 8.3218e-02 3.3546e-01
4 a 8.3569e-03 3.3438e-01
5 a 2.7345e-01 3.2984e-01
6 a 2.5172e-01 3.2822e-01
7 a -1.9834e-01 3.4721e-01
8 a 2.7677e-02 3.3404e-01
9 a 4.0135e-01 3.2855e-01
10 a 5.7042e-03 3.4325e-01
11 sig_u 8.3188e-01 3.0163e-02
So the std devs are smaller but the results are really bad.
Now the fun part. With a penalty weight p=10 and the
sqrt(abs()) penalty I got
index name value std dev
1 a 2.0191e+00 4.0619e-02
2 a 6.7300e-06 1.2917e-03
3 a 3.7718e-06 1.2915e-03
4 a 3.0421e-06 1.2914e-03
5 a 6.0954e-06 1.2917e-03
6 a 5.9105e-06 1.2916e-03
7 a 4.2511e-07 1.2914e-03
8 a 2.4084e-06 1.2914e-03
9 a 8.3686e-06 1.2919e-03
10 a 2.7657e-06 1.2914e-03
11 sig_u 8.3226e-01 3.0119e-02
So this looks like a promising approach.
Of course the real test for this is does it
work for real data? If anyone wants to cooperate
on this it could be fun.
Dave
--
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com
Dear Dieter There has been some effort examining Lasso-type estimators for mixed models. It is more involved than with other type of models. We have studied Gaussian and generalized linear mixed models including a Lasso-type penalty for the fixed effects and we are now able to provide two packages: lmmlasso and glmmlasso. Both are available from R-Forge (the former even from CRAN). Andreas Groll provides the glmmLasso package, which is available from CRAN. Juerg Schelldorfer -- View this message in context: http://r.789695.n4.nabble.com/lasso-based-selection-for-mixed-model-tp887308p4228495.html Sent from the R help mailing list archive at Nabble.com.