thr3ads.net - R help - [R] Wrong contrast matrix for nested factors in lm(), rlm(), and lmRob() [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Saul Sternberg

2010-Dec-13 11:48 UTC

[R] Wrong contrast matrix for nested factors in lm(), rlm(), and lmRob()

This message also reports wrong estimates produced by lmRob.fit.compute()
for nested factors when using the correct contrast matrix.

And in these respects, I have found that S-Plus behaves the same way as R.

Using the three available contrast types (sum, treatment, helmert)
with lm() or lm.fit(), but just contr.sum with rlm() and lmRob(),
and small examples, I generated contrast matrices for four models
involving nested factors with fixed effects.  For three of the models
the matrices were incorrect.

-----------------------------------------------------------------
Details
---------
For lm() and rlm() I used two data frames:

In "same.df" the nested factor, D, has the same number of levels for
each level of the nesting factor, G:
   G  D  T1
1 g1 d1 -60
2 g1 d2 -50
3 g1 d3 -40
4 g2 d1  30
5 g2 d2  50
6 g2 d3  70

In "diff.df" the nested factor, D, has a different number of levels
for
the two levels of the nesting factor, G:
   G  D  T2
1 g1 d1 -60
2 g1 d2 -50
3 g1 d3 -40
4 g2 d1  20
5 g2 d2  40
6 g2 d3  60
7 g2 d4  80

(G, D are factors; T1, T2 are numeric)

For lmRob() I expanded the data frames to 600 or 700 rows by replicating
them 100 times and adding error to the observations.

For lm() the models and commands were
(1) model.matrix(lm(T1 ~ G + D%in%G, same.df))
(2) model.matrix(lm(T1 ~ D%in%G, same.df))
(3) model.matrix(lm(T2 ~ G + D%in%G, diff.df))
(4) model.matrix(lm(T2 ~ D%in%G, diff.df))

Using (1), all three types of contrast matrix were correctly generated
Using (2), the same incorrect contrast matrix was generated for all
        three contrast types.
Using (3), an incorrect contrast matrix was generated for each of the
        three contrast types.  For contr.treatment the error was an extra
        column of zeros; for the others the error was more serious.
Using (4), the same incorrect contrast matrix was generated for all
        three contrast types.

I used only contr.sum with rlm() and lmRob().  For model (1) the programs
worked correctly, but for models (2), (3), and (4) with the formulas
above, rlm() and lmRob() both reported that x was singular.

When x was the correct contrast matrix and T was the observation vector,
rlm(x,T) worked correctly for (2), (3), and (4), as did lm.fit(x,T).
However, whereas lmRob.fit.compute(x2=NULL,y=T,x1=x) worked correctly for
(3), the estimates it produced for (2) and (4) were radically wrong
(and were the same for different random seeds and initial algorithms).

-------------------------------------------------------------------
Questions:
-------------

(1) If there is a way to use lm(), rlm(), and lmRob() in such cases
so that they generate the correct contrast matrices (and the desired
parameter estimates), what is it?

(2) If there is no way to do this, is the best alternative for the user to
create the desired model matrices "by hand" and provide them as
arguments
to lim.fit(), rlm(), and lmRob.fit.compute()?  This would also require
that lmRob.fit.compute() generate the correct estimates.

(3) If one uses lm.fit() and lmRob.fit.compute() directly in this way,
then, given that one is warned against doing so, what are the dangers?

(4) According to cran.r-project.org/web/views/Robust.html, lmRob()
"makes use of the M-S algorithm of Maronna and Yohai (2000),
automatically when there are factors among the predictors (where
S-estimators (and hence MM-estimators) based on resampling typically
badly fail)."  Is there an alternative program that uses the M-S
algorithm, if lmRob() or lmRob.fit.compute() cannot be made to work? 

----------------------------------------------------------------
R%%>sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-redhat-linux-gnu (64-bit)
----------------------------------------------------------------
I'll be very grateful for any help.
Saul Sternberg, Psychology
University of Pennsylvania

peter dalgaard

2010-Dec-13 22:16 UTC

head link

[R] Wrong contrast matrix for nested factors in lm(), rlm(), and lmRob()

On Dec 13, 2010, at 12:48 , Saul Sternberg wrote:
> 
> This message also reports wrong estimates produced by lmRob.fit.compute()
> for nested factors when using the correct contrast matrix.
> 
> And in these respects, I have found that S-Plus behaves the same way as R.
> 
> Using the three available contrast types (sum, treatment, helmert)
> with lm() or lm.fit(), but just contr.sum with rlm() and lmRob(),
> and small examples, I generated contrast matrices for four models
> involving nested factors with fixed effects.  For three of the models
> the matrices were incorrect.
> 
You are using "correct" and "incorrect" in a very absolute
way. As far as I can tell, model.matrix is behaving according to its
conventions. These conventions may not always be what you want, or even what is
desirable for specific models, but that is not an error per se.

In particular, one feature of the R way of generating contrast matrices from
factors is that the rows of the result are the same, irrespective of which
factor levels are present in data, i.e. that the model.matrix() for a subset of
data is the corresponding subset of model.matrix() applied to all data. Breaking
that convention looks like a path to madness. (Actually, it has been broken,
historically, by the spline functions, and that did indeed lead to madness, at
least initially; google for safe.predict() if you care)

It is not unlikely that what you are really looking for is

interaction(G, D, drop=TRUE)

> -----------------------------------------------------------------
> Details
> ---------
> For lm() and rlm() I used two data frames:
> 
> In "same.df" the nested factor, D, has the same number of levels
for
> each level of the nesting factor, G:
>   G  D  T1
> 1 g1 d1 -60
> 2 g1 d2 -50
> 3 g1 d3 -40
> 4 g2 d1  30
> 5 g2 d2  50
> 6 g2 d3  70
> 
> In "diff.df" the nested factor, D, has a different number of
levels for
> the two levels of the nesting factor, G:
>   G  D  T2
> 1 g1 d1 -60
> 2 g1 d2 -50
> 3 g1 d3 -40
> 4 g2 d1  20
> 5 g2 d2  40
> 6 g2 d3  60
> 7 g2 d4  80
> 
> (G, D are factors; T1, T2 are numeric)
> 
> For lmRob() I expanded the data frames to 600 or 700 rows by replicating
> them 100 times and adding error to the observations.
> 
> For lm() the models and commands were
> (1) model.matrix(lm(T1 ~ G + D%in%G, same.df))
> (2) model.matrix(lm(T1 ~ D%in%G, same.df))
> (3) model.matrix(lm(T2 ~ G + D%in%G, diff.df))
> (4) model.matrix(lm(T2 ~ D%in%G, diff.df))
> 
> Using (1), all three types of contrast matrix were correctly generated
> Using (2), the same incorrect contrast matrix was generated for all
>        three contrast types.
> Using (3), an incorrect contrast matrix was generated for each of the
>        three contrast types.  For contr.treatment the error was an extra
>        column of zeros; for the others the error was more serious.
> Using (4), the same incorrect contrast matrix was generated for all
>        three contrast types.
> 
> I used only contr.sum with rlm() and lmRob().  For model (1) the programs
> worked correctly, but for models (2), (3), and (4) with the formulas
> above, rlm() and lmRob() both reported that x was singular.
> 
> When x was the correct contrast matrix and T was the observation vector,
> rlm(x,T) worked correctly for (2), (3), and (4), as did lm.fit(x,T).
> However, whereas lmRob.fit.compute(x2=NULL,y=T,x1=x) worked correctly for
> (3), the estimates it produced for (2) and (4) were radically wrong
> (and were the same for different random seeds and initial algorithms).
> 
> -------------------------------------------------------------------
> Questions:
> -------------
> 
> (1) If there is a way to use lm(), rlm(), and lmRob() in such cases
> so that they generate the correct contrast matrices (and the desired
> parameter estimates), what is it?
> 
> (2) If there is no way to do this, is the best alternative for the user to
> create the desired model matrices "by hand" and provide them as
arguments
> to lim.fit(), rlm(), and lmRob.fit.compute()?  This would also require
> that lmRob.fit.compute() generate the correct estimates.
> 
> (3) If one uses lm.fit() and lmRob.fit.compute() directly in this way,
> then, given that one is warned against doing so, what are the dangers?
> 
> (4) According to cran.r-project.org/web/views/Robust.html, lmRob()
> "makes use of the M-S algorithm of Maronna and Yohai (2000),
> automatically when there are factors among the predictors (where
> S-estimators (and hence MM-estimators) based on resampling typically
> badly fail)."  Is there an alternative program that uses the M-S
> algorithm, if lmRob() or lmRob.fit.compute() cannot be made to work? 
> 
> ----------------------------------------------------------------
> R%%>sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-redhat-linux-gnu (64-bit)
> ----------------------------------------------------------------
> I'll be very grateful for any help.
> Saul Sternberg, Psychology
> University of Pennsylvania
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Dec 2010 - Wrong contrast matrix for nested factors in lm(), rlm(), and lmRob()

[R] Wrong contrast matrix for nested factors in lm(), rlm(), and lmRob()

[R] Wrong contrast matrix for nested factors in lm(), rlm(), and lmRob()

Seemingly Similar Threads