thr3ads.net - R help - [R] Contrasts for ordered factors [Jan 2007]

If this information is useful, please help other people find it:
Share via:

lorenz.gygax at art.admin.ch

2007-Jan-08 09:13 UTC

[R] Contrasts for ordered factors

Dear all,

I do not seem to grasp how contrasts are set for ordered factors. Perhaps
someone can elighten me?

When I work with ordered factors, I would often like to be able to reduce the
used polynomial to a simpler one (where possible). Thus, I would like to
explicetly code the polynomial but ideally, the intial model (thus, the full
polynomial) would be identical to one with an ordered factor.

Here is a toy example with an explanatory variable (EV) with three distinct
values (1 to 3) and a continuous response variable (RV):

options (contrasts= c ('contr.treatment', 'contr.poly'))
example.df <- data.frame (EV= rep (1:3, 5))
set.seed (298)
example.df$RV <- 2 * example.df$EV + rnorm (15)

I evaluate this data using either an ordered factor or a polynomial with a
linear and a quadratic term:

lm.ord <- lm (RV ~ ordered (EV), example.df)
lm.pol <- lm (RV ~ EV + I(EV^2), example.df)

I then see that the estimated coefficients differ (and in other examples that I
have come across, it is often even more extreme):

coef (lm.ord)
(Intercept) ordered(EV).L ordered(EV).Q 
  3.9497767     2.9740535    -0.1580798 
coef (lm.pol)
(Intercept)            EV       I(EV^2) 
 -0.9015283     2.8774032    -0.1936074 

but the predictions are the same (except for some rounding):

table (round (predict (lm.ord), 6) == round (predict (lm.pol), 6))
TRUE 
  15 

I thus conclude that the two models are the same and are just using a different
parametrisation. I can easily interprete the parameters of the explicit
polynomial but I started to wonder about the parametrisation of the ordered
factor. In search of an answer, I did check the contrasts:

contr.poly (levels (ordered (example.df$EV)))
                .L         .Q
[1,] -7.071068e-01  0.4082483
[2,] -9.073264e-17 -0.8164966
[3,]  7.071068e-01  0.4082483

The linear part basically seems to be -0.707, 0 (apart for numerical rounding)
and 0.707. I can understand that any even-spaced parametrisation is possible for
the linear part. But does someone know where the value of 0.707 comes from (it
seems to be the square-root of 0.5, but why?) and why the middle term is not
exactly 0?

I do not understand the quadratic part at all. Wouldn't that need the be the
linear part to the power of 2?

Thank you for your thoughts! Lorenz
- 
Lorenz Gygax
Swiss Federal Veterinary Office
Centre for proper housing of ruminants and pigs
T?nikon, CH-8356 Ettenhausen / Switzerland

Professor Brian Ripley

2007-Jan-08 10:37 UTC

head link

[R] Contrasts for ordered factors

contr.poly is using orthoogonal polynomials: look at poly() for
further information.  It seems to me that you did not realize that,
or did not realize what they are, or ... and that may be enough of
a hint for you or you may need more help, in which case please ask
again.

lorenz.gygax at art.admin.ch wrote:> Dear all,
> 
> I do not seem to grasp how contrasts are set for ordered factors. Perhaps
someone can elighten me?
> 
> When I work with ordered factors, I would often like to be able to reduce
the used polynomial to a simpler one (where possible). Thus, I would like to
explicetly code the polynomial but ideally, the intial model (thus, the full
polynomial) would be identical to one with an ordered factor.
> 
> Here is a toy example with an explanatory variable (EV) with three distinct
values (1 to 3) and a continuous response variable (RV):
> 
> options (contrasts= c ('contr.treatment', 'contr.poly'))
> example.df <- data.frame (EV= rep (1:3, 5))
> set.seed (298)
> example.df$RV <- 2 * example.df$EV + rnorm (15)
> 
> I evaluate this data using either an ordered factor or a polynomial with a
linear and a quadratic term:
> 
> lm.ord <- lm (RV ~ ordered (EV), example.df)
> lm.pol <- lm (RV ~ EV + I(EV^2), example.df)
> 
> I then see that the estimated coefficients differ (and in other examples
that I have come across, it is often even more extreme):
> 
> coef (lm.ord)
> (Intercept) ordered(EV).L ordered(EV).Q 
>   3.9497767     2.9740535    -0.1580798 
> coef (lm.pol)
> (Intercept)            EV       I(EV^2) 
>  -0.9015283     2.8774032    -0.1936074 
> 
> but the predictions are the same (except for some rounding):
> 
> table (round (predict (lm.ord), 6) == round (predict (lm.pol), 6))
> TRUE 
>   15 
> 
> I thus conclude that the two models are the same and are just using a
different parametrisation. I can easily interprete the parameters of the
explicit polynomial but I started to wonder about the parametrisation of the
ordered factor. In search of an answer, I did check the contrasts:
> 
> contr.poly (levels (ordered (example.df$EV)))
>                 .L         .Q
> [1,] -7.071068e-01  0.4082483
> [2,] -9.073264e-17 -0.8164966
> [3,]  7.071068e-01  0.4082483
> 
> The linear part basically seems to be -0.707, 0 (apart for numerical
rounding) and 0.707. I can understand that any even-spaced parametrisation is
possible for the linear part. But does someone know where the value of 0.707
comes from (it seems to be the square-root of 0.5, but why?) and why the middle
term is not exactly 0?
> 
> I do not understand the quadratic part at all. Wouldn't that need the
be the linear part to the power of 2?
> 
> Thank you for your thoughts! Lorenz
> - 
> Lorenz Gygax
> Swiss Federal Veterinary Office
> Centre for proper housing of ruminants and pigs
> T?nikon, CH-8356 Ettenhausen / Switzerland
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard

2007-Jan-08 11:08 UTC

head link

[R] Contrasts for ordered factors

lorenz.gygax at art.admin.ch wrote:> Dear all,
>
> I do not seem to grasp how contrasts are set for ordered factors. Perhaps
someone can elighten me?
>
> When I work with ordered factors, I would often like to be able to reduce
the used polynomial to a simpler one (where possible). Thus, I would like to
explicetly code the polynomial but ideally, the intial model (thus, the full
polynomial) would be identical to one with an ordered factor.
>
> Here is a toy example with an explanatory variable (EV) with three distinct
values (1 to 3) and a continuous response variable (RV):
>
> options (contrasts= c ('contr.treatment', 'contr.poly'))
> example.df <- data.frame (EV= rep (1:3, 5))
> set.seed (298)
> example.df$RV <- 2 * example.df$EV + rnorm (15)
>
> I evaluate this data using either an ordered factor or a polynomial with a
linear and a quadratic term:
>
> lm.ord <- lm (RV ~ ordered (EV), example.df)
> lm.pol <- lm (RV ~ EV + I(EV^2), example.df)
>
> I then see that the estimated coefficients differ (and in other examples
that I have come across, it is often even more extreme):
>
> coef (lm.ord)
> (Intercept) ordered(EV).L ordered(EV).Q 
>   3.9497767     2.9740535    -0.1580798 
> coef (lm.pol)
> (Intercept)            EV       I(EV^2) 
>  -0.9015283     2.8774032    -0.1936074 
>
> but the predictions are the same (except for some rounding):
>
> table (round (predict (lm.ord), 6) == round (predict (lm.pol), 6))
> TRUE 
>   15 
>
> I thus conclude that the two models are the same and are just using a
different parametrisation. I can easily interprete the parameters of the
explicit polynomial but I started to wonder about the parametrisation of the
ordered factor. In search of an answer, I did check the contrasts:
>
> contr.poly (levels (ordered (example.df$EV)))
>                 .L         .Q
> [1,] -7.071068e-01  0.4082483
> [2,] -9.073264e-17 -0.8164966
> [3,]  7.071068e-01  0.4082483
>
> The linear part basically seems to be -0.707, 0 (apart for numerical
rounding) and 0.707. I can understand that any even-spaced parametrisation is
possible for the linear part. But does someone know where the value of 0.707
comes from (it seems to be the square-root of 0.5, but why?) and why the middle
term is not exactly 0?
>
> I do not understand the quadratic part at all. Wouldn't that need the
be the linear part to the power of 2?
>
>   These are orthogonal polynomials. 

To see the main point, try
> M <- cbind(1,contr.poly (3))
> M
                  .L         .Q

[1,] 1 -7.071068e-01  0.4082483

[2,] 1 -7.850462e-17 -0.8164966

[3,] 1  7.071068e-01  0.4082483
> zapsmall(crossprod(M))
     .L .Q

   3  0  0

.L 0  1  0

.Q 0  0  1


This parametrization has better numerical properties than the
straightforward 1,x,x^2,... , especially in balanced designs.

(SOAPBOX: Some, including me, feel that  having polynomials as default
contrasts for ordered factors is a bit of a design misfeature - It was
inherited from S, but assigning equidistant numerical values to ordered
groups isn't really well-founded, and does become plainly wrong when the
levels are really something like 0, 3, 6, 12, 18, 24 months.)
 

-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Apparently Analagous Threads

Search for more maybe matching threads

R help - Jan 2007 - Contrasts for ordered factors

[R] Contrasts for ordered factors

[R] Contrasts for ordered factors

[R] Contrasts for ordered factors

Apparently Analagous Threads