thr3ads.net - R help - [R] Help with categorical predicrots in regression models [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Pamela Foggia

2015-Jun-19 21:32 UTC

[R] Help with categorical predicrots in regression models

Hello,
In my regression models (linear and logistic models) I have two predictor
variables, both are categorical variables: DEGREE and REGION.

DEGREE is for educational level, that is an ordinal variable with five
levels (0-LT HIGH SCHOOL, 1-HIGH SCHOOL, 2-JUNIOR COLLEGE, 3-BACHELOR,
4-GRADUATE).

REGION is for the region of the respondent, that is a nominal variable with
9 levels (1-NEW ENGLAND, 2-MIDDLE ATLANTIC, 3-E. NOR. CENTRAL, 4-W. NOR.
CENTRAL, 5-SOUTH ATLANTIC, 6-E. SOU. CENTRAL, 7-W. SOU. CENTRAL, 8-
 MOUNTAIN, 9-PACIFIC).

In many examples I read that, in order to use correctly these predictors as
categorical variables, I have to use before the FACTOR function, for
example in this way

fit1 <- lm(Z ~ factor(X) + factor(Y))
fit2 <- glm(W ~ factor(x) + factor(Y),
family=binomial(link="logit"))

obtaining the following output for the logistic regression

                               coef.est coef.se
(Intercept)                 1.027    0.263
factor(DEGREE)1         0.301    0.134
factor(DEGREE)2         0.340    0.211
factor(DEGREE)3         0.748    0.168
factor(DEGREE)4         1.267    0.237
...

where clearly Z is a continuous variable and W is a binary variable. My
question is: as far as the ordinal variable X is concerned, would it be
more correct to use the ORDERED function rather than FACTOR? I mean an
operation like this

fit1 <- lm(Z ~ ordered(X) + factor(Y))
fit2 <- glm(W ~ ordered(x) + factor(Y),
family=binomial(link="logit"))

where I obtain a different output like this

                                    coef.est coef.se
(Intercept)                      1.558    0.241
ordered(DEGREE).L           0.942    0.157
ordered(DEGREE).Q          0.215    0.160
ordered(DEGREE).C          0.118    0.111
ordered(DEGREE)^4        -0.106    0.143
...

What do the letters L, Q, C and the power ^4 (which I find in the output)
mean?

Thanks in advance

	[[alternative HTML version deleted]]

David Winsemius

2015-Jun-20 05:05 UTC

head link

[R] Help with categorical predicrots in regression models

On Jun 19, 2015, at 2:32 PM, Pamela Foggia wrote:
> Hello,
> In my regression models (linear and logistic models) I have two predictor
> variables, both are categorical variables: DEGREE and REGION.
> 
> DEGREE is for educational level, that is an ordinal variable with five
> levels (0-LT HIGH SCHOOL, 1-HIGH SCHOOL, 2-JUNIOR COLLEGE, 3-BACHELOR,
> 4-GRADUATE).
> 
> REGION is for the region of the respondent, that is a nominal variable with
> 9 levels (1-NEW ENGLAND, 2-MIDDLE ATLANTIC, 3-E. NOR. CENTRAL, 4-W. NOR.
> CENTRAL, 5-SOUTH ATLANTIC, 6-E. SOU. CENTRAL, 7-W. SOU. CENTRAL, 8-
> MOUNTAIN, 9-PACIFIC).
> 
> In many examples I read that, in order to use correctly these predictors as
> categorical variables, I have to use before the FACTOR function,
Please do _not_ capitalize the `factor` function name. R is _not_ SAS.
> for
> example in this way
> 
> fit1 <- lm(Z ~ factor(X) + factor(Y))
> fit2 <- glm(W ~ factor(x) + factor(Y),
family=binomial(link="logit"))
> 
> obtaining the following output for the logistic regression
> 
>                               coef.est coef.se
> (Intercept)                 1.027    0.263
> factor(DEGREE)1         0.301    0.134
> factor(DEGREE)2         0.340    0.211
> factor(DEGREE)3         0.748    0.168
> factor(DEGREE)4         1.267    0.237
> ...
> 
> where clearly Z is a continuous variable and W is a binary variable. My
> question is: as far as the ordinal variable X is concerned, would it be
> more correct to use the ORDERED function rather than FACTOR?
That really would depend on the hypotheses under consideration, wouldn't it?
> I mean an
> operation like this
> 
> fit1 <- lm(Z ~ ordered(X) + factor(Y))
> fit2 <- glm(W ~ ordered(x) + factor(Y),
family=binomial(link="logit"))
> 
> where I obtain a different output like this
> 
>                                    coef.est coef.se
> (Intercept)                      1.558    0.241
> ordered(DEGREE).L           0.942    0.157
> ordered(DEGREE).Q          0.215    0.160
> ordered(DEGREE).C          0.118    0.111
> ordered(DEGREE)^4        -0.106    0.143
> ...
Clearly that output does not match the regression call.
> 
> What do the letters L, Q, C and the power ^4 (which I find in the output)
> mean?
The default set of contrasts for an ordered factor are the orthogonal polynomial
contrasts of degree (nothing to do with your factor name) n-1 where there are n
levels to the factor.

If this doesn't make sense, then you need to do further research to improve
your understanding of polynomial contrasts. (They are messy.) You can limit the
contrasts to only a linear "degree". You can find further information
regarding polynomial contrasts at ?contr.poly and ?C

It's possible that this will be helpful:

DEGREE <- C( DEGREE, poly, 1)  # Only linear contrast
fit2 <- glm(W ~ DEGREE + factor(REGION),
family=binomial(link="logit"))

And refrain from then using `ordered` in the formula.

-- 
David.> 
> Thanks in advance
> 
> 	[[alternative HTML version deleted]]
This is a mailing list that request plain text. It's not hard to do in
gmail.

Please read ....
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

R help - Jun 2015 - Help with categorical predicrots in regression models

[R] Help with categorical predicrots in regression models

[R] Help with categorical predicrots in regression models