thr3ads.net - R help - [R] Dummy variables or factors? [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Luciano La Sala

2009-Oct-20 20:00 UTC

[R] Dummy variables or factors?

Dear R-people, 

I am analyzing epidemiological data using GLMM using the lmer package. I usually
explore the assumption of linearity of continuous variables in the logit of the
outcome by creating 4 categories of the variable, performing a bivariate
logistic regression, and then plotting the coefficients of each category against
their mid points. That gives me a pretty good idea about the linearity
assumption and possible departures from it.

I know of people who create 0,1 dummy variables in order to relax the linearity
assumption. However, I've read that dummy variables are never needed (nor
are desireble) in R! Instead, one should make use of factors variable. That is
much easier to work with than dummy variables and the model itself will create
the necessary dummy variables.

Having said that, if my data violates the linearity assumption, does the use of
a factors for the variable in question helps overcome the lack of linearity?

Thanks in advance, 

Luciano     



      Yahoo! Cocina

Encontra las mejores recetas con Yahoo! Cocina.


http://ar.mujer.yahoo.com/cocina/

David Winsemius

2009-Oct-20 22:44 UTC

head link

[R] Dummy variables or factors?

On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote:
> Dear R-people,
>
> I am analyzing epidemiological data using GLMM using the lmer  
> package. I usually explore the assumption of linearity of continuous  
> variables in the logit of the outcome by creating 4 categories of  
> the variable, performing a bivariate logistic regression, and then  
> plotting the coefficients of each category against their mid points.  
> That gives me a pretty good idea about the linearity assumption and  
> possible departures from it.
>
> I know of people who create 0,1 dummy variables in order to relax  
> the linearity assumption. However, I've read that dummy variables  
> are never needed (nor are desireble) in R! Instead, one should make  
> use of factors variable. That is much easier to work with than dummy  
> variables and the model itself will create the necessary dummy  
> variables.
>
> Having said that, if my data violates the linearity assumption, does  
> the use of a factors for the variable in question helps overcome the  
> lack of linearity?
>No. If done by dividing into samall numbers of categories after  
looking at the data, it merely creates other (and probably more  
severe) problems. If you are in the unusal (although desirable)  
position of having a large number of events across the range of the  
covariates in your data, you may be able to cut your variable into  
quintiles or deciles and analyze the resulting factor, but the  
preferred approach would be to fit a regression spline of sufficient  
complexity.
> Thanks in advance.
-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Maybe Matching Threads

Search for more possibly parallel threads

R help - Oct 2009 - Dummy variables or factors?

[R] Dummy variables or factors?

[R] Dummy variables or factors?

Maybe Matching Threads