Jack Luo
2009-Nov-16 19:22 UTC
[R] fitting a logistic regression with mixed type of variables
Hi, I am trying to fit a logistic regression using glm, but my explanatory variables are of mixed type: some are numeric, some are ordinal, some are categorical, say If x1 is numeric, x2 is ordinal, x3 is categorical, is the following formula OK? *model <- glm(y~x1+x2+x3, family=binomial(link="logit"), na.action=na.pass)* * * *Thanks,* * * *-Jack* [[alternative HTML version deleted]]
David Winsemius
2009-Nov-16 19:32 UTC
[R] fitting a logistic regression with mixed type of variables
On Nov 16, 2009, at 2:22 PM, Jack Luo wrote:> Hi, > > I am trying to fit a logistic regression using glm, but my explanatory > variables are of mixed type: some are numeric, some are ordinal, > some are > categorical, say > > If x1 is numeric, x2 is ordinal, x3 is categorical, is the following > formula > OK?The formula's certainly "OK". What may be non-OK will be your understanding of the output. The default handling of ordinal factors is a common source of questions to R-help, so read up first.> > *model <- glm(y~x1+x2+x3, family=binomial(link="logit"), > na.action=na.pass)*Why have you chosen that na.action option? -- David Winsemius, MD Heritage Laboratories West Hartford, CT
(Ted Harding)
2009-Nov-16 20:14 UTC
[R] fitting a logistic regression with mixed type of variabl
On 16-Nov-09 19:22:10, Jack Luo wrote:> Hi, > I am trying to fit a logistic regression using glm, but my > explanatory variables are of mixed type: some are numeric, > some are ordinal, some are categorical, say > > If x1 is numeric, x2 is ordinal, x3 is categorical, is the > following formula OK? > > model <- glm(y~x1+x2+x3, family=binomial(link="logit"), > na.action=na.pass) > > Thanks, > -JackSpeaking rather generally (the details will depend on the nature of your variables, and of what you want to find out), the formula itself is OK. What *is* important is to define your variables so as to respect their nature, so that the regression can handle them appropriately. For the quantitative variable x1, there should be no problem; you can leave it as it is (though in some applications a transform of it, such as log(x1) or sqrt(x1) may be better, of course). For the categorical variable x3, this should be treated as a factor whose levels are the categories. If the categories are represented alphabetically in the data (e.g. the values of x3 are "A","B","C") then x3 will be converted into a factor when the data are read in. Then it is only a matter of specifying what system of contrasts you want (see below). However, if the values of x3 are represented numerically (e.g. 1,2,3) then x3 should be explicitly converted into a factor: x3 <- factor(x3) with possible additional argument depending on whether you want to consider the levels as ordered. You should use ordered=TRUE if you want x3 to be treated as an ordered factor, ordered=FALSE if unordered: x3 <- factor(x3,ordered=TRUE) x3 <- factor(x3,ordered=FALSE) In the case that x3 was read in as a factor in the first place, you may still want to apply tghe above in order to force ordering or non-ordering. Read ?factor for more detail. Then there is the question of contrasts for x3. For unordered factors, probably "treatment contrasts" (which compare each level of the factor with a reference) may be most appropriate. For ordered factors, you may want to use either Helmert contrasts or "successive difference" contrasts. For "treatment contrasts" use contrasts(x3) <- contr.treatment(N) where n is the number of levels of x3 (see ?contrasts). For Helmert contrasts, similarly contrasts(x3) <- contr.helmert(n) For "successive difference" contrasts, there is a function contr.diffe which, however, is not available in the standard packages. However, there is a contr.diff() on package Epi, and an implementation is also developed in the MASS book. In that case contrasts(x3) <- contr.diff(n) Now for the ordinal variable, x2. Attitudes differ, in different applications, as to whether to use such a variable as if it were a numerical variable or as an ordered factor. If it can be considered meaningful to treat the ordered values as if they were numerical measure (i.e. the difference between x2=1 and x2=2 can be considered as effectively equivalent to the difference between x2=2 and x2=3, etc.) then it can be meaningful to simply treat x2 on the same footing as x1. On the other hand, you may only want to go as far as treating x2 as if it were an ordered factor, in which case you can treat it on the same lines as x3 above. However, an ordinal variable is often treated as if it were the index of a subdivision of a latent continuum. For example, a question might ask the respondent if he is "Strongly against", "Somewhat against" "Indifferent", "Somewhat in favour" "Strongly in favour" some proposal. This forces the respondent to decide which of these categories "best" represents their inner intensity of attitude towards the issue, which is the latent continuum. Such things can be treated by methods which fit latent variables to ordered responses, but this goes beyond what can be represented in a simple linear model such as you give above. I prefer to leave others, who really know about such things, to advise on how to proceed in such a case! Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 16-Nov-09 Time: 20:14:52 ------------------------------ XFMail ------------------------------