Diaz-Escamilla, Rafael E
2011-Aug-25 18:56 UTC
[R] Syntax for a three-level logistic model
Dear People at R help, I am trying to figure out the syntax for a three-level logistic model with a single random effect (intercept): Data Collected My data consist of three levels: level 1 is four setting for each student (setting nested within student), and each student is registered in one of 14 universities (students nested within university). More detailed: A. 2,479 students who have a dichotomous outcome, engaged in risk behavior (yes or no) in EACH of 4 different settings. Then I am using these settings as the level-1 units nested within the students, and the students as level-2 units. I also have three dichotomous covariates for each student: age (less than 21 and 21 and older), and gender (male and female), and weather the student got drunk in each of the four instances (yes and no). However, I am considering this last covariate ?drunk? as a level 1 covariate since the values for each individual student may change in each setting. B. The students, however, are also nested within 14 universities (Level-3 units) Data File My data file consists of the following columns: V1 is student ID (each student id appears four times to account for the outcome of each of the four settings). Student ID?s range from 1001 to 3,479 (total of 2,479) V2 and V3 contain the age and gender of each student (each of these values repeats 4 times, same as the student ID) V4, V5, and V6 contain the dummy coding of the 4 settings, say, as: V1 V2 V3 V4 V5 V6 ? 1001 0 1 0 0 0 1001 0 1 0 0 1 1001 0 1 0 1 0 1001 0 1 1 0 0 . . . . Etc. V7 contains the values of the covariate ?drunk? (whether the student was or not drunk during in a setting) V8 contains the ID of the university where the student is enrolled. Universities are coded with the numbers 11 ? 24 (14 universities total). V9 contains the dichotomous response variable: the outcome in each setting for each student. Model (random intercept only): Logit(p|G00) = P0 + P1(V4) + P2(V5) + P3(V6) + P4(V7) (Level 1, setting and whether ?drunk? in setting) P0 = B00 + B01(V2) + B02(V3) (Level 2, age and gender of student ) B00 = G00 (Level 3, university ? random effect) Questions: a) How to set the level 2 covariates (V2, V3) nested within student (V1) in the model below, is the nesting of students within universities correct as (1|V8) for the random intercept? model <- lmer(V9 ~ V4 + V5 + V6 + ?.+ (1|V8) ???, data=datos, family = "binomial") or should I use ?glmPQL? b) Do I need to change any coding in the column(s) of the data file? Any help would be greatly appreciated. Sincerely, Rafael Diaz Assistant Professor Dept of Mathematics and Statistics California State University Sacramento
Diaz-Escamilla, Rafael E <rdiaz <at> saclink.csus.edu> writes:>This question is really best for the r-sig-mixed-models at r-project.org mailing list: please direct any follow-ups there.> Data Collected My data consist of three levels: level 1 is four > setting for each student (setting nested within student), and each > student is registered in one of 14 universities (students nested > within university). More detailed:> A. 2,479 students who have a dichotomous outcome, engaged in risk > behavior (yes or no) in EACH of 4 different settings. Then I am > using these settings as the level-1 units nested within the > students, and the students as level-2 units. I also have three > dichotomous covariates for each student: age (less than 21 and 21 > and older), and gender (male and female), and weather the student > got drunk in each of the four instances (yes and no). However, I am > considering this last covariate ?drunk? as a level 1 covariate since > the values for each individual student may change in each setting.> B. The students, however, are also nested within 14 universities > (Level-3 units)> Data File> My data file consists of the following columns: V1 is student ID > (each student id appears four times to account for the outcome of > each of the four settings).> Student ID?s range from 1001 to 3,479 (total of 2,479)> V2 and V3 contain the age and gender of each student (each of these > values repeats 4 times, same as the student> ID) > V4, V5, and V6 contain the dummy coding of the 4 settings, say, as: > V1 V2 V3 V4 V5 V6 ? > 1001 0 1 0 0 0 > 1001 0 1 0 0 1 > 1001 0 1 0 1 0 > 1001 0 1 1 0 0 > . . . . > Etc.You don't need the dummy codings. It's probably better just to use a factor with the values of the four settings. R will do the work of translating the factor (with appropriate contrasts, treatment contrasts by default -- although with the first level rather than the last as the baseline unless you use contr.sas(). It will be easier to switch contrasts if you need to and easier to work with the results.> V7 contains the values of the covariate ?drunk? (whether the student > was or not drunk during in a setting)> V8 contains the ID of the university where the student is enrolled. > Universities are coded with the numbers 11 ? 24 (14 universities > total).> V9 contains the dichotomous response variable: the outcome in each > setting for each student. > > Model (random intercept only): > Logit(p|G00) = P0 + P1(V4) + P2(V5) + P3(V6) + P4(V7) > (Level 1, setting and whether ?drunk? in setting) > P0 = B00 + B01(V2) + B02(V3) > (Level 2, age and gender of student ) > B00 = G00 > (Level 3, university ? random effect) >I believe you just need glmer(riskbehavior ~ age+gender+setting+drunk+(1|student)+(1|university), family="binomial",data=data) The nesting of the fixed factors (age, gender, setting) doesn't have to be done explicitly. If your nesting is implicit (i.e. all students have unique IDs, rather than being listed 1,...,N_j within each university) then you don't have to worry about nesting either. (If your nesting is explicit, i.e. students have non-unique IDs, then you need (1|university/student) instead.) glmmPQL is dubious (there are only 4 binomial responses per "cluster" (student)) although it might be worth a try just to see. There you need to specify the random effect as something ~1|university/student . If the simple model works I would have further questions like: * should there be an age:gender interaction in the model? * how about variation in (say) 'settings' across universities, (settings|university) [or (1|settings:university) for a slightly more constrained model] ?