Jacob Wegelin
2003-Nov-05 00:16 UTC
[R] Re: [S] LME - fixed effects model and missing values
Here is an answer to a 1999 post. I didn't find a direct answer anywhere on the Web, perhaps because it is "obvious" once one sees it. Suppose you have data from a longitudinal study, where each subject was measured *up to* four times, with missing measurements, so that the data look like this:> MAT<- structure(c(23, 24, 6, 19, 16, 20, 13, 11, NA, 8, NA, 21, 19, 15,11, NA, 10, 12, 16, 30, 13, 16, NA, NA, NA, NA, 28, 6), .Dim = c(7, 4), .Dimnames = list(c("SUB1", "SUB2", "SUB3", "SUB4", "SUB5", "SUB6", "SUB7"), c("BDI0", "BDI3", "BDI6", "BDI12")))> MATBDI0 BDI3 BDI6 BDI12 SUB1 23 11 11 16 SUB2 24 NA NA NA SUB3 6 8 10 NA SUB4 19 NA 12 NA SUB5 16 21 16 NA SUB6 20 19 30 28 SUB7 13 15 13 6 Let> x<- c(0,3,6,12)Then it's a simple matter to regress each row of MAT on x, in spite of missing values. Take for instance a row that has three missing values:> lm(MAT["SUB2",] ~ x)Call: lm(formula = MAT["SUB2", ] ~ x) Coefficients: (Intercept) x 24 NA But now suppose we want to fit a linear mixed-effects model, borrowing strength from all the data in estimating each individual's intercept and slope. As in the naive lm approach, NAs are not a problem. Unlike the naive lm approach, however, we obtain values for slope and intercept even for subjects for whom we cannot get slope via the lm approach, such as SUB2. First, you must create a data frame with one row for each time point within each subject, and you need to create a column to keep track of the time variable. For instance: d<-data.frame(sub=I(rep("", length(MAT))), time=rep(NA, length(MAT)), BDI=rep(NA, length(MAT)) ) count<-0 for (i in 1:nrow(MAT)) for (j in 1:ncol(MAT)) { count<-count+1 d[count, "sub"]<- dimnames(MAT) [[1]][i] d[count, "time"]<- c(0,3,6,12) [j] d[count, "BDI"]<- MAT[i,j] } d$sub<-factor(d$sub) Then you call lme, telling it to throw away rows with NAs in them. Note that this is *not* complete case analysis. We are not throwing away subjects (cases) who happen to have a missing BDI, just the time points that have missing BDI.> library("nlme") > mymodel<-lme( BDI ~ time, random=~ 1 + time | sub, data=d, na.action=na.omit) > mymodelLinear mixed-effects model fit by REML Data: d Log-restricted-likelihood: -63.70296 Fixed: BDI ~ time (Intercept) time 16.8317095 -0.1449387 Random effects: Formula: ~1 + time | sub Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 3.5621406 (Intr) time 0.4671455 0.886 Residual 4.1972943 Number of Observations: 21 Number of Groups: 7 We even get a slope for SUB2, for whom we have only one observation.> mymodel$coef$random$sub (Intercept) time SUB1 -0.4909769 -0.08181262 SUB2 3.0012854 0.34872811 SUB3 -4.5733486 -0.52161792 SUB4 -0.7908920 -0.13009388 SUB5 0.7516454 0.09528318 SUB6 4.4620873 0.61630713 SUB7 -2.3598006 -0.32679398 (Version: R 1.8.0 on Windows.) This responds to the following email, which may be found at http://www.biostat.wustl.edu/archives/html/s-news/1999-01/msg00122.html Jake Wegelin On Wed, 20 Jan 1999, ziv shkedy wrote:> > Dear Jose and all the other S+ users, > First I want to thank you and Prof. Brian Ripley for your helpful answers. > > I'll described the missing values problem in more details. > I'm modeling longitudinal data where each one of the subjects was measured > at few time points. > Some of the subjects dropout from the study, i.e. that the RESPONSE of these > subjects is missing. > The analysis in proc MIXED is likelihood based ignorable analysis (i.e. we > using all available cases at each time point), > in lme ,if i understood you correctly, if the response is missing then the > observation is omitted from the dataset. This is a complete case analysis. > My question is if there is a way to overcome this problem and to use all > observation at each time point (and in that way the output from proc MIXED > and lme will be the same). > > Now, I'm on my way to get the gls function. > Thank you once again for your help, Ziv. > > ========================================================> Ziv Shkedy > Biostatistics > Center for Statistics > Limburgs Universitair Centrum > Universitaire Campus, department WNI > B-3590 Diepenbeek, Belgium > Tel: +32-(0)11-26.82.57 > e-mail: ziv.shkedy at luc.ac.be > ========================================================>