Hello Running R2.9.2 on Windows XP I am puzzled by the performance of LME in situations where there are missing data. As I understand it, one of the strengths of this sort of model is how well it deals with missing data, yet lme requires nonmissing data. Thus, m1.mod1 <- lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum) causes an error in na.fail.default, but adding na.action = "na.omit" makes a model with no errors. However, if I create that model, i.e. m1.mod1 <- lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum, na.action = "na.omit") then the diagnostic plots suggested in Pinheiro & Bates produce errors; e.g. plot(m1.mod1, schoolnum~resid(.), abline = 0) gives an error "could not find function "NaAct". Searching the archives showed a similar question from 2007, but did not show any responses. Thanks for any help Peter ) Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom
Peter Flom wrote:>> I am puzzled by the performance of LME in situations where there are >> missing data. As I >> understand it, one of the strengths of this sort of model is how well it >> deals with missing >> data, yet lme requires nonmissing data.You are confusing missing data with an unbalanced design. A strengths of LME is that it copes with the latter, which aov() does not. Regards, Mark. Peter Flom-2 wrote:> > Hello > > Running R2.9.2 on Windows XP > > I am puzzled by the performance of LME in situations where there are > missing data. As I understand it, one of the strengths of this sort of > model is how well it deals with missing data, yet lme requires nonmissing > data. > > Thus, > > m1.mod1 <- lme(fixed = math_1 ~ I(year-2007.5)*TFC_, > data = long, > random = ~I(year-2007.5)|schoolnum) > > causes an error in na.fail.default, but adding na.action = "na.omit" makes > a model with no errors. However, if I create that model, i.e. > > m1.mod1 <- lme(fixed = math_1 ~ I(year-2007.5)*TFC_, > data = long, > random = ~I(year-2007.5)|schoolnum, > na.action = "na.omit") > > then the diagnostic plots suggested in Pinheiro & Bates produce errors; > e.g. > > plot(m1.mod1, schoolnum~resid(.), abline = 0) > > gives an error "could not find function "NaAct". > > > Searching the archives showed a similar question from 2007, but did not > show any responses. > > Thanks for any help > > Peter > > ) > > Peter L. Flom, PhD > Statistical Consultant > Website: www DOT peterflomconsulting DOT com > Writing; http://www.associatedcontent.com/user/582880/peter_flom.html > Twitter: @peterflom > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/Missing-data-and-LME-models-and-diagnostic-plots-tp25996584p25998546.html Sent from the R help mailing list archive at Nabble.com.
I wrote>>> I am puzzled by the performance of LME in situations where there are >>> missing data. As I >>> understand it, one of the strengths of this sort of model is how well it >>> deals with missing >>> data, yet lme requires nonmissing data. >Mark Difford replied>You are confusing missing data with an unbalanced design. A strengths of LME >is that it copes with the latter, which aov() does not. >Thanks for your reply, but I don't believe I am confused with respect to missing data in mixed models. See e.g. Hedeker and Gibbons, Longitudinal Data Analysis, which repeatedly stresses that mixed models provide good estimates if the data are missing at random. Regards Peter Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom
Kingsford Jones
2009-Oct-22 03:29 UTC
[R] Missing data and LME models and diagnostic plots
Mixed models based on likelihood methods can often handle missing observations within subjects, but they not do well with missing individual elements in the design matrices (think unit nonresponse vs item nonresponse in the survey world). Continuing with the example I recently sent to you set.seed(0112358) library(nlme) school <- factor(rep(1:20, each=4)) year <- rep(2006:2009, 20) year.c <- year - mean(year) tmt <- sample(0:1, 20, replace = TRUE)[school] math <- rnorm(80, 2 + tmt + .001*year + .0001*tmt*year, 1.5) + rnorm(20)[school] tmt <- factor(tmt) dfr <- data.frame(math, school, tmt, year, year.c) rm(math, school, year, year.c, tmt) # Now create missing observations, # -- note just a single obs for the first school: dfr2 <- dfr[-c(2:6, sample(75, 5)), ] table(dfr2$school) # # school: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # numobs: 1 2 4 3 4 4 4 4 4 3 3 4 4 4 3 4 4 4 4 4 f3 <- lme(math ~ year.c*tmt, data = dfr2, random=~1|school) intervals(f3) #Approximate 95% confidence intervals # # Fixed effects: # lower est. upper #(Intercept) 3.0705975 3.81227729 4.5539571 #year.c -0.3913985 0.09107008 0.5735386 #tmt1 0.3146311 1.34895504 2.3832790 #year.c:tmt1 -0.4385680 0.20748194 0.8535319 #attr(,"label") #[1] "Fixed effects:" # # Random Effects: # Level: school # lower est. upper #sd((Intercept)) 0.35068 0.7334782 1.534134 # # Within-group standard error: # lower est. upper #1.226240 1.492044 1.815464 Note that even with the missing obs, in this instance we got pretty good estimates of the variance components (parameters: within-group sd = 1.5; between-school sd = 1). However, I had to change the seed from the one I used in the last email because, as is not uncommon with unbalanced data and relatively small sample sizes, there were convergence problems. If you were using classical MoM estimators fitting a mixed model with unbalanced data would lead to very precarious inferences (not that the likelihood inference isn't without it share of pitfalls, but that's another email...) hth, Kingsford On Wed, Oct 21, 2009 at 11:13 AM, Peter Flom <peterflomconsulting at mindspring.com> wrote:> Hello > > Running R2.9.2 on Windows XP > > I am puzzled by the performance of LME in situations where there are missing data. ?As I understand it, one of the strengths of this sort of model is how well it deals with missing data, yet lme requires nonmissing data. > > Thus, > > m1.mod1 <- lme(fixed = math_1 ~ I(year-2007.5)*TFC_, > ? ? ? ? ? ? ? ?data = long, > ? ? ? ? ? ? ? ?random = ~I(year-2007.5)|schoolnum) > > causes an error in na.fail.default, but adding na.action = "na.omit" makes a model with no errors. ?However, if I create that model, i.e. > > m1.mod1 <- lme(fixed = math_1 ~ I(year-2007.5)*TFC_, > ? ? ? ? ? ? ? data = long, > ? ? ? ? ? ? ? random = ~I(year-2007.5)|schoolnum, > ? ? ? ? ? ? ? na.action = "na.omit") > > then the diagnostic plots suggested in Pinheiro & Bates produce errors; e.g. > > plot(m1.mod1, schoolnum~resid(.), abline = 0) > > gives an error "could not find function "NaAct". > > > Searching the archives showed a similar question from 2007, but did not show any responses. > > Thanks for any help > > Peter > > ) > > Peter L. Flom, PhD > Statistical Consultant > Website: www DOT peterflomconsulting DOT com > Writing; http://www.associatedcontent.com/user/582880/peter_flom.html > Twitter: ? @peterflom > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >