Keith Chamberlain
2005-Dec-01 21:44 UTC
[R] LME & data with complicated random & correlational structures
Dear List, This is my first post, and I'm a relatively new R user trying to work out a mixed effects model using lme() with random effects, and a correlation structure, and have looked over the archives, & R help on lme, corClasses, & etc extensively for clues. My programming experience is minimal (1 semester of C). My mentor, who has much more programming experience, but a comparable level of knowledge of R, has been stumped. What follows are 3 questions pertaining to an lme() model, one on the nested hierarcy, 1 on a strategy for a piecewise approach to the variance given I have ~24 hours of data (sampled at 32Hz, 1hr per subject), and one on the corStruct or how to get rid of serial dependencies before lme(). I'm analyzing skin temperature continuously recorded at 32Hz in Baseline (10 min), Testing (~5 min), and Recovery (20 min) epochs of a face recognition experiment. Stimuli are the same in Baseline and Recovery (portrait or landscape), and in testing, participants were tested on their recognition of a list b&w portraits presented just before testing started. On some of the portraits 'learned' the eyes were masked, and in others, the eyes were visible. In testing, the portraits have no masking but the stimuli in testing are labeled "Eyes" and "NoEyes". The data structure looks as follows: Subj/Epoch/Stimuli/Time/Temperature There are 8 subjects 9 epochs - 6 of which were just "instruction" blocks, and one "Learning" period. Wrt lme(), I figured out how to use subset too isolate just the Baseline, Learning, and Testing Epochs (and avoid epochs with only 1 stimulus level, such as "instruction"). Data within each epoch are balanced wrt # trials, but not between epochs. Recovery has twice as many trials as Baseline, and Testing has about half. Time for each epoch is roughly that ratio too, although time in each trial differs. Stimuli are the same in Baseline & Recovery, but different in Testing, although there are 2 levels in each used epoch. Time & Temperature make up the time series, and Temperature is the dependent variable too stimulus. 1- are fixed effects and random effects discrete? That is, if I set something in my model formula as a fixed effect, then it does not make sense to set it as a random effect as well? The documentation (and posts) were not really clear on that point (not that the documentation technically 'should' be per say, just that I got confused). The nested hierarchy for what actually gets analyzed looks as follows: Subj/Epoch/Stimulus/Temperature Reasoning: there are several temperature samples recorded in each trial of Stimulus. Several stimuli in each Epoch, and all of the Epochs for one subject. Subject is random (theoretically) because of sampling in a population, Epoch would be fixed because all participants went through the same sequence of Epochs, but Stimulus varied randomly within an Epoch, which seems inconsistent when I apply it to the lme model as both a fixed and random effect. Temperature ~ Stimulus-1, random=Subj|Subj/Epoch/Stimulus Subset= Epoch=="Baseline" | Epoch=="Testing" | Epoch=="Recovery" I'm looking to correctly allocate error terms for between subjects (Subj) variability, and further delineate the within subject error between Epoch and Stimulus. The current model that I got to work (memory issues namely) is Temperature ~ Stimulus-1, random=Subj|Subj, which I decided to use to get the residuals to have the Subject variability accounted for and subtracted. Would a list of random structures work better? If so, is each item in the list structured just as the random formula? I haven't actually seen/found any examples of a list of random/nesting structures. 2- Is it possible to take a piecewise approach wrt the variance using lme(), such as modeling the variability of each subject first, then using further-nested terms in a model and the residuals from the previous? If so, what caveats exist for interpreting the variances? I'm not interpreting p-values at this point because of another issue. When I try to set up the correlation structure, I run out of memory fast. I've tried this on a mac G5, an HP Pavilion dv1000 (= Pentium 2.6GHz), and a Gateway with an AMD athalon 900MHz processors. Each system has 386M memory or more, one of which has 1G. 3- Is there a way to get rid of the serial dependency BEFORE running the model with LME(), such as initiating a corStruct before placing it in the model? I'm working with so much data that I'm fine with doing the process piecewise. An AR process was difficult because the residuals are not the same length as the data file that I started with. Serial dependencies still gota go, whether via the correlation term in lme() or some other method, because I'll soon be breaking up the variance into components via spectrum(). So I might as well add a 4th. What's the term that gets me too data after AR() has done it's work? I'm thinking that resid() wasn't right but data that the data differ from their original length prior to an AR process may be how its done. Rgds, KeithC. Psych Undergrad, CU Boulder RE McNair Scholar
Spencer Graves
2005-Dec-07 02:32 UTC
[R] LME & data with complicated random & correlational structures
Have you received any replies to this post? I haven't seen any, so I will attempt a few comments. First, I'm overwhelmed with the details in your discussion. I suggest that for each of your question, you try to think of an extremely simple example that would test your question, as suggested in the Posting Guide (www.R-project.org/posting-guide.html). If you have problems with that, please submit a question focused on that one thing. Have you looked at Pinheiro and Bates (2000) Mixed Effects Models in S and S-Plus (Springer)? If no, I suggest you take a hard look at this book. I was unable to get anything sensible from "lme" until I started working through this book. This is the primary reference for "lme", and for me at least, it was essential to understanding what I needed to do to get anything useful from "lme". You don't follow all the math in this book to get something useful out of it, because it includes many worked examples that should go a long way to answering many questions you might have about "lme". 1. "if I set something in my model formula as a fixed effect, then it does not make sense to set it as a random effect as well? ... > Temperature ~ Stimulus-1, random=Subj|Subj/Epoch/Stimulus I perceive several problems here. Have you tried the following: > Temperature ~ Stimulus-1, random=~1|Subj/Epoch From your description, I'm guessing that Stimulus is a factor with levels like ("Baseline", "A", "B", "Recovery"). My way of thinking about these things is to try to write this as an algebraic model with parameters to estimate. "Temperature~Stimulus-1", ignoring the "random" argument for the moment, could be written as follows: (1) Temperature = b["Baseline"]*I(Stimulus=="Baseline") + b["A"]*I(Stimulus=="A") + b["B"]*I(Stimulus=="B") + b["Recovery"]*I(Stimulus=="Recovery"), where the 4 "b" parameters are to be estimated by "iterative, reweighted generalized least squares" to maximize an appropriate likelihood function [and where I(...) = indicator function that is 1 if the (...) is TRUE and 0 otherwise]. Now consider "random=~1|Subj"; ignore "epoch" for the moment. This adds one "random coefficient" to this model for each Sub. If you have 2 subjects, then the model becomes something like the following: (2) Temperature = b["Baseline"]*I(Stimulus=="Baseline") + b["A"]*I(Stimulus=="A") + b["B"]*I(Stimulus=="B") + b["Recovery"]*I(Stimulus=="Recovery") + b.subj[1]*I(Subj==1) + b.subj[2]*I(Subj==2). However, we do NOT initially estimate the random coefficients b.subj[1] and b.subj[2]. Rather, we assume these coefficients "b.subj" are normally distributed with mean 0 and a variance "var.subj", and we want to estimate "var.subj". (We may later estimate the b.subj's conditioned on the estimate of "var.subj", but that's a separate issue.) We estimate "var.subj" using "iterative, reweighted generalized least squares" that roughly speaking "uncorrelates" or "whitens" the correlated residuals from model (1) and then minimizes the sum of squares of those "whitened" residuals. More detail is provided in Pinheiro and Bates. Now consider "random=~1|Subj/Epoch". This adds another random coefficient for each (Subject, Epoch) combination, which we also assume are normally distributed with mean 0 and variance "var.s.epoch". We then want to estimate the 4 fixed effect coefficients and the two variance coefficients simultaneously, using the same approach I just outlined. > 2- Is it possible to take a piecewise approach wrt the variance using lme(), such as modeling the variability of each subject first ... . When I try to set up the correlation structure, I run out of memory fast." You've hit on a great idea here: Consider the data for each subject separately. Plot it, think about the similarities and differences, and condense the data into, e.g., 1 or a few numbers per subject / epoch combination. Then use "lme" on this condensed data set. I'd start simple and add complexity later. I have on occasion tried to fit the most complicated model I could think of, only to find that I could have gotten most of the information from condensing it grossly and fitting much simpler models, and I wasted lots of time and effort trying to work with all the data at once. I suggest you start by aggregating it to the grossest level you think might answer your research questions. After you have sensible answers at that level, if you still have time and energy for this project, I then might try aggregate the data into more summaries with fewer observations in each summary. 3. Is there a way to get rid of the serial dependency BEFORE running the model with LME(), such as initiating a corStruct before placing it in the model? Answer: Yes, and the primary way to do this would be to aggregate, like I just suggested. If you still want to do more with the correlation structure, study Pinheiro and Bates and try something with a moderately condensed version of what you have. hope this helps. spencer graves Keith Chamberlain wrote:> Dear List, > > This is my first post, and I'm a relatively new R user trying to work out a > mixed effects model using lme() with random effects, and a correlation > structure, and have looked over the archives, & R help on lme, corClasses, & > etc extensively for clues. My programming experience is minimal (1 semester > of C). My mentor, who has much more programming experience, but a comparable > level of knowledge of R, has been stumped. What follows are 3 questions > pertaining to an lme() model, one on the nested hierarcy, 1 on a strategy > for a piecewise approach to the variance given I have ~24 hours of data > (sampled at 32Hz, 1hr per subject), and one on the corStruct or how to get > rid of serial dependencies before lme(). > > I'm analyzing skin temperature continuously recorded at 32Hz in Baseline (10 > min), Testing (~5 min), and Recovery (20 min) epochs of a face recognition > experiment. Stimuli are the same in Baseline and Recovery (portrait or > landscape), and in testing, participants were tested on their recognition of > a list b&w portraits presented just before testing started. On some of the > portraits 'learned' the eyes were masked, and in others, the eyes were > visible. In testing, the portraits have no masking but the stimuli in > testing are labeled "Eyes" and "NoEyes". The data structure looks as > follows: > > Subj/Epoch/Stimuli/Time/Temperature > There are 8 subjects > > 9 epochs - 6 of which were just "instruction" blocks, and one "Learning" > period. Wrt lme(), I figured out how to use subset too isolate just the > Baseline, Learning, and Testing Epochs (and avoid epochs with only 1 > stimulus level, such as "instruction"). Data within each epoch are balanced > wrt # trials, but not between epochs. Recovery has twice as many trials as > Baseline, and Testing has about half. Time for each epoch is roughly that > ratio too, although time in each trial differs. > > Stimuli are the same in Baseline & Recovery, but different in Testing, > although there are 2 levels in each used epoch. > > Time & Temperature make up the time series, and Temperature is the dependent > variable too stimulus. > > 1- are fixed effects and random effects discrete? That is, if I set > something in my model formula as a fixed effect, then it does not make sense > to set it as a random effect as well? The documentation (and posts) were not > really clear on that point (not that the documentation technically 'should' > be per say, just that I got confused). > > The nested hierarchy for what actually gets analyzed looks as follows: > Subj/Epoch/Stimulus/Temperature > > Reasoning: there are several temperature samples recorded in each trial of > Stimulus. Several stimuli in each Epoch, and all of the Epochs for one > subject. Subject is random (theoretically) because of sampling in a > population, Epoch would be fixed because all participants went through the > same sequence of Epochs, but Stimulus varied randomly within an Epoch, which > seems inconsistent when I apply it to the lme model as both a fixed and > random effect. > > Temperature ~ Stimulus-1, random=Subj|Subj/Epoch/Stimulus > Subset= Epoch=="Baseline" | Epoch=="Testing" | Epoch=="Recovery" > > I'm looking to correctly allocate error terms for between subjects (Subj) > variability, and further delineate the within subject error between Epoch > and Stimulus. The current model that I got to work (memory issues namely) is > Temperature ~ Stimulus-1, random=Subj|Subj, which I decided to use to get > the residuals to have the Subject variability accounted for and subtracted. > Would a list of random structures work better? If so, is each item in the > list structured just as the random formula? I haven't actually seen/found > any examples of a list of random/nesting structures. > > 2- Is it possible to take a piecewise approach wrt the variance using lme(), > such as modeling the variability of each subject first, then using > further-nested terms in a model and the residuals from the previous? If so, > what caveats exist for interpreting the variances? > > I'm not interpreting p-values at this point because of another issue. When I > try to set up the correlation structure, I run out of memory fast. I've > tried this on a mac G5, an HP Pavilion dv1000 (= Pentium 2.6GHz), and a > Gateway with an AMD athalon 900MHz processors. Each system has 386M memory > or more, one of which has 1G. > > 3- Is there a way to get rid of the serial dependency BEFORE running the > model with LME(), such as initiating a corStruct before placing it in the > model? I'm working with so much data that I'm fine with doing the process > piecewise. An AR process was difficult because the residuals are not the > same length as the data file that I started with. Serial dependencies still > gota go, whether via the correlation term in lme() or some other method, > because I'll soon be breaking up the variance into components via > spectrum(). > > So I might as well add a 4th. What's the term that gets me too data after > AR() has done it's work? I'm thinking that resid() wasn't right but data > that the data differ from their original length prior to an AR process may > be how its done. > > Rgds, > KeithC. > Psych Undergrad, CU Boulder > RE McNair Scholar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA spencer.graves at pdf.com www.pdf.com <http://www.pdf.com> Tel: 408-938-4420 Fax: 408-280-7915
Apparently Analagous Threads
- Problem with 2-ways ANOVA interactions
- Post hoc analysis for ANOVA with repeated measures
- COX PH models for event histories?
- Repeated-measures anova with a within-subject covariate (or varying slopes random-effects?)
- ANOVA/ANCOVA Repeated Measure Mixed Model