thr3ads.net - R help - [R] LME & data with complicated random & correlational structures [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Keith Chamberlain

2005-Dec-01 21:44 UTC

[R] LME & data with complicated random & correlational structures

Dear List,

This is my first post, and I'm a relatively new R user trying to work out a
mixed effects model using lme() with random effects, and a correlation
structure, and have looked over the archives, & R help on lme, corClasses,
&
etc extensively for clues. My programming experience is minimal (1 semester
of C). My mentor, who has much more programming experience, but a comparable
level of knowledge of R, has been stumped. What follows are 3 questions
pertaining to an lme() model, one on the nested hierarcy, 1 on a strategy
for a piecewise approach to the variance given I have ~24 hours of data
(sampled at 32Hz, 1hr per subject), and one on the corStruct or how to get
rid of serial dependencies before lme().

I'm analyzing skin temperature continuously recorded at 32Hz in Baseline (10
min), Testing (~5 min), and Recovery (20 min) epochs of a face recognition
experiment. Stimuli are the same in Baseline and Recovery (portrait or
landscape), and in testing, participants were tested on their recognition of
a list b&w portraits presented just before testing started. On some of the
portraits 'learned' the eyes were masked, and in others, the eyes were
visible. In testing, the portraits have no masking but the stimuli in
testing are labeled "Eyes" and "NoEyes". The data structure
looks as
follows:

Subj/Epoch/Stimuli/Time/Temperature
There are 8 subjects

9 epochs - 6 of which were just "instruction" blocks, and one
"Learning"
period. Wrt lme(), I figured out how to use subset too isolate just the
Baseline, Learning, and Testing Epochs (and avoid epochs with only 1
stimulus level, such as "instruction"). Data within each epoch are
balanced
wrt # trials, but not between epochs. Recovery has twice as many trials as
Baseline, and Testing has about half. Time for each epoch is roughly that
ratio too, although time in each trial differs.

Stimuli are the same in Baseline & Recovery, but different in Testing,
although there are 2 levels in each used epoch.

Time & Temperature make up the time series, and Temperature is the dependent
variable too stimulus. 

1- are fixed effects and random effects discrete? That is, if I set
something in my model formula as a fixed effect, then it does not make sense
to set it as a random effect as well? The documentation (and posts) were not
really clear on that point (not that the documentation technically
'should'
be per say, just that I got confused).

The nested hierarchy for what actually gets analyzed looks as follows:
Subj/Epoch/Stimulus/Temperature

Reasoning: there are several temperature samples recorded in each trial of
Stimulus. Several stimuli in each Epoch, and all of the Epochs for one
subject. Subject is random (theoretically) because of sampling in a
population, Epoch would be fixed because all participants went through the
same sequence of Epochs, but Stimulus varied randomly within an Epoch, which
seems inconsistent when I apply it to the lme model as both a fixed and
random effect.

Temperature ~ Stimulus-1, random=Subj|Subj/Epoch/Stimulus
Subset= Epoch=="Baseline" | Epoch=="Testing" |
Epoch=="Recovery"

I'm looking to correctly allocate error terms for between subjects (Subj)
variability, and further delineate the within subject error between Epoch
and Stimulus. The current model that I got to work (memory issues namely) is
Temperature ~ Stimulus-1, random=Subj|Subj, which I decided to use to get
the residuals to have the Subject variability accounted for and subtracted.
Would a list of random structures work better? If so, is each item in the
list structured just as the random formula? I haven't actually seen/found
any examples of a list of random/nesting structures.

2- Is it possible to take a piecewise approach wrt the variance using lme(),
such as modeling the variability of each subject first, then using
further-nested terms in a model and the residuals from the previous? If so,
what caveats exist for interpreting the variances?

I'm not interpreting p-values at this point because of another issue. When I
try to set up the correlation structure, I run out of memory fast. I've
tried this on a mac G5, an HP Pavilion dv1000 (= Pentium 2.6GHz), and a
Gateway with an AMD athalon 900MHz processors. Each system has 386M memory
or more, one of which has 1G. 

3- Is there a way to get rid of the serial dependency BEFORE running the
model with LME(), such as initiating a corStruct before placing it in the
model? I'm working with so much data that I'm fine with doing the
process
piecewise. An AR process was difficult because the residuals are not the
same length as the data file that I started with. Serial dependencies still
gota go, whether via the correlation term in lme() or some other method,
because I'll soon be breaking up the variance into components via
spectrum().

So I might as well add a 4th. What's the term that gets me too data after
AR() has done it's work? I'm thinking that resid() wasn't right but
data
that the data differ from their original length prior to an AR process may
be how its done.

Rgds,
KeithC.
Psych Undergrad, CU Boulder
RE McNair Scholar

Spencer Graves

2005-Dec-07 02:32 UTC

head link

[R] LME & data with complicated random & correlational structures

Have you received any replies to this post?  I haven't seen any, so I 
will attempt a few comments.  First, I'm overwhelmed with the details in 
your discussion.  I suggest that for each of your question, you try to 
think of an extremely simple example that would test your question, as 
suggested in the Posting Guide (www.R-project.org/posting-guide.html). 
If you have problems with that, please submit a question focused on that 
one thing.

	  Have you looked at Pinheiro and Bates (2000) Mixed Effects Models in 
S and S-Plus (Springer)?  If no, I suggest you take a hard look at this 
book.  I was unable to get anything sensible from "lme" until I
started
working through this book.  This is the primary reference for "lme",
and
for me at least, it was essential to understanding what I needed to do 
to get anything useful from "lme".  You don't follow all the math
in
this book to get something useful out of it, because it includes many 
worked examples that should go a long way to answering many questions 
you might have about "lme".

	  1.  "if I set something in my model formula as a fixed effect, then 
it does not make sense to set it as a random effect as well? ...

 > Temperature ~ Stimulus-1, random=Subj|Subj/Epoch/Stimulus

	  I perceive several problems here.  Have you tried the following:

 > Temperature ~ Stimulus-1, random=~1|Subj/Epoch

 From your description, I'm guessing that Stimulus is a factor with 
levels like ("Baseline", "A", "B",
"Recovery").  My way of thinking
about these things is to try to write this as an algebraic model with 
parameters to estimate.  "Temperature~Stimulus-1", ignoring the
"random"
argument for the moment, could be written as follows:

(1) Temperature = b["Baseline"]*I(Stimulus=="Baseline") + 
b["A"]*I(Stimulus=="A") +
b["B"]*I(Stimulus=="B") +
b["Recovery"]*I(Stimulus=="Recovery"),

	  where the 4 "b" parameters are to be estimated by "iterative, 
reweighted generalized least squares" to maximize an appropriate 
likelihood function [and where I(...) = indicator function that is 1 if 
the (...) is TRUE and 0 otherwise].

	  Now consider "random=~1|Subj";  ignore "epoch" for the
moment.  This
adds one "random coefficient" to this model for each Sub.  If you have
2
subjects, then the model becomes something like the following:

(2) Temperature = b["Baseline"]*I(Stimulus=="Baseline") + 
b["A"]*I(Stimulus=="A") +
b["B"]*I(Stimulus=="B") +
b["Recovery"]*I(Stimulus=="Recovery") + b.subj[1]*I(Subj==1)
+ b.subj[2]*I(Subj==2).

	  However, we do NOT initially estimate the random coefficients 
b.subj[1] and b.subj[2].  Rather, we assume these coefficients
"b.subj"
are normally distributed with mean 0 and a variance "var.subj", and we
want to estimate "var.subj".  (We may later estimate the b.subj's 
conditioned on the estimate of "var.subj", but that's a separate
issue.)
  We estimate "var.subj" using "iterative, reweighted generalized
least
squares" that roughly speaking "uncorrelates" or
"whitens" the
correlated residuals from model (1) and then minimizes the sum of 
squares of those "whitened" residuals.  More detail is provided in 
Pinheiro and Bates.

	  Now consider "random=~1|Subj/Epoch".  This adds another random 
coefficient for each (Subject, Epoch) combination, which we also assume 
are normally distributed with mean 0 and variance "var.s.epoch".  We 
then want to estimate the 4 fixed effect coefficients and the two 
variance coefficients simultaneously, using the same approach I just 
outlined.

	> 2- Is it possible to take a piecewise approach wrt the variance using 
lme(), such as modeling the variability of each subject first ... . 
When I try to set up the correlation structure, I run out of memory fast."

	  You've hit on a great idea here:  Consider the data for each subject 
separately.  Plot it, think about the similarities and differences, and 
condense the data into, e.g., 1 or a few numbers per subject / epoch 
combination.  Then use "lme" on this condensed data set.  I'd
start
simple and add complexity later.  I have on occasion tried to fit the 
most complicated model I could think of, only to find that I could have 
gotten most of the information from condensing it grossly and fitting 
much simpler models, and I wasted lots of time and effort trying to work 
with all the data at once.  I suggest you start by aggregating it to the 
grossest level you think might answer your research questions. After you 
have sensible answers at that level, if you still have time and energy 
for this project, I then might try aggregate the data into more 
summaries with fewer observations in each summary.

	  3.  Is there a way to get rid of the serial dependency BEFORE running 
the model with LME(), such as initiating a corStruct before placing it 
in the model?

	  Answer:  Yes, and the primary way to do this would be to aggregate, 
like I just suggested.  If you still want to do more with the 
correlation structure, study Pinheiro and Bates and try something with a 
moderately condensed version of what you have.

	  hope this helps.
	  spencer graves

Keith Chamberlain wrote:
> Dear List,
> 
> This is my first post, and I'm a relatively new R user trying to work
out a
> mixed effects model using lme() with random effects, and a correlation
> structure, and have looked over the archives, & R help on lme,
corClasses, &
> etc extensively for clues. My programming experience is minimal (1 semester
> of C). My mentor, who has much more programming experience, but a
comparable
> level of knowledge of R, has been stumped. What follows are 3 questions
> pertaining to an lme() model, one on the nested hierarcy, 1 on a strategy
> for a piecewise approach to the variance given I have ~24 hours of data
> (sampled at 32Hz, 1hr per subject), and one on the corStruct or how to get
> rid of serial dependencies before lme().
> 
> I'm analyzing skin temperature continuously recorded at 32Hz in
Baseline (10
> min), Testing (~5 min), and Recovery (20 min) epochs of a face recognition
> experiment. Stimuli are the same in Baseline and Recovery (portrait or
> landscape), and in testing, participants were tested on their recognition
of
> a list b&w portraits presented just before testing started. On some of
the
> portraits 'learned' the eyes were masked, and in others, the eyes
were
> visible. In testing, the portraits have no masking but the stimuli in
> testing are labeled "Eyes" and "NoEyes". The data
structure looks as
> follows:
> 
> Subj/Epoch/Stimuli/Time/Temperature
> There are 8 subjects
> 
> 9 epochs - 6 of which were just "instruction" blocks, and one
"Learning"
> period. Wrt lme(), I figured out how to use subset too isolate just the
> Baseline, Learning, and Testing Epochs (and avoid epochs with only 1
> stimulus level, such as "instruction"). Data within each epoch
are balanced
> wrt # trials, but not between epochs. Recovery has twice as many trials as
> Baseline, and Testing has about half. Time for each epoch is roughly that
> ratio too, although time in each trial differs.
> 
> Stimuli are the same in Baseline & Recovery, but different in Testing,
> although there are 2 levels in each used epoch.
> 
> Time & Temperature make up the time series, and Temperature is the
dependent
> variable too stimulus. 
> 
> 1- are fixed effects and random effects discrete? That is, if I set
> something in my model formula as a fixed effect, then it does not make
sense
> to set it as a random effect as well? The documentation (and posts) were
not
> really clear on that point (not that the documentation technically
'should'
> be per say, just that I got confused).
> 
> The nested hierarchy for what actually gets analyzed looks as follows:
> Subj/Epoch/Stimulus/Temperature
> 
> Reasoning: there are several temperature samples recorded in each trial of
> Stimulus. Several stimuli in each Epoch, and all of the Epochs for one
> subject. Subject is random (theoretically) because of sampling in a
> population, Epoch would be fixed because all participants went through the
> same sequence of Epochs, but Stimulus varied randomly within an Epoch,
which
> seems inconsistent when I apply it to the lme model as both a fixed and
> random effect.
> 
> Temperature ~ Stimulus-1, random=Subj|Subj/Epoch/Stimulus
> Subset= Epoch=="Baseline" | Epoch=="Testing" |
Epoch=="Recovery"
> 
> I'm looking to correctly allocate error terms for between subjects
(Subj)
> variability, and further delineate the within subject error between Epoch
> and Stimulus. The current model that I got to work (memory issues namely)
is
> Temperature ~ Stimulus-1, random=Subj|Subj, which I decided to use to get
> the residuals to have the Subject variability accounted for and subtracted.
> Would a list of random structures work better? If so, is each item in the
> list structured just as the random formula? I haven't actually
seen/found
> any examples of a list of random/nesting structures.
> 
> 2- Is it possible to take a piecewise approach wrt the variance using
lme(),
> such as modeling the variability of each subject first, then using
> further-nested terms in a model and the residuals from the previous? If so,
> what caveats exist for interpreting the variances?
> 
> I'm not interpreting p-values at this point because of another issue.
When I
> try to set up the correlation structure, I run out of memory fast. I've
> tried this on a mac G5, an HP Pavilion dv1000 (= Pentium 2.6GHz), and a
> Gateway with an AMD athalon 900MHz processors. Each system has 386M memory
> or more, one of which has 1G. 
> 
> 3- Is there a way to get rid of the serial dependency BEFORE running the
> model with LME(), such as initiating a corStruct before placing it in the
> model? I'm working with so much data that I'm fine with doing the
process
> piecewise. An AR process was difficult because the residuals are not the
> same length as the data file that I started with. Serial dependencies still
> gota go, whether via the correlation term in lme() or some other method,
> because I'll soon be breaking up the variance into components via
> spectrum().
> 
> So I might as well add a 4th. What's the term that gets me too data
after
> AR() has done it's work? I'm thinking that resid() wasn't right
but data
> that the data differ from their original length prior to an AR process may
> be how its done.
> 
> Rgds,
> KeithC.
> Psych Undergrad, CU Boulder
> RE McNair Scholar
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915

Reasonably Related Threads

Search for more possibly parallel threads

R help - Dec 2005 - LME & data with complicated random & correlational structures

[R] LME & data with complicated random & correlational structures

[R] LME & data with complicated random & correlational structures

Reasonably Related Threads