Paolo Canal
2015-Sep-23 10:46 UTC
[R] Appropriate specification of random effects structure for EEG/ERP data: including Channels or not?
Dear r-help list, I work with EEG/ERP data and this is the first time I am using LMM to analyze my data (using lme4). The experimental design is a 2X2: one manipulated factor is agreement, the other is noun (agreement being within subjects and items, and noun being within subjects and between items). The data matrix is 31 subjects * 160 items * 33 channels. In ERP research, the distribution of the EEG amplitude differences (in a time window of interest) are important, and we care about knowing whether a negative difference is occurring in Parietal or Frontal electrodes. At the same time information from single channel is often too noisy and channels are organized in topographic factors for evaluating differences in distribution. In the present case I have assigned each channel to one of three levels of two factors, i.e., Longitude (Anterior, Central, Parietal) and Medial (Left, Midline, Right): for instance, one channel is Anterior and Left. With traditional ANOVAs channels from the same level of topographic factors are averaged before variance is evaluated and this also has the benefit of reducing the noise picked up by the electrodes. I have troubles in deciding the random structure of my model. Very few examples on LMM on ERP data exist (e.g., Newman, Tremblay, Nichols, Neville & Ullman, 2012) and little detail is provided about the treatment of channel. I feel it is a tricky term but very important to optimize fit. Newman et al say "data from each electrode within an ROI were treated as repeated measures of that ROI". In Newman et al, the ROIs are the 9 regions deriving from Longitude X Medial (Anterior-Left, Anterior-Midline, Anterior-Right, Central-Left ... and so on), so in a way they treated each ROI separately and not according to the relevant dimensions of Longitude and Medial. We used the following specifications in lmer: [fixed effects specification: ?V ~ Agreement * Noun * Longitude * Medial * (cov1 + cov2 + cov3 + cov4)] (the terms within brackets are a series of individual covariates, most of which are continuous variables) [random effects specification: (1+Agreement*Type of Noun | subject) + (1+Agreement | item) + (1|longitude:medial:channel)] What I care the most about is the last term (1|longitude:medial:channel). I chose this specification because I thought that allowing each channel to have different intercepts in the random structure would affect the estimation of the topographic fixed effects (Longitude and Medial) in which channel is nested. Unfortunately a reviewer commented that since "channel is not included in the fixed effects I would probably leave that out". But each channel is a repeated measure of the eeg amplitude inside the two topographic factors, and random terms do not have to be in the fixed structure, otherwise we would also include subjects and items in the fixed effects structure. So I kind of feel that including channels as random effect is correct, and having them nested in longitude:medial allows to relax the assumption that the effect in the EEG has always the same longitude:medial distribution. But I might be wrong. I thus tested differences in fit (ML) with anova() between (1|longitude:medial:channel) and the same model without the term, and a third model with the model with a simpler (1|longitude:medial). Fullmod vs Nochannel: Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) modnoch 119 969479 970653 -484621 969241 fullmod 120 968972 970156 -484366 968732 508.73 1 < 2.2e-16 *** Differences in fit is remarkable (no variance components with estimates close to zero; no correlation parameters with values close to ?1). Fullmod vs SimplerMod: Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) fullmod 120 968972 970156 -484366 968732 simplermod 120 969481 970665 -484621 969241 0 0 1 Here the number of parameters to estimate in fullmod and simplermod is the same but the increase in fit is very consistent (-509 BIC). So I guess although the chisquare is not significant we do have a string increase in fit. As I understand this, a model with better fit will find more accurate estimates, and I would be inclined to keep the fullmod random structure. But perhaps I am missing something or I am doing something wrong. Which is the correct random structure to use? Feedbacks are very much appreciated. I often find answers in the list, and this is the first time I post a question. Thanks, Paolo [[alternative HTML version deleted]]