Darren Weber
2006-Mar-18 03:11 UTC
[R] Time-Series, multiple measurements, ANOVA model over time points, analysis advice
Hi, I have some general questions about statistical analysis for a research dataset and a request for advice on using R and associated packages for a valid analysis of this data. I can only pose the problem as how to run multiple ANOVA tests on time series data, with reasonable controls of the family-wise error rate. If we run analysis at many small sections of a long time-series, the Type-I family-wise error rate is a concern. Is it important to consider the temporal dependence in the time-series data? There are papers on ramdomization tests for the sort of time-series data we have (eg, Blair and Karnisky, 1993), but these papers often report only t-test comparisons, not F-tests with 2+ factors. BACKGROUND: We have a dataset from a human neuroimaging experiment. Subjects view a screen, with a fixation point at the center. A cue arrow appears, directing their attention to the lower left or right visual field (cue left or cue right, this is one factor in our analysis). After 1 sec, a stimulus (S) appears in the lower left or right visual quadrant and subjects have to respond to it only if it appeared in the cued location. This sequence of events repeated hundreds of times. Each trial comprised a cue followed 1 sec later by a stimulus (cue - S), with a longer gap in between these trials (about 2 sec). The brain activity was measured with magnetoencephalography (MEG), with a very high sample rate (1200 Hz). The activity from 275 MEG sensors was segmented precisely in relation to the onset of the cue. Each of these segments is known as an 'event-related field' or ERF and the segments for every cue left or cue right trial were averaged across trials (to improve signal-to-noise of the event-related activity). We have data that are averaged ERFs over several hundred trials. These averaged ERF for cue left or cue right was used to estimate the brain source activity (the details are not relevant here). A small example dataset and R scripts are available via ftp://ftp.mrsc.ucsf.edu/pub/dweber/cortical_timeSeries.tar These example data are from one brain region of interest (roi), called the middle frontal gyrus (MFG). We have estimated activity in this brain region for the left and right cerebral hemisphere (this is one factor in the analysis). These data are for a short period prior to the cue (-300 ms) and a longer period after the cue (1400 ms; the S appeared at 1000 ms). There are 8 subjects in this dataset. Each subject has an ERF ANALYSIS to DATE: For each time bin of about 20 ms duration, from -300 to 1400 ms, we need to evaluate the ANOVA model, MFGactivity = CUE + HEMI + error where the CUE (left, right) and HEMI (left, right) interactions are important. These two factors are within-subjects factors (ie, repeated measures for each subject). In classical terms, this is a split-plot design. The data frame and aov model are specified in the R scripts of the download. Given time-bins of about 20ms and a time-series of -300 to 1400 ms at small increments of 1-2 ms, we have a lot of analyses in just one brain region. How can we do this analysis and minimize family-wise error rates? Is it possible to run permutation analysis for an ANOVA model? R scripts in the download: source("Rscript_HiN_cortical_roi_analysis_aov_specifics.R") This will run ANOVA on several time-bins of the data. The time-bins are defined in Rscript_HiN_cortical_roi_analysis_setup.R, which is sourced by the script above. Reference Blair RC, Karniski W. 1993. An alternative method for significance testing of waveform difference potentials. Psychophysiology 30:518--524. [[alternative HTML version deleted]]
Spencer Graves
2006-Mar-23 17:10 UTC
[R] Time-Series, multiple measurements, ANOVA model over time points, analysis advice
If this were my problem, I might start by considering each stimulus-response pair as a one observation, and I'd break the MEG into separate time series, each starting roughly 1 second before the stimulus and ending roughly 1 second after. If you've averaged many of these, I'm guessing you must have done something like this already. From plots of the averages plus from autocorrelation and partial autocorrelation functions of the individual series, I'd then try to develop a parsimoneous model for the changes. By fitting this model to each response, I could hopefully condense the data from a few thousand observations in each series to a small number of parameters. Then you could try to model the differences in the estimated parameters. Are you familiar with Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer)? This book and the companion nlme package has facilities for handling the kinds of models you describe. However, I doubt if the software will handle the volume of data you have. You may have to condence the data, e.g., by averaging sequences of roughly 100 observations each, matched somehow to the stimulus and response events, etc. hope this helps. spencer graves Darren Weber wrote:> Hi, > > I have some general questions about statistical analysis for a research > dataset and a request for advice on using R and associated packages for a > valid analysis of this data. I can only pose the problem as how to run > multiple ANOVA tests on time series data, with reasonable controls of the > family-wise error rate. If we run analysis at many small sections of a long > time-series, the Type-I family-wise error rate is a concern. Is it > important to consider the temporal dependence in the time-series data? > There are papers on ramdomization tests for the sort of time-series data we > have (eg, Blair and Karnisky, 1993), but these papers often report only > t-test comparisons, not F-tests with 2+ factors. > > BACKGROUND: > > We have a dataset from a human neuroimaging experiment. Subjects view a > screen, with a fixation point at the center. A cue arrow appears, directing > their attention to the lower left or right visual field (cue left or cue > right, this is one factor in our analysis). After 1 sec, a stimulus (S) > appears in the lower left or right visual quadrant and subjects have to > respond to it only if it appeared in the cued location. This sequence of > events repeated hundreds of times. Each trial comprised a cue followed 1 > sec later by a stimulus (cue - S), with a longer gap in between these trials > (about 2 sec). > > The brain activity was measured with magnetoencephalography (MEG), with a > very high sample rate (1200 Hz). The activity from 275 MEG sensors was > segmented precisely in relation to the onset of the cue. Each of these > segments is known as an 'event-related field' or ERF and the segments for > every cue left or cue right trial were averaged across trials (to improve > signal-to-noise of the event-related activity). We have data that are > averaged ERFs over several hundred trials. These averaged ERF for cue left > or cue right was used to estimate the brain source activity (the details are > not relevant here). > > A small example dataset and R scripts are available via > ftp://ftp.mrsc.ucsf.edu/pub/dweber/cortical_timeSeries.tar > > These example data are from one brain region of interest (roi), called the > middle frontal gyrus (MFG). We have estimated activity in this brain region > for the left and right cerebral hemisphere (this is one factor in the > analysis). These data are for a short period prior to the cue (-300 ms) and > a longer period after the cue (1400 ms; the S appeared at 1000 ms). There > are 8 subjects in this dataset. Each subject has an ERF > > ANALYSIS to DATE: > > For each time bin of about 20 ms duration, from -300 to 1400 ms, we need to > evaluate the ANOVA model, > > MFGactivity = CUE + HEMI + error > > where the CUE (left, right) and HEMI (left, right) interactions are > important. These two factors are within-subjects factors (ie, repeated > measures for each subject). In classical terms, this is a split-plot > design. The data frame and aov model are specified in the R scripts of the > download. > > Given time-bins of about 20ms and a time-series of -300 to 1400 ms at small > increments of 1-2 ms, we have a lot of analyses in just one brain region. > How can we do this analysis and minimize family-wise error rates? Is it > possible to run permutation analysis for an ANOVA model? > > R scripts in the download: > > source("Rscript_HiN_cortical_roi_analysis_aov_specifics.R") > > This will run ANOVA on several time-bins of the data. The time-bins are > defined in Rscript_HiN_cortical_roi_analysis_setup.R, which is sourced by > the script above. > > Reference > Blair RC, Karniski W. 1993. An alternative method for significance testing > of waveform difference potentials. Psychophysiology 30:518--524. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html