Jones, Kristopher@DWR
2014-Sep-16  20:17 UTC
[R] Changepoint analysis--is it possible to attribute changpoints to explanatory variables?
Hello, I would like to evaluate the relationship between flows and phytoplankton abundance (or Chlorophyll a concentrations) using a changepoint analysis.? Specifically, I have two study questions: Study Question 1: Are there certain flow thresholds that result in spikes in phytoplankton abundance? Study Question 2: Are the duration of certain flows important for phytoplankton abundance (e.g., would a certain flow value need to be reached for 1 day, 1 week etc. to create a spike in phytoplankton abundance)? Many of the examples I've seen online have only looked for change points in a time series.? However, I have not seen any examples, which look at whether changes in the mean or variance can be attributed to a particular factor (e.g., changes in abundance relative to an environmental factor).? Question #1: Is it possible to attribute changes in the mean or variance of a time series (e.g., of phytoplankton abundance) to a particular environmental variable (e.g., flows)? If so, can you provide guidance for how to do that in R (or refer me to a good example)? Question #2: is it possible to take Question #1 a step further, adding a time component (as described in study question 2, above)? If so, can you provide guidance for how to do that in R (or refer me to a good example)? One resource on changepoint analyses (using changepoint package) that I have been trying to model my work after (at least the R code) is by Killick and Eckley (Lancaster University).? http://www.lancs.ac.uk/~killick/Pub/KillickEckley2011.pdf Their descriptions and the accompanying code were really helpful (although, their questions were not similar to mine, as described above).? In reviewing this document, and other descriptions online, I've noticed that data for changepoint analyses need to be in a time series.? My data is set up with columns of sampling date, Chlorophyll a concentration, and stage (a surrogate for flow). In reviewing the help online regarding changepoint, I realized that the data I am using would likely not be considered a 'time series', as the sampling did not occur at uniform time intervals. Question #3: Do data for changepoint analyses in R need to be at uniform time intervals?? If so, is there an appropriate way to transform my data (which was not collected at uniform time intervals) to make it work in changepoint? Question #4: Do data in the time series need to be transformed (e.g., Chlorophyll a and Stage)? Hopefully, I've laid out my question in a way that makes sense.? Any help you can provide would be much appreciated.? I've been trying to read up on this for a while, and have tried to narrow my questions down to those with which I am still struggling. Thanks in advance for your help. Kris ?
Bert Gunter
2014-Sep-16  22:14 UTC
[R] Changepoint analysis--is it possible to attribute changpoints to explanatory variables?
This is primarily a statistical issue and is offtopic here. I would strongly suggest that you consult with a local statistical expert. The answer is almost certainly yes: this is regression (perhaps quantile regression) in which the error structure is not iid (the response is an autocorrelated time series) and there probably is inertia in the system, too. So it may be complicated. That's why you need to spend time with someone who knows how to handle this. Econometricians tend to do this sort of thing I believe. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Tue, Sep 16, 2014 at 1:17 PM, Jones, Kristopher at DWR <Kristopher.Jones at water.ca.gov> wrote:> Hello, > > I would like to evaluate the relationship between flows and phytoplankton abundance (or Chlorophyll a concentrations) using a changepoint analysis. Specifically, I have two study questions: > > Study Question 1: Are there certain flow thresholds that result in spikes in phytoplankton abundance? > Study Question 2: Are the duration of certain flows important for phytoplankton abundance (e.g., would a certain flow value need to be reached for 1 day, 1 week etc. to create a spike in phytoplankton abundance)? > > Many of the examples I've seen online have only looked for change points in a time series. However, I have not seen any examples, which look at whether changes in the mean or variance can be attributed to a particular factor (e.g., changes in abundance relative to an environmental factor). > > Question #1: Is it possible to attribute changes in the mean or variance of a time series (e.g., of phytoplankton abundance) to a particular environmental variable (e.g., flows)? If so, can you provide guidance for how to do that in R (or refer me to a good example)? > > Question #2: is it possible to take Question #1 a step further, adding a time component (as described in study question 2, above)? If so, can you provide guidance for how to do that in R (or refer me to a good example)? > > One resource on changepoint analyses (using changepoint package) that I have been trying to model my work after (at least the R code) is by Killick and Eckley (Lancaster University). > http://www.lancs.ac.uk/~killick/Pub/KillickEckley2011.pdf > > Their descriptions and the accompanying code were really helpful (although, their questions were not similar to mine, as described above). In reviewing this document, and other descriptions online, I've noticed that data for changepoint analyses need to be in a time series. My data is set up with columns of sampling date, Chlorophyll a concentration, and stage (a surrogate for flow). In reviewing the help online regarding changepoint, I realized that the data I am using would likely not be considered a 'time series', as the sampling did not occur at uniform time intervals. > > Question #3: Do data for changepoint analyses in R need to be at uniform time intervals? If so, is there an appropriate way to transform my data (which was not collected at uniform time intervals) to make it work in changepoint? > > Question #4: Do data in the time series need to be transformed (e.g., Chlorophyll a and Stage)? > > Hopefully, I've laid out my question in a way that makes sense. Any help you can provide would be much appreciated. I've been trying to read up on this for a while, and have tried to narrow my questions down to those with which I am still struggling. > > Thanks in advance for your help. > > Kris > > > > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.