Catarina Serra Gonçalves
2019-Apr-30 14:57 UTC
[R] Time series (trend over time) for irregular sampling dates and multiple sites
I have a dataset of marine debris items (number of items standardized per effort: Items/(number of volunteers*Hours*Lenght)) taken from 2 main locations (WA and Queensland) in Australia (8 Sub Sites in total: 4 in WA and 4 in Queensland) at irregular sampling intervals over a period 15 years. I want to test if there is a change over the years on the amount of debris in these locations and more specifically a change after the implementation of a mitigation strategy (in 2013). Here?s the head of the data:[image: enter image description here] <https://i.stack.imgur.com/VNIpb.png>Description of each one of the varables in the dataframe: *eventid *= each sampling (clean-up) event Location = Queensland and New South Wales Sites = all the 9 sampling beaches *Date *= specific dates for the clean-up events (day-month-year) *Date1 *= specific dates for the clean-up events (day-month-year) on the POSICXT format Year= Year of sampling event (2004 to 2018) *Month*= Month of the sampling event (jan to dec) *nMonth*= a number was determined to the respective month of the sampling event (1 to 12) *Day*= Day of sampling (1 to 31) Days = Days since the first date of clean up = just another way of using the dates *MARPOL *= before and after implementation (factor with 2 levels) *DaysC *= days between sampling events for the same sites = number of days since the previous clean-up event *DaysI *= Days since intervention, all the dates before implementation are zero, and after we count the number of days since the implementation date (1 jan 2013) *DaysIa*= same as DayI but instead of zero for before the intervention we have negative values (days) *Items *= number of fishing and shipping items counted in each clean-up event *Hours *= hours spent by all volunteers together at each clean up event *Lenght *= Lenght of beach sampled by all volunteers together at each clean up event volunteers = all volunteers at each clean up event *HoursVolunteer *= hours spent bt each volunteer at each clean up event (Hours/volunteers) *Ieffort *= the items standarized by the effort (hours, volunteers and lenght) *GrossWeight & **GrossTotal are not relevant * ------------------------------ Problems: My data has a few problems: (1) I think I will need to fix the effects of seasonal variation (Monthly) and (2) of possible spatial correlation (probability of finding an item is higher after finding one since they can come from the same ship). (3) How do I handle the fact that the measurements were not taken at a regular interval? I was trying to use GAMs to analyse the data and see the trends over time. The model I came across is the following: m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs = "ps", k = 12), random=list(Site=~1,Location=~1),data = d) *thank you in advance.* - *Catarina Serra Gon?alves * PhD candidate Adrift Lab <https://adriftlab.org> University of Tasmania <http://www.utas.edu.au/> | Institute for Marine and Antarctic Studies <http://www.imas.utas.edu.au/> Launceston, TAS | Australia Personal website <https://catarinasg.wixsite.com/acserra> <https://catarinasg.wixsite.com/acserra>| E-mail <acserra at utas.edu.au> | Twitter <https://twitter.com/CatarinaSerraG> Research Gate <https://www.researchgate.net/profile/Catarina_Serra_Goncalves> | Google Scholar <https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en> [[alternative HTML version deleted]]
Bert Gunter
2019-Apr-30 15:28 UTC
[R] Time series (trend over time) for irregular sampling dates and multiple sites
I have 0 expertise, but I suggest that you check out the SPatioTemporal taskview on CRAN (or possibly others, like environmetrics). You might also want to move this to the R-Sig-geo list,where you probably are more likely to find relevant expertise. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Apr 30, 2019 at 8:13 AM Catarina Serra Gon?alves < catarinasg at gmail.com> wrote:> I have a dataset of marine debris items (number of items standardized per > effort: Items/(number of volunteers*Hours*Lenght)) taken from 2 main > locations (WA and Queensland) in Australia (8 Sub Sites in total: 4 in WA > and 4 in Queensland) at irregular sampling intervals over a period 15 > years. > > I want to test if there is a change over the years on the amount of debris > in these locations and more specifically a change after the implementation > of a mitigation strategy (in 2013). > Here?s the head of the data:[image: enter image description here] > <https://i.stack.imgur.com/VNIpb.png>Description of each one of the > varables in the dataframe: > > *eventid *= each sampling (clean-up) event Location = Queensland and New > South Wales Sites = all the 9 sampling beaches > > *Date *= specific dates for the clean-up events (day-month-year) > > *Date1 *= specific dates for the clean-up events (day-month-year) on the > POSICXT format Year= Year of sampling event (2004 to 2018) > > *Month*= Month of the sampling event (jan to dec) > > *nMonth*= a number was determined to the respective month of the sampling > event (1 to 12) > > *Day*= Day of sampling (1 to 31) Days = Days since the first date of clean > up = just another way of using the dates > > *MARPOL *= before and after implementation (factor with 2 levels) > > *DaysC *= days between sampling events for the same sites = number of days > since the previous clean-up event > > *DaysI *= Days since intervention, all the dates before implementation are > zero, and after we count the number of days since the implementation date > (1 jan 2013) > > *DaysIa*= same as DayI but instead of zero for before the intervention we > have negative values (days) > > *Items *= number of fishing and shipping items counted in each clean-up > event > > *Hours *= hours spent by all volunteers together at each clean up event > > *Lenght *= Lenght of beach sampled by all volunteers together at each clean > up event volunteers = all volunteers at each clean up event > > *HoursVolunteer *= hours spent bt each volunteer at each clean up event > (Hours/volunteers) > > *Ieffort *= the items standarized by the effort (hours, volunteers and > lenght) > > *GrossWeight & **GrossTotal are not relevant * > ------------------------------ > Problems: > > My data has a few problems: (1) I think I will need to fix the effects of > seasonal variation (Monthly) and (2) of possible spatial correlation > (probability of finding an item is higher after finding one since they can > come from the same ship). (3) How do I handle the fact that the > measurements were not taken at a regular interval? > > I was trying to use GAMs to analyse the data and see the trends over time. > The model I came across is the following: > > m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs = "ps", k = 12), > random=list(Site=~1,Location=~1),data = d) > > *thank you in advance.* > - > *Catarina Serra Gon?alves * > PhD candidate > > Adrift Lab <https://adriftlab.org> > University of Tasmania <http://www.utas.edu.au/> | Institute for Marine > and > Antarctic Studies <http://www.imas.utas.edu.au/> > Launceston, TAS | Australia > > Personal website <https://catarinasg.wixsite.com/acserra> > <https://catarinasg.wixsite.com/acserra>| E-mail <acserra at utas.edu.au> | > Twitter <https://twitter.com/CatarinaSerraG> > Research Gate > <https://www.researchgate.net/profile/Catarina_Serra_Goncalves> | Google > Scholar <https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Abs Spurdle
2019-Apr-30 21:58 UTC
[R] Time series (trend over time) for irregular sampling dates and multiple sites
> My data has a few problems: (1) I think I will need to fix the effects of > seasonal variation (Monthly) and (2) of possible spatial correlation > (probability of finding an item is higher after finding one since they can > come from the same ship). (3) How do I handle the fact that the > measurements were not taken at a regular interval?Can I ask two questions: (1) Is the data autocorrelated (or "Seasonal") over time? If not then this problem is a lot simpler. (2) Can you expand on the following statement? "possible spatial correlation (probability of finding an item is higher after finding one since they can come from the same ship" [[alternative HTML version deleted]]
Abs Spurdle
2019-May-01 01:19 UTC
[R] Time series (trend over time) for irregular sampling dates and multiple sites
> > My data has a few problems: (1) I think I will need to fix the effectsof> > seasonal variation (Monthly) and (2) of possible spatial correlation > > (probability of finding an item is higher after finding one since theycan> > come from the same ship). (3) How do I handle the fact that the > > measurements were not taken at a regular interval? > > Can I ask two questions: > (1) Is the data autocorrelated (or "Seasonal") over time? > If not then this problem is a lot simpler. > (2) Can you expand on the following statement? > "possible spatial correlation (probability of finding an item is higherafter finding one since they can come from the same ship" I just had a closer look at your example. You've tried to model nMonth (presumably in {1, 2, ..., 12}) but is there a long term trend, over Year? Also, I'm not an expert on mgcv, but I was wondering if you want bs="cp" rather than bs="ps"? When you say "measurements were not taken at a regular interval" are you referring to the variable "DaysIa"? In which case, my previous question about autocorrelation applies to this variable. [[alternative HTML version deleted]]
Abs Spurdle
2019-May-01 04:06 UTC
[R] Time series (trend over time) for irregular sampling dates and multiple sites
This is possibly off topic now... However, given that it involves mgcv, I think that it's relevant to R.> to test if there is a change over the years on the amount of debris inthese locations and more specifically a change after the implementation of a mitigation strategy> My debris items per effort (Ieffort) are fishing and shipping relateditems that can be due to an intentional discharge or an accidental discharge. It is very common to find a great amount of these items together in the beach (from where we collected these data (beach clean-ups), possibly having origin from the same ship. I was thinking that this can be a problem but still don't know how to overcome or if it makes sense to include in the model. I could be wrong on this. If your goal is simply to determine whether the MARPOL term in significant or not (or how strong the effect is), I don't think the above issue is important. However, you could do a separate spatial analysis, which could be very interesting...> This does not apply along the different years.Are you sure (there's no long term effect)? Note that you could combine Year and nMonth into one variable, say t. However, if I understand your variables correctly, this would be correlated with DaysIa. So, if you try to fit a model with both Year and DaysIa, then Year is less likely to be significant, and you probably don't need both. Note that another approach, is to regard month as a categorical variable. Also, note that it may be worthwhile testing for interactions, between MARPOL and Location or Site. If you want to be fancy, you could test for interactions between MARPOL and your time variables. It's possible that there are higher order interactions, however, these sort of models are difficult for most people to interpret, so are probably a bad idea. [[alternative HTML version deleted]]