Dear R users, I have a data frame with daily snow cumulates (these quantities are known as "hn" and are expressed in cm), from the 1st of December to the 30th of April, for more than twenty years. I would need to find days when the sum of a given short interval (I might choose two consecutive days, three consecutive days or something like that) is higher than a threshold (it might be 80 cm, or 100 cm). I am trying with rle, but I really struggle to find an efficient algorithm. Could somebody help me with some hints? Thank you for your attention and your help Stefano init_day <- as.POSIXct("2018-02-01", format="%Y-%m-%d", tz="Etc/GMT-1") fin_day <- as.POSIXct("2018-02-20", format="%Y-%m-%d", tz="Etc/GMT-1") mydf <- data.frame(data_POSIX=seq(init_day, fin_day, by="1 day")) mydf$hn <- c(30, 0, 10, 50, NA, 40, 70, 0, 0, 0 , NA, 10, 50, 30, 30, 10, 0, 0, 90, 0) - if I choose a threshold of 100 cm in two days, I should get the 6th of February; - if I choose a threshold of 80 cm in two days I should get the 6th and the 13th of February, but not the 19th of February because this is a single day; - f I choose a threshold of 100 cm in four days, I should get the 12th of February. (oo) --oOO--( )--OOo-------------------------------------- Stefano Sofia PhD Civil Protection - Marche Region - Italy Meteo Section Snow Section Via del Colle Ameno 5 60126 Torrette di Ancona, Ancona (AN) Uff: +39 071 806 7743 E-mail: stefano.sofia at regione.marche.it ---Oo---------oO---------------------------------------- ________________________________ AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si ? il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la risposta al presente messaggio di posta elettronica pu? essere visionata da persone estranee al destinatario. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system. -- Questo messaggio stato analizzato da Libraesva ESG ed risultato non infetto. This message was scanned by Libraesva ESG and is believed to be clean. [[alternative HTML version deleted]]
Hi I would use ?embed function. nr <- which(rowSums(embed(mydf$hn, 2))>=80) mydf[nr,] But I feel strange that variant 40,50 should be accepted but 0, 90 should not. Both after two consecutive days result in more than 80cm cumulative snow. What about 1,80 how it differs from 0, 81. basically with your constrain zero means NA so if I change all zeroes to NA I get quite close.> nr <- which(rowSums(embed(mydf$hn, 2))>=80) > mydf[nr,]data_POSIX hn 6 2018-02-06 40 13 2018-02-13 50> nr <- which(rowSums(embed(mydf$hn, 4))>=100) > mydf[nr,]data_POSIX hn 12 2018-02-12 10 13 2018-02-13 50> nr <- which(rowSums(embed(mydf$hn, 2))>=100) > mydf[nr,]data_POSIX hn 6 2018-02-06 40>Anyway, you need to polish the result a bit as in four consecutive days both 12 and 13 results in higher than 100 cm Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Stefano Sofia > Sent: Friday, July 30, 2021 9:24 AM > To: r-help mailing list <r-help at r-project.org> > Subject: [R] Cumulates of snowfall within a given interval > > Dear R users, > I have a data frame with daily snow cumulates (these quantities are known > as "hn" and are expressed in cm), from the 1st of December to the 30th of > April, for more than twenty years. > > I would need to find days when the sum of a given short interval (I might > choose two consecutive days, three consecutive days or something like that) > is higher than a threshold (it might be 80 cm, or 100 cm). > > I am trying with rle, but I really struggle to find an efficient algorithm. > Could somebody help me with some hints? > > Thank you for your attention and your help > Stefano > > > init_day <- as.POSIXct("2018-02-01", format="%Y-%m-%d", tz="Etc/GMT-1") > fin_day <- as.POSIXct("2018-02-20", format="%Y-%m-%d", tz="Etc/GMT-1") > mydf <- data.frame(data_POSIX=seq(init_day, fin_day, by="1 day")) > mydf$hn <- c(30, 0, 10, 50, NA, 40, 70, 0, 0, 0 , NA, 10, 50, 30, 30, 10, 0, 0, 90, 0) > > - if I choose a threshold of 100 cm in two days, I should get the 6th of > February; > - if I choose a threshold of 80 cm in two days I should get the 6th and the 13th > of February, but not the 19th of February because this is a single day; > - f I choose a threshold of 100 cm in four days, I should get the 12th of > February. > > (oo) > --oOO--( )--OOo-------------------------------------- > Stefano Sofia PhD > Civil Protection - Marche Region - Italy > Meteo Section > Snow Section > Via del Colle Ameno 5 > 60126 Torrette di Ancona, Ancona (AN) > Uff: +39 071 806 7743 > E-mail: stefano.sofia at regione.marche.it > ---Oo---------oO---------------------------------------- > > ________________________________ > > AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere > informazioni confidenziali, pertanto ? destinato solo a persone autorizzate > alla ricezione. I messaggi di posta elettronica per i client di Regione Marche > possono contenere informazioni confidenziali e con privilegi legali. Se non si ? > il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo > messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al > mittente ed eliminarlo completamente dal sistema del proprio computer. Ai > sensi dell?art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed > urgenza, la risposta al presente messaggio di posta elettronica pu? essere > visionata da persone estranee al destinatario. > IMPORTANT NOTICE: This e-mail message is intended to be received only by > persons entitled to receive the confidential information it may contain. E-mail > messages to clients of Regione Marche may contain information that is > confidential and legally privileged. Please do not read, copy, forward, or store > this message unless you are an intended recipient of it. If you have received > this message in error, please forward it to the sender and delete it > completely from your computer system. > > -- > Questo messaggio stato analizzato da Libraesva ESG ed risultato non infetto. > This message was scanned by Libraesva ESG and is believed to be clean. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Stefano, Try using rollsum from the zoo package: library(zoo) rollsum_index<-function(x,window,val) return(which(rollsum(x,window) >= val)) rollsum_index(mydf$hn,2,80) [1] 6 13 18 19 Jim On Fri, Jul 30, 2021 at 5:24 PM Stefano Sofia <stefano.sofia at regione.marche.it> wrote:> > Dear R users, > I have a data frame with daily snow cumulates (these quantities are known as "hn" and are expressed in cm), from the 1st of December to the 30th of April, for more than twenty years. > > I would need to find days when the sum of a given short interval (I might choose two consecutive days, three consecutive days or something like that) is higher than a threshold (it might be 80 cm, or 100 cm). > > I am trying with rle, but I really struggle to find an efficient algorithm. > Could somebody help me with some hints? > > Thank you for your attention and your help > Stefano > > > init_day <- as.POSIXct("2018-02-01", format="%Y-%m-%d", tz="Etc/GMT-1") > fin_day <- as.POSIXct("2018-02-20", format="%Y-%m-%d", tz="Etc/GMT-1") > mydf <- data.frame(data_POSIX=seq(init_day, fin_day, by="1 day")) > mydf$hn <- c(30, 0, 10, 50, NA, 40, 70, 0, 0, 0 , NA, 10, 50, 30, 30, 10, 0, 0, 90, 0) > > - if I choose a threshold of 100 cm in two days, I should get the 6th of February; > - if I choose a threshold of 80 cm in two days I should get the 6th and the 13th of February, but not the 19th of February because this is a single day; > - f I choose a threshold of 100 cm in four days, I should get the 12th of February. > > (oo) > --oOO--( )--OOo-------------------------------------- > Stefano Sofia PhD > Civil Protection - Marche Region - Italy > Meteo Section > Snow Section > Via del Colle Ameno 5 > 60126 Torrette di Ancona, Ancona (AN) > Uff: +39 071 806 7743 > E-mail: stefano.sofia at regione.marche.it > ---Oo---------oO---------------------------------------- > > ________________________________ > > AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si ? il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la risposta al presente messaggio di posta elettronica pu? essere visionata da persone estranee al destinatario. > IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system. > > -- > Questo messaggio stato analizzato da Libraesva ESG ed risultato non infetto. > This message was scanned by Libraesva ESG and is believed to be clean. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
> x <- c(1,2,3) # a vector of numbers, such as snowfallsum > (cx <- cumsum(x)) # a vector of cumulative sums.1 3 6> i <- 1 # The starting point. > j <- 2 # The ending point. > cx[j] - cx[i-1] # sum of x[i] + ... + x[j]ERROR!> cx <- c(0, cx) # Oops, we need this step. > cx[j+1] - cx[i]So using c(0,cumsum(x)) you take O(#x) time and O(#x) space and get a data structure that will answer any (i,j) -> x[i]+...+x[j] query in constant time. Let's now suppose you fix delta = 3 (days) and some threshold: indices <- (delta + 1):length(cx) which(cx[indices] - cx[indices - delta] > threshold) On Fri, 30 Jul 2021 at 19:24, Stefano Sofia <stefano.sofia at regione.marche.it> wrote:> > Dear R users, > I have a data frame with daily snow cumulates (these quantities are known as "hn" and are expressed in cm), from the 1st of December to the 30th of April, for more than twenty years. > > I would need to find days when the sum of a given short interval (I might choose two consecutive days, three consecutive days or something like that) is higher than a threshold (it might be 80 cm, or 100 cm). > > I am trying with rle, but I really struggle to find an efficient algorithm. > Could somebody help me with some hints? > > Thank you for your attention and your help > Stefano > > > init_day <- as.POSIXct("2018-02-01", format="%Y-%m-%d", tz="Etc/GMT-1") > fin_day <- as.POSIXct("2018-02-20", format="%Y-%m-%d", tz="Etc/GMT-1") > mydf <- data.frame(data_POSIX=seq(init_day, fin_day, by="1 day")) > mydf$hn <- c(30, 0, 10, 50, NA, 40, 70, 0, 0, 0 , NA, 10, 50, 30, 30, 10, 0, 0, 90, 0) > > - if I choose a threshold of 100 cm in two days, I should get the 6th of February; > - if I choose a threshold of 80 cm in two days I should get the 6th and the 13th of February, but not the 19th of February because this is a single day; > - f I choose a threshold of 100 cm in four days, I should get the 12th of February. > > (oo) > --oOO--( )--OOo-------------------------------------- > Stefano Sofia PhD > Civil Protection - Marche Region - Italy > Meteo Section > Snow Section > Via del Colle Ameno 5 > 60126 Torrette di Ancona, Ancona (AN) > Uff: +39 071 806 7743 > E-mail: stefano.sofia at regione.marche.it > ---Oo---------oO---------------------------------------- > > ________________________________ > > AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si ? il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la risposta al presente messaggio di posta elettronica pu? essere visionata da persone estranee al destinatario. > IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system. > > -- > Questo messaggio stato analizzato da Libraesva ESG ed risultato non infetto. > This message was scanned by Libraesva ESG and is believed to be clean. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.