jcrosbie
2015-May-06 21:55 UTC
[R] How to finding a given length of runs in a series of data?
I'm trying to study times in which flow was operating at a given level or greater. To do so I have created a way to see how long the series has operated at a high level. But for some reason the data is calculating the runs one hour to long. Any ideas on why? Code: Date<-format(seq(as.POSIXct("2014-01-01 01:00"), as.POSIXct("2015-01-01 00:00"), by="hour"), "%Y-%m-%d %H:%M", usetz = FALSE) Flow<-runif(8760, 0, 2300) IsHigh<- function(x ){ if (x < 1600) return(0) if (1600 <= x) return(1) } isHighFlow = unlist(lapply(Flow, IsHigh)) df = data.frame(Date, Flow, isHighFlow ) temp <- df %>% mutate(highFlowInterval = cumsum(isHighFlow==0)) %>% group_by(highFlowInterval) %>% summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate = max(as.character(Date))) #Then join the two tables together. temp2<-sqldf("SELECT * FROM temp LEFT JOIN df ON df.Date BETWEEN temp.minDate AND temp.maxDate") -- View this message in context: http://r.789695.n4.nabble.com/How-to-finding-a-given-length-of-runs-in-a-series-of-data-tp4706915.html Sent from the R help mailing list archive at Nabble.com.
Adams, Jean
2015-May-07 10:32 UTC
[R] How to finding a given length of runs in a series of data?
Two libraries are needed to run the code you submitted ... library(dplyr) library(sqldf) Your IsHigh() function and its use can be replaced by a single line of code isHighFlow <- as.numeric(Flow>=1600) You are getting the additional hour by using cumsum(). One date element which you seem to characterize as zero hours returns a one in cumsum, two returns two, etc. cumsum(c(1, 0, 1, 1, 0, 1, 1, 1, 0)) If everything is off by one hour, just subtract a 1. Problem solved. Jean On Wed, May 6, 2015 at 5:55 PM, jcrosbie <james at crosb.ie> wrote:> I'm trying to study times in which flow was operating at a given level or > greater. To do so I have created a way to see how long the series has > operated at a high level. But for some reason the data is calculating the > runs one hour to long. Any ideas on why? > > > > > > Code: > Date<-format(seq(as.POSIXct("2014-01-01 01:00"), as.POSIXct("2015-01-01 > 00:00"), by="hour"), "%Y-%m-%d %H:%M", usetz = FALSE) > Flow<-runif(8760, 0, 2300) > > IsHigh<- function(x ){ > if (x < 1600) return(0) > if (1600 <= x) return(1) > } > > isHighFlow = unlist(lapply(Flow, IsHigh)) > > df = data.frame(Date, Flow, isHighFlow ) > > > temp <- df %>% > mutate(highFlowInterval = cumsum(isHighFlow==0)) %>% > group_by(highFlowInterval) %>% > summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate > = max(as.character(Date))) > > #Then join the two tables together. > temp2<-sqldf("SELECT * > FROM temp LEFT JOIN df > ON df.Date BETWEEN temp.minDate AND temp.maxDate") > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/How-to-finding-a-given-length-of-runs-in-a-series-of-data-tp4706915.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]