jcrosbie
2015-May-06 21:55 UTC
[R] How to finding a given length of runs in a series of data?
I'm trying to study times in which flow was operating at a given level or
greater. To do so I have created a way to see how long the series has
operated at a high level. But for some reason the data is calculating the
runs one hour to long. Any ideas on why?
Code:
Date<-format(seq(as.POSIXct("2014-01-01 01:00"),
as.POSIXct("2015-01-01
00:00"), by="hour"), "%Y-%m-%d %H:%M", usetz =
FALSE)
Flow<-runif(8760, 0, 2300)
IsHigh<- function(x ){
if (x < 1600) return(0)
if (1600 <= x) return(1)
}
isHighFlow = unlist(lapply(Flow, IsHigh))
df = data.frame(Date, Flow, isHighFlow )
temp <- df %>%
mutate(highFlowInterval = cumsum(isHighFlow==0)) %>%
group_by(highFlowInterval) %>%
summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate
= max(as.character(Date)))
#Then join the two tables together.
temp2<-sqldf("SELECT *
FROM temp LEFT JOIN df
ON df.Date BETWEEN temp.minDate AND temp.maxDate")
--
View this message in context:
http://r.789695.n4.nabble.com/How-to-finding-a-given-length-of-runs-in-a-series-of-data-tp4706915.html
Sent from the R help mailing list archive at Nabble.com.
Adams, Jean
2015-May-07 10:32 UTC
[R] How to finding a given length of runs in a series of data?
Two libraries are needed to run the code you submitted ... library(dplyr) library(sqldf) Your IsHigh() function and its use can be replaced by a single line of code isHighFlow <- as.numeric(Flow>=1600) You are getting the additional hour by using cumsum(). One date element which you seem to characterize as zero hours returns a one in cumsum, two returns two, etc. cumsum(c(1, 0, 1, 1, 0, 1, 1, 1, 0)) If everything is off by one hour, just subtract a 1. Problem solved. Jean On Wed, May 6, 2015 at 5:55 PM, jcrosbie <james at crosb.ie> wrote:> I'm trying to study times in which flow was operating at a given level or > greater. To do so I have created a way to see how long the series has > operated at a high level. But for some reason the data is calculating the > runs one hour to long. Any ideas on why? > > > > > > Code: > Date<-format(seq(as.POSIXct("2014-01-01 01:00"), as.POSIXct("2015-01-01 > 00:00"), by="hour"), "%Y-%m-%d %H:%M", usetz = FALSE) > Flow<-runif(8760, 0, 2300) > > IsHigh<- function(x ){ > if (x < 1600) return(0) > if (1600 <= x) return(1) > } > > isHighFlow = unlist(lapply(Flow, IsHigh)) > > df = data.frame(Date, Flow, isHighFlow ) > > > temp <- df %>% > mutate(highFlowInterval = cumsum(isHighFlow==0)) %>% > group_by(highFlowInterval) %>% > summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate > = max(as.character(Date))) > > #Then join the two tables together. > temp2<-sqldf("SELECT * > FROM temp LEFT JOIN df > ON df.Date BETWEEN temp.minDate AND temp.maxDate") > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/How-to-finding-a-given-length-of-runs-in-a-series-of-data-tp4706915.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]