Rahul Sharm
2017-Dec-30 18:34 UTC
[R] 10 Minute Time Window Aggregate By Multiple Grouping Variables
Hello All, I have 1M rows of time stamped information about delivery trucks and their trip related information from simulation .Detailed info the column names and attributes for clarity 1. id: alphanumeric factor/character 2. datetime.of.trip.start: POSIXct yyyy-mm-dd hh:mm:ss 3. datetime.of.trip.end: POSIXct yyyy-mm-dd hh:mm:ss 4. trip.distance: numeric, miles 5. trip.fuel.consumed: numeric, gallons Close to 2500 trucks , simulated for a year with varying trip behavior. This is what I am trying to accomplish 1. First, I want to create 10 minute intervals from a chosen start date and end date from simulation, ex: 2011-03-30 00:00:00 to 2012-04-01 00:00:00 { approximately 8760 hours * six, 10 minute blocks per hour = looking at 52K unique timestamps }. If it works, will look into 15 minute, hourly and so on. * This will be representative of a start.time.index * I want to recreate the same time window with same frequency but for a column named end.time.index with time increment of 10 minutes 2. Go to the raw data, inspect the " datetime.of.trip.start " and "datetime.of.trip.end", get the time span, match it with the derived 10 min time time intervals (start and end), equally distribute the columns 4,5 (the numeric variables) among the 10 minute indices. Single row example - datetime.of.trip.start: 2017-01-01 00:00:00 - datetime.of.end.start: 2017-01-01 01:00:00 - trip.distance = 60 miles - trip.fuel.consumed = 6 gallons Interpretation of raw data, on January 1, 2017 from 00 am/midnight to 1 am, the delivery truck traveled 60 miles using 6 gallons of fuel 3. I want to be able to do this over entire raw data, get the aggregate sum of the variables of interest (distance and fuel for example) by the 10.min start and end time intervals across all ids as long as there is the 10min start/end time overlap derived from the raw data 4. Repeat the same with some grouping criteria from DateTime or suitable index, example day of week or by truckid. From index 1 (00:00:00 to 00:10:00 ) to index 6(00:00:50 to 01:00:00), 60 miles and 6 gallons are equally distributed. So far, I have created the 10 min HH:MM ids and exploring the foverlaps in data.table, period.apply from xts, converting the time indices into numerics and explore ddply %>% mutate %>% group by options. I am trying to get a more streamlined version that would work as a time series object which would help me get the descriptive statistics (aggregate by grouping criteria for sums and means) and do some plotting and time series analysis (RMA, smoothing, mixed regression models). I got stuck and totally lost when working with zoo and xts, I was creating the index and working on period.apply but was not sure how to work around with 2 POSIXct column variables (start and end). I have the desired index and the cleaned data, I am looking for a more elegant data.table solution to complete the 10min aggregate. Thanks! # simple illustration of the daa id<-c("A1","A1","A1","A1","A1","A1") start.time.index=as.POSIXct (c("2017-01-01 00:00:00","2017-01-01 00:10:00","2017-01-01 00:20:00","2017-01-01 00:30:00","2017-01-01 00:40:00","2017-01-01 00:50:00"), format="%Y-%m-%d %H:%M:%S") end.time.index=as.POSIXct (c("2017-01-01 00:10:00","2017-01-01 00:20:00","2017-01-01 00:30:00","2017-01-01 00:40:00","2017-01-01 00:50:00","2017-01-01 01:00:00"), format="%Y-%m-%d %H:%M:%S") trip.miles=c(10,10,10,10,10,10) trip.fuel=c(1,1,1,1,1,1) # total of 6 gallons in 1 hour equally divided into 6 bins of 10 minute window. desired.data<-c(id,start.time.index,end.time.index,trip.miles,end.time.index) [[alternative HTML version deleted]]