Mark Noworolski
2017-May-16 04:48 UTC
[R] Aggregate counts of observations with times surrounding a time?
I have a data frame that has a set of observed dwell times at a set of locations. The metadata for the locations includes things that have varying degrees of specificity. I'm interested in tracking the number of people present at a given time in a given store, type of store, or zip code. Here's an example of some sample data (here st=start_time, and et=end_time): data.frame(st=seq(1483360938,by=1700,length=10),et=seq(1483362938,by=1700,length=10),store=c(rep("gap",5),rep("starbucks",5)),zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),store_id=seq(50,59)) st et store zip store_id 1 1483360938 1483362938 gap 94000 50 2 1483362638 1483364638 gap 94000 51 3 1483364338 1483366338 gap 94100 52 4 1483366038 1483368038 gap 94100 53 5 1483367738 1483369738 gap 94200 54 6 1483369438 1483371438 starbucks 94000 55 7 1483371138 1483373138 starbucks 94000 56 8 1483372838 1483374838 starbucks 94100 57 9 1483374538 1483376538 starbucks 94100 58 10 1483376238 1483378238 starbucks 94200 59 I'd like to be able to: a) create aggretages of the number of people present in each store_id at a given time b) create aggregates of the number of people present - grouped by zip or store I expect to be rolling up to hour or half hour buckets, but I don't think I should have to decide this up front and be able to do something clever to be able to use ggplot + some other library to plot the time evolution of this information, rolled up the way I want. Any clever solutions? I've trolled stackoverflow and this email list.. to no avail - but I'm willing to acknowledge I may have missed something. [[alternative HTML version deleted]]
Jim Lemon
2017-May-16 09:43 UTC
[R] Aggregate counts of observations with times surrounding a time?
Hi Mark, I think you might want something like this: mndf<-data.frame(st=seq(1483360938,by=1700,length=10), et=seq(1483362938,by=1700,length=10), store=c(rep("gap",5),rep("starbucks",5)), zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200), store_id=seq(50,59)) # orders the times and calculates number of simultaneous presences count_simult<-function(x) { nrows<-dim(x)[1] timeorder<-order(unlist(mndf[1:nrows,c("st","et")])) interval_counts<-data.frame(time=c(x$st,x$et)[timeorder], startfin=rep(c("st","et"),each=5)[timeorder],count=rep(NA,10)) interval_counts[1,"count"]<-1 for(i in 2:(nrows*2)) { interval_counts[i,"count"]<- interval_counts[i-1,"count"]+ ifelse(interval_counts[i,"startfin"]=="st",1,-1) } return(interval_counts) } gap_counts<-count_simult(mndf[1:5,]) plot(gap_counts$time,gap_counts$count,type="l") starbucks_counts<-count_simult(mndf[6:10,]) plot(starbucks_counts$time,gap_counts$count,type="l") There are a lot of ways to plot the counts by time. If you have any preferences, let me know. Jim On Tue, May 16, 2017 at 2:48 PM, Mark Noworolski <jmarkn at gmail.com> wrote:> I have a data frame that has a set of observed dwell times at a set of > locations. The metadata for the locations includes things that have varying > degrees of specificity. I'm interested in tracking the number of people > present at a given time in a given store, type of store, or zip code. > > Here's an example of some sample data (here st=start_time, and et=end_time): > data.frame(st=seq(1483360938,by=1700,length=10),et=seq(1483362938,by=1700,length=10),store=c(rep("gap",5),rep("starbucks",5)),zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),store_id=seq(50,59)) > st et store zip store_id > 1 1483360938 1483362938 gap 94000 50 > 2 1483362638 1483364638 gap 94000 51 > 3 1483364338 1483366338 gap 94100 52 > 4 1483366038 1483368038 gap 94100 53 > 5 1483367738 1483369738 gap 94200 54 > 6 1483369438 1483371438 starbucks 94000 55 > 7 1483371138 1483373138 starbucks 94000 56 > 8 1483372838 1483374838 starbucks 94100 57 > 9 1483374538 1483376538 starbucks 94100 58 > 10 1483376238 1483378238 starbucks 94200 59 > > I'd like to be able to: > a) create aggretages of the number of people present in each store_id at a > given time > b) create aggregates of the number of people present - grouped by zip or > store > > I expect to be rolling up to hour or half hour buckets, but I don't think I > should have to decide this up front and be able to do something clever to > be able to use ggplot + some other library to plot the time evolution of > this information, rolled up the way I want. > > Any clever solutions? I've trolled stackoverflow and this email list.. to > no avail - but I'm willing to acknowledge I may have missed something. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jim Lemon
2017-May-16 22:11 UTC
[R] Aggregate counts of observations with times surrounding a time?
Hi again, Here is a version cleaned up a bit. Too tired to do it last night. mndf<-data.frame(st=seq(1483360938,by=1700,length=10), et=seq(1483362938,by=1700,length=10), store=c(rep("gap",5),rep("starbucks",5)), zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200), store_id=seq(50,59)) # orders the times and calculates number of simultaneous presences count_simult<-function(x) { nrows<-dim(x)[1] timeorder<-order(unlist(mndf[1:nrows,c("st","et")])) # initialize result data frame - first time always has a value of 1 interval_counts<-data.frame(time=c(x$st,x$et)[timeorder], startfin=rep(c("st","et"),each=5)[timeorder], count=c(1,rep(0,nrows-1))) for(i in 2:(nrows*2)) { interval_counts[i,"count"]<- interval_counts[i-1,"count"]+ ifelse(interval_counts[i,"startfin"]=="st",1,-1) } return(interval_counts) } gap_counts<-count_simult(mndf[mndf$store=="gap",]) plot(gap_counts$time,gap_counts$count,type="l") starbucks_counts<-count_simult(mndf[mndf$store=="starbucks",]) plot(starbucks_counts$time,gap_counts$count,type="l") Jim On Tue, May 16, 2017 at 7:43 PM, Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Mark, > I think you might want something like this: > > mndf<-data.frame(st=seq(1483360938,by=1700,length=10), > et=seq(1483362938,by=1700,length=10), > store=c(rep("gap",5),rep("starbucks",5)), > zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200), > store_id=seq(50,59)) > # orders the times and calculates number of simultaneous presences > count_simult<-function(x) { > nrows<-dim(x)[1] > timeorder<-order(unlist(mndf[1:nrows,c("st","et")])) > interval_counts<-data.frame(time=c(x$st,x$et)[timeorder], > startfin=rep(c("st","et"),each=5)[timeorder],count=rep(NA,10)) > interval_counts[1,"count"]<-1 > for(i in 2:(nrows*2)) { > interval_counts[i,"count"]<- > interval_counts[i-1,"count"]+ > ifelse(interval_counts[i,"startfin"]=="st",1,-1) > } > return(interval_counts) > } > gap_counts<-count_simult(mndf[1:5,]) > plot(gap_counts$time,gap_counts$count,type="l") > starbucks_counts<-count_simult(mndf[6:10,]) > plot(starbucks_counts$time,gap_counts$count,type="l") > > There are a lot of ways to plot the counts by time. If you have any > preferences, let me know. > > Jim > > > On Tue, May 16, 2017 at 2:48 PM, Mark Noworolski <jmarkn at gmail.com> wrote: >> I have a data frame that has a set of observed dwell times at a set of >> locations. The metadata for the locations includes things that have varying >> degrees of specificity. I'm interested in tracking the number of people >> present at a given time in a given store, type of store, or zip code. >> >> Here's an example of some sample data (here st=start_time, and et=end_time): >> data.frame(st=seq(1483360938,by=1700,length=10),et=seq(1483362938,by=1700,length=10),store=c(rep("gap",5),rep("starbucks",5)),zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),store_id=seq(50,59)) >> st et store zip store_id >> 1 1483360938 1483362938 gap 94000 50 >> 2 1483362638 1483364638 gap 94000 51 >> 3 1483364338 1483366338 gap 94100 52 >> 4 1483366038 1483368038 gap 94100 53 >> 5 1483367738 1483369738 gap 94200 54 >> 6 1483369438 1483371438 starbucks 94000 55 >> 7 1483371138 1483373138 starbucks 94000 56 >> 8 1483372838 1483374838 starbucks 94100 57 >> 9 1483374538 1483376538 starbucks 94100 58 >> 10 1483376238 1483378238 starbucks 94200 59 >> >> I'd like to be able to: >> a) create aggretages of the number of people present in each store_id at a >> given time >> b) create aggregates of the number of people present - grouped by zip or >> store >> >> I expect to be rolling up to hour or half hour buckets, but I don't think I >> should have to decide this up front and be able to do something clever to >> be able to use ggplot + some other library to plot the time evolution of >> this information, rolled up the way I want. >> >> Any clever solutions? I've trolled stackoverflow and this email list.. to >> no avail - but I'm willing to acknowledge I may have missed something. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.