thr3ads.net - R help - [R] Aggregate counts of observations with times surrounding a time? [May 2017]

If this information is useful, please help other people find it:
Share via:

Mark Noworolski

2017-May-16 04:48 UTC

[R] Aggregate counts of observations with times surrounding a time?

I have a data frame that has a set of observed dwell times at a set of
locations. The metadata for the locations includes things that have varying
degrees of specificity. I'm interested in tracking the number of people
present at a given time in a given store, type of store, or zip code.

Here's an example of some sample data (here st=start_time, and et=end_time):
data.frame(st=seq(1483360938,by=1700,length=10),et=seq(1483362938,by=1700,length=10),store=c(rep("gap",5),rep("starbucks",5)),zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),store_id=seq(50,59))
           st         et     store   zip store_id
1  1483360938 1483362938       gap 94000       50
2  1483362638 1483364638       gap 94000       51
3  1483364338 1483366338       gap 94100       52
4  1483366038 1483368038       gap 94100       53
5  1483367738 1483369738       gap 94200       54
6  1483369438 1483371438 starbucks 94000       55
7  1483371138 1483373138 starbucks 94000       56
8  1483372838 1483374838 starbucks 94100       57
9  1483374538 1483376538 starbucks 94100       58
10 1483376238 1483378238 starbucks 94200       59

I'd like to be able to:
a) create aggretages of the number of people present in each store_id at a
given time
b) create aggregates of the number of people present - grouped by zip or
store

I expect to be rolling up to hour or half hour buckets, but I don't think I
should have to decide this up front and be able to do something clever to
be able to use ggplot + some other library to plot the time evolution of
this information, rolled up the way I want.

Any clever solutions? I've trolled stackoverflow and this email list.. to
no avail - but I'm willing to acknowledge I may have missed something.

	[[alternative HTML version deleted]]

Jim Lemon

2017-May-16 09:43 UTC

head link

[R] Aggregate counts of observations with times surrounding a time?

Hi Mark,
I think you might want something like this:

mndf<-data.frame(st=seq(1483360938,by=1700,length=10),
 et=seq(1483362938,by=1700,length=10),
 store=c(rep("gap",5),rep("starbucks",5)),
 zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),
 store_id=seq(50,59))
# orders the times and calculates number of simultaneous presences
count_simult<-function(x) {
 nrows<-dim(x)[1]
 timeorder<-order(unlist(mndf[1:nrows,c("st","et")]))
 interval_counts<-data.frame(time=c(x$st,x$et)[timeorder],
 
startfin=rep(c("st","et"),each=5)[timeorder],count=rep(NA,10))
 interval_counts[1,"count"]<-1
 for(i in 2:(nrows*2)) {
  interval_counts[i,"count"]<-
   interval_counts[i-1,"count"]+
   ifelse(interval_counts[i,"startfin"]=="st",1,-1)
 }
 return(interval_counts)
}
gap_counts<-count_simult(mndf[1:5,])
plot(gap_counts$time,gap_counts$count,type="l")
starbucks_counts<-count_simult(mndf[6:10,])
plot(starbucks_counts$time,gap_counts$count,type="l")

There are a lot of ways to plot the counts by time. If you have any
preferences, let me know.

Jim


On Tue, May 16, 2017 at 2:48 PM, Mark Noworolski <jmarkn at gmail.com>
wrote:> I have a data frame that has a set of observed dwell times at a set of
> locations. The metadata for the locations includes things that have varying
> degrees of specificity. I'm interested in tracking the number of people
> present at a given time in a given store, type of store, or zip code.
>
> Here's an example of some sample data (here st=start_time, and
et=end_time):
>
data.frame(st=seq(1483360938,by=1700,length=10),et=seq(1483362938,by=1700,length=10),store=c(rep("gap",5),rep("starbucks",5)),zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),store_id=seq(50,59))
>            st         et     store   zip store_id
> 1  1483360938 1483362938       gap 94000       50
> 2  1483362638 1483364638       gap 94000       51
> 3  1483364338 1483366338       gap 94100       52
> 4  1483366038 1483368038       gap 94100       53
> 5  1483367738 1483369738       gap 94200       54
> 6  1483369438 1483371438 starbucks 94000       55
> 7  1483371138 1483373138 starbucks 94000       56
> 8  1483372838 1483374838 starbucks 94100       57
> 9  1483374538 1483376538 starbucks 94100       58
> 10 1483376238 1483378238 starbucks 94200       59
>
> I'd like to be able to:
> a) create aggretages of the number of people present in each store_id at a
> given time
> b) create aggregates of the number of people present - grouped by zip or
> store
>
> I expect to be rolling up to hour or half hour buckets, but I don't
think I
> should have to decide this up front and be able to do something clever to
> be able to use ggplot + some other library to plot the time evolution of
> this information, rolled up the way I want.
>
> Any clever solutions? I've trolled stackoverflow and this email list..
to
> no avail - but I'm willing to acknowledge I may have missed something.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jim Lemon

2017-May-16 22:11 UTC

head link

[R] Aggregate counts of observations with times surrounding a time?

Hi again,
Here is a version cleaned up a bit. Too tired to do it last night.

mndf<-data.frame(st=seq(1483360938,by=1700,length=10),
 et=seq(1483362938,by=1700,length=10),
 store=c(rep("gap",5),rep("starbucks",5)),
 zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),
 store_id=seq(50,59))
# orders the times and calculates number of simultaneous presences
count_simult<-function(x) {
 nrows<-dim(x)[1]
 timeorder<-order(unlist(mndf[1:nrows,c("st","et")]))
 # initialize result data frame - first time always has a value of 1
 interval_counts<-data.frame(time=c(x$st,x$et)[timeorder],
  startfin=rep(c("st","et"),each=5)[timeorder],
  count=c(1,rep(0,nrows-1)))
 for(i in 2:(nrows*2)) {
  interval_counts[i,"count"]<-
   interval_counts[i-1,"count"]+
   ifelse(interval_counts[i,"startfin"]=="st",1,-1)
 }
 return(interval_counts)
}
gap_counts<-count_simult(mndf[mndf$store=="gap",])
plot(gap_counts$time,gap_counts$count,type="l")
starbucks_counts<-count_simult(mndf[mndf$store=="starbucks",])
plot(starbucks_counts$time,gap_counts$count,type="l")

Jim


On Tue, May 16, 2017 at 7:43 PM, Jim Lemon <drjimlemon at gmail.com>
wrote:> Hi Mark,
> I think you might want something like this:
>
> mndf<-data.frame(st=seq(1483360938,by=1700,length=10),
>  et=seq(1483362938,by=1700,length=10),
>  store=c(rep("gap",5),rep("starbucks",5)),
>  zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),
>  store_id=seq(50,59))
> # orders the times and calculates number of simultaneous presences
> count_simult<-function(x) {
>  nrows<-dim(x)[1]
> 
timeorder<-order(unlist(mndf[1:nrows,c("st","et")]))
>  interval_counts<-data.frame(time=c(x$st,x$et)[timeorder],
>  
startfin=rep(c("st","et"),each=5)[timeorder],count=rep(NA,10))
>  interval_counts[1,"count"]<-1
>  for(i in 2:(nrows*2)) {
>   interval_counts[i,"count"]<-
>    interval_counts[i-1,"count"]+
>    ifelse(interval_counts[i,"startfin"]=="st",1,-1)
>  }
>  return(interval_counts)
> }
> gap_counts<-count_simult(mndf[1:5,])
> plot(gap_counts$time,gap_counts$count,type="l")
> starbucks_counts<-count_simult(mndf[6:10,])
> plot(starbucks_counts$time,gap_counts$count,type="l")
>
> There are a lot of ways to plot the counts by time. If you have any
> preferences, let me know.
>
> Jim
>
>
> On Tue, May 16, 2017 at 2:48 PM, Mark Noworolski <jmarkn at
gmail.com> wrote:
>> I have a data frame that has a set of observed dwell times at a set of
>> locations. The metadata for the locations includes things that have
varying
>> degrees of specificity. I'm interested in tracking the number of
people
>> present at a given time in a given store, type of store, or zip code.
>>
>> Here's an example of some sample data (here st=start_time, and
et=end_time):
>>
data.frame(st=seq(1483360938,by=1700,length=10),et=seq(1483362938,by=1700,length=10),store=c(rep("gap",5),rep("starbucks",5)),zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),store_id=seq(50,59))
>>            st         et     store   zip store_id
>> 1  1483360938 1483362938       gap 94000       50
>> 2  1483362638 1483364638       gap 94000       51
>> 3  1483364338 1483366338       gap 94100       52
>> 4  1483366038 1483368038       gap 94100       53
>> 5  1483367738 1483369738       gap 94200       54
>> 6  1483369438 1483371438 starbucks 94000       55
>> 7  1483371138 1483373138 starbucks 94000       56
>> 8  1483372838 1483374838 starbucks 94100       57
>> 9  1483374538 1483376538 starbucks 94100       58
>> 10 1483376238 1483378238 starbucks 94200       59
>>
>> I'd like to be able to:
>> a) create aggretages of the number of people present in each store_id
at a
>> given time
>> b) create aggregates of the number of people present - grouped by zip
or
>> store
>>
>> I expect to be rolling up to hour or half hour buckets, but I don't
think I
>> should have to decide this up front and be able to do something clever
to
>> be able to use ggplot + some other library to plot the time evolution
of
>> this information, rolled up the way I want.
>>
>> Any clever solutions? I've trolled stackoverflow and this email
list.. to
>> no avail - but I'm willing to acknowledge I may have missed
something.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

R help - May 2017 - Aggregate counts of observations with times surrounding a time?

[R] Aggregate counts of observations with times surrounding a time?

[R] Aggregate counts of observations with times surrounding a time?

[R] Aggregate counts of observations with times surrounding a time?