I large datset that includes subjects(ID), Dates and events that need to be counted. Not every date includes an event, and I need to only count one event per 30days, per subject. So in essence, I need to create a 30-day "black out" period during which time an event cannot be "counted" for each subject. The reason is that a rule has been set up, whereby a subject can only be "counted" once per 30 day period (the 30 day window includes the day the event of interest is counted). The solution should count only the following events per subject(per the 30-day blackout rule): ID Date auto1 1/1/2010 auto2 2/12/2010 auto2 4/21/2011 auto3 3/1/2010 auto3 5/3/2010 I have created a multistep process to do this, but it is extremely clumsy (detailed below). I have to believe that one of you has a much more elegant solution. Thank you all in advance for any help!!!! ## example data data1 <- structure(list(ID = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,3L, 4L, 4L, 4L, 4L, 4L), .Label = c("", "auto1", "auto2", "auto3"), class = "factor"), Date = structure(c(14610, 14610, 14627,14680, 14652, 14660, 14725, 15085, 15086, 14642, 14669, 14732,14747, 14749), class = "Date"), event = c(1L, 1L, 1L, 0L, 1L,1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L)), .Names = c("ID", "Date","event"), class = "data.frame", row.names = c(NA, 14L)) ## remove non events data2 <- data1[data1$event==1,] library(doBy) ## create a table of first events step1 <- summaryBy(Date~ID, data = data2, FUN=min) step1$Date30 <- step1$Date.min+30 step2 <- merge(data2, step1, by.x="ID", by.y="ID") ## use an ifelse to essentially remove any events that shouldn't be counted step2$event <- ifelse(as.numeric(step2$Date) >= step2$Date.min & as.numeric(step2$Date) <= step2$Date30, 0, step2$event) ## basically repeat steps above until I get an error (no more events) data3 <- step2[step2$event==1,] data3<- data3[,1:3] step3 <- summaryBy(Date~ID, data = data3, FUN=min) step3$Date30 <- step3$Date.min+30 step4 <- merge(data3, step3, by.x="ID", by.y="ID") step4$event <- ifelse(as.numeric(step4$Date) >= step4$Date.min & as.numeric(step4$Date) <= step4$Date30, 0, step4$event) ## then I rbind the "keepers" ## in this case steps 1 and 3 above final <- rbind(step1,step3) ## then reformat final <- final[,1:2] final$Date.min <- as.Date(final$Date.min,origin="1970-01-01") ## again, extremely clumsy, but it works... HELP! :) [[alternative HTML version deleted]]
Dennis Murphy
2011-Nov-19 04:25 UTC
[R] couting events by subject with "black out" windows
Hi: Here's a Q & D solution that could be improved. It uses the plyr package. Starting from your data1 data frame, library('plyr') dseq <- seq(as.Date('2010-01-01'), as.Date('2011-06-05'), by = '30 days') # Use the cut() function to create a factor whose levels are demarcated # by the dates in dseq: # See ?cut for labeling options data1[['tf']] <- cut(data1$Date, dseq) ddply(subset(data1, event == 1L), .(tf), summarise, Date.min = min(Date)) tf Date.min 1 2010-01-01 2010-01-01 2 2010-01-31 2010-02-12 3 2010-05-01 2010-05-03 4 2011-03-27 2011-04-21 The value of tf is the left endpoint of the time interval. This isn't your desired output in two respects: (1) summarise won't carry along extra variables, so ID gets dropped; (2) you have 2010-03-01 as the first date of a 30-day period, but according to the way I defined the 30-day intervals, Mar. 1 is the last day of an interval, so that's why it's not included [2010-2-12 precedes it]. You can always change the definitions. If you group by months instead, you get the output you expected. Hope this is enough to get you started.. Dennis On Fri, Nov 18, 2011 at 3:22 PM, Chris Conner <connerpharmd at yahoo.com> wrote:> I large datset that includes subjects(ID), Dates and events that need to be counted.? Not every date includes an event, and I need to only count one event per 30days, per subject.? So in essence, I need to create a 30-day "black out" period during which time an event cannot be "counted" for each subject.? The reason is that a rule has been set up, whereby a subject can only be "counted" once per 30 day period (the 30 day window includes the day the event of interest is counted). > > The solution should count only the following events per subject(per the 30-day blackout rule): > > ID?Date > auto1?1/1/2010 > auto2?2/12/2010 > auto2?4/21/2011 > auto3?3/1/2010 > auto3?5/3/2010 > > I have created a multistep process to do this, but it is extremely clumsy (detailed below).? I have to believe that one of you has a much more elegant solution.? Thank you all in advance for any help!!!! > > ##?????example data > data1 <- structure(list(ID = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,3L, 4L, 4L, 4L, 4L, 4L), .Label = c("", "auto1", "auto2", "auto3"), class = "factor"), Date = structure(c(14610, 14610, 14627,14680, 14652, 14660, 14725, 15085, 15086, 14642, 14669, 14732,14747, 14749), class = "Date"), event = c(1L, 1L, 1L, 0L, 1L,1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L)), .Names = c("ID", "Date","event"), class = "data.frame", row.names = c(NA, 14L)) > ##???? remove non events > data2 <- data1[data1$event==1,] > library(doBy) > ##???? create a table of first events > step1 <- summaryBy(Date~ID, data = data2, FUN=min) > step1$Date30 <- step1$Date.min+30 > step2 <- merge(data2, step1, by.x="ID", by.y="ID") > ##???? use an ifelse to essentially remove any events that shouldn't be counted > step2$event <- ifelse(as.numeric(step2$Date) >= step2$Date.min & as.numeric(step2$Date) <= step2$Date30, 0, step2$event) > ##???? basically repeat steps above until I get an error (no more events) > data3 <- step2[step2$event==1,] > data3<- data3[,1:3] > step3 <- summaryBy(Date~ID, data = data3, FUN=min) > step3$Date30 <- step3$Date.min+30 > step4 <- merge(data3, step3, by.x="ID", by.y="ID") > step4$event <- ifelse(as.numeric(step4$Date) >= step4$Date.min & as.numeric(step4$Date) <= step4$Date30, 0, step4$event) > ##???? then I rbind the "keepers" > ##???? in this case steps 1 and 3 above > final <- rbind(step1,step3) > ##???? then reformat > final <- final[,1:2] > final$Date.min <- as.Date(final$Date.min,origin="1970-01-01") > ##???? again, extremely clumsy, but it works...? HELP! :) > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >