Gavin Rudge
2013-Aug-15 13:22 UTC
[R] Create new records based on event dates in a data frame
One of those simple tasks, but I can't get to first base with it. I've
got a data set of observations of subjects over a 10 year period beginning on
1st April 2001 and ending on 31st March 2011. One of may variables is a score
based on an intervention on a given date. Before the intervention there is
baseline score, on the day of intervention the score changes as a result. If
there is no observation the baseline score remains constant over the entire
period.
Now I have a data.frame with subject ids, the baseline score with a baseline
date of 01/04/2001, a baseline score , and the date of any subsequent
observations and the scores resulting from them.
Here is a rough approximation with just three subjects, one with two
interventions one with one, and one with none. My actual data set has about
30,000 observations most of them with one or two interventions.
case=c("a","a","a","b","b","c")
obsdate<-c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001")
score=c(60,72,85,72,79,65)
df<-as.data.frame(cbind(case,obsdate,score))
df$obsdate<-as.Date(df$obsdate,format="%d/%m/%Y")
df
Now the data set I am trying to obtain for my analysis will consist of exposure
periods for each subject, with the start and end date and the score during the
period of exposure. So each subject will have at least one exposure period
beginning on the start date and a score. In those cases where there has been an
intervention (most of them) the next exposure period will start on the day of
intervention, and the earlier period will end the day before. If there are no
subsequent interventions between the start of one and the end of the study
period, 31/03/2011, then the last exposure period is censored at this date.
Where there is no intervention at all, (case 'c' is an example) the
exposure period is the duration of the study, from 01/04/2001 to 31/03/2011.
So for the above example my resulting data frame should look like this:
exp_case=c("a","a","a","b","b","c")
date_begin=c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001")
date_end=c("19/05/2001","07/10/2010","31/03/2011","09/11/2005","31/03/2011","31/03/2011")
exp_score=c(60,72,85,72,79,65)
expdata<-as.data.frame(cbind(exp_case,date_begin,date_end,exp_score))
expdata$date_begin<-as.Date(expdata$date_begin,format="%d/%m/%Y")
expdata$date_end<-as.Date(expdata$date_end,format="%d/%m/%Y")
expdata
Sorry about the clunky way I've handled the dates, this is the only way I
know how to do this.
All assistance gratefully received
GavinR
Hi,One way would be:
df<-
data.frame(case,obsdate=as.Date(obsdate,format="%d/%m/%Y"),score,stringsAsFactors=FALSE)?
#using as.data.frame(cbind(...? should be avoided
df$date_end<-as.Date(unlist(lapply(with(df,tapply(obsdate,case,FUN=function(x)
x-1)),function(x)
c(x[-1],as.Date("31/03/2011",format="%d/%m/%Y")))),origin="1970-01-01")
?df1<- df[,c(1,2,4,3)]
?df1
#? case??? obsdate?? date_end score
#1??? a 2001-04-01 2007-05-19??? 60
#2??? a 2007-05-20 2010-10-07??? 72
#3??? a 2010-10-08 2011-03-31??? 85
#4??? b 2001-04-01 2005-11-09??? 72
#5??? b 2005-11-10 2011-03-31??? 79
#6??? c 2001-04-01 2011-03-31??? 65
A.K.
----- Original Message -----
From: Gavin Rudge <g.rudge at bham.ac.uk>
To: "'r-help at r-project.org'" <r-help at
r-project.org>
Cc:
Sent: Thursday, August 15, 2013 9:22 AM
Subject: [R] Create new records based on event dates in a data frame
One of those simple tasks, but I can't get to first base with it.? I've
got a data set of observations of subjects over a 10 year period beginning on
1st April 2001 and ending on 31st March 2011.? One of may variables is a score
based on an intervention on a given date. Before the intervention there is
baseline score, on the day of intervention the score changes as a result.? If
there is no observation the baseline score remains constant over the entire
period.
Now I have a data.frame with subject ids, the baseline score with a baseline
date of 01/04/2001, a baseline score , and the date of any subsequent
observations and the scores resulting from them.
Here is a rough approximation with just three subjects, one with two
interventions one with one, and one with none.? My actual data set has about
30,000 observations most of them with one or two interventions.
case=c("a","a","a","b","b","c")
obsdate<-c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001")
score=c(60,72,85,72,79,65)
df<-as.data.frame(cbind(case,obsdate,score))
df$obsdate<-as.Date(df$obsdate,format="%d/%m/%Y")
df
Now the data set I am trying to obtain for my analysis will consist of exposure
periods for each subject, with the start and end date and the score during the
period of exposure. So each subject will have at least one exposure period
beginning on the start date and a score.? In those cases where there has been an
intervention (most of them) the next exposure period will start on the day of
intervention, and the earlier period will end the day before. If there are no
subsequent interventions between the start of one and the end of the study
period, 31/03/2011, then the last exposure period is censored at this date.
Where there is no intervention at all, (case 'c' is an example) the
exposure period is the duration of the study, from 01/04/2001 to 31/03/2011.
So for the above example my resulting data frame should look like this:
exp_case=c("a","a","a","b","b","c")
date_begin=c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001")
date_end=c("19/05/2001","07/10/2010","31/03/2011","09/11/2005","31/03/2011","31/03/2011")
exp_score=c(60,72,85,72,79,65)
expdata<-as.data.frame(cbind(exp_case,date_begin,date_end,exp_score))
expdata$date_begin<-as.Date(expdata$date_begin,format="%d/%m/%Y")
expdata$date_end<-as.Date(expdata$date_end,format="%d/%m/%Y")
expdata
Sorry about the clunky way I've handled the dates, this is the only way I
know how to do this.
All assistance gratefully received
GavinR
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Have used the suggested solution (the second method, as it seems a bit leaner) on my real data, and it works perfectly. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Create-new-records-based-on-event-dates-in-a-data-frame-tp4673839p4674082.html Sent from the R help mailing list archive at Nabble.com.