Katschke, Adrian R
2009-Jun-03 17:07 UTC
[R] Create a time interval from a single time variable
I am trying to set up a data set for a survival analysis with time-varying
covariates. The data is already in a long format, but does not have a variable
to signify the stopping point for the interval. The variable DaysEnrolled is the
variable I would like to use to form this interval. This is what I have now:
ID Age DaysEnrolled HAZ WAZ WHZ Food onARV
HIVStatus LTFUp
1 71622 0.008 0 NA NA NA NA 0
HIV exposed, status indeterminate 0
2 71622 0.085 28 NA NA NA NA 0
HIV exposed, status indeterminate 0
3 71622 0.123 42 NA NA NA NA 0
HIV exposed, status indeterminate 0
4 71622 0.277 98 NA NA NA NA 0
HIV exposed, status indeterminate 0
5 71622 0.441 158 NA NA NA NA 0
HIV exposed, status indeterminate 0
6 71622 0.517 186 NA NA NA NA 0
HIV exposed, status indeterminate 0
7 71622 0.594 214 NA NA NA NA 0
HIV exposed, status indeterminate 0
8 71622 0.715 258 NA NA NA NA 0
HIV exposed, status indeterminate 0
9 71622 0.791 286 NA NA NA NA 0
HIV exposed, status indeterminate 0
This is what I would like to have:
ID Age DaysEnrolled HAZ WAZ WHZ Food onARV
HIVStatus LTFUp Start Stop
1 71622 0.008 0 NA NA NA NA 0
HIV exposed, status indeterminate 0 0 28
2 71622 0.085 28 NA NA NA NA 0
HIV exposed, status indeterminate 0 28 42
3 71622 0.123 42 NA NA NA NA 0
HIV exposed, status indeterminate 0 42 98
4 71622 0.277 98 NA NA NA NA 0
HIV exposed, status indeterminate 0 98 158
5 71622 0.441 158 NA NA NA NA 0
HIV exposed, status indeterminate 0 158 186
6 71622 0.517 186 NA NA NA NA 0
HIV exposed, status indeterminate 0 186 214
7 71622 0.594 214 NA NA NA NA 0
HIV exposed, status indeterminate 0 214 258
8 71622 0.715 258 NA NA NA NA 0
HIV exposed, status indeterminate 0 258 286
9 71622 0.791 286 NA NA NA NA 0
HIV exposed, status indeterminate 0 286 NA
I am not sure how to put this in a function. I thought of using embed() in
tapply().
astop <- tapply(sample1$DaysEnrolled, sample1$ID, function(x){
ifelse(length(x) == 1,
embed(x,1),
ifelse(length(x) > 1,
embed(x,2), NA))})
This doesn't do what I thought it would. I know that I could write a double
loop to look at each subject and the differing number of observations for each
subject, but would like to avoid that it at all possible.
Sample of 2 subjects:
sample1 <-
structure(list(ID = c(71622L, 71622L, 71622L, 71622L, 71622L,
71622L, 71622L, 71622L, 71622L, 1436L), Age = c(0.008, 0.085,
0.123, 0.277, 0.441, 0.517, 0.594, 0.715, 0.791, 6.968), DaysEnrolled = c(0L,
28L, 42L, 98L, 158L, 186L, 214L, 258L, 286L, 0L), HAZ = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), WAZ = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
WHZ = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Food = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), onARV = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), HIVStatus = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "HIV
exposed, status indeterminate",
"HIV infected", "HIV negative"), class =
"factor"), LTFUp = c(0,
0, 0, 0, 0, 0, 0, 0, 0, NA), Start = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0), Stop = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =
c("ID",
"Age", "DaysEnrolled", "HAZ", "WAZ",
"WHZ", "Food", "onARV",
"HIVStatus", "LTFUp", "Start", "Stop"),
row.names = c(NA, 10L
), class = "data.frame")
Adrian Katschke
Biostatistician
IU Department of Medicine
Division of Biostatistics
akatschk@iupui.edu
317-278-6665
[[alternative HTML version deleted]]
Katschke, Adrian R
2009-Jun-03 18:15 UTC
[R] Create a time interval from a single time variable
I am trying to set up a data set for a survival analysis with time-varying
covariates. The data is already in a long format, but does not have a variable
to signify the stopping point for the interval. The variable DaysEnrolled is the
variable I would like to use to form this interval. This is what I have now:
ID Age DaysEnrolled HAZ WAZ WHZ Food onARV
HIVStatus LTFUp Start Stop
1 71622 0.008 0 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 0
2 71622 0.085 28 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 0
3 71622 0.123 42 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 0
4 71622 0.277 98 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 0
5 71622 0.441 158 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 0
6 71622 0.517 186 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 0
7 71622 0.594 214 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 0
8 71622 0.715 258 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 0
9 71622 0.791 286 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 0
This is what I would like to have:
ID Age DaysEnrolled HAZ WAZ WHZ Food onARV
HIVStatus LTFUp Start Stop
1 71622 0.008 0 NA NA NA NA 0 HIV exposed, status
indeterminate 0 0 28
2 71622 0.085 28 NA NA NA NA 0 HIV exposed, status
indeterminate 0 28 42
3 71622 0.123 42 NA NA NA NA 0 HIV exposed, status
indeterminate 0 42 98
4 71622 0.277 98 NA NA NA NA 0 HIV exposed, status
indeterminate 0 98 158
5 71622 0.441 158 NA NA NA NA 0 HIV exposed, status
indeterminate 0 158 186
6 71622 0.517 186 NA NA NA NA 0 HIV exposed, status
indeterminate 0 186 214
7 71622 0.594 214 NA NA NA NA 0 HIV exposed, status
indeterminate 0 214 258
8 71622 0.715 258 NA NA NA NA 0 HIV exposed, status
indeterminate 0 258 286
9 71622 0.791 286 NA NA NA NA 0 HIV exposed, status
indeterminate 0 286 NA
I am not sure how to put this in a function. I thought of using embed() in
tapply().
astop <- tapply(sample1$DaysEnrolled, sample1$ID, function(x){
??????????????????????????????????????????????????????? ifelse(length(x) == 1,
??????????????????????????????????????????????????????? embed(x,1),
ifelse(length(x) > 1,
?????????????? ?????????????????????????????????????????embed(x,2), NA))})
This doesn't do what I thought it would. I know that I could write a double
loop to look at each subject and the differing number of observations for each
subject, but would like to avoid that it at all possible.
Sample of 2 subjects:
????????? sample1 <-
structure(list(ID = c(71622L, 71622L, 71622L, 71622L, 71622L,
71622L, 71622L, 71622L, 71622L, 1436L), Age = c(0.008, 0.085,
0.123, 0.277, 0.441, 0.517, 0.594, 0.715, 0.791, 6.968), DaysEnrolled = c(0L,
28L, 42L, 98L, 158L, 186L, 214L, 258L, 286L, 0L), HAZ = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), WAZ = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
??? WHZ = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
??? NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Food = c(NA_integer_,
??? NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
??? NA_integer_, NA_integer_, NA_integer_, NA_integer_), onARV = c(0L,
??? 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), HIVStatus = structure(c(2L,
??? 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "HIV
exposed, status indeterminate",
??? "HIV infected", "HIV negative"), class =
"factor"), LTFUp = c(0,
??? 0, 0, 0, 0, 0, 0, 0, 0, NA), Start = c(0, 0, 0, 0, 0, 0,
??? 0, 0, 0, 0), Stop = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =
c("ID",
"Age", "DaysEnrolled", "HAZ", "WAZ",
"WHZ", "Food", "onARV",
"HIVStatus", "LTFUp", "Start", "Stop"),
row.names = c(NA, 10L
), class = "data.frame")
Adrian Katschke
Biostatistician
IU Department of Medicine
Division of Biostatistics
akatschk at iupui.edu
317-278-6665
Terry Therneau
2009-Jun-04 13:13 UTC
[R] Create a time interval from a single time variable
-- begin included message -- I am trying to set up a data set for a survival analysis with time-varying covariates. The data is already in a long format, but does not have a variable to signify the stopping point for the interval. The variable DaysEnrolled is the variable I would like to use to form this interval. This is what I have now: ... ---- end inclusion I would have expected a dozen solutions from the list - data manipulation problems usually get a large following. It can be done in 4 lines, assuming that the parent data set is sorted by subject and time within subject. newdata$start <- olddata$DaysEnrolled #start time = the current variable temp <- olddata$DaysEnrolled[-1] # shift column up by one position temp[diff(olddata$id) !=0] <- NA # NA for last line of each subject newdata$stop <- c(temp, NA) # add the NA for the last subject I will leave it to others to compress this into a 1-line application of one of the apply functions. (Unreadable perhaps, but definitely more elegant :-) Terry T.
Apparently Analagous Threads
- Fwd: WHO Anthro growth curve macros and R&In-Reply-To=<CAAOCNNZawGtKkWpgFMYADSyxWGTeWEDxqVVHv7=Azo=1G+H9gg@mail.gmail.com>
- Asterisk perms in manager.conf
- Using rsync for two-waz-synchronisation
- loop searching the id corresponding to the given index (timestamp)
- PrettyR (describe)