thr3ads.net - R help - [R] Create a time interval from a single time variable [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Katschke, Adrian R

2009-Jun-03 17:07 UTC

[R] Create a time interval from a single time variable

I am trying to set up a data set for a survival analysis with time-varying
covariates. The data is already in a long format, but does not have a variable
to signify the stopping point for the interval. The variable DaysEnrolled is the
variable I would like to use to form this interval. This is what I have now:

         ID   Age       DaysEnrolled HAZ WAZ WHZ Food onARV                     
HIVStatus                         LTFUp
1 71622 0.008            0                  NA     NA   NA     NA     0         
HIV exposed, status indeterminate     0
2 71622 0.085           28                 NA    NA    NA     NA     0          
HIV exposed, status indeterminate     0
3 71622 0.123           42                 NA    NA    NA     NA     0          
HIV exposed, status indeterminate     0
4 71622 0.277           98                 NA    NA    NA     NA     0          
HIV exposed, status indeterminate     0
5 71622 0.441          158               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0
6 71622 0.517          186               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0
7 71622 0.594          214               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0
8 71622 0.715          258               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0
9 71622 0.791          286               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0

This is what I would like to have:

         ID   Age       DaysEnrolled HAZ WAZ WHZ Food onARV                     
HIVStatus                         LTFUp Start      Stop
1 71622 0.008            0                  NA     NA   NA     NA     0         
HIV exposed, status indeterminate     0           0             28
2 71622 0.085           28                 NA    NA    NA     NA     0          
HIV exposed, status indeterminate     0          28           42
3 71622 0.123           42                 NA    NA    NA     NA     0          
HIV exposed, status indeterminate     0          42           98
4 71622 0.277           98                 NA    NA    NA     NA     0          
HIV exposed, status indeterminate     0          98           158
5 71622 0.441          158               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0         158         186
6 71622 0.517          186               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0         186        214
7 71622 0.594          214               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0         214        258
8 71622 0.715          258               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0         258         286
9 71622 0.791          286               NA     NA    NA     NA     0           
HIV exposed, status indeterminate     0         286          NA

I am not sure how to put this in a function. I thought of using embed() in
tapply().

astop <- tapply(sample1$DaysEnrolled, sample1$ID, function(x){
                                                        ifelse(length(x) == 1,
                                                        embed(x,1),
ifelse(length(x) > 1,
                                                        embed(x,2), NA))})

This doesn't do what I thought it would. I know that I could write a double
loop to look at each subject and the differing number of observations for each
subject, but would like to avoid that it at all possible.


Sample of 2 subjects:
          sample1 <-
structure(list(ID = c(71622L, 71622L, 71622L, 71622L, 71622L,
71622L, 71622L, 71622L, 71622L, 1436L), Age = c(0.008, 0.085,
0.123, 0.277, 0.441, 0.517, 0.594, 0.715, 0.791, 6.968), DaysEnrolled = c(0L,
28L, 42L, 98L, 158L, 186L, 214L, 258L, 286L, 0L), HAZ = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), WAZ = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
    WHZ = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Food = c(NA_integer_,
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), onARV = c(0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), HIVStatus = structure(c(2L,
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "HIV
exposed, status indeterminate",
    "HIV infected", "HIV negative"), class =
"factor"), LTFUp = c(0,
    0, 0, 0, 0, 0, 0, 0, 0, NA), Start = c(0, 0, 0, 0, 0, 0,
    0, 0, 0, 0), Stop = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =
c("ID",
"Age", "DaysEnrolled", "HAZ", "WAZ",
"WHZ", "Food", "onARV",
"HIVStatus", "LTFUp", "Start", "Stop"),
row.names = c(NA, 10L
), class = "data.frame")


Adrian Katschke
Biostatistician
IU Department of Medicine
Division of Biostatistics
akatschk@iupui.edu
317-278-6665


	[[alternative HTML version deleted]]

Katschke, Adrian R

2009-Jun-03 18:15 UTC

head link

[R] Create a time interval from a single time variable

I am trying to set up a data set for a survival analysis with time-varying
covariates. The data is already in a long format, but does not have a variable
to signify the stopping point for the interval. The variable DaysEnrolled is the
variable I would like to use to form this interval. This is what I have now:

     ID   Age DaysEnrolled HAZ WAZ WHZ Food onARV                        
HIVStatus LTFUp Start Stop
1 71622 0.008            0  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    0
2 71622 0.085           28  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    0
3 71622 0.123           42  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    0
4 71622 0.277           98  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    0
5 71622 0.441          158  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    0
6 71622 0.517          186  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    0
7 71622 0.594          214  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    0
8 71622 0.715          258  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    0
9 71622 0.791          286  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    0

This is what I would like to have:

     ID   Age DaysEnrolled HAZ WAZ WHZ Food onARV                        
HIVStatus LTFUp Start Stop
1 71622 0.008            0  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0     0    28
2 71622 0.085           28  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0    28    42
3 71622 0.123           42  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0    42    98
4 71622 0.277           98  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0    98    158
5 71622 0.441          158  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0    158   186
6 71622 0.517          186  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0    186   214
7 71622 0.594          214  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0    214   258
8 71622 0.715          258  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0    258   286
9 71622 0.791          286  NA  NA  NA   NA     0 HIV exposed, status
indeterminate     0    286    NA

I am not sure how to put this in a function. I thought of using embed() in
tapply().

astop <- tapply(sample1$DaysEnrolled, sample1$ID, function(x){
??????????????????????????????????????????????????????? ifelse(length(x) == 1,
??????????????????????????????????????????????????????? embed(x,1),
ifelse(length(x) > 1,
?????????????? ?????????????????????????????????????????embed(x,2), NA))})

This doesn't do what I thought it would. I know that I could write a double
loop to look at each subject and the differing number of observations for each
subject, but would like to avoid that it at all possible.


Sample of 2 subjects:
????????? sample1 <-
structure(list(ID = c(71622L, 71622L, 71622L, 71622L, 71622L, 
71622L, 71622L, 71622L, 71622L, 1436L), Age = c(0.008, 0.085, 
0.123, 0.277, 0.441, 0.517, 0.594, 0.715, 0.791, 6.968), DaysEnrolled = c(0L, 
28L, 42L, 98L, 158L, 186L, 214L, 258L, 286L, 0L), HAZ = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), WAZ = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
??? WHZ = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
??? NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Food = c(NA_integer_, 
??? NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
??? NA_integer_, NA_integer_, NA_integer_, NA_integer_), onARV = c(0L, 
??? 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), HIVStatus = structure(c(2L, 
??? 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "HIV
exposed, status indeterminate",
??? "HIV infected", "HIV negative"), class =
"factor"), LTFUp = c(0,
??? 0, 0, 0, 0, 0, 0, 0, 0, NA), Start = c(0, 0, 0, 0, 0, 0, 
??? 0, 0, 0, 0), Stop = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =
c("ID",
"Age", "DaysEnrolled", "HAZ", "WAZ",
"WHZ", "Food", "onARV",
"HIVStatus", "LTFUp", "Start", "Stop"),
row.names = c(NA, 10L
), class = "data.frame")


Adrian Katschke
Biostatistician
IU Department of Medicine
Division of Biostatistics
akatschk at iupui.edu
317-278-6665

Terry Therneau

2009-Jun-04 13:13 UTC

head link

[R] Create a time interval from a single time variable

-- begin included message --
I am trying to set up a data set for a survival analysis with time-varying  
covariates. The data is already in a long format, but does not have a variable 
to signify the stopping point for the interval. The variable DaysEnrolled is the
variable I would like to use to form this interval. This is what I have now:

...
---- end inclusion

 I would have expected a dozen solutions from the list - data manipulation 
problems usually get a large following.  It can be done in 4 lines, assuming 
that the parent data set is sorted by subject and time within subject.
 
newdata$start <- olddata$DaysEnrolled   #start time = the current variable
temp <- olddata$DaysEnrolled[-1]        # shift column up by one position
temp[diff(olddata$id) !=0] <- NA        # NA for last line of each subject
newdata$stop <- c(temp, NA)             # add the NA for the last subject

   I will leave it to others to compress this into a 1-line application of one 
of the apply functions.  (Unreadable perhaps, but definitely more elegant :-)
   
   	Terry T.

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Jun 2009 - Create a time interval from a single time variable

[R] Create a time interval from a single time variable

[R] Create a time interval from a single time variable

[R] Create a time interval from a single time variable

Apparently Analagous Threads