I thought that that was a typo from the OP, as it disagrees with his example. But the labels are arbitrary, so in fact cut() will do it whichever way he meant. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <istazahn at gmail.com> wrote:> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> This seems like a job for cut() . > > I thought that at first two, but the middle group shouldn't be .87 but rather > > exposure" = "2007-01-01" - "fini" > > so, I think cut alone won't do it. > > Best, > Ista >> >> (I made DT a data frame to avoid loading the data table package. But I >> assume it would work with a data table too, Check this, though!) >> >>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5))) >> >>> DT >> id fini group exposure >> 1 2 2005-04-20 A 1 >> 2 2 2005-04-20 A 1 >> 3 2 2005-04-20 A 1 >> 4 5 2006-02-19 B 0.87 >> 5 5 2006-02-19 B 0.87 >> 6 7 2006-10-08 A 0.5 >> 7 7 2006-10-08 A 0.5 >> >> >> (but note that exposure is a factor, not numeric) >> >> >> Cheers, >> Bert >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istazahn at gmail.com> wrote: >>> Hi Frank, >>> >>> lapply(DT) iterates over each column. That doesn't seem to be what you want. >>> >>> There are probably better ways, but here is one approach. >>> >>> DT[, exposure := vector(mode = "numeric", length = .N)] >>> DT[fini < as.Date("2006-01-01"), exposure := 1] >>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), >>> exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25] >>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5] >>> >>> Best, >>> Ista >>> >>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_rod at hotmail.com> wrote: >>>> Dear all, >>>> >>>> I have a R data table like this: >>>> >>>> DT <- data.table( >>>> id = rep(c(2, 5, 7), c(3, 2, 2)), >>>> fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)), >>>> group = rep(c("A", "B", "A"), c(3, 2, 2)) ) >>>> >>>> >>>> I want to construct a new variable "exposure" defined as follows: >>>> >>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1 >>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini" >>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5 >>>> >>>> >>>> So the desired output would be the following data table: >>>> >>>> id fini exposure group >>>> 1: 2 2005-04-20 1.00 A >>>> 2: 2 2005-04-20 1.00 A >>>> 3: 2 2005-04-20 1.00 A >>>> 4: 5 2006-02-19 0.87 B >>>> 5: 5 2006-02-19 0.87 B >>>> 6: 7 2006-10-08 0.50 A >>>> 7: 7 2006-10-08 0.50 A >>>> >>>> >>>> I have tried: >>>> >>>> DT <- DT[ , list(id, fini, exposure = 0, group)] >>>> DT.new <- lapply(DT, function(exposure){ >>>> exposure[fini < as.Date("2006-01-01")] <- 1 # 1st case >>>> exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case >>>> exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5 # 3rd case >>>> exposure # return value >>>> }) >>>> >>>> >>>> But I get an error message. >>>> >>>> Thanks for any help!! >>>> >>>> >>>> Frank S. >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code.
On Mon, Sep 26, 2016 at 2:48 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> I thought that that was a typo from the OP, as it disagrees with his > example. But the labels are arbitrary, so in fact cut() will do it > whichever way he meant.I don't see how cut will do it, at least not conveniently. Consider this slightly altered example: library(data.table) DT <- data.table( id = rep(c(2, 5, 7), c(3, 2, 2)), fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-06-29', '2006-10-08')), c(3, 1, 1, 2)), group = rep(c("A", "B", "A"), c(3, 2, 2)) ) DT[, exposure := vector(mode = "numeric", length = .N)] DT[fini < as.Date("2006-01-01"), exposure := 1] DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25] DT[fini >= as.Date("2006-07-01"), exposure := 0.5] DT ## id fini group exposure ## 1: 2 2005-04-20 A 1.0000000 ## 2: 2 2005-04-20 A 1.0000000 ## 3: 2 2005-04-20 A 1.0000000 ## 4: 5 2006-02-19 B 0.8651608 ## 5: 5 2006-06-29 B 0.5092402 ## 6: 7 2006-10-08 A 0.5000000 ## 7: 7 2006-10-08 A 0.5000000 Best, Ista> > -- Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <istazahn at gmail.com> wrote: >> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >>> This seems like a job for cut() . >> >> I thought that at first two, but the middle group shouldn't be .87 but rather >> >> exposure" = "2007-01-01" - "fini" >> >> so, I think cut alone won't do it. >> >> Best, >> Ista >>> >>> (I made DT a data frame to avoid loading the data table package. But I >>> assume it would work with a data table too, Check this, though!) >>> >>>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5))) >>> >>>> DT >>> id fini group exposure >>> 1 2 2005-04-20 A 1 >>> 2 2 2005-04-20 A 1 >>> 3 2 2005-04-20 A 1 >>> 4 5 2006-02-19 B 0.87 >>> 5 5 2006-02-19 B 0.87 >>> 6 7 2006-10-08 A 0.5 >>> 7 7 2006-10-08 A 0.5 >>> >>> >>> (but note that exposure is a factor, not numeric) >>> >>> >>> Cheers, >>> Bert >>> >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istazahn at gmail.com> wrote: >>>> Hi Frank, >>>> >>>> lapply(DT) iterates over each column. That doesn't seem to be what you want. >>>> >>>> There are probably better ways, but here is one approach. >>>> >>>> DT[, exposure := vector(mode = "numeric", length = .N)] >>>> DT[fini < as.Date("2006-01-01"), exposure := 1] >>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), >>>> exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25] >>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5] >>>> >>>> Best, >>>> Ista >>>> >>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_rod at hotmail.com> wrote: >>>>> Dear all, >>>>> >>>>> I have a R data table like this: >>>>> >>>>> DT <- data.table( >>>>> id = rep(c(2, 5, 7), c(3, 2, 2)), >>>>> fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)), >>>>> group = rep(c("A", "B", "A"), c(3, 2, 2)) ) >>>>> >>>>> >>>>> I want to construct a new variable "exposure" defined as follows: >>>>> >>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1 >>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini" >>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5 >>>>> >>>>> >>>>> So the desired output would be the following data table: >>>>> >>>>> id fini exposure group >>>>> 1: 2 2005-04-20 1.00 A >>>>> 2: 2 2005-04-20 1.00 A >>>>> 3: 2 2005-04-20 1.00 A >>>>> 4: 5 2006-02-19 0.87 B >>>>> 5: 5 2006-02-19 0.87 B >>>>> 6: 7 2006-10-08 0.50 A >>>>> 7: 7 2006-10-08 0.50 A >>>>> >>>>> >>>>> I have tried: >>>>> >>>>> DT <- DT[ , list(id, fini, exposure = 0, group)] >>>>> DT.new <- lapply(DT, function(exposure){ >>>>> exposure[fini < as.Date("2006-01-01")] <- 1 # 1st case >>>>> exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case >>>>> exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5 # 3rd case >>>>> exposure # return value >>>>> }) >>>>> >>>>> >>>>> But I get an error message. >>>>> >>>>> Thanks for any help!! >>>>> >>>>> >>>>> Frank S. >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code.
Ista: Aha -- now I see the point. My bad. You are right. I was careless. However, cut() with ifelse() might simplify the code a bit and/or make it more readable. To be clear, this is just a matter of taste; e.g. using your data and a data frame instead of a data table:> DT <- within(DT,exposure <- { f <-cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= letters[1:3]) ifelse(f == "a", 1, ifelse( f == "c", .5, difftime(as.Date("2007-01-01"), fini, units="days")/365.25)) } )> DTid fini group exposure f 1 2 2005-04-20 A 1.0000000 a 2 2 2005-04-20 A 1.0000000 a 3 2 2005-04-20 A 1.0000000 a 4 5 2006-02-19 B 0.8651608 b 5 5 2006-06-29 B 0.5092402 b 6 7 2006-10-08 A 0.5000000 c 7 7 2006-10-08 A 0.5000000 c Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 26, 2016 at 12:07 PM, Ista Zahn <istazahn at gmail.com> wrote:> On Mon, Sep 26, 2016 at 2:48 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> I thought that that was a typo from the OP, as it disagrees with his >> example. But the labels are arbitrary, so in fact cut() will do it >> whichever way he meant. > > I don't see how cut will do it, at least not conveniently. Consider > this slightly altered example: > > library(data.table) > DT <- data.table( > id = rep(c(2, 5, 7), c(3, 2, 2)), > fini = rep(as.Date(c('2005-04-20', > '2006-02-19', > '2006-06-29', > '2006-10-08')), > c(3, 1, 1, 2)), > group = rep(c("A", "B", "A"), c(3, 2, 2)) ) > > DT[, exposure := vector(mode = "numeric", length = .N)] > DT[fini < as.Date("2006-01-01"), exposure := 1] > DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), > exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25] > DT[fini >= as.Date("2006-07-01"), exposure := 0.5] > > DT > > ## id fini group exposure > ## 1: 2 2005-04-20 A 1.0000000 > ## 2: 2 2005-04-20 A 1.0000000 > ## 3: 2 2005-04-20 A 1.0000000 > ## 4: 5 2006-02-19 B 0.8651608 > ## 5: 5 2006-06-29 B 0.5092402 > ## 6: 7 2006-10-08 A 0.5000000 > ## 7: 7 2006-10-08 A 0.5000000 > > Best, > Ista > >> >> -- Bert >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <istazahn at gmail.com> wrote: >>> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >>>> This seems like a job for cut() . >>> >>> I thought that at first two, but the middle group shouldn't be .87 but rather >>> >>> exposure" = "2007-01-01" - "fini" >>> >>> so, I think cut alone won't do it. >>> >>> Best, >>> Ista >>>> >>>> (I made DT a data frame to avoid loading the data table package. But I >>>> assume it would work with a data table too, Check this, though!) >>>> >>>>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5))) >>>> >>>>> DT >>>> id fini group exposure >>>> 1 2 2005-04-20 A 1 >>>> 2 2 2005-04-20 A 1 >>>> 3 2 2005-04-20 A 1 >>>> 4 5 2006-02-19 B 0.87 >>>> 5 5 2006-02-19 B 0.87 >>>> 6 7 2006-10-08 A 0.5 >>>> 7 7 2006-10-08 A 0.5 >>>> >>>> >>>> (but note that exposure is a factor, not numeric) >>>> >>>> >>>> Cheers, >>>> Bert >>>> >>>> Bert Gunter >>>> >>>> "The trouble with having an open mind is that people keep coming along >>>> and sticking things into it." >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>> >>>> >>>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istazahn at gmail.com> wrote: >>>>> Hi Frank, >>>>> >>>>> lapply(DT) iterates over each column. That doesn't seem to be what you want. >>>>> >>>>> There are probably better ways, but here is one approach. >>>>> >>>>> DT[, exposure := vector(mode = "numeric", length = .N)] >>>>> DT[fini < as.Date("2006-01-01"), exposure := 1] >>>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), >>>>> exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25] >>>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5] >>>>> >>>>> Best, >>>>> Ista >>>>> >>>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_rod at hotmail.com> wrote: >>>>>> Dear all, >>>>>> >>>>>> I have a R data table like this: >>>>>> >>>>>> DT <- data.table( >>>>>> id = rep(c(2, 5, 7), c(3, 2, 2)), >>>>>> fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)), >>>>>> group = rep(c("A", "B", "A"), c(3, 2, 2)) ) >>>>>> >>>>>> >>>>>> I want to construct a new variable "exposure" defined as follows: >>>>>> >>>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1 >>>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini" >>>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5 >>>>>> >>>>>> >>>>>> So the desired output would be the following data table: >>>>>> >>>>>> id fini exposure group >>>>>> 1: 2 2005-04-20 1.00 A >>>>>> 2: 2 2005-04-20 1.00 A >>>>>> 3: 2 2005-04-20 1.00 A >>>>>> 4: 5 2006-02-19 0.87 B >>>>>> 5: 5 2006-02-19 0.87 B >>>>>> 6: 7 2006-10-08 0.50 A >>>>>> 7: 7 2006-10-08 0.50 A >>>>>> >>>>>> >>>>>> I have tried: >>>>>> >>>>>> DT <- DT[ , list(id, fini, exposure = 0, group)] >>>>>> DT.new <- lapply(DT, function(exposure){ >>>>>> exposure[fini < as.Date("2006-01-01")] <- 1 # 1st case >>>>>> exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case >>>>>> exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5 # 3rd case >>>>>> exposure # return value >>>>>> }) >>>>>> >>>>>> >>>>>> But I get an error message. >>>>>> >>>>>> Thanks for any help!! >>>>>> >>>>>> >>>>>> Frank S. >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code.