This seems like a job for cut() . (I made DT a data frame to avoid loading the data table package. But I assume it would work with a data table too, Check this, though!)> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))> DTid fini group exposure 1 2 2005-04-20 A 1 2 2 2005-04-20 A 1 3 2 2005-04-20 A 1 4 5 2006-02-19 B 0.87 5 5 2006-02-19 B 0.87 6 7 2006-10-08 A 0.5 7 7 2006-10-08 A 0.5 (but note that exposure is a factor, not numeric) Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istazahn at gmail.com> wrote:> Hi Frank, > > lapply(DT) iterates over each column. That doesn't seem to be what you want. > > There are probably better ways, but here is one approach. > > DT[, exposure := vector(mode = "numeric", length = .N)] > DT[fini < as.Date("2006-01-01"), exposure := 1] > DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), > exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25] > DT[fini >= as.Date("2006-07-01"), exposure := 0.5] > > Best, > Ista > > On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_rod at hotmail.com> wrote: >> Dear all, >> >> I have a R data table like this: >> >> DT <- data.table( >> id = rep(c(2, 5, 7), c(3, 2, 2)), >> fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)), >> group = rep(c("A", "B", "A"), c(3, 2, 2)) ) >> >> >> I want to construct a new variable "exposure" defined as follows: >> >> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1 >> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini" >> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5 >> >> >> So the desired output would be the following data table: >> >> id fini exposure group >> 1: 2 2005-04-20 1.00 A >> 2: 2 2005-04-20 1.00 A >> 3: 2 2005-04-20 1.00 A >> 4: 5 2006-02-19 0.87 B >> 5: 5 2006-02-19 0.87 B >> 6: 7 2006-10-08 0.50 A >> 7: 7 2006-10-08 0.50 A >> >> >> I have tried: >> >> DT <- DT[ , list(id, fini, exposure = 0, group)] >> DT.new <- lapply(DT, function(exposure){ >> exposure[fini < as.Date("2006-01-01")] <- 1 # 1st case >> exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case >> exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5 # 3rd case >> exposure # return value >> }) >> >> >> But I get an error message. >> >> Thanks for any help!! >> >> >> Frank S. >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> This seems like a job for cut() .I thought that at first two, but the middle group shouldn't be .87 but rather exposure" = "2007-01-01" - "fini" so, I think cut alone won't do it. Best, Ista> > (I made DT a data frame to avoid loading the data table package. But I > assume it would work with a data table too, Check this, though!) > >> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5))) > >> DT > id fini group exposure > 1 2 2005-04-20 A 1 > 2 2 2005-04-20 A 1 > 3 2 2005-04-20 A 1 > 4 5 2006-02-19 B 0.87 > 5 5 2006-02-19 B 0.87 > 6 7 2006-10-08 A 0.5 > 7 7 2006-10-08 A 0.5 > > > (but note that exposure is a factor, not numeric) > > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istazahn at gmail.com> wrote: >> Hi Frank, >> >> lapply(DT) iterates over each column. That doesn't seem to be what you want. >> >> There are probably better ways, but here is one approach. >> >> DT[, exposure := vector(mode = "numeric", length = .N)] >> DT[fini < as.Date("2006-01-01"), exposure := 1] >> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), >> exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25] >> DT[fini >= as.Date("2006-07-01"), exposure := 0.5] >> >> Best, >> Ista >> >> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_rod at hotmail.com> wrote: >>> Dear all, >>> >>> I have a R data table like this: >>> >>> DT <- data.table( >>> id = rep(c(2, 5, 7), c(3, 2, 2)), >>> fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)), >>> group = rep(c("A", "B", "A"), c(3, 2, 2)) ) >>> >>> >>> I want to construct a new variable "exposure" defined as follows: >>> >>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1 >>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini" >>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5 >>> >>> >>> So the desired output would be the following data table: >>> >>> id fini exposure group >>> 1: 2 2005-04-20 1.00 A >>> 2: 2 2005-04-20 1.00 A >>> 3: 2 2005-04-20 1.00 A >>> 4: 5 2006-02-19 0.87 B >>> 5: 5 2006-02-19 0.87 B >>> 6: 7 2006-10-08 0.50 A >>> 7: 7 2006-10-08 0.50 A >>> >>> >>> I have tried: >>> >>> DT <- DT[ , list(id, fini, exposure = 0, group)] >>> DT.new <- lapply(DT, function(exposure){ >>> exposure[fini < as.Date("2006-01-01")] <- 1 # 1st case >>> exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case >>> exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5 # 3rd case >>> exposure # return value >>> }) >>> >>> >>> But I get an error message. >>> >>> Thanks for any help!! >>> >>> >>> Frank S. >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
I thought that that was a typo from the OP, as it disagrees with his example. But the labels are arbitrary, so in fact cut() will do it whichever way he meant. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <istazahn at gmail.com> wrote:> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> This seems like a job for cut() . > > I thought that at first two, but the middle group shouldn't be .87 but rather > > exposure" = "2007-01-01" - "fini" > > so, I think cut alone won't do it. > > Best, > Ista >> >> (I made DT a data frame to avoid loading the data table package. But I >> assume it would work with a data table too, Check this, though!) >> >>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5))) >> >>> DT >> id fini group exposure >> 1 2 2005-04-20 A 1 >> 2 2 2005-04-20 A 1 >> 3 2 2005-04-20 A 1 >> 4 5 2006-02-19 B 0.87 >> 5 5 2006-02-19 B 0.87 >> 6 7 2006-10-08 A 0.5 >> 7 7 2006-10-08 A 0.5 >> >> >> (but note that exposure is a factor, not numeric) >> >> >> Cheers, >> Bert >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istazahn at gmail.com> wrote: >>> Hi Frank, >>> >>> lapply(DT) iterates over each column. That doesn't seem to be what you want. >>> >>> There are probably better ways, but here is one approach. >>> >>> DT[, exposure := vector(mode = "numeric", length = .N)] >>> DT[fini < as.Date("2006-01-01"), exposure := 1] >>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), >>> exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25] >>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5] >>> >>> Best, >>> Ista >>> >>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_rod at hotmail.com> wrote: >>>> Dear all, >>>> >>>> I have a R data table like this: >>>> >>>> DT <- data.table( >>>> id = rep(c(2, 5, 7), c(3, 2, 2)), >>>> fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)), >>>> group = rep(c("A", "B", "A"), c(3, 2, 2)) ) >>>> >>>> >>>> I want to construct a new variable "exposure" defined as follows: >>>> >>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1 >>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini" >>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5 >>>> >>>> >>>> So the desired output would be the following data table: >>>> >>>> id fini exposure group >>>> 1: 2 2005-04-20 1.00 A >>>> 2: 2 2005-04-20 1.00 A >>>> 3: 2 2005-04-20 1.00 A >>>> 4: 5 2006-02-19 0.87 B >>>> 5: 5 2006-02-19 0.87 B >>>> 6: 7 2006-10-08 0.50 A >>>> 7: 7 2006-10-08 0.50 A >>>> >>>> >>>> I have tried: >>>> >>>> DT <- DT[ , list(id, fini, exposure = 0, group)] >>>> DT.new <- lapply(DT, function(exposure){ >>>> exposure[fini < as.Date("2006-01-01")] <- 1 # 1st case >>>> exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case >>>> exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5 # 3rd case >>>> exposure # return value >>>> }) >>>> >>>> >>>> But I get an error message. >>>> >>>> Thanks for any help!! >>>> >>>> >>>> Frank S. >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code.