Thank you Jeff and All, Within a given time period (say 700 days, from the start day), I am expecting measurements taken at each time interval;. In this case "0" means measurement taken, "1" not taken (stopped or opted out and " -1" don't consider that time period for that individual. This will be compared with the actual measurements taken (Observed- expected) within each time interval. On Sat, Jun 3, 2017 at 9:50 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> # read.table is NOT part of the data.table package > #library(data.table) > DFM <- read.table( text> 'obs start end > 1 2/1/2015 1/1/2017 > 2 4/11/2010 1/1/2011 > 3 1/4/2006 5/3/2007 > 4 10/1/2007 1/1/2008 > 5 6/1/2011 1/1/2012 > 6 10/5/2004 12/1/2004 > ',header = TRUE, stringsAsFactors = FALSE) > # cleaner way to compute D > DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" ) > DFM$end <- as.Date( DFM$end, format="%m/%d/%Y" ) > DFM$D <- as.numeric( DFM$end - DFM$start, units="days" ) > # categorize your data into groups > DFM$bin <- cut( DFM$D > , breaks=c( seq( 0, 500, 100 ), Inf ) > , right=FALSE # do not include the right edge > , ordered_result = TRUE > ) > # brute force method you should have been able to figure out to show us > some work > DFM$t1 <- ifelse( DFM$D < 100, 1, 0 ) > DFM$t2 <- ifelse( 100 <= DFM$D & DFM$D < 200, 1, ifelse( DFM$D < 100, -1, > 0 ) ) > DFM$t3 <- ifelse( 200 <= DFM$D & DFM$D < 300, 1, ifelse( DFM$D < 200, -1, > 0 ) ) > DFM$t4 <- ifelse( 300 <= DFM$D & DFM$D < 400, 1, ifelse( DFM$D < 300, -1, > 0 ) ) > DFM$t5 <- ifelse( 400 <= DFM$D & DFM$D < 500, 1, ifelse( DFM$D < 400, -1, > 0 ) ) > # brute force method with ordered factor > DFM$tf1 <- ifelse( "[0,100)" == DFM$bin, 1, 0 ) > DFM$tf2 <- ifelse( "[100,200)" == DFM$bin, 1, ifelse( "[100,200)" < > DFM$bin, 0, -1 ) ) > DFM$tf3 <- ifelse( "[200,300)" == DFM$bin, 1, ifelse( "[200,300)" < > DFM$bin, 0, -1 ) ) > DFM$tf4 <- ifelse( "[300,400)" == DFM$bin, 1, ifelse( "[300,400)" < > DFM$bin, 0, -1 ) ) > DFM$tf5 <- ifelse( "[400,500)" == DFM$bin, 1, ifelse( "[400,500)" < > DFM$bin, 0, -1 ) ) > # less obvious approach using the fact that factors are integers > # and using the outer function to find all combinations of elements of two > vectors > # and the sign function > DFM[ , paste0( "tm", 1:5 )] <- outer( as.integer( DFM$bin ) > , 1:5 > , FUN = function(x,y) { > z <- sign(y-x)+1L > ifelse( 2 == z, -1L, z ) > } > ) > > # my result, provided using dput for precise representation > DFMresult <- structure(list(obs = 1:6, start = structure(c(16467, 14710, > 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167, > 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, > 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, > 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)", > "[400,500)", "[500,Inf)"), class = c("ordered", "factor")), t1 = c(0, > 0, 0, 1, 0, 1), t2 = c(0, 0, 0, -1, 0, -1), t3 = c(0, 1, 0, -1, > 1, -1), t4 = c(0, -1, 0, -1, -1, -1), t5 = c(0, -1, 1, -1, -1, > -1), tf1 = c(0, 0, 0, 1, 0, 1), tf2 = c(0, 0, 0, -1, 0, -1), > tf3 = c(0, 1, 0, -1, 1, -1), tf4 = c(0, -1, 0, -1, -1, -1 > ), tf5 = c(0, -1, 1, -1, -1, -1), tm1 = c(0, 0, 0, 1, 0, > 1), tm2 = c(0, 0, 0, -1, 0, -1), tm3 = c(0, 1, 0, -1, 1, > -1), tm4 = c(0, -1, 0, -1, -1, -1), tm5 = c(0, -1, 1, -1, > -1, -1)), row.names = c(NA, -6L), .Names = c("obs", "start", > "end", "D", "bin", "t1", "t2", "t3", "t4", "t5", "tf1", "tf2", > "tf3", "tf4", "tf5", "tm1", "tm2", "tm3", "tm4", "tm5"), class > "data.frame") > > You did not address Bert's request for some context, but I am curious how > he or Peter would have approached this problem, so I encourage you do > provide some insight on the list as to why you are doing this. > > > On Sat, 3 Jun 2017, Val wrote: > > Thank you all for the useful suggestion. I did some of my homework. >> >> library(data.table) >> DFM <- read.table(header=TRUE, text='obs start end >> 1 2/1/2015 1/1/2017 >> 2 4/11/2010 1/1/2011 >> 3 1/4/2006 5/3/2007 >> 4 10/1/2007 1/1/2008 >> 5 6/1/2011 1/1/2012 >> 6 10/5/2004 12/1/2004',stringsAsFactors = FALSE) >> DFM >> >> DFM$D =as.numeric(difftime(as.Date(DFM$end,format="%m/%d/%Y"), >> as.Date(DFM$start,format="%m/%d/%Y"), units = "days")) >> DFM >> >> output. >> obs start end D >> 1 1 2/1/2015 1/1/2017 700 >> 2 2 4/11/2010 1/1/2011 265 >> 3 3 1/4/2006 5/3/2007 484 >> 4 4 10/1/2007 1/1/2008 92 >> 5 5 6/1/2011 1/1/2012 214 >> 6 6 10/5/2004 12/1/2004 57 >> >> My problem is how do I get the other new variables >> >> obs start end D t1,t2,t3,t4, t5 >> 1, 2/1/2015, 1/1/2017, 700,0,0,0,0,0 >> 2, 4/11/2010, 1/1/2011, 265,0,0,1,-1,-1 >> 3, 1/4/2006, 5/3/2007, 484,0,0,0,0,1 >> 4, 10/1/2007, 1/1/2008, 92,1,-1,-1,-1,-1 >> 5, 6/1/2011, 1/1/2012, 214,0,0,1,-1,-1 >> 6, 10/15/2004,12/1/2004,47,1,-1,-1,-1,-1 >> >> Thank you again. >> >> >> >> On Sat, Jun 3, 2017 at 12:13 AM, Bert Gunter <bgunter.4567 at gmail.com> >> wrote: >> >>> Ii is difficult to provide useful help, because you have failed to >>> read and follow the posting guide. In particular: >>> >>> 1. Plain text, not HTML. >>> 2. Use dput() or provide code to create your example. Text printouts >>> such as that which you gave require some work to wrangle into into an >>> example that we can test. >>> >>> Specifically: >>> >>> 3. Have you gone through any R tutorials?-- it sure doesn't look like >>> it. We do expect some effort to learn R before posting. >>> >>> 4. What is the format of your date columns? character, factors, >>> POSIX,...? See ?date-time for details. Note particularly the >>> "difftime" link to obtain intervals. >>> >>> 5. ?ifelse for vectorized conditionals. >>> >>> Also, you might want to explain the context of what you are trying to >>> do. I strongly suspect you shouldn't be doing it at all, but that is >>> just a guess. >>> >>> Be sure to cc your reply to the list, not just to me. >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Fri, Jun 2, 2017 at 8:49 PM, Val <valkremk at gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I have a data set with time interval and depending on the interval I >>>> want >>>> to create 5 more variables . Sample data below >>>> >>>> obs, Start, End >>>> 1,2/1/2015, 1/1/2017 >>>> 2,4/11/2010, 1/1/2011 >>>> 3,1/4/2006, 5/3/2007 >>>> 4,10/1/2007, 1/1/2008 >>>> 5,6/1/2011, 1/1/2012 >>>> 6,10/15/2004,12/1/2004 >>>> >>>> First, I want get interval between the start date and end dates >>>> (End-start). >>>> >>>> obs, Start , end, datediff >>>> 1,2/1/2015, 1/1/2017, 700 >>>> 2,4/11/2010, 1/1/2011, 265 >>>> 3,1/4/2006, 5/3/2007, 484 >>>> 4,10/1/2007, 1/1/2008, 92 >>>> 5,6/1/2011, 1/1/2012, 214 >>>> 6,10/15/2004,12/1/2004,47 >>>> >>>> Second. I want create 5 more variables t1, t2, t3, t4 and t5 >>>> The value of each variable is defined as follows >>>> if datediff < 100 then t1=1, t2=t3=t4=t5=-1. >>>> if datediff >= 100 and < 200 then t1=0, t2=1,t3=t4=t5=-1, >>>> if datediff >= 200 and < 300 then t1=0, t2=0,t3=1,t4=t5=-1, >>>> if datediff >= 300 and < 400 then t1=0, t2=0,t3=0,t4=1,t5=-1, >>>> if datediff >= 400 and < 500 then t1=0, t2=0,t3=0,t4=0,t5=1, >>>> if datediff >= 500 then t1=0, t2=0,t3=0,t4=0,t5=0 >>>> >>>> The complete out put looks like as follow. >>>> obs, start, end, datediff, t1, t2, t3, t4, t5 >>>> 1, 2/1/2015, 1/1/2017, 700, 0, 0, 0, 0, 0 >>>> 2, 4/11/2010, 1/1/2011, 265, 0, 0, 1, -1, -1 >>>> 3, 1/4/2006, 5/3/2007, 484, 0, 0, 0, 0, 1 >>>> 4, 10/1/2007, 1/1/2008, 92, 1, -1, -1,-1, -1 >>>> 5 , 6/1/2011, 1/1/2012, 214, 0, 0, 1,-1, -1 >>>> 6, 10/15/2004, 12/1/2004, 47, 1, -1, -1, -1, -1 >>>> >>>> Thank you. >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>> ng-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > ------------------------------------------------------------ > --------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ------------------------------------------------------------ > --------------- >[[alternative HTML version deleted]]
Since the number of choices is small (6), how about this? Starting with Jeff's initial DFM: DFM <- structure(list(obs = 1:6, start = structure(c(16467, 14710, 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167, 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)", "[400,500)", "[500,Inf)"), class = c("ordered", "factor"))), .Names = c("obs", "start", "end", "D", "bin"), row.names = c(NA, -6L), class = "data.frame") Construct a matrix of the six alternatives: tvals <- c(1, -1, -1, -1, -1, 0, 1, -1, -1, -1, 0, 0, 1, -1, -1, 0, 0, 0, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0) tmat <- matrix(tvals, 6, 5, byrow=TRUE) colnames(tmat) <- paste0("t", 1:5) tmat # t1 t2 t3 t4 t5 # [1,] 1 -1 -1 -1 -1 # [2,] 0 1 -1 -1 -1 # [3,] 0 0 1 -1 -1 # [4,] 0 0 0 1 -1 # [5,] 0 0 0 0 1 # [6,] 0 0 0 0 0 idx <-as.numeric(DFM$bin) (DFM <- data.frame(DFM, tmat[idx, ])) # obs start end D bin t1 t2 t3 t4 t5 # 1 1 2015-02-01 2017-01-01 700 [500,Inf) 0 0 0 0 0 # 2 2 2010-04-11 2011-01-01 265 [200,300) 0 0 1 -1 -1 # 3 3 2006-01-04 2007-05-03 484 [400,500) 0 0 0 0 1 # 4 4 2007-10-01 2008-01-01 92 [0,100) 1 -1 -1 -1 -1 # 5 5 2011-06-01 2012-01-01 214 [200,300) 0 0 1 -1 -1 # 6 6 2004-10-05 2004-12-01 57 [0,100) 1 -1 -1 -1 -1 David L. Carlson Department of Anthropology Texas A&M University -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Val Sent: Sunday, June 4, 2017 11:31 AM To: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> Cc: r-help at R-project.org Subject: Re: [R] New var Thank you Jeff and All, Within a given time period (say 700 days, from the start day), I am expecting measurements taken at each time interval;. In this case "0" means measurement taken, "1" not taken (stopped or opted out and " -1" don't consider that time period for that individual. This will be compared with the actual measurements taken (Observed- expected) within each time interval. On Sat, Jun 3, 2017 at 9:50 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> # read.table is NOT part of the data.table package > #library(data.table) > DFM <- read.table( text> 'obs start end > 1 2/1/2015 1/1/2017 > 2 4/11/2010 1/1/2011 > 3 1/4/2006 5/3/2007 > 4 10/1/2007 1/1/2008 > 5 6/1/2011 1/1/2012 > 6 10/5/2004 12/1/2004 > ',header = TRUE, stringsAsFactors = FALSE) > # cleaner way to compute D > DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" ) > DFM$end <- as.Date( DFM$end, format="%m/%d/%Y" ) > DFM$D <- as.numeric( DFM$end - DFM$start, units="days" ) > # categorize your data into groups > DFM$bin <- cut( DFM$D > , breaks=c( seq( 0, 500, 100 ), Inf ) > , right=FALSE # do not include the right edge > , ordered_result = TRUE > ) > # brute force method you should have been able to figure out to show us > some work > DFM$t1 <- ifelse( DFM$D < 100, 1, 0 ) > DFM$t2 <- ifelse( 100 <= DFM$D & DFM$D < 200, 1, ifelse( DFM$D < 100, -1, > 0 ) ) > DFM$t3 <- ifelse( 200 <= DFM$D & DFM$D < 300, 1, ifelse( DFM$D < 200, -1, > 0 ) ) > DFM$t4 <- ifelse( 300 <= DFM$D & DFM$D < 400, 1, ifelse( DFM$D < 300, -1, > 0 ) ) > DFM$t5 <- ifelse( 400 <= DFM$D & DFM$D < 500, 1, ifelse( DFM$D < 400, -1, > 0 ) ) > # brute force method with ordered factor > DFM$tf1 <- ifelse( "[0,100)" == DFM$bin, 1, 0 ) > DFM$tf2 <- ifelse( "[100,200)" == DFM$bin, 1, ifelse( "[100,200)" < > DFM$bin, 0, -1 ) ) > DFM$tf3 <- ifelse( "[200,300)" == DFM$bin, 1, ifelse( "[200,300)" < > DFM$bin, 0, -1 ) ) > DFM$tf4 <- ifelse( "[300,400)" == DFM$bin, 1, ifelse( "[300,400)" < > DFM$bin, 0, -1 ) ) > DFM$tf5 <- ifelse( "[400,500)" == DFM$bin, 1, ifelse( "[400,500)" < > DFM$bin, 0, -1 ) ) > # less obvious approach using the fact that factors are integers > # and using the outer function to find all combinations of elements of two > vectors > # and the sign function > DFM[ , paste0( "tm", 1:5 )] <- outer( as.integer( DFM$bin ) > , 1:5 > , FUN = function(x,y) { > z <- sign(y-x)+1L > ifelse( 2 == z, -1L, z ) > } > ) > > # my result, provided using dput for precise representation > DFMresult <- structure(list(obs = 1:6, start = structure(c(16467, 14710, > 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167, > 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, > 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, > 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)", > "[400,500)", "[500,Inf)"), class = c("ordered", "factor")), t1 = c(0, > 0, 0, 1, 0, 1), t2 = c(0, 0, 0, -1, 0, -1), t3 = c(0, 1, 0, -1, > 1, -1), t4 = c(0, -1, 0, -1, -1, -1), t5 = c(0, -1, 1, -1, -1, > -1), tf1 = c(0, 0, 0, 1, 0, 1), tf2 = c(0, 0, 0, -1, 0, -1), > tf3 = c(0, 1, 0, -1, 1, -1), tf4 = c(0, -1, 0, -1, -1, -1 > ), tf5 = c(0, -1, 1, -1, -1, -1), tm1 = c(0, 0, 0, 1, 0, > 1), tm2 = c(0, 0, 0, -1, 0, -1), tm3 = c(0, 1, 0, -1, 1, > -1), tm4 = c(0, -1, 0, -1, -1, -1), tm5 = c(0, -1, 1, -1, > -1, -1)), row.names = c(NA, -6L), .Names = c("obs", "start", > "end", "D", "bin", "t1", "t2", "t3", "t4", "t5", "tf1", "tf2", > "tf3", "tf4", "tf5", "tm1", "tm2", "tm3", "tm4", "tm5"), class > "data.frame") > > You did not address Bert's request for some context, but I am curious how > he or Peter would have approached this problem, so I encourage you do > provide some insight on the list as to why you are doing this. > > > On Sat, 3 Jun 2017, Val wrote: > > Thank you all for the useful suggestion. I did some of my homework. >> >> library(data.table) >> DFM <- read.table(header=TRUE, text='obs start end >> 1 2/1/2015 1/1/2017 >> 2 4/11/2010 1/1/2011 >> 3 1/4/2006 5/3/2007 >> 4 10/1/2007 1/1/2008 >> 5 6/1/2011 1/1/2012 >> 6 10/5/2004 12/1/2004',stringsAsFactors = FALSE) >> DFM >> >> DFM$D =as.numeric(difftime(as.Date(DFM$end,format="%m/%d/%Y"), >> as.Date(DFM$start,format="%m/%d/%Y"), units = "days")) >> DFM >> >> output. >> obs start end D >> 1 1 2/1/2015 1/1/2017 700 >> 2 2 4/11/2010 1/1/2011 265 >> 3 3 1/4/2006 5/3/2007 484 >> 4 4 10/1/2007 1/1/2008 92 >> 5 5 6/1/2011 1/1/2012 214 >> 6 6 10/5/2004 12/1/2004 57 >> >> My problem is how do I get the other new variables >> >> obs start end D t1,t2,t3,t4, t5 >> 1, 2/1/2015, 1/1/2017, 700,0,0,0,0,0 >> 2, 4/11/2010, 1/1/2011, 265,0,0,1,-1,-1 >> 3, 1/4/2006, 5/3/2007, 484,0,0,0,0,1 >> 4, 10/1/2007, 1/1/2008, 92,1,-1,-1,-1,-1 >> 5, 6/1/2011, 1/1/2012, 214,0,0,1,-1,-1 >> 6, 10/15/2004,12/1/2004,47,1,-1,-1,-1,-1 >> >> Thank you again. >> >> >> >> On Sat, Jun 3, 2017 at 12:13 AM, Bert Gunter <bgunter.4567 at gmail.com> >> wrote: >> >>> Ii is difficult to provide useful help, because you have failed to >>> read and follow the posting guide. In particular: >>> >>> 1. Plain text, not HTML. >>> 2. Use dput() or provide code to create your example. Text printouts >>> such as that which you gave require some work to wrangle into into an >>> example that we can test. >>> >>> Specifically: >>> >>> 3. Have you gone through any R tutorials?-- it sure doesn't look like >>> it. We do expect some effort to learn R before posting. >>> >>> 4. What is the format of your date columns? character, factors, >>> POSIX,...? See ?date-time for details. Note particularly the >>> "difftime" link to obtain intervals. >>> >>> 5. ?ifelse for vectorized conditionals. >>> >>> Also, you might want to explain the context of what you are trying to >>> do. I strongly suspect you shouldn't be doing it at all, but that is >>> just a guess. >>> >>> Be sure to cc your reply to the list, not just to me. >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Fri, Jun 2, 2017 at 8:49 PM, Val <valkremk at gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I have a data set with time interval and depending on the interval I >>>> want >>>> to create 5 more variables . Sample data below >>>> >>>> obs, Start, End >>>> 1,2/1/2015, 1/1/2017 >>>> 2,4/11/2010, 1/1/2011 >>>> 3,1/4/2006, 5/3/2007 >>>> 4,10/1/2007, 1/1/2008 >>>> 5,6/1/2011, 1/1/2012 >>>> 6,10/15/2004,12/1/2004 >>>> >>>> First, I want get interval between the start date and end dates >>>> (End-start). >>>> >>>> obs, Start , end, datediff >>>> 1,2/1/2015, 1/1/2017, 700 >>>> 2,4/11/2010, 1/1/2011, 265 >>>> 3,1/4/2006, 5/3/2007, 484 >>>> 4,10/1/2007, 1/1/2008, 92 >>>> 5,6/1/2011, 1/1/2012, 214 >>>> 6,10/15/2004,12/1/2004,47 >>>> >>>> Second. I want create 5 more variables t1, t2, t3, t4 and t5 >>>> The value of each variable is defined as follows >>>> if datediff < 100 then t1=1, t2=t3=t4=t5=-1. >>>> if datediff >= 100 and < 200 then t1=0, t2=1,t3=t4=t5=-1, >>>> if datediff >= 200 and < 300 then t1=0, t2=0,t3=1,t4=t5=-1, >>>> if datediff >= 300 and < 400 then t1=0, t2=0,t3=0,t4=1,t5=-1, >>>> if datediff >= 400 and < 500 then t1=0, t2=0,t3=0,t4=0,t5=1, >>>> if datediff >= 500 then t1=0, t2=0,t3=0,t4=0,t5=0 >>>> >>>> The complete out put looks like as follow. >>>> obs, start, end, datediff, t1, t2, t3, t4, t5 >>>> 1, 2/1/2015, 1/1/2017, 700, 0, 0, 0, 0, 0 >>>> 2, 4/11/2010, 1/1/2011, 265, 0, 0, 1, -1, -1 >>>> 3, 1/4/2006, 5/3/2007, 484, 0, 0, 0, 0, 1 >>>> 4, 10/1/2007, 1/1/2008, 92, 1, -1, -1,-1, -1 >>>> 5 , 6/1/2011, 1/1/2012, 214, 0, 0, 1,-1, -1 >>>> 6, 10/15/2004, 12/1/2004, 47, 1, -1, -1, -1, -1 >>>> >>>> Thank you. >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>> ng-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > ------------------------------------------------------------ > --------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ------------------------------------------------------------ > --------------- >[[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sent from my iPhone> On Jun 4, 2017, at 1:36 PM, David L Carlson <dcarlson at tamu.edu> wrote: > > Since the number of choices is small (6), how about this? > > Starting with Jeff's initial DFM: > > DFM <- structure(list(obs = 1:6, start = structure(c(16467, 14710, 13152, > 13787, 15126, 12696), class = "Date"), end = structure(c(17167, > 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, > 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, > 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)", > "[400,500)", "[500,Inf)"), class = c("ordered", "factor"))), .Names = c("obs", > "start", "end", "D", "bin"), row.names = c(NA, -6L), class = "data.frame") > > Construct a matrix of the six alternatives: > > tvals <- c(1, -1, -1, -1, -1, 0, 1, -1, -1, -1, 0, 0, 1, -1, -1, 0, 0, > 0, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0) > tmat <- matrix(tvals, 6, 5, byrow=TRUE) > colnames(tmat) <- paste0("t", 1:5) > tmat > # t1 t2 t3 t4 t5 > # [1,] 1 -1 -1 -1 -1 > # [2,] 0 1 -1 -1 -1 > # [3,] 0 0 1 -1 -1 > # [4,] 0 0 0 1 -1 > # [5,] 0 0 0 0 1 > # [6,] 0 0 0 0 0 > > idx <-as.numeric(DFM$bin) > (DFM <- data.frame(DFM, tmat[idx, ])) > # obs start end D bin t1 t2 t3 t4 t5 > # 1 1 2015-02-01 2017-01-01 700 [500,Inf) 0 0 0 0 0 > # 2 2 2010-04-11 2011-01-01 265 [200,300) 0 0 1 -1 -1 > # 3 3 2006-01-04 2007-05-03 484 [400,500) 0 0 0 0 1 > # 4 4 2007-10-01 2008-01-01 92 [0,100) 1 -1 -1 -1 -1 > # 5 5 2011-06-01 2012-01-01 214 [200,300) 0 0 1 -1 -1 > # 6 6 2004-10-05 2004-12-01 57 [0,100) 1 -1 -1 -1 -1 > > > David L. Carlson > Department of Anthropology > Texas A&M University > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Val > Sent: Sunday, June 4, 2017 11:31 AM > To: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> > Cc: r-help at R-project.org > Subject: Re: [R] New var > > Thank you Jeff and All, > > Within a given time period (say 700 days, from the start day), I am > expecting measurements taken at each time interval;. In this case "0" means > measurement taken, "1" not taken (stopped or opted out and " -1" don't > consider that time period for that individual. This will be compared with > the actual measurements taken (Observed- expected) within each time > interval. > > > > > On Sat, Jun 3, 2017 at 9:50 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> > wrote: > >> # read.table is NOT part of the data.table package >> #library(data.table) >> DFM <- read.table( text>> 'obs start end >> 1 2/1/2015 1/1/2017 >> 2 4/11/2010 1/1/2011 >> 3 1/4/2006 5/3/2007 >> 4 10/1/2007 1/1/2008 >> 5 6/1/2011 1/1/2012 >> 6 10/5/2004 12/1/2004 >> ',header = TRUE, stringsAsFactors = FALSE) >> # cleaner way to compute D >> DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" ) >> DFM$end <- as.Date( DFM$end, format="%m/%d/%Y" ) >> DFM$D <- as.numeric( DFM$end - DFM$start, units="days" ) >> # categorize your data into groups >> DFM$bin <- cut( DFM$D >> , breaks=c( seq( 0, 500, 100 ), Inf ) >> , right=FALSE # do not include the right edge >> , ordered_result = TRUE >> ) >> # brute force method you should have been able to figure out to show us >> some work >> DFM$t1 <- ifelse( DFM$D < 100, 1, 0 ) >> DFM$t2 <- ifelse( 100 <= DFM$D & DFM$D < 200, 1, ifelse( DFM$D < 100, -1, >> 0 ) ) >> DFM$t3 <- ifelse( 200 <= DFM$D & DFM$D < 300, 1, ifelse( DFM$D < 200, -1, >> 0 ) ) >> DFM$t4 <- ifelse( 300 <= DFM$D & DFM$D < 400, 1, ifelse( DFM$D < 300, -1, >> 0 ) ) >> DFM$t5 <- ifelse( 400 <= DFM$D & DFM$D < 500, 1, ifelse( DFM$D < 400, -1, >> 0 ) ) >> # brute force method with ordered factor >> DFM$tf1 <- ifelse( "[0,100)" == DFM$bin, 1, 0 ) >> DFM$tf2 <- ifelse( "[100,200)" == DFM$bin, 1, ifelse( "[100,200)" < >> DFM$bin, 0, -1 ) ) >> DFM$tf3 <- ifelse( "[200,300)" == DFM$bin, 1, ifelse( "[200,300)" < >> DFM$bin, 0, -1 ) ) >> DFM$tf4 <- ifelse( "[300,400)" == DFM$bin, 1, ifelse( "[300,400)" < >> DFM$bin, 0, -1 ) ) >> DFM$tf5 <- ifelse( "[400,500)" == DFM$bin, 1, ifelse( "[400,500)" < >> DFM$bin, 0, -1 ) ) >> # less obvious approach using the fact that factors are integers >> # and using the outer function to find all combinations of elements of two >> vectors >> # and the sign function >> DFM[ , paste0( "tm", 1:5 )] <- outer( as.integer( DFM$bin ) >> , 1:5 >> , FUN = function(x,y) { >> z <- sign(y-x)+1L >> ifelse( 2 == z, -1L, z ) >> } >> ) >> >> # my result, provided using dput for precise representation >> DFMresult <- structure(list(obs = 1:6, start = structure(c(16467, 14710, >> 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167, >> 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, >> 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, >> 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)", >> "[400,500)", "[500,Inf)"), class = c("ordered", "factor")), t1 = c(0, >> 0, 0, 1, 0, 1), t2 = c(0, 0, 0, -1, 0, -1), t3 = c(0, 1, 0, -1, >> 1, -1), t4 = c(0, -1, 0, -1, -1, -1), t5 = c(0, -1, 1, -1, -1, >> -1), tf1 = c(0, 0, 0, 1, 0, 1), tf2 = c(0, 0, 0, -1, 0, -1), >> tf3 = c(0, 1, 0, -1, 1, -1), tf4 = c(0, -1, 0, -1, -1, -1 >> ), tf5 = c(0, -1, 1, -1, -1, -1), tm1 = c(0, 0, 0, 1, 0, >> 1), tm2 = c(0, 0, 0, -1, 0, -1), tm3 = c(0, 1, 0, -1, 1, >> -1), tm4 = c(0, -1, 0, -1, -1, -1), tm5 = c(0, -1, 1, -1, >> -1, -1)), row.names = c(NA, -6L), .Names = c("obs", "start", >> "end", "D", "bin", "t1", "t2", "t3", "t4", "t5", "tf1", "tf2", >> "tf3", "tf4", "tf5", "tm1", "tm2", "tm3", "tm4", "tm5"), class >> "data.frame") >> >> You did not address Bert's request for some context, but I am curious how >> he or Peter would have approached this problem, so I encourage you do >> provide some insight on the list as to why you are doing this. >> >> >> On Sat, 3 Jun 2017, Val wrote: >> >> Thank you all for the useful suggestion. I did some of my homework. >>> >>> library(data.table) >>> DFM <- read.table(header=TRUE, text='obs start end >>> 1 2/1/2015 1/1/2017 >>> 2 4/11/2010 1/1/2011 >>> 3 1/4/2006 5/3/2007 >>> 4 10/1/2007 1/1/2008 >>> 5 6/1/2011 1/1/2012 >>> 6 10/5/2004 12/1/2004',stringsAsFactors = FALSE) >>> DFM >>> >>> DFM$D =as.numeric(difftime(as.Date(DFM$end,format="%m/%d/%Y"), >>> as.Date(DFM$start,format="%m/%d/%Y"), units = "days")) >>> DFM >>> >>> output. >>> obs start end D >>> 1 1 2/1/2015 1/1/2017 700 >>> 2 2 4/11/2010 1/1/2011 265 >>> 3 3 1/4/2006 5/3/2007 484 >>> 4 4 10/1/2007 1/1/2008 92 >>> 5 5 6/1/2011 1/1/2012 214 >>> 6 6 10/5/2004 12/1/2004 57 >>> >>> My problem is how do I get the other new variables >>> >>> obs start end D t1,t2,t3,t4, t5 >>> 1, 2/1/2015, 1/1/2017, 700,0,0,0,0,0 >>> 2, 4/11/2010, 1/1/2011, 265,0,0,1,-1,-1 >>> 3, 1/4/2006, 5/3/2007, 484,0,0,0,0,1 >>> 4, 10/1/2007, 1/1/2008, 92,1,-1,-1,-1,-1 >>> 5, 6/1/2011, 1/1/2012, 214,0,0,1,-1,-1 >>> 6, 10/15/2004,12/1/2004,47,1,-1,-1,-1,-1 >>> >>> Thank you again. >>> >>> >>> >>> On Sat, Jun 3, 2017 at 12:13 AM, Bert Gunter <bgunter.4567 at gmail.com> >>> wrote: >>> >>>> Ii is difficult to provide useful help, because you have failed to >>>> read and follow the posting guide. In particular: >>>> >>>> 1. Plain text, not HTML. >>>> 2. Use dput() or provide code to create your example. Text printouts >>>> such as that which you gave require some work to wrangle into into an >>>> example that we can test. >>>> >>>> Specifically: >>>> >>>> 3. Have you gone through any R tutorials?-- it sure doesn't look like >>>> it. We do expect some effort to learn R before posting. >>>> >>>> 4. What is the format of your date columns? character, factors, >>>> POSIX,...? See ?date-time for details. Note particularly the >>>> "difftime" link to obtain intervals. >>>> >>>> 5. ?ifelse for vectorized conditionals. >>>> >>>> Also, you might want to explain the context of what you are trying to >>>> do. I strongly suspect you shouldn't be doing it at all, but that is >>>> just a guess. >>>> >>>> Be sure to cc your reply to the list, not just to me. >>>> >>>> Cheers, >>>> Bert >>>> >>>> >>>> Bert Gunter >>>> >>>> "The trouble with having an open mind is that people keep coming along >>>> and sticking things into it." >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>> >>>> >>>>> On Fri, Jun 2, 2017 at 8:49 PM, Val <valkremk at gmail.com> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I have a data set with time interval and depending on the interval I >>>>> want >>>>> to create 5 more variables . Sample data below >>>>> >>>>> obs, Start, End >>>>> 1,2/1/2015, 1/1/2017 >>>>> 2,4/11/2010, 1/1/2011 >>>>> 3,1/4/2006, 5/3/2007 >>>>> 4,10/1/2007, 1/1/2008 >>>>> 5,6/1/2011, 1/1/2012 >>>>> 6,10/15/2004,12/1/2004 >>>>> >>>>> First, I want get interval between the start date and end dates >>>>> (End-start). >>>>> >>>>> obs, Start , end, datediff >>>>> 1,2/1/2015, 1/1/2017, 700 >>>>> 2,4/11/2010, 1/1/2011, 265 >>>>> 3,1/4/2006, 5/3/2007, 484 >>>>> 4,10/1/2007, 1/1/2008, 92 >>>>> 5,6/1/2011, 1/1/2012, 214 >>>>> 6,10/15/2004,12/1/2004,47 >>>>> >>>>> Second. I want create 5 more variables t1, t2, t3, t4 and t5 >>>>> The value of each variable is defined as follows >>>>> if datediff < 100 then t1=1, t2=t3=t4=t5=-1. >>>>> if datediff >= 100 and < 200 then t1=0, t2=1,t3=t4=t5=-1, >>>>> if datediff >= 200 and < 300 then t1=0, t2=0,t3=1,t4=t5=-1, >>>>> if datediff >= 300 and < 400 then t1=0, t2=0,t3=0,t4=1,t5=-1, >>>>> if datediff >= 400 and < 500 then t1=0, t2=0,t3=0,t4=0,t5=1, >>>>> if datediff >= 500 then t1=0, t2=0,t3=0,t4=0,t5=0 >>>>> >>>>> The complete out put looks like as follow. >>>>> obs, start, end, datediff, t1, t2, t3, t4, t5 >>>>> 1, 2/1/2015, 1/1/2017, 700, 0, 0, 0, 0, 0 >>>>> 2, 4/11/2010, 1/1/2011, 265, 0, 0, 1, -1, -1 >>>>> 3, 1/4/2006, 5/3/2007, 484, 0, 0, 0, 0, 1 >>>>> 4, 10/1/2007, 1/1/2008, 92, 1, -1, -1,-1, -1 >>>>> 5 , 6/1/2011, 1/1/2012, 214, 0, 0, 1,-1, -1 >>>>> 6, 10/15/2004, 12/1/2004, 47, 1, -1, -1, -1, -1 >>>>> >>>>> Thank you. >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>>> ng-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posti >>> ng-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> ------------------------------------------------------------ >> --------------- >> Jeff Newmiller The ..... ..... Go Live... >> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >> Go... >> Live: OO#.. Dead: OO#.. Playing >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k >> ------------------------------------------------------------ >> --------------- >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.