Muhuri, Pradip (SAMHSA/CBHSQ)
2014-Nov-10 16:10 UTC
[R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Hello, The range() with complete.cases() removes NA's for the date variables that are read from a data frame. However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate(). The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated. Thanks, Pradip Muhuri ################# cut and pasted from the R console #################### id mrjdate cocdate inhdate haldate oiddate 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 <NA> <NA> <NA> <NA> <NA> 3 3 2009-10-24 <NA> 2011-10-13 <NA> 2011-10-13 4 4 2007-10-10 <NA> <NA> <NA> 2007-10-10 5 5 2006-09-01 2005-08-10 <NA> <NA> 2006-09-01 6 6 2007-09-04 2011-10-05 <NA> <NA> 2011-10-05 7 7 2005-10-25 <NA> <NA> 2011-11-04 2011-11-04> > # range of dates > > range(data2$mrjdate[complete.cases(data2$mrjdate)])[1] "2004-11-04" "2009-10-24"> range(data2$cocdate[complete.cases(data2$cocdate)])[1] "2005-08-10" "2011-10-05"> range(data2$inhdate[complete.cases(data2$inhdate)])[1] "2005-07-07" "2011-10-13"> range(data2$haldate[complete.cases(data2$haldate)])[1] "2007-11-07" "2011-11-04"> range(data2$oiddate[complete.cases(data2$oiddate)])[1] NA "2011-11-04" ################ reproducible code ############################# library(dplyr) library(lubridate) library(zoo) # data object - description of the temp <- "id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04" # read the data object data1 <- read.table(textConnection(temp), colClasses=c("character", "Date", "Date", "Date", "Date"), header=TRUE, as.is=TRUE ) # create a new column data2 <- data1 %>% rowwise() %>% mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) # print records print (data2) # range of dates range(data2$mrjdate[complete.cases(data2$mrjdate)]) range(data2$cocdate[complete.cases(data2$cocdate)]) range(data2$inhdate[complete.cases(data2$inhdate)]) range(data2$haldate[complete.cases(data2$haldate)]) range(data2$oiddate[complete.cases(data2$oiddate)]) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]]
arun
2014-Nov-10 16:30 UTC
[R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Try range(data2$oiddate[complete.cases(data2$oiddate) & is.finite(data2$oiddate)]) #[1] "2006-09-01" "2011-11-04" If you look at the `dput` output, it is `Inf` for oiddate dput(data2$oiddate) structure(c(14078, -Inf, 15260, 13796, 13392, 15252, 15282), class = "Date") A.K. On Monday, November 10, 2014 11:15 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov> wrote: Hello, The range() with complete.cases() removes NA's for the date variables that are read from a data frame. However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate(). The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated. Thanks, Pradip Muhuri ################# cut and pasted from the R console #################### id mrjdate cocdate inhdate haldate oiddate 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 <NA> <NA> <NA> <NA> <NA> 3 3 2009-10-24 <NA> 2011-10-13 <NA> 2011-10-13 4 4 2007-10-10 <NA> <NA> <NA> 2007-10-10 5 5 2006-09-01 2005-08-10 <NA> <NA> 2006-09-01 6 6 2007-09-04 2011-10-05 <NA> <NA> 2011-10-05 7 7 2005-10-25 <NA> <NA> 2011-11-04 2011-11-04> > # range of dates > > range(data2$mrjdate[complete.cases(data2$mrjdate)])[1] "2004-11-04" "2009-10-24"> range(data2$cocdate[complete.cases(data2$cocdate)])[1] "2005-08-10" "2011-10-05"> range(data2$inhdate[complete.cases(data2$inhdate)])[1] "2005-07-07" "2011-10-13"> range(data2$haldate[complete.cases(data2$haldate)])[1] "2007-11-07" "2011-11-04"> range(data2$oiddate[complete.cases(data2$oiddate)])[1] NA "2011-11-04" ################ reproducible code ############################# library(dplyr) library(lubridate) library(zoo) # data object - description of the temp <- "id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04" # read the data object data1 <- read.table(textConnection(temp), colClasses=c("character", "Date", "Date", "Date", "Date"), header=TRUE, as.is=TRUE ) # create a new column data2 <- data1 %>% rowwise() %>% mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) # print records print (data2) # range of dates range(data2$mrjdate[complete.cases(data2$mrjdate)]) range(data2$cocdate[complete.cases(data2$cocdate)]) range(data2$inhdate[complete.cases(data2$inhdate)]) range(data2$haldate[complete.cases(data2$haldate)]) range(data2$oiddate[complete.cases(data2$oiddate)]) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Mark Sharp
2014-Nov-10 17:23 UTC
[R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Pradip, For some reason mutate is not setting the is.NA value for the new column. Note the output below using your data structures.> ## It looks at first as if the second element of both columns are NA. > data2$mrjdate[2][1] NA> data2$oiddate[2][1] NA> ## for convenience > mrj <- data2$mrjdate[2] > oid <- data2$oiddate[2] > mode(mrj)[1] "numeric"> mode(oid)[1] "numeric"> str(mrj)Date[1:1], format: NA> str(oid)Date[1:1], format: NA> class(mrj)[1] "Date"> class(oid)[1] "Date"> ## But note: > identical(mrj, oid)[1] FALSE> all.equal(mrj, oid)[1] "'is.NA' value mismatch: 0 in current 1 in target" ## functioning code data2$mrjdate[2] data2$oiddate[2] mrj <- data2$mrjdate[2] oid <- data2$oiddate[2] mode(mrj) mode(oid) str(mrj) str(oid) class(mrj) class(oid) # But note: identical(mrj, oid) all.equal(mrj, oid) ## This ugly solution does not have the problem.> data3 <- data1 > data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {+ if (all(is.na(unlist(data1[row, -1])))) { + max_d <- NA + } else { + max_d <- max(unlist(data1[row, -1]), na.rm = TRUE) + } + max_d}), + origin = "1970-01-01")> > range(data3$mrjdate[complete.cases(data3$mrjdate)])[1] "2004-11-04" "2009-10-24"> range(data3$cocdate[complete.cases(data3$cocdate)])[1] "2005-08-10" "2011-10-05"> range(data3$inhdate[complete.cases(data3$inhdate)])[1] "2005-07-07" "2011-10-13"> range(data3$haldate[complete.cases(data3$haldate)])[1] "2007-11-07" "2011-11-04"> range(data3$oiddate[complete.cases(data3$oiddate)])[1] "2006-09-01" "2011-11-04">Working code below. data3 <- data1 data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) { if (all(is.na(unlist(data1[row, -1])))) { max_d <- NA } else { max_d <- max(unlist(data1[row, -1]), na.rm = TRUE) } max_d}), origin = "1970-01-01") range(data3$mrjdate[complete.cases(data3$mrjdate)]) range(data3$cocdate[complete.cases(data3$cocdate)]) range(data3$inhdate[complete.cases(data3$inhdate)]) range(data3$haldate[complete.cases(data3$haldate)]) range(data3$oiddate[complete.cases(data3$oiddate)]) On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:> Hello, > > The range() with complete.cases() removes NA's for the date variables that are read from a data frame. However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate(). The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated. > > Thanks, > > Pradip Muhuri > > > ################# cut and pasted from the R console #################### > > id mrjdate cocdate inhdate haldate oiddate > 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 > 2 2 <NA> <NA> <NA> <NA> <NA> > 3 3 2009-10-24 <NA> 2011-10-13 <NA> 2011-10-13 > 4 4 2007-10-10 <NA> <NA> <NA> 2007-10-10 > 5 5 2006-09-01 2005-08-10 <NA> <NA> 2006-09-01 > 6 6 2007-09-04 2011-10-05 <NA> <NA> 2011-10-05 > 7 7 2005-10-25 <NA> <NA> 2011-11-04 2011-11-04 >> >> # range of dates >> >> range(data2$mrjdate[complete.cases(data2$mrjdate)]) > [1] "2004-11-04" "2009-10-24" >> range(data2$cocdate[complete.cases(data2$cocdate)]) > [1] "2005-08-10" "2011-10-05" >> range(data2$inhdate[complete.cases(data2$inhdate)]) > [1] "2005-07-07" "2011-10-13" >> range(data2$haldate[complete.cases(data2$haldate)]) > [1] "2007-11-07" "2011-11-04" >> range(data2$oiddate[complete.cases(data2$oiddate)]) > [1] NA "2011-11-04" > > > ################ reproducible code ############################# > > library(dplyr) > library(lubridate) > library(zoo) > # data object - description of the > > temp <- "id mrjdate cocdate inhdate haldate > 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 > 2 NA NA NA NA > 3 2009-10-24 NA 2011-10-13 NA > 4 2007-10-10 NA NA NA > 5 2006-09-01 2005-08-10 NA NA > 6 2007-09-04 2011-10-05 NA NA > 7 2005-10-25 NA NA 2011-11-04" > > # read the data object > > data1 <- read.table(textConnection(temp), > colClasses=c("character", "Date", "Date", "Date", "Date"), > header=TRUE, as.is=TRUE > ) > > > # create a new column > > data2 <- data1 %>% > rowwise() %>% > mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, > na.rm=TRUE), origin='1970-01-01')) > > # print records > > print (data2) > > # range of dates > > range(data2$mrjdate[complete.cases(data2$mrjdate)]) > range(data2$cocdate[complete.cases(data2$cocdate)]) > range(data2$inhdate[complete.cases(data2$inhdate)]) > range(data2$haldate[complete.cases(data2$haldate)]) > range(data2$oiddate[complete.cases(data2$oiddate)]) > > > > > > Pradip K. Muhuri, PhD > SAMHSA/CBHSQ > 1 Choke Cherry Road, Room 2-1071 > Rockville, MD 20857 > Tel: 240-276-1070 > Fax: 240-276-1260 > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.NOTICE: This E-Mail (including attachments) is confidential and may be legally privileged. It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited. Please reply to the sender that you have received this message in error, then delete it.