I am trying to clean up some dates and I am clearly doing something wrong. I have laid out an example that seems to show what is happening with the "real" data. The coding is lousy but it looks like it should have worked. Can anyone suggest a) why I am getting that NA appearing after the strptime() command and b) why the NA is disappearing in the sort()? It happens with na.rm=TRUE and na.rm=FALSE ------------------------------------------------- aa <- data.frame( c("12/05/2001", " ", "30/02/1995", NA, "14/02/2007", "M" ) ) names(aa) <- "times" aa[is.na(aa)] <- "M" aa[aa==" "] <- "M" bb <- unlist(subset(aa, aa[,1] !="M")) dates <- strptime(bb, "%d/%m/%Y") dates sort(dates) -------------------------------------------------- Session Info R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=English_Canada.1252; LC_CTYPE=English_Canada.1252; LC_MONETARY=English_Canada.1252; LC_NUMERIC=C;LC_TIME=English_Canada.1252 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base" other attached packages: gdata Hmisc "2.3.1" "3.3-2" (Yes I know I'm out of date but I don't like upgrading just as I am finishing a project) Thanks
Perhaps you want one of these:> sort(as.Date(aa$times, "%d/%m/%Y"))[1] "1995-03-02" "2001-05-12" "2007-02-14"> sort(as.Date(aa$times, "%d/%m/%Y"), na.last = TRUE)[1] "1995-03-02" "2001-05-12" "2007-02-14" NA NA [6] NA On 6/7/07, John Kane <jrkrideau at yahoo.ca> wrote:> I am trying to clean up some dates and I am clearly > doing something wrong. I have laid out an example > that seems to show what is happening with the "real" > data. The coding is lousy but it looks like it > should have worked. > > Can anyone suggest a) why I am getting that NA > appearing after the strptime() command and b) why the > NA is disappearing in the sort()? It happens with > na.rm=TRUE and na.rm=FALSE > ------------------------------------------------- > aa <- data.frame( c("12/05/2001", " ", "30/02/1995", > NA, "14/02/2007", "M" ) ) > names(aa) <- "times" > aa[is.na(aa)] <- "M" > aa[aa==" "] <- "M" > bb <- unlist(subset(aa, aa[,1] !="M")) > dates <- strptime(bb, "%d/%m/%Y") > dates > sort(dates) > -------------------------------------------------- > > Session Info > R version 2.4.1 (2006-12-18) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_Canada.1252; > LC_CTYPE=English_Canada.1252; > LC_MONETARY=English_Canada.1252; > LC_NUMERIC=C;LC_TIME=English_Canada.1252 > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" > "datasets" "methods" "base" > > other attached packages: > gdata Hmisc > "2.3.1" "3.3-2" > > (Yes I know I'm out of date but I don't like > upgrading just as I am finishing a project) > > Thanks > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi John, a) The NA appears because '30/02/1995' is not a valid date. > strptime('30/02/1995' , "%d/%m/%Y") [1] NA b) dates which has the following classes uses sort.POSIXlt which in turns sets na.last to NA. ?order details how NA's are handled in ordering data via na.last. > class(dates) [1] "POSIXt" "POSIXlt" > methods(sort) [1] sort.default sort.POSIXlt > sort.POSIXlt function (x, decreasing = FALSE, na.last = NA, ...) x[order(as.POSIXct(x), na.last = na.last, decreasing = decreasing)] <environment: namespace:base> After resetting the Feb. date the code works. HTH, -jason ----- Original Message ----- From: "John Kane" <jrkrideau at yahoo.ca> To: "R R-help" <r-help at stat.math.ethz.ch> Sent: Thursday, June 07, 2007 2:17 PM Subject: [R] character to time problem>I am trying to clean up some dates and I am clearly > doing something wrong. I have laid out an example > that seems to show what is happening with the "real" > data. The coding is lousy but it looks like it > should have worked. > > Can anyone suggest a) why I am getting that NA > appearing after the strptime() command and b) why the > NA is disappearing in the sort()? It happens with > na.rm=TRUE and na.rm=FALSE > ------------------------------------------------- > aa <- data.frame( c("12/05/2001", " ", "30/02/1995", > NA, "14/02/2007", "M" ) ) > names(aa) <- "times" > aa[is.na(aa)] <- "M" > aa[aa==" "] <- "M" > bb <- unlist(subset(aa, aa[,1] !="M")) > dates <- strptime(bb, "%d/%m/%Y") > dates > sort(dates) > -------------------------------------------------- > > Session Info > R version 2.4.1 (2006-12-18) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_Canada.1252; > LC_CTYPE=English_Canada.1252; > LC_MONETARY=English_Canada.1252; > LC_NUMERIC=C;LC_TIME=English_Canada.1252 > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" > "datasets" "methods" "base" > > other attached packages: > gdata Hmisc > "2.3.1" "3.3-2" > > (Yes I know I'm out of date but I don't like > upgrading just as I am finishing a project) > > Thanks > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >