Michael Chirico
2016-Jan-27 03:12 UTC
[Rd] Suggestions for improvement as regards `as` methods, and a call for consistency in `as.Date` methods
Good evening all, This topic is gone into at a bit more length at my related Stack Overflow question here: http://stackoverflow.com/questions/34647674/why-do-as-methods-remove-vector-names-and-is-there-a-way-around-it There are two lingering issues despite the abundant insight received at SO, namely: 1) _Why_ do as methods remove their arguments' names attribute? This is a fact which is mentioned briefly in a select few of the related help files, namely ?as.vector ("removes *all* attributes including names for results of atomic mode"), ?as.double ("strips attributes including names.") and ?as.character ("strips attributes including names"); however, it appears (1) neither of these references gives a satisfactory explanation of the reasoning behind this (I can only think of speed) and (2) it would be much more digestible to users if this information (even copy-pasting the same blurb) was placed in all of the as reference files (e.g., ?as, ?as.numeric, ?as.Date, ?as.POSIXct, etc.) Personally, I think that unless there's a substantial efficiency cost to doing so, the default should in fact be to retain names (if not other attributes). 2) All as.Date methods should behave consistently as regards attribute retention As explicated in the referenced SO topic, the following should all give the same result (as they would for similar examples involving other as methods), but don't: datesc <- c(ind = "2015-07-04", nyd = "2016-01-01") datesn <- c(ind = 16620, nyd = 16801) datesp <- structure(c(1435982400, 1451624400), .Names = c("ind", "nyd"), class = c("POSIXct", "POSIXt"), tzone = "") datesl <- structure(list(sec = c(0, 0), min = c(0L, 0L), hour = c(0L, 0L), mday = c(4L, 1L), mon = c(6L, 0L), year = structure(115:116, .Names = c("ind", "nyd")), wday = c(6L, 5L), yday = c(184L, 0L), isdst = c(1L, 0L), zone = c("EDT", "EST"), gmtoff = c(NA_integer_, NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", "POSIXt")) Retain names as.Date.numeric(datesn) # ind nyd #"2015-07-04" "2016-01-01" as.Date.POSIXct(datesp) # ind nyd #"2015-07-04" "2016-01-01" Destroy names as.Date.POSIXlt(datesl) # [1] "2015-07-04" "2016-01-01" as.Date.character(datesc) # [1] "2015-07-04" "2016-01-01" (unconfirmed, but I assume given a glance at the code that all of as.Date.date, as.Date.dates, as.Date.ts, as.Date.yearmon, and as.Date.yearqtr will also strip the names) Regardless of the default behavior as regards keeping/destroying names/other attributes, it would seem that for the sake of consistency the above should be unified. Barring an overhaul of all as methods to retain names, this would mean the following changes (for example): as.Date.numeric <- function (x, origin, ...) { if (missing(origin)) origin <- "1970-01-01" if (identical(origin, "0000-00-00")) origin <- as.Date("0000-01-01", ...) - 1 setNames(as.Date(origin, ...) + x, NULL) } as.Date.POSIXct <- function (x, tz = "UTC", ...) { if (tz == "UTC") { z <- floor(unclass(x)/86400) attr(z, "tzone") <- NULL attr(z, "names") <- NULL structure(z, class = "Date") } else as.Date(as.POSIXlt(x, tz = tz)) } Thank you in advance for your consideration and thank you as always for your time on this project. Michael Chirico PhD Candidate in Economics University of Pennsylvania 3718 Locust Walk Room 160 McNeil Building Philadelphia, PA 19104 United States of America [[alternative HTML version deleted]]