Martin Maechler
2022-Oct-07 12:52 UTC
[Rd] A potential POSIXlt->Date bug introduced in r-devel
>>>>> Martin Maechler >>>>> on Thu, 6 Oct 2022 10:15:29 +0200 writes:>>>>> Davis Vaughan >>>>> on Wed, 5 Oct 2022 17:04:11 -0400 writes:>> Hi all, >> I think I have discovered a bug in the conversion from POSIXlt to Date that >> has been introduced in r-devel. >> It affects lubridate, but surprisingly didn't cause test failures there. >> Instead it caused test failures in users of lubridate, like slider, arrow, >> and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and >> at least in slider I have been asked by CRAN to correct this issue before >> 2022-10-16. >> In r-devel we get the following: >> ``` >> data <- list( >> sec = 0, >> min = 0L, >> hour = 0L, >> mday = 31L, >> mon = c(0L, NA, 2L), >> year = 113L, >> wday = 4L, >> yday = 30L, >> isdst = 0L >> ) >> x <- .POSIXlt(xx = data, tz = "UTC") >> x >> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" >> # Looks right >> as.POSIXct(x) >> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" >> # Weird, where is the `NA`? >> as.Date(x) >> #> [1] "2013-01-31" "1970-01-01" "2013-03-31" >> ``` > I agree that the above is wrong, i.e., a bug in current R-devel. >> The POSIXlt object is length 3, but is only partially filled out. >> The other elements are all recycled to length 3 upon >> conversion to POSIXct or Date. >> But when converting to Date, we lose the `NA` value. I think the >> `as.Date()` conversion seems inconsistent with the `as.POSIXct()` >> conversion. > Yes. There was another very much relatd conversation here on R-devel, > initiated by Suharto Anggono just a few days ago. > This subject, i.e., "partially filled out" POSIXlt objects, was > one of the topics, too. > See my reply there, notably at the end: > https://stat.ethz.ch/pipermail/r-devel/2022-October/082072.html > I do mention that "recycling" of partially filled POSIXlt > objects has only partially been implemented in R more generally > and was actually asking for comments and further discussion. >> It looks like this comes up because the conversion to Date now defaults to >> using `sec` if any of the date-like fields are `NA_INTEGER`, > yes, because only that allows to also deal with +/- Inf etc, > as was recently added as new feature, see the NEWS of R 4.2.0 > ? Not strictly fixing a bug, format()ing and print()ing of > non-finite Date and POSIXt values NaN and +/-Inf no longer show > as NA but the respective string, e.g., Inf, for consistency with > numeric vector's behaviour, fulfilling the wish of PR#18308. > i.e., see also R's bugzilla > https://bugs.r-project.org/show_bug.cgi?id=18308 > which actually *also* mentioned an NA problem in Date/Time objects. >> but this means the `NA` in the `mon` field is ignored. > which I agree is bogous and we'll fix. > Still, I did not get any feedback on asking about documentation > etc on POSIXlt objects ... and I *had* mentioned I agreed that > the current partial implementation of "partially filled" i.e. recycling of > POSIXlt components should probably be made part of the > "definition" of POSIXlt. > Have I overlooked an existing definition / contract about these? > Martin I'm still waiting for comments. Note that "partially filled" POSIXlt do not work correctly in any version of R. I mentioned that even length(.) may easily fail; but there is much more. While I can relatively easily fix Davis' case above, the following example behaves wrongly in current and previous released versions of R and in R-devel: dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf, NA)) + pi, # "out of range", non-finite, fractions min = 45L, hour = c(21L, 3L, NA, 4L), mday = 6L, mon = c(11L, NA, 3L), year = 116L, wday = 2L, yday = 340L, isdst = 1L)) Of course that's constructed to be particularly unpleasant. You can try some of the following "checks" in your version(s) of R to see that some of the things are misbehaving with it in all (*) versions of R. -- *) so I claim boldly dct <- as.POSIXct(dlt) (n <- length(dct)) dD <- as.Date(dlt) dDc <- as.Date(dct) dltN <- as.POSIXlt(dct) # "normalized POSIXlt" (with *lost* accuracy): data.frame(unclass(dltN)) .POSIXltNormalize <- function(x) { stopifnot(is.numeric(s <- x$sec)) x <- as.POSIXlt(as.POSIXct(x)) # and restore the precise seconds : ifin <- is.finite(s) & is.finite(x$sec) # (maybe recycling already) x$sec[ifin] <- s[ifin] %% 60 x } dlt2 <- .POSIXltNormalize(dlt) # normalized POSIXlt - with accuracy kept all.equal(dlt2$sec, dltN$sec, tolerance = 0) # .. small (2e-9) difference stopifnot(all.equal(dlt2, dltN), identical(as.POSIXct(dlt2), as.POSIXct(dltN))) ## First show (in a way it also works for older R), then check : oldR <- getRversion() < "4.2.2" print(width = 101, data.frame(dlt, dltN, asCT = dct, asDateCT = dDc, asDate = if(oldR) rep_len(dD, n) else dD , na = is.na(dlt), fin = if(oldR) rep_len(NA, n) else is.finite(dlt)) ) -- Martin M?chler ETH Zurich and R Core team
Davis Vaughan
2022-Oct-07 15:00 UTC
[Rd] A potential POSIXlt->Date bug introduced in r-devel
Martin, FWIW, I scoured the docs using GitHub's new code search preview but can't seem to find any reference to the fact that POSIXlt fields are internally recycled (even though lubridate seems to have been relying on this for quite some time). -Davis On Fri, Oct 7, 2022 at 8:52 AM Martin Maechler <maechler at stat.math.ethz.ch> wrote:> >>>>> Martin Maechler > >>>>> on Thu, 6 Oct 2022 10:15:29 +0200 writes: > > >>>>> Davis Vaughan > >>>>> on Wed, 5 Oct 2022 17:04:11 -0400 writes: > > >> Hi all, > > >> I think I have discovered a bug in the conversion from POSIXlt to > Date that > >> has been introduced in r-devel. > > >> It affects lubridate, but surprisingly didn't cause test failures > there. > >> Instead it caused test failures in users of lubridate, like slider, > arrow, > >> and admiral (see https://github.com/tidyverse/lubridate/issues/1069), > and > >> at least in slider I have been asked by CRAN to correct this issue > before > >> 2022-10-16. > > >> In r-devel we get the following: > > >> ``` > >> data <- list( > >> sec = 0, > >> min = 0L, > >> hour = 0L, > >> mday = 31L, > >> mon = c(0L, NA, 2L), > >> year = 113L, > >> wday = 4L, > >> yday = 30L, > >> isdst = 0L > >> ) > > >> x <- .POSIXlt(xx = data, tz = "UTC") > >> x > >> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" > > >> # Looks right > >> as.POSIXct(x) > >> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" > > > > >> # Weird, where is the `NA`? > >> as.Date(x) > >> #> [1] "2013-01-31" "1970-01-01" "2013-03-31" > >> ``` > > > I agree that the above is wrong, i.e., a bug in current R-devel. > > >> The POSIXlt object is length 3, but is only partially filled out. > > >> The other elements are all recycled to length 3 upon > >> conversion to POSIXct or Date. > > >> But when converting to Date, we lose the `NA` value. I think the > >> `as.Date()` conversion seems inconsistent with the `as.POSIXct()` > >> conversion. > > > Yes. There was another very much relatd conversation here on > R-devel, > > initiated by Suharto Anggono just a few days ago. > > > This subject, i.e., "partially filled out" POSIXlt objects, was > > one of the topics, too. > > > See my reply there, notably at the end: > > > https://stat.ethz.ch/pipermail/r-devel/2022-October/082072.html > > > I do mention that "recycling" of partially filled POSIXlt > > objects has only partially been implemented in R more generally > > and was actually asking for comments and further discussion. > > > >> It looks like this comes up because the conversion to Date now > defaults to > >> using `sec` if any of the date-like fields are `NA_INTEGER`, > > > yes, because only that allows to also deal with +/- Inf etc, > > as was recently added as new feature, see the NEWS of R 4.2.0 > > > ? Not strictly fixing a bug, format()ing and print()ing of > > non-finite Date and POSIXt values NaN and +/-Inf no longer show > > as NA but the respective string, e.g., Inf, for consistency with > > numeric vector's behaviour, fulfilling the wish of PR#18308. > > > i.e., see also R's bugzilla > > https://bugs.r-project.org/show_bug.cgi?id=18308 > > > which actually *also* mentioned an NA problem in Date/Time objects. > > > >> but this means the `NA` in the `mon` field is ignored. > > > which I agree is bogous and we'll fix. > > > Still, I did not get any feedback on asking about documentation > > etc on POSIXlt objects ... and I *had* mentioned I agreed that > > the current partial implementation of "partially filled" i.e. > recycling of > > POSIXlt components should probably be made part of the > > "definition" of POSIXlt. > > > Have I overlooked an existing definition / contract about these? > > > Martin > > I'm still waiting for comments. > > Note that "partially filled" POSIXlt do not work correctly in > any version of R. I mentioned that even length(.) may easily > fail; but there is much more. > > While I can relatively easily fix Davis' case above, > the following example behaves wrongly in current and previous > released versions of R and in R-devel: > > dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf, NA)) + pi, > # "out of range", non-finite, > fractions > min = 45L, hour = c(21L, 3L, NA, 4L), > mday = 6L, mon = c(11L, NA, 3L), > year = 116L, wday = 2L, yday = 340L, isdst = 1L)) > > > Of course that's constructed to be particularly unpleasant. > You can try some of the following "checks" in your version(s) of > R to see that some of the things are misbehaving with it in all (*) > versions of R. > -- > *) so I claim boldly > > > dct <- as.POSIXct(dlt) > (n <- length(dct)) > dD <- as.Date(dlt) > dDc <- as.Date(dct) > dltN <- as.POSIXlt(dct) # "normalized POSIXlt" (with *lost* accuracy): > data.frame(unclass(dltN)) > .POSIXltNormalize <- function(x) { > stopifnot(is.numeric(s <- x$sec)) > x <- as.POSIXlt(as.POSIXct(x)) # and restore the precise seconds : > ifin <- is.finite(s) & is.finite(x$sec) # (maybe recycling already) > x$sec[ifin] <- s[ifin] %% 60 > x > } > dlt2 <- .POSIXltNormalize(dlt) # normalized POSIXlt - with accuracy kept > all.equal(dlt2$sec, dltN$sec, tolerance = 0) # .. small (2e-9) difference > stopifnot(all.equal(dlt2, dltN), > identical(as.POSIXct(dlt2), as.POSIXct(dltN))) > ## First show (in a way it also works for older R), then check : > oldR <- getRversion() < "4.2.2" > print(width = 101, > data.frame(dlt, dltN, asCT = dct, asDateCT = dDc, > asDate = if(oldR) rep_len(dD, n) else dD , > na = is.na(dlt), > fin = if(oldR) rep_len(NA, n) else is.finite(dlt)) > ) > > > > > -- > Martin M?chler > ETH Zurich and R Core team >[[alternative HTML version deleted]]