Martin Maechler
2022-Oct-06 08:15 UTC
[Rd] A potential POSIXlt->Date bug introduced in r-devel
>>>>> Davis Vaughan >>>>> on Wed, 5 Oct 2022 17:04:11 -0400 writes:> Hi all, > I think I have discovered a bug in the conversion from POSIXlt to Date that > has been introduced in r-devel. > It affects lubridate, but surprisingly didn't cause test failures there. > Instead it caused test failures in users of lubridate, like slider, arrow, > and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and > at least in slider I have been asked by CRAN to correct this issue before > 2022-10-16. > In r-devel we get the following: > ``` > data <- list( > sec = 0, > min = 0L, > hour = 0L, > mday = 31L, > mon = c(0L, NA, 2L), > year = 113L, > wday = 4L, > yday = 30L, > isdst = 0L > ) > x <- .POSIXlt(xx = data, tz = "UTC") > x > #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" > # Looks right > as.POSIXct(x) > #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" > # Weird, where is the `NA`? > as.Date(x) > #> [1] "2013-01-31" "1970-01-01" "2013-03-31" > ``` I agree that the above is wrong, i.e., a bug in current R-devel. > The POSIXlt object is length 3, but is only partially filled out. > The other elements are all recycled to length 3 upon > conversion to POSIXct or Date. > But when converting to Date, we lose the `NA` value. I think the > `as.Date()` conversion seems inconsistent with the `as.POSIXct()` > conversion. Yes. There was another very much relatd conversation here on R-devel, initiated by Suharto Anggono just a few days ago. This subject, i.e., "partially filled out" POSIXlt objects, was one of the topics, too. See my reply there, notably at the end: https://stat.ethz.ch/pipermail/r-devel/2022-October/082072.html I do mention that "recycling" of partially filled POSIXlt objects has only partially been implemented in R more generally and was actually asking for comments and further discussion. > It looks like this comes up because the conversion to Date now defaults to > using `sec` if any of the date-like fields are `NA_INTEGER`, yes, because only that allows to also deal with +/- Inf etc, as was recently added as new feature, see the NEWS of R 4.2.0 ? Not strictly fixing a bug, format()ing and print()ing of non-finite Date and POSIXt values NaN and +/-Inf no longer show as NA but the respective string, e.g., Inf, for consistency with numeric vector's behaviour, fulfilling the wish of PR#18308. i.e., see also R's bugzilla https://bugs.r-project.org/show_bug.cgi?id=18308 which actually *also* mentioned an NA problem in Date/Time objects. > but this means the `NA` in the `mon` field is ignored. which I agree is bogous and we'll fix. Still, I did not get any feedback on asking about documentation etc on POSIXlt objects ... and I *had* mentioned I agreed that the current partial implementation of "partially filled" i.e. recycling of POSIXlt components should probably be made part of the "definition" of POSIXlt. Have I overlooked an existing definition / contract about these? Martin -- Martin M?chler ETH Zurich and R Core team
Berwin A Turlach
2022-Oct-06 08:41 UTC
[Rd] A potential POSIXlt->Date bug introduced in r-devel
G'day all, On Thu, 6 Oct 2022 10:15:29 +0200 Martin Maechler <maechler at stat.math.ethz.ch> wrote:> >>>>> Davis Vaughan > >>>>> on Wed, 5 Oct 2022 17:04:11 -0400 writes:> > # Weird, where is the `NA`? > > as.Date(x) > > #> [1] "2013-01-31" "1970-01-01" "2013-03-31" > > ``` > > I agree that the above is wrong, i.e., a bug in current R-devel.I have no intention of hijacking this thread, but I wonder whether this is a good opportunity to mention that the 32 bit build of R-devel falls over on my machine since 25 September. It fails one of the regression tests in reg-tests-1d.R. The final lines of reg-tests-1d.Rout.fail are:> tools::Rd2txt(rd, out <- textConnection(NULL, "w"), fragment = TRUE) > stopifnot(any(as.character(rd) != "\n"),+ identical(textConnectionValue(out)[2L], "LaTeX")); close(out)> ## empty output in R <= 4.2.x > > > ## as.POSIXlt(<very large Date>) gave integer overflow > stopifnot(as.POSIXlt(.Date(2^31 + 10))$year == 5879680L)Error: as.POSIXlt(.Date(2^31 + 10))$year == 5879680L is not TRUE Execution halted I should have reported this earlier, but somehow did not find the time to do so. So I thought I mention it here. :) Cheers, Berwin
Martin Maechler
2022-Oct-07 12:52 UTC
[Rd] A potential POSIXlt->Date bug introduced in r-devel
>>>>> Martin Maechler >>>>> on Thu, 6 Oct 2022 10:15:29 +0200 writes:>>>>> Davis Vaughan >>>>> on Wed, 5 Oct 2022 17:04:11 -0400 writes:>> Hi all, >> I think I have discovered a bug in the conversion from POSIXlt to Date that >> has been introduced in r-devel. >> It affects lubridate, but surprisingly didn't cause test failures there. >> Instead it caused test failures in users of lubridate, like slider, arrow, >> and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and >> at least in slider I have been asked by CRAN to correct this issue before >> 2022-10-16. >> In r-devel we get the following: >> ``` >> data <- list( >> sec = 0, >> min = 0L, >> hour = 0L, >> mday = 31L, >> mon = c(0L, NA, 2L), >> year = 113L, >> wday = 4L, >> yday = 30L, >> isdst = 0L >> ) >> x <- .POSIXlt(xx = data, tz = "UTC") >> x >> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" >> # Looks right >> as.POSIXct(x) >> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" >> # Weird, where is the `NA`? >> as.Date(x) >> #> [1] "2013-01-31" "1970-01-01" "2013-03-31" >> ``` > I agree that the above is wrong, i.e., a bug in current R-devel. >> The POSIXlt object is length 3, but is only partially filled out. >> The other elements are all recycled to length 3 upon >> conversion to POSIXct or Date. >> But when converting to Date, we lose the `NA` value. I think the >> `as.Date()` conversion seems inconsistent with the `as.POSIXct()` >> conversion. > Yes. There was another very much relatd conversation here on R-devel, > initiated by Suharto Anggono just a few days ago. > This subject, i.e., "partially filled out" POSIXlt objects, was > one of the topics, too. > See my reply there, notably at the end: > https://stat.ethz.ch/pipermail/r-devel/2022-October/082072.html > I do mention that "recycling" of partially filled POSIXlt > objects has only partially been implemented in R more generally > and was actually asking for comments and further discussion. >> It looks like this comes up because the conversion to Date now defaults to >> using `sec` if any of the date-like fields are `NA_INTEGER`, > yes, because only that allows to also deal with +/- Inf etc, > as was recently added as new feature, see the NEWS of R 4.2.0 > ? Not strictly fixing a bug, format()ing and print()ing of > non-finite Date and POSIXt values NaN and +/-Inf no longer show > as NA but the respective string, e.g., Inf, for consistency with > numeric vector's behaviour, fulfilling the wish of PR#18308. > i.e., see also R's bugzilla > https://bugs.r-project.org/show_bug.cgi?id=18308 > which actually *also* mentioned an NA problem in Date/Time objects. >> but this means the `NA` in the `mon` field is ignored. > which I agree is bogous and we'll fix. > Still, I did not get any feedback on asking about documentation > etc on POSIXlt objects ... and I *had* mentioned I agreed that > the current partial implementation of "partially filled" i.e. recycling of > POSIXlt components should probably be made part of the > "definition" of POSIXlt. > Have I overlooked an existing definition / contract about these? > Martin I'm still waiting for comments. Note that "partially filled" POSIXlt do not work correctly in any version of R. I mentioned that even length(.) may easily fail; but there is much more. While I can relatively easily fix Davis' case above, the following example behaves wrongly in current and previous released versions of R and in R-devel: dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf, NA)) + pi, # "out of range", non-finite, fractions min = 45L, hour = c(21L, 3L, NA, 4L), mday = 6L, mon = c(11L, NA, 3L), year = 116L, wday = 2L, yday = 340L, isdst = 1L)) Of course that's constructed to be particularly unpleasant. You can try some of the following "checks" in your version(s) of R to see that some of the things are misbehaving with it in all (*) versions of R. -- *) so I claim boldly dct <- as.POSIXct(dlt) (n <- length(dct)) dD <- as.Date(dlt) dDc <- as.Date(dct) dltN <- as.POSIXlt(dct) # "normalized POSIXlt" (with *lost* accuracy): data.frame(unclass(dltN)) .POSIXltNormalize <- function(x) { stopifnot(is.numeric(s <- x$sec)) x <- as.POSIXlt(as.POSIXct(x)) # and restore the precise seconds : ifin <- is.finite(s) & is.finite(x$sec) # (maybe recycling already) x$sec[ifin] <- s[ifin] %% 60 x } dlt2 <- .POSIXltNormalize(dlt) # normalized POSIXlt - with accuracy kept all.equal(dlt2$sec, dltN$sec, tolerance = 0) # .. small (2e-9) difference stopifnot(all.equal(dlt2, dltN), identical(as.POSIXct(dlt2), as.POSIXct(dltN))) ## First show (in a way it also works for older R), then check : oldR <- getRversion() < "4.2.2" print(width = 101, data.frame(dlt, dltN, asCT = dct, asDateCT = dDc, asDate = if(oldR) rep_len(dD, n) else dD , na = is.na(dlt), fin = if(oldR) rep_len(NA, n) else is.finite(dlt)) ) -- Martin M?chler ETH Zurich and R Core team