Andy Teucher
2023-Aug-11 23:07 UTC
[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time
I understand that `as.character.POSIXt()` had an overhaul in R 4.3
(https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b),
and I have come across a new behaviour and I wonder if it is unintended?
When calling `as.character.POSIXt()` on a vector that contains elements where
the time component is midnight (00:00:00), it drops the time component of that
element in the resulting character vector. Previously the time component was
retained:
In R 4.2.3:
```
R.version$version.string
#> [1] "R version 4.2.3 (2023-03-15)"
(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
(tc <- as.character(t))
#> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
```
In R 4.3.1:
```
R.version$version.string
#> [1] "R version 4.3.1 (2023-06-16)"
(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
(tc <- as.character(t))
#> [1] "1975-01-01" "1975-01-01 15:27:00?
```
This has consequences when round-tripping from POSIXt -> character ->
POSIXt, since `as.POSIXct.character()` drops the time component from the entire
vector if any element does not have a time component:
In R 4.2.3:
```
R.version$version.string
#> [1] "R version 4.2.3 (2023-03-15)"
(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
(tc <- as.character(t))
#> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
as.POSIXct(tc)
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
```
In R 4.3.1:
```
R.version$version.string
#> [1] "R version 4.3.1 (2023-06-16)?
(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
(tc <- as.character(t))
#> [1] "1975-01-01" "1975-01-01 15:27:00?
as.POSIXct(tc)
#> [1] "1975-01-01 PST" "1975-01-01 PST?
```
`format.POSIXt()` retains its old behaviour in R 4.3:
```
R.version$version.string
#> [1] "R version 4.2.3 (2023-03-15)"
(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
(tf <- format(t))
#> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
as.POSIXct(tf)
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
```
```
R.version$version.string
#> [1] "R version 4.3.1 (2023-06-16)"
(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
(tf <- format(t))
#> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
as.POSIXct(tf)
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
```
And finally, the behaviour of `as.POSIXct.character()` has not changed (it
previously did, and still does, drop the time component from all elements when
any element has no time):
```R.version$version.string
#> [1] "R version 4.2.3 (2023-03-15)"
as.POSIXct(c("1975-01-01", "1975-01-01 15:27:00"))
#> [1] "1975-01-01 PST" "1975-01-01 PST?
```
```R.version$version.string
#> [1] "R version 4.3.1 (2023-06-16)"
as.POSIXct(c("1975-01-01", "1975-01-01 15:27:00"))
#> [1] "1975-01-01 PST" "1975-01-01 PST?
```
I don?t know if this is a bug/regression in `as.character.POSIXt()`, or intended
behaviour. If it is intended, I think it would benefit from some more
comprehensive documentation.
Thanks very much,
Andy Teucher
Martin Maechler
2023-Aug-14 08:52 UTC
[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time
>>>>> Andy Teucher >>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes:> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended? Well, as the NEWS entry says (partly visible in the url above -- which only shows one part of the several changes for R 4.3) : ? as.character(<POSIXt>) now behaves more in line with the methods for atomic vectors such as numbers, and is no longer influenced by options(). Ditto for as.character(<Date>). The as.character() method gets arguments digits and OutDec with defaults _not_ depending on options(). Use of as.character(*, format = .) now warns. It was "inconsistent" to have as.character(.) basically use format(.) for these datatime objects. as.character(x) for basic R types such as numbers, strings, logicals,... fulfills the important property as.character(x)[j] === as.character(x[j]) whereas that is very much different for format() where indeed, the formatting of x[1] may quite a bit depend on the other x[j]'s values:> as.character(c(1, pi, pi/2^20))[1] "1" "3.14159265358979" "2.99605622633914e-06"> format(c(1, pi, pi/2^20))[1] "1.000000e+00" "3.141593e+00" "2.996056e-06"> format(c(1, pi))[1] "1.000000" "3.141593"> format(c(1, 10))[1] " 1" "10">> When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained: > In R 4.2.3: > ``` > R.version$version.string > #> [1] "R version 4.2.3 (2023-03-15)" > (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) > #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? > (tc <- as.character(t)) > #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00? > ``` > In R 4.3.1: > ``` > R.version$version.string > #> [1] "R version 4.3.1 (2023-06-16)" > (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) > #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? > (tc <- as.character(t)) > #> [1] "1975-01-01" "1975-01-01 15:27:00? > ``` You should have used format() here or at least should do so now. > This has consequences when round-tripping from POSIXt -> > character -> POSIXt, Well, I'd argue that such a "round trip" is not a "good idea" anyway, as there are quite a few platform (local timezone for one) issues, and precision is lost, notably for POSIXlt which may be more precise than you typically get, etc. > since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component: Well, there *is* no as.POSIXct.character() {but we understand what you mean}: If you look at the help page you'd see that there's as.POSIXlt.character() {which is called from as.POSIXct.default()} with a 3rd argument 'format' and a 4th argument 'tryFormats' {and a lot more information -- the whole topic is far from trivial}. Now, indirectly you would want R to be "smart", i.e. the as.POSIXlt.character() method "guess better" about what the user wants. ... ... and I agree that is not an unreasonable expectation, e.g., for your example of wanting c("1975-01-01", "1975-01-01 15:27:00") to "work". as.POSIXlt.character() is well documented to be trying all of the `tryFormats` in order, until it finds one that works for all vector components (or fail / use NA if none works); and here it's only a format which drops the time that works for all (i.e. both, in the example). { Even though its behavior is well documented, one could even argue that by default you'd want a warning in such a case where "so much" is lost. I think however that introducing such a warning may trip too much current code relying .. also, the extra *checking* maybe somewhat costly .. (?) .... anyway that's an interesting side topic } Instead what you want here is for each string (element of the character vector) to try the `tryFormats and using the best available *individually* {smart R users ==> "think lapply(.)"} : Currently, this would be "something like" unlist(lapply(x, as.POSIXlt)) well, and then you need to jump a hoop additionally. If you want POSIXct, like this : .POSIXct(unlist(lapply( * , as.POSIXct)))) For your example ch <- c("1975-01-01", "1975-01-01 15:27:00")> str(.POSIXct(unlist(lapply(ch, as.POSIXct))))POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00" --- After all that, yes, I agree that we should consider making this much easier. E.g., by adding an optional argument to as.POSIXlt.character() say, `each` with default FALSE such that as.POSIXlt(*, each=TRUE) {and also as.POSIXct(*, each=TRUE) } would follow the above strategy. ? Martin -- Martin Maechler ETH Zurich and R Core tam