thr3ads.net - R devel - [Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time [Aug 2023]

If this information is useful, please help other people find it:
Share via:

Andy Teucher

2023-Aug-11 23:07 UTC

[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

I understand that `as.character.POSIXt()` had an overhaul in R 4.3
(https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b),
and I have come across a new behaviour and I wonder if it is unintended?

When calling `as.character.POSIXt()` on a vector that contains elements where
the time component is midnight (00:00:00), it drops the time component of that
element in the resulting character vector. Previously the time component was
retained:

In R 4.2.3:

```
R.version$version.string
#> [1] "R version 4.2.3 (2023-03-15)"

(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?

(tc <- as.character(t))
#> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
```

In R 4.3.1:

```
R.version$version.string
#> [1] "R version 4.3.1 (2023-06-16)"

(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?

(tc <- as.character(t))
#> [1] "1975-01-01" "1975-01-01 15:27:00?
```

This has consequences when round-tripping from POSIXt -> character ->
POSIXt, since `as.POSIXct.character()` drops the time component from the entire
vector if any element does not have a time component:

In R 4.2.3:

```
R.version$version.string
#> [1] "R version 4.2.3 (2023-03-15)"

(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?

(tc <- as.character(t))
#> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?

as.POSIXct(tc)
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
```

In R 4.3.1:

```
R.version$version.string
#> [1] "R version 4.3.1 (2023-06-16)?

(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?

(tc <- as.character(t))
#> [1] "1975-01-01" "1975-01-01 15:27:00?

as.POSIXct(tc)
#> [1] "1975-01-01 PST" "1975-01-01 PST?
```

`format.POSIXt()` retains its old behaviour in R 4.3:

```
R.version$version.string
#> [1] "R version 4.2.3 (2023-03-15)"

(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?

(tf <- format(t))
#> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?

as.POSIXct(tf)
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
```

```
R.version$version.string
#> [1] "R version 4.3.1 (2023-06-16)"

(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?

(tf <- format(t))
#> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?

as.POSIXct(tf)
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
```

And finally, the behaviour of `as.POSIXct.character()` has not changed (it
previously did, and still does, drop the time component from all elements when
any element has no time):

```R.version$version.string
#> [1] "R version 4.2.3 (2023-03-15)"

as.POSIXct(c("1975-01-01", "1975-01-01 15:27:00"))
#> [1] "1975-01-01 PST" "1975-01-01 PST?
```

```R.version$version.string
#> [1] "R version 4.3.1 (2023-06-16)"

as.POSIXct(c("1975-01-01", "1975-01-01 15:27:00"))
#> [1] "1975-01-01 PST" "1975-01-01 PST?
```

I don?t know if this is a bug/regression in `as.character.POSIXt()`, or intended
behaviour. If it is intended, I think it would benefit from some more
comprehensive documentation.

Thanks very much,
Andy Teucher

Martin Maechler

2023-Aug-14 08:52 UTC

head link

[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

>>>>> Andy Teucher 
>>>>>     on Fri, 11 Aug 2023 16:07:36 -0700 writes:
    > I understand that `as.character.POSIXt()` had an overhaul in R 4.3
(https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b),
and I have come across a new behaviour and I wonder if it is unintended?

Well, as the NEWS entry says
 (partly visible in the url above -- which only shows one part of
  the several changes for R 4.3) :

    ? as.character(<POSIXt>) now behaves more in line with the methods
      for atomic vectors such as numbers, and is no longer influenced
      by options().  Ditto for as.character(<Date>).  The
      as.character() method gets arguments digits and OutDec with
      defaults _not_ depending on options().  Use of as.character(*,
      format = .) now warns.

It was "inconsistent" to have  as.character(.) basically use format(.)
for
these datatime objects.
as.character(x) for basic R types such as numbers, strings, logicals,... 
fulfills the important property

	 as.character(x)[j] === as.character(x[j])

whereas that is very much different for format() where indeed,
the formatting  of  x[1]  may quite a bit depend on the other
x[j]'s values:
> as.character(c(1, pi, pi/2^20))[1] "1"    "3.14159265358979"  
"2.99605622633914e-06"
> format(c(1, pi, pi/2^20))[1] "1.000000e+00" "3.141593e+00"
"2.996056e-06"> format(c(1, pi))
[1] "1.000000" "3.141593"> format(c(1, 10))
[1] " 1" "10"> 

    > When calling `as.character.POSIXt()` on a vector that contains elements
where the time component is midnight (00:00:00), it drops the time component of
that element in the resulting character vector. Previously the time component
was retained:

    > In R 4.2.3:

    > ```
    > R.version$version.string
    > #> [1] "R version 4.2.3 (2023-03-15)"

    > (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
    > #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00
PST?

    > (tc <- as.character(t))
    > #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
    > ```

    > In R 4.3.1:

    > ```
    > R.version$version.string
    > #> [1] "R version 4.3.1 (2023-06-16)"

    > (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
    > #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00
PST?

    > (tc <- as.character(t))
    > #> [1] "1975-01-01" "1975-01-01 15:27:00?
    > ```

You should have used format()  here  or at least should do so now.

    > This has consequences when round-tripping from POSIXt ->
    > character -> POSIXt,

Well, I'd argue that such a "round trip" is not a "good
idea"
anyway, as there are quite a few platform (local timezone for
one) issues, and precision is lost, notably for POSIXlt which
may be more precise than you typically get, etc. 

    > since `as.POSIXct.character()` drops the time component from the entire
vector if any element does not have a time component:

Well, there *is* no as.POSIXct.character()  {but we understand what you mean}: 
If you look at the help page you'd see that there's 
as.POSIXlt.character()
{which is called from as.POSIXct.default()}
with a 3rd argument 'format' and a 4th argument 'tryFormats'
{and a lot more information -- the whole topic is far from trivial}.

Now, indirectly you would want R to be "smart", i.e. the
as.POSIXlt.character() method "guess better" about what the
user wants. ...
... and I agree that is not an unreasonable expectation, e.g.,
for your example of wanting 

    c("1975-01-01", "1975-01-01 15:27:00")

to  "work".

as.POSIXlt.character() is well documented to be trying all of
the `tryFormats` in order, until it finds one that works for all
vector components (or fail / use NA if none works);
and here it's only a format which drops the time that works for
all (i.e. both, in the example).

{ Even though its behavior is well documented,
  one could even argue that by default you'd want a warning in
  such a case where "so much" is lost.
  I think however that introducing such a warning  may trip too
  much current code relying .. also, the extra *checking* maybe
  somewhat costly .. (?)  .... anyway that's an interesting side topic
}

Instead what you want here is for each string (element of the
character vector) to try the `tryFormats and using the best
available *individually*  {smart R users ==> "think lapply(.)"} :
Currently, this would be  "something like"  unlist(lapply(x,
as.POSIXlt))
well, and then you need to jump a hoop additionally.
If you want POSIXct,  like this :

   .POSIXct(unlist(lapply( * , as.POSIXct))))

For your example

  ch <- c("1975-01-01", "1975-01-01 15:27:00")
> str(.POSIXct(unlist(lapply(ch, as.POSIXct)))) POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01
15:27:00"

---

After all that, yes, I agree that we should consider making
this much easier. E.g.,  by adding an optional argument to
as.POSIXlt.character()   say, `each` with default FALSE such
that as.POSIXlt(*,  each=TRUE)
{and also as.POSIXct(*,  each=TRUE) } would follow the above
strategy.

?

Martin

--
Martin Maechler
ETH Zurich   and   R Core tam

R devel - Aug 2023 - R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time