thr3ads.net - R devel - [Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time [Aug 2023]

If this information is useful, please help other people find it:
Share via:

Tim Taylor

2023-Aug-14 11:26 UTC

[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

Martin,

Thank you. Everything you have written is helpful and I admit I am likely guilty
of using as.character() instead of format() in the past().

Ignoring the above though, one thing I?m still unclear on is the special
handling of zero (or rather non-zero time) seconds in the method. Is the
motivation that as.character() outputs the minimum necessary information? It is
clearly a very deliberate choice but the reasoning is still going a little over
my head.

Best

Tim
> On 14 Aug 2023, at 09:52, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
> 
> ?
>> 
>>>>>> Andy Teucher 
>>>>>>    on Fri, 11 Aug 2023 16:07:36 -0700 writes:
> 
>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3
(https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b),
and I have come across a new behaviour and I wonder if it is unintended?
> 
> Well, as the NEWS entry says
> (partly visible in the url above -- which only shows one part of
>  the several changes for R 4.3) :
> 
>    ? as.character(<POSIXt>) now behaves more in line with the methods
>      for atomic vectors such as numbers, and is no longer influenced
>      by options().  Ditto for as.character(<Date>).  The
>      as.character() method gets arguments digits and OutDec with
>      defaults _not_ depending on options().  Use of as.character(*,
>      format = .) now warns.
> 
> It was "inconsistent" to have  as.character(.) basically use
format(.) for
> these datatime objects.
> as.character(x) for basic R types such as numbers, strings, logicals,... 
> fulfills the important property
> 
>     as.character(x)[j] === as.character(x[j])
> 
> whereas that is very much different for format() where indeed,
> the formatting  of  x[1]  may quite a bit depend on the other
> x[j]'s values:
> 
>> as.character(c(1, pi, pi/2^20))
> [1] "1"    "3.14159265358979"  
"2.99605622633914e-06"
> 
>> format(c(1, pi, pi/2^20))
> [1] "1.000000e+00" "3.141593e+00"
"2.996056e-06"
>> format(c(1, pi))
> [1] "1.000000" "3.141593"
>> format(c(1, 10))
> [1] " 1" "10"
>> 
> 
> 
>> When calling `as.character.POSIXt()` on a vector that contains elements
where the time component is midnight (00:00:00), it drops the time component of
that element in the resulting character vector. Previously the time component
was retained:
> 
>> In R 4.2.3:
> 
>> ```
>> R.version$version.string
>> #> [1] "R version 4.2.3 (2023-03-15)"
> 
>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00
PST?
> 
>> (tc <- as.character(t))
>> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
>> ```
> 
>> In R 4.3.1:
> 
>> ```
>> R.version$version.string
>> #> [1] "R version 4.3.1 (2023-06-16)"
> 
>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01
15:27:00")))
>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00
PST?
> 
>> (tc <- as.character(t))
>> #> [1] "1975-01-01" "1975-01-01 15:27:00?
>> ```
> 
> You should have used format()  here  or at least should do so now.
> 
>> This has consequences when round-tripping from POSIXt ->
>> character -> POSIXt,
> 
> Well, I'd argue that such a "round trip" is not a "good
idea"
> anyway, as there are quite a few platform (local timezone for
> one) issues, and precision is lost, notably for POSIXlt which
> may be more precise than you typically get, etc. 
> 
>> since `as.POSIXct.character()` drops the time component from the entire
vector if any element does not have a time component:
> 
> Well, there *is* no as.POSIXct.character()  {but we understand what you
mean}:
> If you look at the help page you'd see that there's 
as.POSIXlt.character()
> {which is called from as.POSIXct.default()}
> with a 3rd argument 'format' and a 4th argument
'tryFormats'
> {and a lot more information -- the whole topic is far from trivial}.
> 
> Now, indirectly you would want R to be "smart", i.e. the
> as.POSIXlt.character() method "guess better" about what the
> user wants. ...
> ... and I agree that is not an unreasonable expectation, e.g.,
> for your example of wanting 
> 
>    c("1975-01-01", "1975-01-01 15:27:00")
> 
> to  "work".
> 
> as.POSIXlt.character() is well documented to be trying all of
> the `tryFormats` in order, until it finds one that works for all
> vector components (or fail / use NA if none works);
> and here it's only a format which drops the time that works for
> all (i.e. both, in the example).
> 
> { Even though its behavior is well documented,
>  one could even argue that by default you'd want a warning in
>  such a case where "so much" is lost.
>  I think however that introducing such a warning  may trip too
>  much current code relying .. also, the extra *checking* maybe
>  somewhat costly .. (?)  .... anyway that's an interesting side topic
> }
> 
> Instead what you want here is for each string (element of the
> character vector) to try the `tryFormats and using the best
> available *individually*  {smart R users ==> "think
lapply(.)"} :
> Currently, this would be  "something like"  unlist(lapply(x,
as.POSIXlt))
> well, and then you need to jump a hoop additionally.
> If you want POSIXct,  like this :
> 
>   .POSIXct(unlist(lapply( * , as.POSIXct))))
> 
> For your example
> 
>  ch <- c("1975-01-01", "1975-01-01 15:27:00")
> 
>> str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
> POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01
15:27:00"
> 
> ---
> 
> After all that, yes, I agree that we should consider making
> this much easier. E.g.,  by adding an optional argument to
> as.POSIXlt.character()   say, `each` with default FALSE such
> that as.POSIXlt(*,  each=TRUE)
> {and also as.POSIXct(*,  each=TRUE) } would follow the above
> strategy.
> 
> ?
> 
> Martin
> 
> --
> Martin Maechler
> ETH Zurich   and   R Core tam
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Martin Maechler

2023-Aug-15 07:58 UTC

head link

[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

>>>>> Tim Taylor 
>>>>>     on Mon, 14 Aug 2023 12:26:51 +0100 writes:
    > Martin,
    > Thank you. Everything you have written is helpful and I admit I am
likely guilty of using as.character() instead of format() in the past().

    > Ignoring the above though, one thing I?m still unclear on is the
special handling of zero (or rather non-zero time) seconds in the method. Is the
motivation that as.character() outputs the minimum necessary information? It is
clearly a very deliberate choice but the reasoning is still going a little over
my head.

    > Best
    > Tim

Hmm, I really don't understand what you don't understand.
Here's some annotated R code exemplifying that indeed now,
    as.character(x)[j] === as.character(x[j])
but previously that was not fulfilled  {when  as.character() was
the same as format() for POSIXct or POSIXlt}:

##-----------------------------------------------------------------------------
x0 <- c("1975-01-01 00:00:00", "1975-01-01 15:27:00")
t0 <- as.POSIXct(x0)
str(t0) #  POSIXct[1:2], format: "1975-01-01 00:00:00"
"1975-01-01 15:27:00"
t0    #  "1975-01-01 00:00:00 CET" "1975-01-01 15:27:00 CET"
t0[1] #  "1975-01-01 CET" <-- yes, *no* 00:00:00   in no version of
R

## In R <= 4.2.x  as.character() was using format() for POSIX{ct,lt} :
as.character(t0)    # "1975-01-01 00:00:00" "1975-01-01
15:27:00" << for R <= 4.2.x
as.character(t0)    # "1975-01-01"          "1975-01-01
15:27:00" << for R >= 4.3.0
as.character(t0[1]) # "1975-01-01"  {in all versions of R}


Note that indeed   as.character()  does drop redundant trailing 0s :

  > as.character(c(0.5, 0.75, pi))
  [1] "0.5"              "0.75"            
"3.14159265358979"

whereas format() does not (ensuring resulting strings of the same nchar(.)):

  > format(      c(0.5, 0.75, pi))
  [1] "0.500000" "0.750000" "3.141593"



    >> On 14 Aug 2023, at 09:52, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
    >> 
    >> ?
    >>> 
    >>>>>>> Andy Teucher 
    >>>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes:
    >> 
    >>> I understand that `as.character.POSIXt()` had an overhaul in R
4.3
(https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b),
and I have come across a new behaviour and I wonder if it is unintended?
    >> 
    >> Well, as the NEWS entry says
    >> (partly visible in the url above -- which only shows one part of
    >> the several changes for R 4.3) :
    >> 
    >> ? as.character(<POSIXt>) now behaves more in line with the
methods
    >> for atomic vectors such as numbers, and is no longer influenced
    >> by options().  Ditto for as.character(<Date>).  The
    >> as.character() method gets arguments digits and OutDec with
    >> defaults _not_ depending on options().  Use of as.character(*,
    >> format = .) now warns.
    >> 
    >> It was "inconsistent" to have  as.character(.) basically
use format(.) for
    >> these datatime objects.
    >> as.character(x) for basic R types such as numbers, strings,
logicals,...
    >> fulfills the important property
    >> 
    >> as.character(x)[j] === as.character(x[j])
    >> 
    >> whereas that is very much different for format() where indeed,
    >> the formatting  of  x[1]  may quite a bit depend on the other
    >> x[j]'s values:
    >> 
    >>> as.character(c(1, pi, pi/2^20))
    >> [1] "1"    "3.14159265358979"  
"2.99605622633914e-06"
    >> 
    >>> format(c(1, pi, pi/2^20))
    >> [1] "1.000000e+00" "3.141593e+00"
"2.996056e-06"
    >>> format(c(1, pi))
    >> [1] "1.000000" "3.141593"
    >>> format(c(1, 10))
    >> [1] " 1" "10"
    >>> 
    >> 
    >> 
    >>> When calling `as.character.POSIXt()` on a vector that contains
elements where the time component is midnight (00:00:00), it drops the time
component of that element in the resulting character vector. Previously the time
component was retained:
    >> 
    >>> In R 4.2.3:
    >> 
    >>> ```
    >>> R.version$version.string
    >>> #> [1] "R version 4.2.3 (2023-03-15)"
    >> 
    >>> (t <- as.POSIXct(c("1975-01-01 00:00:00",
"1975-01-01 15:27:00")))
    >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01
15:27:00 PST?
    >> 
    >>> (tc <- as.character(t))
    >>> #> [1] "1975-01-01 00:00:00" "1975-01-01
15:27:00?
    >>> ```
    >> 
    >>> In R 4.3.1:
    >> 
    >>> ```
    >>> R.version$version.string
    >>> #> [1] "R version 4.3.1 (2023-06-16)"
    >> 
    >>> (t <- as.POSIXct(c("1975-01-01 00:00:00",
"1975-01-01 15:27:00")))
    >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01
15:27:00 PST?
    >> 
    >>> (tc <- as.character(t))
    >>> #> [1] "1975-01-01" "1975-01-01 15:27:00?
    >>> ```
    >> 
    >> You should have used format()  here  or at least should do so now.
    >> 
    >>> This has consequences when round-tripping from POSIXt ->
    >>> character -> POSIXt,
    >> 
    >> Well, I'd argue that such a "round trip" is not a
"good idea"
    >> anyway, as there are quite a few platform (local timezone for
    >> one) issues, and precision is lost, notably for POSIXlt which
    >> may be more precise than you typically get, etc. 
    >> 
    >>> since `as.POSIXct.character()` drops the time component from
the entire vector if any element does not have a time component:
    >> 
    >> Well, there *is* no as.POSIXct.character()  {but we understand what
you mean}:
    >> If you look at the help page you'd see that there's 
as.POSIXlt.character()
    >> {which is called from as.POSIXct.default()}
    >> with a 3rd argument 'format' and a 4th argument
'tryFormats'
    >> {and a lot more information -- the whole topic is far from
trivial}.
    >> 
    >> Now, indirectly you would want R to be "smart", i.e. the
    >> as.POSIXlt.character() method "guess better" about what
the
    >> user wants. ...
    >> ... and I agree that is not an unreasonable expectation, e.g.,
    >> for your example of wanting 
    >> 
    >> c("1975-01-01", "1975-01-01 15:27:00")
    >> 
    >> to  "work".
    >> 
    >> as.POSIXlt.character() is well documented to be trying all of
    >> the `tryFormats` in order, until it finds one that works for all
    >> vector components (or fail / use NA if none works);
    >> and here it's only a format which drops the time that works for
    >> all (i.e. both, in the example).
    >> 
    >> { Even though its behavior is well documented,
    >> one could even argue that by default you'd want a warning in
    >> such a case where "so much" is lost.
    >> I think however that introducing such a warning  may trip too
    >> much current code relying .. also, the extra *checking* maybe
    >> somewhat costly .. (?)  .... anyway that's an interesting side
topic
    >> }
    >> 
    >> Instead what you want here is for each string (element of the
    >> character vector) to try the `tryFormats and using the best
    >> available *individually*  {smart R users ==> "think
lapply(.)"} :
    >> Currently, this would be  "something like" 
unlist(lapply(x, as.POSIXlt))
    >> well, and then you need to jump a hoop additionally.
    >> If you want POSIXct,  like this :
    >> 
    >> .POSIXct(unlist(lapply( * , as.POSIXct))))
    >> 
    >> For your example
    >> 
    >> ch <- c("1975-01-01", "1975-01-01 15:27:00")
    >> 
    >>> str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
    >> POSIXct[1:2], format: "1975-01-01 00:00:00"
"1975-01-01 15:27:00"
    >> 
    >> ---
    >> 
    >> After all that, yes, I agree that we should consider making
    >> this much easier. E.g.,  by adding an optional argument to
    >> as.POSIXlt.character()   say, `each` with default FALSE such
    >> that as.POSIXlt(*,  each=TRUE)
    >> {and also as.POSIXct(*,  each=TRUE) } would follow the above
    >> strategy.
    >> 
    >> ?
    >> 
    >> Martin
    >> 
    >> --
    >> Martin Maechler
    >> ETH Zurich   and   R Core tam
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

R devel - Aug 2023 - R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time