Tim Taylor
2023-Aug-14 11:26 UTC
[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time
Martin, Thank you. Everything you have written is helpful and I admit I am likely guilty of using as.character() instead of format() in the past(). Ignoring the above though, one thing I?m still unclear on is the special handling of zero (or rather non-zero time) seconds in the method. Is the motivation that as.character() outputs the minimum necessary information? It is clearly a very deliberate choice but the reasoning is still going a little over my head. Best Tim> On 14 Aug 2023, at 09:52, Martin Maechler <maechler at stat.math.ethz.ch> wrote: > > ? >> >>>>>> Andy Teucher >>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes: > >> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended? > > Well, as the NEWS entry says > (partly visible in the url above -- which only shows one part of > the several changes for R 4.3) : > > ? as.character(<POSIXt>) now behaves more in line with the methods > for atomic vectors such as numbers, and is no longer influenced > by options(). Ditto for as.character(<Date>). The > as.character() method gets arguments digits and OutDec with > defaults _not_ depending on options(). Use of as.character(*, > format = .) now warns. > > It was "inconsistent" to have as.character(.) basically use format(.) for > these datatime objects. > as.character(x) for basic R types such as numbers, strings, logicals,... > fulfills the important property > > as.character(x)[j] === as.character(x[j]) > > whereas that is very much different for format() where indeed, > the formatting of x[1] may quite a bit depend on the other > x[j]'s values: > >> as.character(c(1, pi, pi/2^20)) > [1] "1" "3.14159265358979" "2.99605622633914e-06" > >> format(c(1, pi, pi/2^20)) > [1] "1.000000e+00" "3.141593e+00" "2.996056e-06" >> format(c(1, pi)) > [1] "1.000000" "3.141593" >> format(c(1, 10)) > [1] " 1" "10" >> > > >> When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained: > >> In R 4.2.3: > >> ``` >> R.version$version.string >> #> [1] "R version 4.2.3 (2023-03-15)" > >> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) >> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? > >> (tc <- as.character(t)) >> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00? >> ``` > >> In R 4.3.1: > >> ``` >> R.version$version.string >> #> [1] "R version 4.3.1 (2023-06-16)" > >> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) >> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? > >> (tc <- as.character(t)) >> #> [1] "1975-01-01" "1975-01-01 15:27:00? >> ``` > > You should have used format() here or at least should do so now. > >> This has consequences when round-tripping from POSIXt -> >> character -> POSIXt, > > Well, I'd argue that such a "round trip" is not a "good idea" > anyway, as there are quite a few platform (local timezone for > one) issues, and precision is lost, notably for POSIXlt which > may be more precise than you typically get, etc. > >> since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component: > > Well, there *is* no as.POSIXct.character() {but we understand what you mean}: > If you look at the help page you'd see that there's as.POSIXlt.character() > {which is called from as.POSIXct.default()} > with a 3rd argument 'format' and a 4th argument 'tryFormats' > {and a lot more information -- the whole topic is far from trivial}. > > Now, indirectly you would want R to be "smart", i.e. the > as.POSIXlt.character() method "guess better" about what the > user wants. ... > ... and I agree that is not an unreasonable expectation, e.g., > for your example of wanting > > c("1975-01-01", "1975-01-01 15:27:00") > > to "work". > > as.POSIXlt.character() is well documented to be trying all of > the `tryFormats` in order, until it finds one that works for all > vector components (or fail / use NA if none works); > and here it's only a format which drops the time that works for > all (i.e. both, in the example). > > { Even though its behavior is well documented, > one could even argue that by default you'd want a warning in > such a case where "so much" is lost. > I think however that introducing such a warning may trip too > much current code relying .. also, the extra *checking* maybe > somewhat costly .. (?) .... anyway that's an interesting side topic > } > > Instead what you want here is for each string (element of the > character vector) to try the `tryFormats and using the best > available *individually* {smart R users ==> "think lapply(.)"} : > Currently, this would be "something like" unlist(lapply(x, as.POSIXlt)) > well, and then you need to jump a hoop additionally. > If you want POSIXct, like this : > > .POSIXct(unlist(lapply( * , as.POSIXct)))) > > For your example > > ch <- c("1975-01-01", "1975-01-01 15:27:00") > >> str(.POSIXct(unlist(lapply(ch, as.POSIXct)))) > POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00" > > --- > > After all that, yes, I agree that we should consider making > this much easier. E.g., by adding an optional argument to > as.POSIXlt.character() say, `each` with default FALSE such > that as.POSIXlt(*, each=TRUE) > {and also as.POSIXct(*, each=TRUE) } would follow the above > strategy. > > ? > > Martin > > -- > Martin Maechler > ETH Zurich and R Core tam > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Maechler
2023-Aug-15 07:58 UTC
[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time
>>>>> Tim Taylor >>>>> on Mon, 14 Aug 2023 12:26:51 +0100 writes:> Martin, > Thank you. Everything you have written is helpful and I admit I am likely guilty of using as.character() instead of format() in the past(). > Ignoring the above though, one thing I?m still unclear on is the special handling of zero (or rather non-zero time) seconds in the method. Is the motivation that as.character() outputs the minimum necessary information? It is clearly a very deliberate choice but the reasoning is still going a little over my head. > Best > Tim Hmm, I really don't understand what you don't understand. Here's some annotated R code exemplifying that indeed now, as.character(x)[j] === as.character(x[j]) but previously that was not fulfilled {when as.character() was the same as format() for POSIXct or POSIXlt}: ##----------------------------------------------------------------------------- x0 <- c("1975-01-01 00:00:00", "1975-01-01 15:27:00") t0 <- as.POSIXct(x0) str(t0) # POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00" t0 # "1975-01-01 00:00:00 CET" "1975-01-01 15:27:00 CET" t0[1] # "1975-01-01 CET" <-- yes, *no* 00:00:00 in no version of R ## In R <= 4.2.x as.character() was using format() for POSIX{ct,lt} : as.character(t0) # "1975-01-01 00:00:00" "1975-01-01 15:27:00" << for R <= 4.2.x as.character(t0) # "1975-01-01" "1975-01-01 15:27:00" << for R >= 4.3.0 as.character(t0[1]) # "1975-01-01" {in all versions of R} Note that indeed as.character() does drop redundant trailing 0s : > as.character(c(0.5, 0.75, pi)) [1] "0.5" "0.75" "3.14159265358979" whereas format() does not (ensuring resulting strings of the same nchar(.)): > format( c(0.5, 0.75, pi)) [1] "0.500000" "0.750000" "3.141593" >> On 14 Aug 2023, at 09:52, Martin Maechler <maechler at stat.math.ethz.ch> wrote: >> >> ? >>> >>>>>>> Andy Teucher >>>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes: >> >>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended? >> >> Well, as the NEWS entry says >> (partly visible in the url above -- which only shows one part of >> the several changes for R 4.3) : >> >> ? as.character(<POSIXt>) now behaves more in line with the methods >> for atomic vectors such as numbers, and is no longer influenced >> by options(). Ditto for as.character(<Date>). The >> as.character() method gets arguments digits and OutDec with >> defaults _not_ depending on options(). Use of as.character(*, >> format = .) now warns. >> >> It was "inconsistent" to have as.character(.) basically use format(.) for >> these datatime objects. >> as.character(x) for basic R types such as numbers, strings, logicals,... >> fulfills the important property >> >> as.character(x)[j] === as.character(x[j]) >> >> whereas that is very much different for format() where indeed, >> the formatting of x[1] may quite a bit depend on the other >> x[j]'s values: >> >>> as.character(c(1, pi, pi/2^20)) >> [1] "1" "3.14159265358979" "2.99605622633914e-06" >> >>> format(c(1, pi, pi/2^20)) >> [1] "1.000000e+00" "3.141593e+00" "2.996056e-06" >>> format(c(1, pi)) >> [1] "1.000000" "3.141593" >>> format(c(1, 10)) >> [1] " 1" "10" >>> >> >> >>> When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained: >> >>> In R 4.2.3: >> >>> ``` >>> R.version$version.string >>> #> [1] "R version 4.2.3 (2023-03-15)" >> >>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? >> >>> (tc <- as.character(t)) >>> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00? >>> ``` >> >>> In R 4.3.1: >> >>> ``` >>> R.version$version.string >>> #> [1] "R version 4.3.1 (2023-06-16)" >> >>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? >> >>> (tc <- as.character(t)) >>> #> [1] "1975-01-01" "1975-01-01 15:27:00? >>> ``` >> >> You should have used format() here or at least should do so now. >> >>> This has consequences when round-tripping from POSIXt -> >>> character -> POSIXt, >> >> Well, I'd argue that such a "round trip" is not a "good idea" >> anyway, as there are quite a few platform (local timezone for >> one) issues, and precision is lost, notably for POSIXlt which >> may be more precise than you typically get, etc. >> >>> since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component: >> >> Well, there *is* no as.POSIXct.character() {but we understand what you mean}: >> If you look at the help page you'd see that there's as.POSIXlt.character() >> {which is called from as.POSIXct.default()} >> with a 3rd argument 'format' and a 4th argument 'tryFormats' >> {and a lot more information -- the whole topic is far from trivial}. >> >> Now, indirectly you would want R to be "smart", i.e. the >> as.POSIXlt.character() method "guess better" about what the >> user wants. ... >> ... and I agree that is not an unreasonable expectation, e.g., >> for your example of wanting >> >> c("1975-01-01", "1975-01-01 15:27:00") >> >> to "work". >> >> as.POSIXlt.character() is well documented to be trying all of >> the `tryFormats` in order, until it finds one that works for all >> vector components (or fail / use NA if none works); >> and here it's only a format which drops the time that works for >> all (i.e. both, in the example). >> >> { Even though its behavior is well documented, >> one could even argue that by default you'd want a warning in >> such a case where "so much" is lost. >> I think however that introducing such a warning may trip too >> much current code relying .. also, the extra *checking* maybe >> somewhat costly .. (?) .... anyway that's an interesting side topic >> } >> >> Instead what you want here is for each string (element of the >> character vector) to try the `tryFormats and using the best >> available *individually* {smart R users ==> "think lapply(.)"} : >> Currently, this would be "something like" unlist(lapply(x, as.POSIXlt)) >> well, and then you need to jump a hoop additionally. >> If you want POSIXct, like this : >> >> .POSIXct(unlist(lapply( * , as.POSIXct)))) >> >> For your example >> >> ch <- c("1975-01-01", "1975-01-01 15:27:00") >> >>> str(.POSIXct(unlist(lapply(ch, as.POSIXct)))) >> POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00" >> >> --- >> >> After all that, yes, I agree that we should consider making >> this much easier. E.g., by adding an optional argument to >> as.POSIXlt.character() say, `each` with default FALSE such >> that as.POSIXlt(*, each=TRUE) >> {and also as.POSIXct(*, each=TRUE) } would follow the above >> strategy. >> >> ? >> >> Martin >> >> -- >> Martin Maechler >> ETH Zurich and R Core tam >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel