Suharto Anggono Suharto Anggono
2022-Oct-14 16:21 UTC
[Rd] Bug with `[<-.POSIXlt` on specific OSes
I?think?'[.POSIXlt'?and?'[<-.POSIXlt'?don't?need?to?normalize?out-of-range?values.?I?think?they?just?make?same?length?for?all?components,?to?ensure?correct?extraction?or?replacement?for?arbitrary?index. I?have?a?thought?of?adding?an?optional?argument?for?'as.POSIXlt'?applied?to?"POSIXlt"?object.?Possible?name: normalize adjust fixup To?allow?recycling?only?without?changing?content,?instead?of?TRUE?or?FALSE,?maybe?choice,?like fixup?=?c("none",?"balance",?"normalize") ,?where?"normalize"?implies?"balance",?or adjust?=?c("none",?"length",?"content",?"value") ,?where?"content"?and?"value"?are?synonymous. By?the?way,?Inf?in?'sec'?component?is?out-of-range! For?'gmtoff',?NA?or?0?should?be?put?for?unknown.?A?known?'gmtoff'?may?be?[ositive,?negative,?or?zero.?The?documentation?says ?gmtoff? (Optional.) The offset in seconds from GMT: positive values are East of the meridian. Usually ?NA? if unknown, but ?0? could mean unknown. dlt?<-?.POSIXlt(list(sec?=?c(-999,?10000?+?c(1:10,-Inf,?NA))?+?pi, ????????????????????????????????????????#?"out?of?range",?non-finite,?fractions ?????????????????????min?=?45L,?hour?=?c(21L,?3L,?NA,?4L), ?????????????????????mday?=?6L,?mon??=?c(11L,?NA,?3L), ?????????????????????year?=?116L,?wday?=?2L,?yday?=?340L,?isdst?=?1L)) as.POSIXct(dlt)[1]?is?NA?on?Linux?with?timezone?without?DST.?For?example,?after Sys.setenv(TZ?=?"EST") ---------------->>>>>?Martin?Maechler >>>>>?????on?Wed,?12?Oct?2022?10:17:28?+0200?writes:>>>>>?Kurt?Hornik >>>>>?????on?Tue,?11?Oct?2022?16:44:13?+0200?writes:>>>>>?Davis?Vaughan?writes:????>>>?I've?got?a?bit?more?information?about?this?one.?It?seems?like?it ????>>>?(only??not?sure)?appears?when?`TZ?=?"UTC"`,?which?is?why?I?didn't?see ????>>>?it?before?on?my?Mac,?which?defaults?to?`TZ?=?""`.?I?think?this?is?at ????>>>?least?explainable?by?the?fact?that?those?"optional"?fields?aren't ????>>>?technically?needed?when?the?time?zone?is?UTC. ????>>?Exactly.??Debugging?`[<-.POSIlt`?with ????>>?x?<-?as.POSIXlt(as.POSIXct("2013-01-31",?tz?=?"America/Chicago")) ????>>?Sys.setenv(TZ?=?"UTC") ????>>?x[1]?<-?NA ????>>?shows?we?get?into ????>>?value?<-?unclass(as.POSIXlt(value)) ????>>?if?(ici)?{ ????>>?for?(n?in?names(x))?names(x[[n]])?<-?nms ????>>?} ????>>?for?(n?in?names(x))?x[[n]][i]?<-?value[[n]] ????>>?where ????>>?Browse[2]>?names(value) ????>>?[1]?"sec"???"min"???"hour"??"mday"??"mon"???"year"??"wday"??"yday"??"isdst" ????>>?Browse[2]>?names(x) ????>>?[1]?"sec"????"min"????"hour"???"mday"???"mon"????"year"???"wday"???"yday" ????>>?[9]?"isdst"??"zone"???"gmtoff" ????>>?Without?having?looked?at?the?code,?the?docs?say ????>>??zone??(Optional.)?The?abbreviation?for?the?time?zone?in?force?at ????>>?that?time:??""??if?unknown?(but??""??might?also?be?used?for ????>>?UTC). ????>>??gmtoff??(Optional.)?The?offset?in?seconds?from?GMT:?positive ????>>?values?are?East?of?the?meridian.??Usually??NA??if?unknown, ????>>?but??0??could?mean?unknown. ????>>?so?perhaps?we?should?fill?with?the?values?for?the?unknown?case? ????>>?-k ????>?Well, ????>?I?think?you?both?know??I'm?in?the?midst?of?dealing?with?these ????>?issues,?to?fix?both ????>?[.POSIXlt??and ????>?[<-.POSIXlt ????>?Yes,?one?needs?a?way?to?not?only?"fill"?the?partially?filled ????>?entries?but?also?to?*normalize*?out-of-range?values ????>?(say?negative?seconds,?minutes?>?60,?etc) ????>?All?this?is?available?in?our?C?code,?but?not?on?the?R?level, ????>?so?yesterday,?I?wrote?a?C?function?to?be?called?via?.Internal(.) ????>?from?a?new?R?that?provides?this. ????>?Provisionally?called ????>?balancePOSIXlt() ????>?because?it?both?balances?the?9?to?11?list-components?of?POSIXlt ????>?and?it?also?puts?all?numbers?of?(sec,?min,?hour,?mday,?mon) ????>?into?a?correct?range?(and?also?computes?correctl?wday?and?yday?numbers). ????>?but?I'm?happy?for?proposals?of?better?names. ????>?I?had?contemplated??validatePOSIXlt()?as?alternative,?but?then ????>?dismissed?that?as?in?some?sense?we?now?do?agree?that ????>?"imbalanced"?POSIXlt's?are?not?really?invalid?.. ????>?..?and?yes,?to?Davis:??Even?though?I've?spent?so?many?hours?with ????>?POSIXlt,?POSIXct?and?Date?during?the?last?week,?I'm?still ????>?surprised?more?often?than?I?like?by?the?effects?of?timezone ????>?settings?there. ????>?Martin I?have?committed?the?new?R?and?C?code?now,?defining??balancePOSIXlt(), to?get?feedback?from?the?community. I've?extended?the?documentation?in??help(DateTimeClasses), and?notably?factored?out?the?description of??POSIXlt??mentioning?the??"ragged"?and?"out-of-range"?cases. This?needs?more?testing?and?experiments,?and?I?have?not announced?it??NEWS??yet. Planned?next?is?to?use?it?in??[.POSIXlt?and?[<-.POSIXlt so?they?will?work?correctly. But?please?share?your?thoughts,?propositions,?... Martin [snip]
>>>>> Suharto Anggono Suharto Anggono via R-devel >>>>> on Fri, 14 Oct 2022 16:21:14 +0000 (UTC) writes:> I think '[.POSIXlt' and '[<-.POSIXlt' don't need to > normalize out-of-range values. I think they just make same > length for all components, to ensure correct extraction or > replacement for arbitrary index. Yes, you are right; this is definitely correct... and would be more efficient. At the moment, we were mostly focused on *correct* behaviour in the case of "ragged" and/or out-of-range POSIXlt objects. > I have a thought of adding an optional argument for 'as.POSIXlt' applied to "POSIXlt" object. Possible name: > normalize adjust fixup > To allow recycling only without changing content, instead of TRUE or FALSE, maybe choice, like > fixup = c("none", "balance", "normalize") > , where "normalize" implies "balance", or > adjust = c("none", "length", "content", "value") > , where "content" and "value" are synonymous. Such an optional argument for as.POSIXlt() would be a possibility and could replace the new and for now still somewhat experimental balancePOSIXlt(). +: One advantage of (one of the above proposals) would be that it does not take up a new function name. -: OTOH, it may be overdoing the semantics as.POSIXlt(<POSIXlt>, <some> = <other>) and it may be harder to understand by non-sophisticated R users, because as.POSIXlt() is a generic with several methods, and these extra arguments would probably only apply to the as.POSIXlt.default() method and there *only* for the case where the argument inherits from "POSIXlt" .. and all that being somewhat subtle to see for Joe Average UseR I agree that it will make sense to get an R-level version, either using new arguments in as.POSIXlt() or (still my preference) in balancePOSIXlt() to allow to "only fill all components". HOWEVER note that the "filling" (by recycling) and no extra checking will often lead to internally inconsistent lt objects. Eg. Daylight saving time (isdst = 1 or not) can only be known when the day (and hour) is known and that can be shifted by out-of-range sec/min/hour .. ((and of course for 1 hour per year, a time hour=2 will *need* specification of isdst in order to know which of the 2:<min>:<sec> is meant)) also $wday and $yday (who are described as read-only) also can only be checked after validation or "in-ranging" of the sec/min/hour/mday/mon components so their simple recycling will typically be incorrect. That's why I had opted to *mainly* do full "balancing" (in my sense), i.e., simultaneous both filling and "in-ranging". > By the way, Inf in 'sec' component is out-of-range! Yes, the non-finite "values" {+/-Inf, NaN, NA} are all "special", and we had decided to allow them for compatibility with classes "Date" and "POSIXct". BTW, a few days ago, I have updated the help("DateTimeClasses") page in R-devel to document a bit more, notably that "ragged" and out-of-range POSIXlt may exist... see (the always +- current R-devel Help pages at) https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html > For 'gmtoff', NA or 0 should be put for unknown. A known 'gmtoff' may be [ositive, negative, or zero. The documentation says > ?gmtoff? (Optional.) The offset in seconds from GMT: > positive values are East of the meridian. Usually ?NA? if > unknown, but ?0? could mean unknown. > dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf, NA)) + pi, > # "out of range", non-finite, fractions > min = 45L, hour = c(21L, 3L, NA, 4L), > mday = 6L, mon = c(11L, NA, 3L), > year = 116L, wday = 2L, yday = 340L, isdst = 1L)) > as.POSIXct(dlt)[1] is NA on Linux with timezone without DST. For example, after > Sys.setenv(TZ = "EST") Hmm... I needed time to look at the above. Indeed, one gets NA (and has in previous versions of R) in such a case. After applying balancePOSIXlt(), one no longer gets NA. Are you proposing that we should do that (or possibly simple recycling) in as.POSIXct.POSIXlt() ? Martin > ---------------- >>>>>> Martin Maechler >>>>>> on Wed, 12 Oct 2022 10:17:28 +0200 writes: >>>>>> Kurt Hornik >>>>>> on Tue, 11 Oct 2022 16:44:13 +0200 writes: >>>>>> Davis Vaughan writes: [.............]