Gabe Becker
2018-Jun-11 21:59 UTC
[Rd] Date class shows Inf as NA; this confuses the use of is.na()
Emil et al., On Mon, Jun 11, 2018 at 1:08 AM, Emil Bode <emil.bode at dans.knaw.nl> wrote:> I don't think there's much wrong with is.na(as_date(Inf, > origin='1970-01-01'))==FALSE, as there still is some "non-NA-ness" about > the value (as difftime shows), but that the output when printing is > confusing. The way cat is treating it is clearer: it does print Inf. > > So would this be a solution? > > format.Date <- function (x, ...) > { > xx <- format(as.POSIXlt(x), ...) > names(xx) <- names(x) > xx[is.na(xx) & !is.na(x)] <- paste('Invalid date:',as.numeric(x[is.na(xx) > & !is.na(x)])) > xx > } > > Which causes this behaviour, which I think is clearer: > > environment(print.Date) <- .GlobalEnv > x <- as_date(Inf, origin='1970-01-01') > print(x) > # [1] "Invalid date: Inf" >In my opinion, it's either invalid or it isn't. If it's actually invalid, as_date (and the equivalent core function which is actually relevant on this list) should fail; because it's an invalid date. If it *isn't* invalid, having the print method tell users it is seems problematic. And I think people seem to be leaning towards it not being invalid. A bit surprising to me, as my personal first thought was that infinite dates don't make any sense, but I don't really have a horse in this race and so defer to the cooler heads that are saying having an infinite date perhaps should not be disallowed explicitly. If it's not, though, it's not invalid and we shouldn't confuse users by saying it is, imho. Best, ~G> > Best regards, > Emil Bode > > Data-analyst > > +31 6 43 83 89 33 > emil.bode at dans.knaw.nl > > DANS: Netherlands Institute for Permanent Access to Digital Research > Resources > Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | > info at dans.knaw.nl <mailto:info at dans.kn> | dans.knaw.nl > <applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl> > DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and > funding organisation NWO <http://www.nwo.nl/>. > > Who will be the winner of the Dutch Data Prize 2018? Go to researchdata.nl > to nominate. > > ?On 09/06/2018, 13:52, "R-devel on behalf of Joris Meys" < > r-devel-bounces at r-project.org on behalf of jorismeys at gmail.com> wrote: > > And now I've seen I copied the wrong part of ?is.na > > > The default method for is.na applied to an atomic vector returns a > logical vector of the same length as its argument x, containing TRUE > for > those elements marked NA or, for numeric or complex vectors, NaN, and > FALSE > otherwise. > > Key point being "atomic vector" here. > > > On Sat, Jun 9, 2018 at 1:41 PM, Joris Meys <jorismeys at gmail.com> > wrote: > > > Hi Werner, > > > > on ?is.na it says: > > > > > The default method for anyNA handles atomic vectors without a > class and > > NULL. > > > > I hear you, and it is confusing to say the least. Looking deeper, the > > culprit seems to be in the conversion of a Date to POSIXlt prior to > the > > formatting: > > > > > x <- as.Date(Inf,origin = '1970-01-01') > > > is.na(as.POSIXlt(x)) > > [1] TRUE > > > > Given this implicit conversion, I'd argue that as.Date should really > > return NA as well when passed an infinite value. The other option is > to > > provide an is.na method for the Date class, which is -given is.na > is an > > internal generic- rather trivial: > > > > > is.na.Date <- function(x) is.na(as.POSIXlt(x)) > > > is.na(x) > > [1] TRUE > > > > This might be a workaround for your current problem without needing > > changes to R itself. But this will give a "wrong" answer in the > sense that > > this still works: > > > > > Sys.Date() - x > > Time difference of -Inf days > > > > I personally would go for NA as the "correct" date for an infinite > value, > > but given that this will have implications in other areas, there is a > > possibility of breaking code and it should be investigated a bit > further > > imho. > > Cheers > > Joris > > > > > > > > > > On Fri, Jun 8, 2018 at 11:21 PM, Werner Grundlingh < > wgrundlingh at gmail.com> > > wrote: > > > >> Indeed. as_date is from lubridate, but the same holds for as.Date. > >> > >> The output and it's interpretation should be consistent, otherwise > it > >> leads > >> to confusion when programming. I understand that the difference > exists > >> after asking a question on Stack Overflow: > >> https://stackoverflow.com/q/50766089/914686 > >> This understanding is never mentioned in the documentation - that > an Inf > >> date is actually represented as NA: > >> https://www.rdocumentation.org/packages/base/versions/3.5.0/ > >> topics/as.Date > >> So I'm of the impression that the display should be fixed as a first > >> option > >> (thereby providing clarity/transparency in terms of back-end and > output), > >> or the documentation amended (to highlight this) as a second option. > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-devel at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > > > > > -- > > Joris Meys > > Statistical consultant > > > > Department of Data Analysis and Mathematical Modelling > > Ghent University > > Coupure Links 653, B-9000 Gent (Belgium) > > > > <https://maps.google.com/?q=Coupure+links+653,%C2%A0B- > 9000+Gent,%C2%A0Belgium&entry=gmail&source=g> > > > > ----------- > > Biowiskundedagen 2017-2018 > > http://www.biowiskundedagen.ugent.be/ > > > > ------------------------------- > > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > > > > > -- > Joris Meys > Statistical consultant > > Department of Data Analysis and Mathematical Modelling > Ghent University > Coupure Links 653, B-9000 Gent (Belgium) > <https://maps.google.com/?q=Coupure+links+653,%C2%A0B- > 9000+Gent,%C2%A0Belgium&entry=gmail&source=g> > > ----------- > Biowiskundedagen 2017-2018 > http://www.biowiskundedagen.ugent.be/ > > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research [[alternative HTML version deleted]]
Emil Bode
2018-Jun-12 12:00 UTC
[Rd] Date class shows Inf as NA; this confuses the use of is.na()
I agree that calling it invalid is a bit confusing, but I?m not sure what the wording should be, as the problem is that the conversion to POSIXlt is failing. The best solution would be to extend the whole POSIXlt-class, but that?s too much work. I?ve done some experiments, and it also seems that the Date class can store larger values than POSIXlt:> as.Date(8e9, origin='1970-01-01')==as.Date(9e9, origin='1970-01-01')[1] FALSE> as.POSIXlt(as.Date(8e9, origin='1970-01-01'))==as.POSIXlt(as.Date(9e9, origin='1970-01-01'))[1] TRUE> as.POSIXlt(as.Date(8e9, origin='1970-01-01'))[1] "-5877641-06-23 UTC" # Same for 9e9> as.Date(8e9, origin='1970-01-01')>Sys.Date()[1] TRUE> as.POSIXlt(as.Date(8e9, origin='1970-01-01'))>as.POSIXlt(Sys.Date())[1] FALSE So the situation as I see it now: * Having an infinite date may convey some information, so we shouldn?t prohibit it anyway * Idem for very large values (positive or negative) * But we should warn users that their dates may not be neatly representable, that there is no way to use the default-print * So for values where the POSIXlt-print fails, I think it?s best to print the numerical value, along with some text warning the user So I?ve adapted the format-function a bit more, with behaviour below. The details can be adapted of course, but I feel it?s best to print some variant of as.numeric(x) if as.POSIXlt(x) turns out to be unreliable, and further leave is.na() format.Date <- function (x, ...) { xx <- format(as.POSIXlt(x), ...) names(xx) <- names(x) if(any(!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896))) { xx[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)] <- paste('Date with numerical value',as.numeric(x[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)])) warning('Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value.') } xx } With the following results:> environment(print.Date) <- .GlobalEnv > as.Date(Inf, origin='1970-01-01')[1] "Date with numerical value Inf" Warning message: In format.Date(x) : Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value. From: Gabe Becker <becker.gabe at gene.com> Date: Monday, 11 June 2018 at 23:59 To: Emil Bode <emil.bode at dans.knaw.nl> Cc: Joris Meys <jorismeys at gmail.com>, Werner Grundlingh <wgrundlingh at gmail.com>, "macqueen1 at llnl.gov" <macqueen1 at llnl.gov>, r-devel <r-devel at r-project.org> Subject: Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na() format.Date <- function (x, ...) { xx <- format(as.POSIXlt(x), ...) names(xx) <- names(x) xx[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)] <- paste('Invalid date:',as.numeric(x[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)])) xx } [[alternative HTML version deleted]]
Martin Maechler
2018-Jun-12 16:28 UTC
[Rd] Date class shows Inf as NA; this confuses the use of is.na()
>>>>> Emil Bode >>>>> on Tue, 12 Jun 2018 12:00:42 +0000 writes:> I agree that calling it invalid is a bit confusing, but I?m not sure what the wording should be, as the problem is that the conversion to POSIXlt is failing. > The best solution would be to extend the whole POSIXlt-class, but that?s too much work. > I?ve done some experiments, and it also seems that the Date class can store larger values than POSIXlt: > > as.Date(8e9, origin='1970-01-01')==as.Date(9e9, origin='1970-01-01') > [1] FALSE > > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))==as.POSIXlt(as.Date(9e9, origin='1970-01-01')) > [1] TRUE > > as.POSIXlt(as.Date(8e9, origin='1970-01-01')) > [1] "-5877641-06-23 UTC" > # Same for 9e9 > > as.Date(8e9, origin='1970-01-01')>Sys.Date() > [1] TRUE > > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))>as.POSIXlt(Sys.Date()) > [1] FALSE > > So the situation as I see it now: > > * Having an infinite date may convey some information, so > we shouldn?t prohibit it anyway> * Idem for very large values (positive or negative)Indeed -- good you found that you don't have to go all the way to Inf ... and that is typical (and the reason why one has to solve the problem anyway and way Inf is not really a special case in that sense (but nicely in another sense) !> * But we should warn users that their dates may not be neatly representable, that there is no way to use the default-print > * So for values where the POSIXlt-print fails, I think it?s best to print the numerical value, along with some text warning the user> So I?ve adapted the format-function a bit more, with behaviour below. > The details can be adapted of course, but I feel it?s best to print some variant of as.numeric(x) if as.POSIXlt(x) turns out to be unreliable, and further leave is.na()> > format.Date <- function (x, ...) > { > xx <- format(as.POSIXlt(x), ...) > names(xx) <- names(x) > if(any(!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896))) { > xx[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)] <- > paste('Date with numerical value',as.numeric(x[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)])) > warning('Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value.') > } > xx > } > > With the following results: > > > environment(print.Date) <- .GlobalEnv > > as.Date(Inf, origin='1970-01-01') > [1] "Date with numerical value Inf" > Warning message: > In format.Date(x) : > Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value. >This looks somewhat reasonable as a workaround for you and for now. However, I'd propose another route to go for "the next version of R": When I consider > str(unclass(as.POSIXlt.Date(Sys.time() + 1e50))) List of 9 $ sec : num 0 $ min : int 0 $ hour : int 0 $ mday : int 23 $ mon : int 5 $ year : int -5879541 $ wday : int 2 $ yday : int 173 $ isdst: int 0 - attr(*, "tzone")= chr "UTC" > we see the integer overflow (to negative here) and that all components but 'sec' (because allow fractions!) are integer. I think we should allow 'year' to be "double" instead, and so it could also be +Inf or -Inf and we'd nicely cover the conversions from and to 'numeric' -- which is really used internally for dates and date-times in POSIXct. Martin> > From: Gabe Becker <becker.gabe at gene.com> > Date: Monday, 11 June 2018 at 23:59 > To: Emil Bode <emil.bode at dans.knaw.nl> > Cc: Joris Meys <jorismeys at gmail.com>, Werner Grundlingh <wgrundlingh at gmail.com>, "macqueen1 at llnl.gov" <macqueen1 at llnl.gov>, r-devel <r-devel at r-project.org> > Subject: Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na() > > format.Date <- function (x, ...) > { > xx <- format(as.POSIXlt(x), ...) > names(xx) <- names(x) > xx[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)] <- paste('Invalid date:',as.numeric(x[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)])) > xx > }
Maybe Matching Threads
- Date class shows Inf as NA; this confuses the use of is.na()
- Date class shows Inf as NA; this confuses the use of is.na()
- Date class shows Inf as NA; this confuses the use of is.na()
- Date class shows Inf as NA; this confuses the use of is.na()
- Date class shows Inf as NA; this confuses the use of is.na()