Martin Maechler
2018-Jun-12 16:28 UTC
[Rd] Date class shows Inf as NA; this confuses the use of is.na()
>>>>> Emil Bode >>>>> on Tue, 12 Jun 2018 12:00:42 +0000 writes:> I agree that calling it invalid is a bit confusing, but I?m not sure what the wording should be, as the problem is that the conversion to POSIXlt is failing. > The best solution would be to extend the whole POSIXlt-class, but that?s too much work. > I?ve done some experiments, and it also seems that the Date class can store larger values than POSIXlt: > > as.Date(8e9, origin='1970-01-01')==as.Date(9e9, origin='1970-01-01') > [1] FALSE > > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))==as.POSIXlt(as.Date(9e9, origin='1970-01-01')) > [1] TRUE > > as.POSIXlt(as.Date(8e9, origin='1970-01-01')) > [1] "-5877641-06-23 UTC" > # Same for 9e9 > > as.Date(8e9, origin='1970-01-01')>Sys.Date() > [1] TRUE > > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))>as.POSIXlt(Sys.Date()) > [1] FALSE > > So the situation as I see it now: > > * Having an infinite date may convey some information, so > we shouldn?t prohibit it anyway> * Idem for very large values (positive or negative)Indeed -- good you found that you don't have to go all the way to Inf ... and that is typical (and the reason why one has to solve the problem anyway and way Inf is not really a special case in that sense (but nicely in another sense) !> * But we should warn users that their dates may not be neatly representable, that there is no way to use the default-print > * So for values where the POSIXlt-print fails, I think it?s best to print the numerical value, along with some text warning the user> So I?ve adapted the format-function a bit more, with behaviour below. > The details can be adapted of course, but I feel it?s best to print some variant of as.numeric(x) if as.POSIXlt(x) turns out to be unreliable, and further leave is.na()> > format.Date <- function (x, ...) > { > xx <- format(as.POSIXlt(x), ...) > names(xx) <- names(x) > if(any(!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896))) { > xx[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)] <- > paste('Date with numerical value',as.numeric(x[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)])) > warning('Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value.') > } > xx > } > > With the following results: > > > environment(print.Date) <- .GlobalEnv > > as.Date(Inf, origin='1970-01-01') > [1] "Date with numerical value Inf" > Warning message: > In format.Date(x) : > Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value. >This looks somewhat reasonable as a workaround for you and for now. However, I'd propose another route to go for "the next version of R": When I consider > str(unclass(as.POSIXlt.Date(Sys.time() + 1e50))) List of 9 $ sec : num 0 $ min : int 0 $ hour : int 0 $ mday : int 23 $ mon : int 5 $ year : int -5879541 $ wday : int 2 $ yday : int 173 $ isdst: int 0 - attr(*, "tzone")= chr "UTC" > we see the integer overflow (to negative here) and that all components but 'sec' (because allow fractions!) are integer. I think we should allow 'year' to be "double" instead, and so it could also be +Inf or -Inf and we'd nicely cover the conversions from and to 'numeric' -- which is really used internally for dates and date-times in POSIXct. Martin> > From: Gabe Becker <becker.gabe at gene.com> > Date: Monday, 11 June 2018 at 23:59 > To: Emil Bode <emil.bode at dans.knaw.nl> > Cc: Joris Meys <jorismeys at gmail.com>, Werner Grundlingh <wgrundlingh at gmail.com>, "macqueen1 at llnl.gov" <macqueen1 at llnl.gov>, r-devel <r-devel at r-project.org> > Subject: Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na() > > format.Date <- function (x, ...) > { > xx <- format(as.POSIXlt(x), ...) > names(xx) <- names(x) > xx[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)] <- paste('Invalid date:',as.numeric(x[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)])) > xx > }
Joris Meys
2018-Jun-12 18:47 UTC
[Rd] Date class shows Inf as NA; this confuses the use of is.na()
On Tue, Jun 12, 2018 at 6:28 PM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> > I think we should allow 'year' to be "double" instead, and so it > could also be +Inf or -Inf and we'd nicely cover > the conversions from and to 'numeric' -- which is really used > internally for dates and date-times in POSIXct. > > Martin > >That would be perfect and tackles both consistency with other formats and the confusing print() output. I'm all for it. Cheers Joris -- Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2017-2018 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Greg Minshall
2018-Jun-12 23:23 UTC
[Rd] Date class shows Inf as NA; this confuses the use of is.na()
Martin, et al.,> I think we should allow 'year' to be "double" instead, and so it > could also be +Inf or -Inf and we'd nicely cover > the conversions from and to 'numeric' -- which is really used > internally for dates and date-times in POSIXct.storing years as a double makes me worry slightly about ----> year <- 1e50 > (year+1)-year[1] 0 ---- which is not how one thinks of years (or integers) as behaving. cheers, Greg ps -- sorry for the ">" overloading!
Gabe Becker
2018-Jun-13 16:20 UTC
[Rd] Date class shows Inf as NA; this confuses the use of is.na()
Greg, I see what you mean, but on the other hand, that's not how we think about real numbers working either, and doubles have that behavior generally. It might be possible to put checks in (with a potentially non-trivial overhead cost) to disallow that kind of thing, but again R (and everyone else, I think?) doesn't do so for regular doubles. Also, I would expect the year 1e50 and the "year" Inf to be functionally equivalent in meaning (and largely meaningless) in context. Best, ~G On Tue, Jun 12, 2018 at 4:23 PM, Greg Minshall <minshall at acm.org> wrote:> Martin, et al., > > > I think we should allow 'year' to be "double" instead, and so it > > could also be +Inf or -Inf and we'd nicely cover > > the conversions from and to 'numeric' -- which is really used > > internally for dates and date-times in POSIXct. > > storing years as a double makes me worry slightly about > ---- > > year <- 1e50 > > (year+1)-year > [1] 0 > ---- > which is not how one thinks of years (or integers) as behaving. > > cheers, Greg > > ps -- sorry for the ">" overloading! > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >-- Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research [[alternative HTML version deleted]]
Seemingly Similar Threads
- Date class shows Inf as NA; this confuses the use of is.na()
- Date class shows Inf as NA; this confuses the use of is.na()
- Date class shows Inf as NA; this confuses the use of is.na()
- Date class shows Inf as NA; this confuses the use of is.na()
- Date class shows Inf as NA; this confuses the use of is.na()