Richard White
2019-Mar-05 07:31 UTC
[Rd] as.Date(Inf) displays as 'NA' but is actually 'Inf'
Hi, I think I've discovered a bug in base R. Basically, when using 'Inf' as as 'Date', is is visually displayed as 'NA', but R still treats it as 'Inf'. So it is very confusing to work with, and can easily lead to errors: # Visually displays as NA > as.Date(Inf, origin="2018-01-01") [1] NA # Visually displays as NA > str(as.Date(Inf, origin="2018-01-01")) Date[1:1], format: NA # Is NOT NA > is.na(as.Date(Inf, origin="2018-01-01")) [1] FALSE # Is still Inf > is.infinite(as.Date(Inf, origin="2018-01-01")) [1] TRUE This gets really problematic when you are collapsing dates over groups and you want to find the first date of a group. Because min() returns Inf if there is no data: # Visually displays as NA > as.Date(min(), origin="2018-01-01") [1] NA Warning message: In min() : no non-missing arguments to min; returning Inf # Visually displays as NA > str(as.Date(min(), origin="2018-01-01")) Date[1:1], format: NA Warning message: In min() : no non-missing arguments to min; returning Inf # Is not NA > is.na(as.Date(min(), origin="2018-01-01")) [1] FALSE Warning message: In min() : no non-missing arguments to min; returning Inf # This is bad! > as.Date(min(), origin="2018-01-01") > "2018-01-01" [1] TRUE Warning message: In min() : no non-missing arguments to min; returning Inf Here is my sessionInfo(): > sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 9 (stretch) Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/libopenblasp-r0.2.19.so locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 [6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19 > Sys.getlocale() [1] "LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C"
Gabriel Becker
2019-Mar-05 22:33 UTC
[Rd] as.Date(Inf) displays as 'NA' but is actually 'Inf'
Richard, Well others may chime in here, but from a mathematical point of view, the concept of "infinite days from right now" is well-defined, so it maybe a "valid" date in that sense, but what day and month it will be (year will be Inf) are indeterminate/not well defined. Those are rightfully, NA, it seems? I mean you could disallow dates to take Inf at all, ever. I don't feel strongly one way or the other about that, personally. That said, if inf dates are allowed, its not clear to me that displaying the "Formatted" date string as NA, even if the value isn't, is wrong given it can't be determined for that "date" is. It could be displayed differently, I suppose, but all the ones I can think of off the top of my head would be problematic and probably break lots of formatted-dates parsing code out there in the wild (and in R, I would guess). Things like displaying "Inf-NA-NA", or just "Inf". Neither of those are going to handle a read-write round-trip well, I think. So my personal don't-really-have-a-hat-in-the-ring opinion would be to either leave it as is, or force as.Date(Inf, bla) to actually be NA. Best, ~G On Tue, Mar 5, 2019 at 12:06 PM Richard White <w at rwhite.no> wrote:> Hi, > > I think I've discovered a bug in base R. > > Basically, when using 'Inf' as as 'Date', is is visually displayed as > 'NA', but R still treats it as 'Inf'. So it is very confusing to work > with, and can easily lead to errors: > > # Visually displays as NA > > as.Date(Inf, origin="2018-01-01") > [1] NA > > # Visually displays as NA > > str(as.Date(Inf, origin="2018-01-01")) > Date[1:1], format: NA > > # Is NOT NA > > is.na(as.Date(Inf, origin="2018-01-01")) > [1] FALSE > > # Is still Inf > > is.infinite(as.Date(Inf, origin="2018-01-01")) > [1] TRUE > > This gets really problematic when you are collapsing dates over groups > and you want to find the first date of a group. Because min() returns > Inf if there is no data: > > # Visually displays as NA > > as.Date(min(), origin="2018-01-01") > [1] NA > Warning message: In min() : no non-missing arguments to min; returning Inf > > # Visually displays as NA > > str(as.Date(min(), origin="2018-01-01")) > Date[1:1], format: NA > Warning message: In min() : no non-missing arguments to min; returning Inf > > # Is not NA > > is.na(as.Date(min(), origin="2018-01-01")) > [1] FALSE > Warning message: In min() : no non-missing arguments to min; returning Inf > > # This is bad! > > as.Date(min(), origin="2018-01-01") > "2018-01-01" > [1] TRUE > Warning message: In min() : no non-missing arguments to min; returning Inf > > Here is my sessionInfo(): > > > sessionInfo() > R version 3.5.0 (2018-04-23) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Debian GNU/Linux 9 (stretch) > Matrix products: default > BLAS: /usr/lib/openblas-base/libblas.so.3 > LAPACK: /usr/lib/libopenblasp-r0.2.19.so > > locale: > [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 > LC_MONETARY=C.UTF-8 > [6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base loaded via a > namespace (and not attached): > [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19 > > > Sys.getlocale() > [1] > > "LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C" > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
William Dunlap
2019-Mar-05 22:49 UTC
[Rd] as.Date(Inf) displays as 'NA' but is actually 'Inf'
format.Date runs into trouble long before Inf: > as.Date("2018-03-05") + c(2147466052, 2147466053) [1] "5881580-07-11" "-5877641-06-23" Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Mar 5, 2019 at 2:33 PM Gabriel Becker <gabembecker at gmail.com> wrote:> Richard, > > Well others may chime in here, but from a mathematical point of view, the > concept of "infinite days from right now" is well-defined, so it maybe a > "valid" date in that sense, but what day and month it will be (year will be > Inf) are indeterminate/not well defined. Those are rightfully, NA, it > seems? > > I mean you could disallow dates to take Inf at all, ever. I don't feel > strongly one way or the other about that, personally. That said, if inf > dates are allowed, its not clear to me that displaying the "Formatted" date > string as NA, even if the value isn't, is wrong given it can't be > determined for that "date" is. It could be displayed differently, I > suppose, but all the ones I can think of off the top of my head would be > problematic and probably break lots of formatted-dates parsing code out > there in the wild (and in R, I would guess). Things like displaying > "Inf-NA-NA", or just "Inf". Neither of those are going to handle a > read-write round-trip well, I think. > > So my personal don't-really-have-a-hat-in-the-ring opinion would be to > either leave it as is, or force as.Date(Inf, bla) to actually be NA. > > Best, > ~G > > On Tue, Mar 5, 2019 at 12:06 PM Richard White <w at rwhite.no> wrote: > > > Hi, > > > > I think I've discovered a bug in base R. > > > > Basically, when using 'Inf' as as 'Date', is is visually displayed as > > 'NA', but R still treats it as 'Inf'. So it is very confusing to work > > with, and can easily lead to errors: > > > > # Visually displays as NA > > > as.Date(Inf, origin="2018-01-01") > > [1] NA > > > > # Visually displays as NA > > > str(as.Date(Inf, origin="2018-01-01")) > > Date[1:1], format: NA > > > > # Is NOT NA > > > is.na(as.Date(Inf, origin="2018-01-01")) > > [1] FALSE > > > > # Is still Inf > > > is.infinite(as.Date(Inf, origin="2018-01-01")) > > [1] TRUE > > > > This gets really problematic when you are collapsing dates over groups > > and you want to find the first date of a group. Because min() returns > > Inf if there is no data: > > > > # Visually displays as NA > > > as.Date(min(), origin="2018-01-01") > > [1] NA > > Warning message: In min() : no non-missing arguments to min; returning > Inf > > > > # Visually displays as NA > > > str(as.Date(min(), origin="2018-01-01")) > > Date[1:1], format: NA > > Warning message: In min() : no non-missing arguments to min; returning > Inf > > > > # Is not NA > > > is.na(as.Date(min(), origin="2018-01-01")) > > [1] FALSE > > Warning message: In min() : no non-missing arguments to min; returning > Inf > > > > # This is bad! > > > as.Date(min(), origin="2018-01-01") > "2018-01-01" > > [1] TRUE > > Warning message: In min() : no non-missing arguments to min; returning > Inf > > > > Here is my sessionInfo(): > > > > > sessionInfo() > > R version 3.5.0 (2018-04-23) > > Platform: x86_64-pc-linux-gnu (64-bit) > > Running under: Debian GNU/Linux 9 (stretch) > > Matrix products: default > > BLAS: /usr/lib/openblas-base/libblas.so.3 > > LAPACK: /usr/lib/libopenblasp-r0.2.19.so > > > > locale: > > [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 > > LC_MONETARY=C.UTF-8 > > [6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base loaded via a > > namespace (and not attached): > > [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19 > > > > > Sys.getlocale() > > [1] > > > > > "LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C" > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Richard White
2019-Mar-06 05:54 UTC
[Rd] as.Date(Inf) displays as 'NA' but is actually 'Inf'
Hi Gabriel, The point is that it *visually* displays as NA, but is.na() still responds as FALSE. When I (and I am sure many people) see an NA, we then use is.na(). If we see Inf displayed, we then use is.infinite(). With as.Date() this breaks down. I'm not arguing that as.Date(Inf) should be coerced to NA. I'm arguing that as.Date(Inf) should be *visually* displayed as Inf (i.e. the truth!). I doubt this would break any existing code, because as.Date(Inf) acts as Inf in every way possible, except for when you visually look at the output printed on the screen. William - For all the other Date bugs, they don't visually display false information about the variable's contents. They might give wrong output, but the output displayed is what exists inside the variable. If we can't trust the R console to display the truth, then we are in a lot of trouble. > a <- as.Date(Inf, origin="2018-01-01") > a [1] NA > is.na(a) [1] FALSE Richard Gabriel Becker wrote on 06/03/2019 00:33:> Richard, > > Well others may chime in here, but from a mathematical point of view, > the concept of "infinite days from right now" is well-defined, so it > maybe a "valid" date in that sense, but what day and month it will be > (year will be Inf) are indeterminate/not well defined. Those are > rightfully, NA, it seems? > > I mean you could disallow dates to take Inf at all, ever. I don't feel > strongly one way or the other about that, personally. That said, if > inf dates are allowed, its not clear to me that displaying the > "Formatted" date string as NA, even if the value isn't,? is wrong > given it can't be determined for that "date" is. It could be displayed > differently, I suppose, but all the ones I can think of off the top of > my head would be problematic and probably break lots of > formatted-dates parsing code out there in the wild (and in R, I would > guess). Things like displaying "Inf-NA-NA", or just "Inf". Neither of > those are going to handle a read-write round-trip well, I think. > > So my personal don't-really-have-a-hat-in-the-ring opinion would be to > either leave it as is, or force as.Date(Inf, bla) to actually be NA. > > Best, > ~G > > On Tue, Mar 5, 2019 at 12:06 PM Richard White <w at rwhite.no > <mailto:w at rwhite.no>> wrote: > > Hi, > > I think I've discovered a bug in base R. > > Basically, when using 'Inf' as as 'Date', is is visually displayed as > 'NA', but R still treats it as 'Inf'. So it is very confusing to work > with, and can easily lead to errors: > > # Visually displays as NA > ?> as.Date(Inf, origin="2018-01-01") > [1] NA > > # Visually displays as NA > ?> str(as.Date(Inf, origin="2018-01-01")) > Date[1:1], format: NA > > # Is NOT NA > ?> is.na <http://is.na>(as.Date(Inf, origin="2018-01-01")) > [1] FALSE > > # Is still Inf > ?> is.infinite(as.Date(Inf, origin="2018-01-01")) > [1] TRUE > > This gets really problematic when you are collapsing dates over > groups > and you want to find the first date of a group. Because min() returns > Inf if there is no data: > > # Visually displays as NA > ?> as.Date(min(), origin="2018-01-01") > [1] NA > Warning message: In min() : no non-missing arguments to min; > returning Inf > > # Visually displays as NA > ?> str(as.Date(min(), origin="2018-01-01")) > Date[1:1], format: NA > Warning message: In min() : no non-missing arguments to min; > returning Inf > > # Is not NA > ?> is.na <http://is.na>(as.Date(min(), origin="2018-01-01")) > [1] FALSE > Warning message: In min() : no non-missing arguments to min; > returning Inf > > # This is bad! > ?> as.Date(min(), origin="2018-01-01") > "2018-01-01" > [1] TRUE > Warning message: In min() : no non-missing arguments to min; > returning Inf > > Here is my sessionInfo(): > > ?> sessionInfo() > R version 3.5.0 (2018-04-23) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Debian GNU/Linux 9 (stretch) > Matrix products: default > BLAS: /usr/lib/openblas-base/libblas.so.3 > LAPACK: /usr/lib/libopenblasp-r0.2.19.so > <http://libopenblasp-r0.2.19.so> > > locale: > [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 > LC_MONETARY=C.UTF-8 > [6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C > LC_TELEPHONE=C > [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base loaded via a > namespace (and not attached): > [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19 > > ?> Sys.getlocale() > [1] > "LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C" > > ______________________________________________ > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Maybe Matching Threads
- as.Date(Inf) displays as 'NA' but is actually 'Inf'
- as.Date(Inf) displays as 'NA' but is actually 'Inf'
- as.Date(Inf) displays as 'NA' but is actually 'Inf'
- as.Date(Inf) displays as 'NA' but is actually 'Inf'
- readLines() behaves differently for gzfile connection