I have a data frame with dates as integers: > summary(persons[, c("foddat", "doddat")]) foddat doddat Min. :16790000 Min. :18000000 1st Qu.:18760904 1st Qu.:18810924 Median :19030426 Median :19091227 Mean :18946659 Mean :19027233 3rd Qu.:19220911 3rd Qu.:19310526 Max. :19660124 Max. :19691228 NA's :624 NA's :207570 After converting the dates to Date format ('as.Date') I get: > summary(per[, c("foddat", "doddat")]) foddat doddat Min. :1679-07-01 Min. :1800-01-26 1st Qu.:1876-09-04 1st Qu.:1881-09-24 Median :1903-04-26 Median :1909-12-27 Mean :1895-02-04 Mean :1903-02-22 3rd Qu.:1922-09-10 3rd Qu.:1931-05-26 Max. :1966-01-24 Max. :1969-12-28 My question is: Why are the numbers of missing values not printed in the second case? 'is.na' gives the correct (same) numbers. Can I somehow force 'summary' to print NA's? I found no clues in the documentation. > sessionInfo() R version 3.2.3 Patched (2016-01-19 r69960) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 15.10 G?ran Brostr?m
> On Feb 8, 2016, at 11:26 AM, G?ran Brostr?m <goran.brostrom at umu.se> wrote: > > I have a data frame with dates as integers: > > > summary(persons[, c("foddat", "doddat")]) > foddat doddat > Min. :16790000 Min. :18000000 > 1st Qu.:18760904 1st Qu.:18810924 > Median :19030426 Median :19091227 > Mean :18946659 Mean :19027233 > 3rd Qu.:19220911 3rd Qu.:19310526 > Max. :19660124 Max. :19691228 > NA's :624 NA's :207570 > > After converting the dates to Date format ('as.Date') I get: > > > summary(per[, c("foddat", "doddat")]) > foddat doddat > Min. :1679-07-01 Min. :1800-01-26 > 1st Qu.:1876-09-04 1st Qu.:1881-09-24 > Median :1903-04-26 Median :1909-12-27 > Mean :1895-02-04 Mean :1903-02-22 > 3rd Qu.:1922-09-10 3rd Qu.:1931-05-26 > Max. :1966-01-24 Max. :1969-12-28 > > My question is: Why are the numbers of missing values not printed in the second case? 'is.na' gives the correct (same) numbers. > > Can I somehow force 'summary' to print NA's? I found no clues in the documentation.Hi, Two things: 1. We are going to need to see the exact call to as.Date() that you used. as.Date() will take a numeric vector as input, but the presumption is that the number represents the number of days since an origin, which needs to be specified explicitly. If you coerced the numeric vector to character first, presuming a "%Y%m%d" format, then you need to be cautious about how that is done and the result. 2. Your second call is to a data frame called 'per', which may or may not have the same content as 'persons' in your first call. If I do the following, taking some of your numeric values from above: x <- c(18000000, 18810924, 19091227, 19027233, 19310526, 19691228, NA) DF <- data.frame(x)> summary(DF)x Min. :18000000 1st Qu.:18865001 Median :19059230 Mean :18988523 3rd Qu.:19255701 Max. :19691228 NA's :1> as.character(DF$x)[1] "1.8e+07" "18810924" "19091227" "19027233" "19310526" "19691228" [7] NA DF$x.Date <- as.Date(as.character(DF$x), format = "%Y%m%d")> DFx x.Date 1 18000000 <NA> 2 18810924 1881-09-24 3 19091227 1909-12-27 4 19027233 <NA> 5 19310526 1931-05-26 6 19691228 1969-12-28 7 NA <NA>> summary(DF)x x.Date Min. :18000000 Min. :1881-09-24 1st Qu.:18865001 1st Qu.:1902-12-04 Median :19059230 Median :1920-09-10 Mean :18988523 Mean :1923-04-12 3rd Qu.:19255701 3rd Qu.:1941-01-17 Max. :19691228 Max. :1969-12-28 NA's :1 NA's :3 So summary does support the reporting of NA's for Dates, using summary.Date(). Regards, Marc Schwartz
Thanks Marc, but see below! On 2016-02-08 19:26, Marc Schwartz wrote:> >> On Feb 8, 2016, at 11:26 AM, G?ran Brostr?m <goran.brostrom at umu.se> wrote: >> >> I have a data frame with dates as integers: >> >>> summary(persons[, c("foddat", "doddat")]) >> foddat doddat >> Min. :16790000 Min. :18000000 >> 1st Qu.:18760904 1st Qu.:18810924 >> Median :19030426 Median :19091227 >> Mean :18946659 Mean :19027233 >> 3rd Qu.:19220911 3rd Qu.:19310526 >> Max. :19660124 Max. :19691228 >> NA's :624 NA's :207570 >> >> After converting the dates to Date format ('as.Date') I get: >> >>> summary(per[, c("foddat", "doddat")]) >> foddat doddat >> Min. :1679-07-01 Min. :1800-01-26 >> 1st Qu.:1876-09-04 1st Qu.:1881-09-24 >> Median :1903-04-26 Median :1909-12-27 >> Mean :1895-02-04 Mean :1903-02-22 >> 3rd Qu.:1922-09-10 3rd Qu.:1931-05-26 >> Max. :1966-01-24 Max. :1969-12-28 >> >> My question is: Why are the numbers of missing values not printed in the second case? 'is.na' gives the correct (same) numbers. >> >> Can I somehow force 'summary' to print NA's? I found no clues in the documentation. > > > Hi, > > Two things: > > 1. We are going to need to see the exact call to as.Date() that you used. as.Date() will take a numeric vector as input, but the presumption is that the number represents the number of days since an origin, which needs to be specified explicitly. If you coerced the numeric vector to character first, presuming a "%Y%m%d" format, then you need to be cautious about how that is done and the result. > > 2. Your second call is to a data frame called 'per', which may or may not have the same content as 'persons' in your first call. > > > If I do the following, taking some of your numeric values from above: > > x <- c(18000000, 18810924, 19091227, 19027233, 19310526, 19691228, NA) > > DF <- data.frame(x) > >> summary(DF) > x > Min. :18000000 > 1st Qu.:18865001 > Median :19059230 > Mean :18988523 > 3rd Qu.:19255701 > Max. :19691228 > NA's :1 > >> as.character(DF$x) > [1] "1.8e+07" "18810924" "19091227" "19027233" "19310526" "19691228" > [7] NA > > DF$x.Date <- as.Date(as.character(DF$x), format = "%Y%m%d") > >> DF > x x.Date > 1 18000000 <NA> > 2 18810924 1881-09-24 > 3 19091227 1909-12-27 > 4 19027233 <NA> > 5 19310526 1931-05-26 > 6 19691228 1969-12-28 > 7 NA <NA> > >> summary(DF) > x x.Date > Min. :18000000 Min. :1881-09-24 > 1st Qu.:18865001 1st Qu.:1902-12-04 > Median :19059230 Median :1920-09-10 > Mean :18988523 Mean :1923-04-12 > 3rd Qu.:19255701 3rd Qu.:1941-01-17 > Max. :19691228 Max. :1969-12-28 > NA's :1 NA's :3 >But: > summary(DF[, "x.Date", drop = FALSE]) x.Date Min. :1881-09-24 1st Qu.:1902-12-04 Median :1920-09-10 Mean :1923-04-12 3rd Qu.:1941-01-17 Max. :1969-12-28 No NA's. But again: > summary(DF[, "x.Date"]) Min. 1st Qu. Median Mean 3rd Qu. Max. "1881-09-24" "1902-12-04" "1920-09-10" "1923-04-12" "1941-01-17" "1969-12-28" NA's "3"> > So summary does support the reporting of NA's for Dates, using summary.Date().Not always, as it seems. Strange. (The 'persons' vs. 'per' is a red herring.) G?ran Brostr?m> > Regards, > > Marc Schwartz >