diverses at univecom.ch
2009-Dec-22 19:40 UTC
[Rd] as.Date function yields inconsistent results (PR#14166)
Full_Name: Mario Luoni Version: 2.10.0 OS: Windows XP HE SP3 Submission from: (NULL) (217.194.59.134) This piece of code: zzz1 <- as.POSIXct("1999-03-18", tz="CET") zzz2 <- as.POSIXlt("1999-03-18", tz="CET") zzz1 == zzz2 as.Date(zzz1) as.Date(zzz2) yields TRUE for "zzz1==zzz2", but the two dates returned by as.Date are different:> as.Date(zzz1)[1] "1999-03-17"> as.Date(zzz2)[1] "1999-03-18" For me this looks like a bug, even though it could be a problem with timezones, but I couldn't find documentation that would explain that behaviour.> sessionInfo()R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 Hmisc_3.7-0 survival_2.35-7 loaded via a namespace (and not attached): [1] cluster_1.12.1 grid_2.10.0 lattice_0.17-26 svMisc_0.9-56 tools_2.10.0
Tony Plate
2009-Dec-28 09:15 UTC
[Rd] as.Date function yields inconsistent results (PR#14166)
I think you're right that this is a timezone issue -- it seems to be a consequence of the behavior described about in the article by Gabor Grothendieck and Thomas Petzoldt: "R help desk: Date and time classes in R." R News, 4(1):29-32, June 2004. <http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf> http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf (there is a reference at http://wiki.r-project.org/rwiki/doku.php?id=guides:times-dates) BEGIN QUOTE from p31 http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf Regarding POSIX classes, the user should be aware of the following: ... * POSIXlt. The tzone attribute on POSIXlt times are ignored so it is safer to use POSIXct than POSIXlt when performing arithmetic or other manipulations that may depend on time zones. END QUOTE The ignoring of the tzone attribute appears to apply to as.Date() conversions too. The behavior is especially interesting because POSIXlt objects are converted to POSIXct objects by arithmetic, and the arithmetic operation preserves the tzone attribute. So, if you want as.Date() to pay attention to the tzone attribute of a POSIXlt object, you might be able to just add 0 to it (or just use as.POSIXct() on it). (Though I don't know if there are other pitfalls in this path to catch the unwary). The following demonstrates various aspects of the behavior: > d.ct.CET <- as.POSIXct("1999-03-18", tz="CET") > d.lt.CET <- as.POSIXlt("1999-03-18", tz="CET") > d.ct.UTC <- as.POSIXct("1999-03-18", tz="UTC") > d.lt.UTC <- as.POSIXlt("1999-03-18", tz="UTC") > d.ct.EST <- as.POSIXct("1999-03-18", tz="EST") > d.lt.EST <- as.POSIXlt("1999-03-18", tz="EST") > # ii is hours to catch the date changes in the time-zones here > ii <- c(0,1,19,20,23,24)*3600 > # all columns of x1 are as.Date() of POSIXct objects > # each column follows its tzone attribute because as.Date() pays > # attention to tzone attr of POSIXct object, and arithmetic converts > # POSIXlt object to POSIXct object preserving the tzone attr > (x1 <- data.frame(hour=ii, ct.UTC=as.Date(d.ct.UTC + ii), lt.UTC=as.Date(d.lt.UTC + ii), ct.CET=as.Date(d.ct.CET + ii), lt.CET=as.Date(d.lt.CET + ii), ct.EST=as.Date(d.ct.EST + ii), lt.EST=as.Date(d.lt.EST + ii))) hour ct.UTC lt.UTC ct.CET lt.CET ct.EST lt.EST 1 0 1999-03-18 1999-03-18 1999-03-17 1999-03-17 1999-03-18 1999-03-18 2 3600 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-18 3 68400 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-19 1999-03-19 4 72000 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-19 1999-03-19 5 82800 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-19 1999-03-19 6 86400 1999-03-19 1999-03-19 1999-03-18 1999-03-18 1999-03-19 1999-03-19 > all.equal(x1$lt.UTC, x1$ct.UTC) [1] TRUE > all.equal(x1$lt.CET, x1$ct.CET) [1] TRUE > all.equal(x1$lt.EST, x1$ct.EST) [1] TRUE > class(d.lt.EST) [1] "POSIXt" "POSIXlt" > class(d.lt.EST + ii) [1] "POSIXt" "POSIXct" > as.POSIXlt(d.lt.EST + ii) [1] "1999-03-18 00:00:00 EST" "1999-03-18 01:00:00 EST" [3] "1999-03-18 19:00:00 EST" "1999-03-18 20:00:00 EST" [5] "1999-03-18 23:00:00 EST" "1999-03-19 00:00:00 EST" > > # the lt.* columns of x2 are as.Date() of POSIXlt objects, and these > # are all the same the ct.UTC column because as.Date() ignores the > # tzone attribute of POSIXlt objects > (x2 <- data.frame(hour=ii, ct.UTC=as.Date(d.ct.UTC + ii), lt.UTC=as.Date(as.POSIXlt(d.lt.UTC + ii)), ct.CET=as.Date(d.ct.CET + ii), lt.CET=as.Date(as.POSIXlt(d.lt.CET + ii)), ct.EST=as.Date(d.ct.EST + ii), lt.EST=as.Date(as.POSIXlt(d.lt.EST + ii)))) hour ct.UTC lt.UTC ct.CET lt.CET ct.EST lt.EST 1 0 1999-03-18 1999-03-18 1999-03-17 1999-03-18 1999-03-18 1999-03-18 2 3600 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-18 3 68400 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-19 1999-03-18 4 72000 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-19 1999-03-18 5 82800 1999-03-18 1999-03-18 1999-03-18 1999-03-18 1999-03-19 1999-03-18 6 86400 1999-03-19 1999-03-19 1999-03-18 1999-03-19 1999-03-19 1999-03-19 > all.equal(x2$lt.UTC, x2$ct.UTC) [1] TRUE > all.equal(x2$lt.CET, x2$ct.UTC) [1] TRUE > all.equal(x2$lt.EST, x2$ct.UTC) [1] TRUE > > sessionInfo() R version 2.10.1 (2009-12-14) i486-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.10.1 > > diverses at univecom.ch wrote:> Full_Name: Mario Luoni > Version: 2.10.0 > OS: Windows XP HE SP3 > Submission from: (NULL) (217.194.59.134) > > > This piece of code: > > zzz1 <- as.POSIXct("1999-03-18", tz="CET") > zzz2 <- as.POSIXlt("1999-03-18", tz="CET") > zzz1 == zzz2 > as.Date(zzz1) > as.Date(zzz2) > > yields TRUE for "zzz1==zzz2", but the two dates returned by as.Date are > different: > > >> as.Date(zzz1) >> > [1] "1999-03-17" > >> as.Date(zzz2) >> > [1] "1999-03-18" > > For me this looks like a bug, even though it could be a problem with timezones, > but I couldn't find documentation that would explain that behaviour. > > > >> sessionInfo() >> > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] grDevices datasets splines graphics stats tcltk utils > methods base > > other attached packages: > [1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 Hmisc_3.7-0 > survival_2.35-7 > > loaded via a namespace (and not attached): > [1] cluster_1.12.1 grid_2.10.0 lattice_0.17-26 svMisc_0.9-56 tools_2.10.0 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >