Tobias Gauster
2012-Dec-13 21:43 UTC
[R] duplicated.data.frame() and POSIXct with DST shift
Hi,
I encountered the behavior, that the duplicated method for data.frames gives
"false positives" if there are columns of class POSIXct with a clock
shift from DST to standard time.
time <- as.POSIXct("2012-10-28 02:00",
tz="Europe/Vienna") + c(0, 60*60)
time
[1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
df <- data.frame(time, text="foo")
duplicated(df)
[1] FALSE TRUE
This is because the timezone is lost after calling paste():
do.call(paste, c(df, sep = "\r"))
[1] "2012-10-28 02:00:00\rfoo" "2012-10-28 02:00:00\rfoo"
I can't really figure out if this behavior is desired or not. If so, a short
warning in ?duplicated could be helpful. It is mentioned how
duplicated.data.frame() works, but I didn't find a hint to properly handle
POSIXct-objects.
My particular problem was to cast a data.frame like this one with cast() (which
calls reshape1(), which calls duplicated()):
df2 <- data.frame(time, time1=as.numeric(time),
lab=rep(1:3, each=2), value=101:106,
text=rep(c("foo", "bar"), each=3))
library(reshape2)
Using the column of class POSIXct as a variable in the formula gives:
cast(lab*time~text, data=df2, value="value")
Aggregation requires fun.aggregate: length used as default
lab time bar foo
1 1 2012-10-28 02:00:00 0 2
2 2 2012-10-28 02:00:00 1 1
3 3 2012-10-28 02:00:00 2 0
Converting to numeric, casting and converting back works as expected, although
the timezone is not visible, because print.data.frame() calls format.POSIXct()
with, usetz = FALSE:
y <- cast(lab*time1~text, data=df2, value="value")
y$time1 <- as.POSIXct("1970-01-01 01:00") + as.numeric(y$time1)
Can anyone suggest a more elegant solution?
Best,
Tobias
David Winsemius
2012-Dec-14 01:01 UTC
[R] duplicated.data.frame() and POSIXct with DST shift
On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote:> Hi, > > I encountered the behavior, that the duplicated method for > data.frames gives "false positives" if there are columns of class > POSIXct with a clock shift from DST to standard time. > > time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0, > 60*60) > time > [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET" > > df <- data.frame(time, text="foo") > duplicated(df) > [1] FALSE TRUEIn this instance> > This is because the timezone is lost after calling paste(): > do.call(paste, c(df, sep = "\r"))I suspect the problem arise when 'paste' coerces to character: > as.character(time) [1] "2012-10-28 02:00:00" "2012-10-28 02:00:00" I think that as.character might get missed since the 'paste' operation is done internally. > as.character(time, usetz=TRUE) [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET" -- David. [1] "2012-10-28 02:00:00\rfoo" "2012-10-28 02:00:00\rfoo"> >> I can't really figure out if this behavior is desired or not. If so, > a short warning in ?duplicated could be helpful. It is mentioned how > duplicated.data.frame() works, but I didn't find a hint to properly > handle POSIXct-objects.There is no duplicated.POSIXct method> > My particular problem was to cast a data.frame like this one with > cast() (which calls reshape1(), which calls duplicated()): > > df2 <- data.frame(time, time1=as.numeric(time), > lab=rep(1:3, each=2), value=101:106, > text=rep(c("foo", "bar"), each=3)) > > library(reshape2) > > Using the column of class POSIXct as a variable in the formula gives: > cast(lab*time~text, data=df2, value="value") > Aggregation requires fun.aggregate: length used as default > lab time bar foo > 1 1 2012-10-28 02:00:00 0 2 > 2 2 2012-10-28 02:00:00 1 1 > 3 3 2012-10-28 02:00:00 2 0 > > Converting to numeric, casting and converting back works as > expected, although the timezone is not visible, because > print.data.frame() calls format.POSIXct() with, usetz = FALSE: > y <- cast(lab*time1~text, data=df2, value="value") > y$time1 <- as.POSIXct("1970-01-01 01:00") + as.numeric(y$time1) > > Can anyone suggest a more elegant solution? > > Best, > Tobias > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Alameda, CA, USA