Agustin Lobo
2011-May-18 05:53 UTC
[R] Date_Time detected as Duplicated (but they are not!)
I have a problem with duplicated date_time stamps that I do not see as duplicated. I read a file with observations taken every 30 minutes:> aur2009=read.csv(paste(datadir,"AUR_ECPP_2009.csv",sep="/"),sep=";",stringsAsFactors=F) > aur2009[1:3,1:5]Date.Time E_filled E_filled_flag LE_filled LE_filled_flag 1 1/1/2009 0:00 0 NaN 5.86 NaN 2 1/1/2009 0:30 0 NaN 5.05 NaN 3 1/1/2009 1:00 0 NaN 5.56 NaN> delme = strptime(aur2009[,1], "%m/%d/%Y %H:%M") > aur2009[,1]=as.POSIXct(delme)Date.Time E_filled E_filled_flag LE_filled LE_filled_flag 1 2009-01-01 00:00:00 0 NaN 5.86 NaN 2 2009-01-01 00:30:00 0 NaN 5.05 NaN 3 2009-01-01 01:00:00 0 NaN 5.56 NaN> aur2009ts = ts(aur2009) > row.names(aur2009ts) = as.character(delme) > aur2009ts[1:3,1:5]Date.Time E_filled E_filled_flag LE_filled LE_filled_flag 2009-01-01 00:00:00 1230764400 0 NaN 5.86 NaN 2009-01-01 00:30:00 1230766200 0 NaN 5.05 NaN 2009-01-01 01:00:00 1230768000 0 NaN 5.56 NaN Then:> aur2009z = zoo(aur2009[,2:12],as.POSIXct(delme))Warning message: In zoo(aur2009[, 2:12], as.POSIXct(delme)) : some methods for ?zoo? objects do not work if the index entries in ?order.by? are not unique So I investigate:> any(duplicated(aur2009ts[,1]))[1] TRUE> aur2009ts[(duplicated(aur2009ts[,1])),1:5]Date.Time E_filled E_filled_flag LE_filled LE_filled_flag 2009-03-29 02:00:00 1238284800 0 NaN 1.2 NaN 2009-03-29 02:30:00 1238286600 0 NaN 1.2 NaN But note the surprise:> aur2009ts[aur2009ts[,1]==1238284800,1:5]Date.Time E_filled E_filled_flag LE_filled LE_filled_flag 2009-03-29 01:00:00 1238284800 0 NaN -0.58 NaN 2009-03-29 02:00:00 1238284800 0 NaN 1.20 NaN> aur2009ts[aur2009ts[,1]==1238286600,1:5]Date.Time E_filled E_filled_flag LE_filled LE_filled_flag 2009-03-29 01:30:00 1238286600 0 NaN -0.34 NaN 2009-03-29 02:30:00 1238286600 0 NaN 1.20 NaN The dates detected as duplicated are actually different times that got the same value in the ts version of the object! What am I doing wrong? They are all observations every 30min, why are these 2 encoded as the same time? Any help appreciated Agus
Michael Sumner
2011-May-18 06:55 UTC
[R] Date_Time detected as Duplicated (but they are not!)
See under "Note" in ?strptime:
Remember that in most timezones some times do not occur and some
occur twice because of transitions to/from summer time.
‘strptime’ does not validate such times (it does not assume a
specific timezone), but conversion by ‘as.POSIXct’) will do so.
On Wed, May 18, 2011 at 3:53 PM, Agustin Lobo
<Agustin.Lobo@ictja.csic.es>wrote:
> I have a problem with duplicated date_time stamps that I do not see as
> duplicated.
>
> I read a file with observations taken every 30 minutes:
>
> >
>
aur2009=read.csv(paste(datadir,"AUR_ECPP_2009.csv",sep="/"),sep=";",stringsAsFactors=F)
> > aur2009[1:3,1:5]
> Date.Time E_filled E_filled_flag LE_filled LE_filled_flag
> 1 1/1/2009 0:00 0 NaN 5.86 NaN
> 2 1/1/2009 0:30 0 NaN 5.05 NaN
> 3 1/1/2009 1:00 0 NaN 5.56 NaN
>
> > delme = strptime(aur2009[,1], "%m/%d/%Y %H:%M")
> > aur2009[,1]=as.POSIXct(delme)
> Date.Time E_filled E_filled_flag LE_filled LE_filled_flag
> 1 2009-01-01 00:00:00 0 NaN 5.86 NaN
> 2 2009-01-01 00:30:00 0 NaN 5.05 NaN
> 3 2009-01-01 01:00:00 0 NaN 5.56 NaN
>
> > aur2009ts = ts(aur2009)
> > row.names(aur2009ts) = as.character(delme)
> > aur2009ts[1:3,1:5]
> Date.Time E_filled E_filled_flag LE_filled
> LE_filled_flag
> 2009-01-01 00:00:00 1230764400 0 NaN 5.86
> NaN
> 2009-01-01 00:30:00 1230766200 0 NaN 5.05
> NaN
> 2009-01-01 01:00:00 1230768000 0 NaN 5.56
> NaN
>
> Then:
> > aur2009z = zoo(aur2009[,2:12],as.POSIXct(delme))
> Warning message:
> In zoo(aur2009[, 2:12], as.POSIXct(delme)) :
> some methods for “zoo” objects do not work if the index entries in
> ‘order.by’ are not unique
>
> So I investigate:
> > any(duplicated(aur2009ts[,1]))
> [1] TRUE
>
> > aur2009ts[(duplicated(aur2009ts[,1])),1:5]
> Date.Time E_filled E_filled_flag LE_filled
> LE_filled_flag
> 2009-03-29 02:00:00 1238284800 0 NaN 1.2
> NaN
> 2009-03-29 02:30:00 1238286600 0 NaN 1.2
> NaN
>
> But note the surprise:
> > aur2009ts[aur2009ts[,1]==1238284800,1:5]
> Date.Time E_filled E_filled_flag LE_filled
> LE_filled_flag
> 2009-03-29 01:00:00 1238284800 0 NaN -0.58
> NaN
> 2009-03-29 02:00:00 1238284800 0 NaN 1.20
> NaN
> > aur2009ts[aur2009ts[,1]==1238286600,1:5]
> Date.Time E_filled E_filled_flag LE_filled
> LE_filled_flag
> 2009-03-29 01:30:00 1238286600 0 NaN -0.34
> NaN
> 2009-03-29 02:30:00 1238286600 0 NaN 1.20
> NaN
>
> The dates detected as duplicated are actually different times that got
> the same value in the ts version of the object!
> What am I doing wrong? They are all observations every 30min, why are
> these 2 encoded as the
> same time?
>
> Any help appreciated
>
> Agus
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsumner@gmail.com
[[alternative HTML version deleted]]
Agustin Lobo
2011-May-18 07:49 UTC
[R] Date_Time detected as Duplicated (but they are not!)
and is it not possible to ignore savings time? My data are in UTC, with no savings time changes> delme = strptime(aur2009[,1], "%m/%d/%Y %H:%M",tz="UTC") > any(duplicated(delme))[1] TRUE> delme = as.POSIXct(aur2009[,1], "%m/%d/%Y %H:%M",tz="UTC") > any(duplicated(delme))[1] TRUE Agus On Wed, May 18, 2011 at 8:55 AM, Michael Sumner <mdsumner at gmail.com> wrote:> See under "Note" in ?strptime: > ? ?Remember that in most timezones some times do not occur and some > ? ? ?occur twice because of transitions to/from summer time. > ? ? ??strptime? does not validate such times (it does not assume a > ? ? ?specific timezone), but conversion by ?as.POSIXct?) will do so. > > > > On Wed, May 18, 2011 at 3:53 PM, Agustin Lobo <Agustin.Lobo at ictja.csic.es> > wrote: >> >> I have a problem with duplicated date_time stamps that I do not see as >> duplicated. >> >> I read a file with observations taken every 30 minutes: >> >> > >> > aur2009=read.csv(paste(datadir,"AUR_ECPP_2009.csv",sep="/"),sep=";",stringsAsFactors=F) >> > aur2009[1:3,1:5] >> ? ? ?Date.Time E_filled E_filled_flag LE_filled LE_filled_flag >> 1 1/1/2009 0:00 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?5.86 ? ? ? ? ? ?NaN >> 2 1/1/2009 0:30 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?5.05 ? ? ? ? ? ?NaN >> 3 1/1/2009 1:00 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?5.56 ? ? ? ? ? ?NaN >> >> > delme = strptime(aur2009[,1], "%m/%d/%Y %H:%M") >> > aur2009[,1]=as.POSIXct(delme) >> ? ? ? ? ? ?Date.Time E_filled E_filled_flag LE_filled LE_filled_flag >> 1 2009-01-01 00:00:00 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?5.86 ? ? ? ? ? ?NaN >> 2 2009-01-01 00:30:00 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?5.05 ? ? ? ? ? ?NaN >> 3 2009-01-01 01:00:00 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?5.56 ? ? ? ? ? ?NaN >> >> > aur2009ts = ts(aur2009) >> > row.names(aur2009ts) = as.character(delme) >> > aur2009ts[1:3,1:5] >> ? ? ? ? ? ? ? ? ? ? Date.Time E_filled E_filled_flag LE_filled >> LE_filled_flag >> 2009-01-01 00:00:00 1230764400 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?5.86 >> ?NaN >> 2009-01-01 00:30:00 1230766200 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?5.05 >> ?NaN >> 2009-01-01 01:00:00 1230768000 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?5.56 >> ?NaN >> >> Then: >> > aur2009z = zoo(aur2009[,2:12],as.POSIXct(delme)) >> Warning message: >> In zoo(aur2009[, 2:12], as.POSIXct(delme)) : >> ?some methods for ?zoo? objects do not work if the index entries in >> ?order.by? are not unique >> >> So I investigate: >> > any(duplicated(aur2009ts[,1])) >> [1] TRUE >> >> > aur2009ts[(duplicated(aur2009ts[,1])),1:5] >> ? ? ? ? ? ? ? ? ? ? Date.Time E_filled E_filled_flag LE_filled >> LE_filled_flag >> 2009-03-29 02:00:00 1238284800 ? ? ? ?0 ? ? ? ? ? NaN ? ? ? 1.2 >> ?NaN >> 2009-03-29 02:30:00 1238286600 ? ? ? ?0 ? ? ? ? ? NaN ? ? ? 1.2 >> ?NaN >> >> But note the surprise: >> > aur2009ts[aur2009ts[,1]==1238284800,1:5] >> ? ? ? ? ? ? ? ? ? ? Date.Time E_filled E_filled_flag LE_filled >> LE_filled_flag >> 2009-03-29 01:00:00 1238284800 ? ? ? ?0 ? ? ? ? ? NaN ? ? -0.58 >> ?NaN >> 2009-03-29 02:00:00 1238284800 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?1.20 >> ?NaN >> > aur2009ts[aur2009ts[,1]==1238286600,1:5] >> ? ? ? ? ? ? ? ? ? ? Date.Time E_filled E_filled_flag LE_filled >> LE_filled_flag >> 2009-03-29 01:30:00 1238286600 ? ? ? ?0 ? ? ? ? ? NaN ? ? -0.34 >> ?NaN >> 2009-03-29 02:30:00 1238286600 ? ? ? ?0 ? ? ? ? ? NaN ? ? ?1.20 >> ?NaN >> >> The dates detected as duplicated are actually different times that got >> the same value in the ts version of the object! >> What am I doing wrong? They are all observations every 30min, why are >> these 2 encoded as the >> same time? >> >> Any help appreciated >> >> Agus >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Michael Sumner > Institute for Marine and Antarctic Studies, University of Tasmania > Hobart, Australia > e-mail: mdsumner at gmail.com >
Timothy Bates
2011-May-18 08:43 UTC
[R] Date_Time detected as Duplicated (but they are not!)
Dear Augustin: What are the duplicated times? Looks they really do occur twice or more in your original data: perhaps two stamps less time apart than the resolution of your clock? delme[duplicated(delme)] aur2009[[duplicated(delme),1] On 18 May 2011, at 8:49 AM, Agustin Lobo wrote:> and is it not possible to ignore savings time? My data are in UTC, > with no savings time changes >> delme = strptime(aur2009[,1], "%m/%d/%Y %H:%M",tz="UTC") >> any(duplicated(delme)) > [1] TRUE > >> delme = as.POSIXct(aur2009[,1], "%m/%d/%Y %H:%M",tz="UTC") >> any(duplicated(delme)) > [1] TRUE > > Agus > > On Wed, May 18, 2011 at 8:55 AM, Michael Sumner <mdsumner at gmail.com> wrote: >> See under "Note" in ?strptime: >> Remember that in most timezones some times do not occur and some >> occur twice because of transitions to/from summer time. >> ?strptime? does not validate such times (it does not assume a >> specific timezone), but conversion by ?as.POSIXct?) will do so. >> >> >> >> On Wed, May 18, 2011 at 3:53 PM, Agustin Lobo <Agustin.Lobo at ictja.csic.es> >> wrote: >>> >>> I have a problem with duplicated date_time stamps that I do not see as >>> duplicated. >>> >>> I read a file with observations taken every 30 minutes: >>> >>>> >>>> aur2009=read.csv(paste(datadir,"AUR_ECPP_2009.csv",sep="/"),sep=";",stringsAsFactors=F) >>>> aur2009[1:3,1:5] >>> Date.Time E_filled E_filled_flag LE_filled LE_filled_flag >>> 1 1/1/2009 0:00 0 NaN 5.86 NaN >>> 2 1/1/2009 0:30 0 NaN 5.05 NaN >>> 3 1/1/2009 1:00 0 NaN 5.56 NaN >>> >>>> delme = strptime(aur2009[,1], "%m/%d/%Y %H:%M") >>>> aur2009[,1]=as.POSIXct(delme) >>> Date.Time E_filled E_filled_flag LE_filled LE_filled_flag >>> 1 2009-01-01 00:00:00 0 NaN 5.86 NaN >>> 2 2009-01-01 00:30:00 0 NaN 5.05 NaN >>> 3 2009-01-01 01:00:00 0 NaN 5.56 NaN >>> >>>> aur2009ts = ts(aur2009) >>>> row.names(aur2009ts) = as.character(delme) >>>> aur2009ts[1:3,1:5] >>> Date.Time E_filled E_filled_flag LE_filled >>> LE_filled_flag >>> 2009-01-01 00:00:00 1230764400 0 NaN 5.86 >>> NaN >>> 2009-01-01 00:30:00 1230766200 0 NaN 5.05 >>> NaN >>> 2009-01-01 01:00:00 1230768000 0 NaN 5.56 >>> NaN >>> >>> Then: >>>> aur2009z = zoo(aur2009[,2:12],as.POSIXct(delme)) >>> Warning message: >>> In zoo(aur2009[, 2:12], as.POSIXct(delme)) : >>> some methods for ?zoo? objects do not work if the index entries in >>> ?order.by? are not unique >>> >>> So I investigate: >>>> any(duplicated(aur2009ts[,1])) >>> [1] TRUE >>> >>>> aur2009ts[(duplicated(aur2009ts[,1])),1:5] >>> Date.Time E_filled E_filled_flag LE_filled >>> LE_filled_flag >>> 2009-03-29 02:00:00 1238284800 0 NaN 1.2 >>> NaN >>> 2009-03-29 02:30:00 1238286600 0 NaN 1.2 >>> NaN >>> >>> But note the surprise: >>>> aur2009ts[aur2009ts[,1]==1238284800,1:5] >>> Date.Time E_filled E_filled_flag LE_filled >>> LE_filled_flag >>> 2009-03-29 01:00:00 1238284800 0 NaN -0.58 >>> NaN >>> 2009-03-29 02:00:00 1238284800 0 NaN 1.20 >>> NaN >>>> aur2009ts[aur2009ts[,1]==1238286600,1:5] >>> Date.Time E_filled E_filled_flag LE_filled >>> LE_filled_flag >>> 2009-03-29 01:30:00 1238286600 0 NaN -0.34 >>> NaN >>> 2009-03-29 02:30:00 1238286600 0 NaN 1.20 >>> NaN >>> >>> The dates detected as duplicated are actually different times that got >>> the same value in the ts version of the object! >>> What am I doing wrong? They are all observations every 30min, why are >>> these 2 encoded as the >>> same time? >>> >>> Any help appreciated >>> >>> Agus >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Michael Sumner >> Institute for Marine and Antarctic Studies, University of Tasmania >> Hobart, Australia >> e-mail: mdsumner at gmail.com >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.