Daniel Haugstvedt
2013-Aug-22 13:22 UTC
[R] From POSIXct to numeric and back with time zone
From POSIXct to numeric and back with time zone I am running regressions on data which has time series with different time resolution. Some data has hourly resolution, while most has either daily or weekly resolution. Aggregation is used to make the hourly data daily, while liner interpolation is used to find daily data from the weekly time series. This data manipulation requires some careful handling of date and time. I do travel across time zones and want my code to keep working as the system time zone changes. So far quick fixes have been used to handle problems. Now I am trying to get a grip and make a more robust solution. Google and forums have left me with an increasing amount of questions instead of answers. I have chosen one question and one problem. The question, which should be trivial, should allow me to solve the problem. However, I have been stuck with this all day so if anyone know the solution to the problem straight away, it will be highly appreciated. The question: What does the tz attribute in POSIXct do? As an example, two dates with different time zone attributes, tmp1 and tmp2, are compared.> tmp1 = as.POSIXct('2000-01-30',origin = '1970-01-01', tz = "UTC")> tmp1[1] "2000-01-30 UTC"> tmp2 = as.POSIXct('2000-01-30',origin = '1970-01-01', tz = "ETC")> tmp2[1] "2000-01-30 UTC" The time displayed, including the time zone, is the same but the tzone attributes are not.> attributes(tmp1)$class [1] "POSIXct" "POSIXt" $tzone [1] "UTC"> attributes(tmp2)$class [1] "POSIXct" "POSIXt" $tzone [1] "ETC" As a final check the numbers are compared> as.numeric(tmp1)[1] 949190400> as.numeric(tmp2)[1] 949190400 and they match. I was under the impression that POSIXct always used UTC and that the tzone attribute was only for displaying and converting to POSIXlt but that seems wrong in the above example. As far as I can see, the tzone attribute is neither used for display, as both dates display as UTC, and not used to change to origin, as both numbers are the same. My question is, what does the tzone attribute in POSIXct actually do? I hope increased understanding of that part will let me solve the true problem without further assistance. The problem: from POSIXct to numeric and back.> tmp3 = as.POSIXct( '2000-01-30', origin = '1970-01-01' )tmp3 [1] "2000-01-30 CET" Converting it to numeric and back to POSIXct it becomes> as.POSIXct( as.numeric( tmp3 ),origin = '1970-01-01' )[1] "2000-01-29 23:00:00 CET" which is "2000-01-30 UTC". By converting to numeric and back to POSIXct, an hour has been added. This is not the behavior I want. I am trying to sett the tz attribute but it does not change the added hour. Trying to understand more of what is going on and to replicate the original date, I set the time zone to be CET in both conversions. as.POSIXct( as.numeric( as.POSIXct( '2000-01-30', origin = '1970-01-01', tz = "CET" ) ), origin = '1970-01-01', tz = "CET" ) [1] "2000-01-29 23:00:00 CET" Which is "2000-01-30 UTC". Choosing set the time zone to be UTC in both conversions, as.POSIXct( as.numeric( as.POSIXct( '2000-01-30', origin = '1970-01-01', tz = "UTC" ) ), origin = '1970-01-01', tz = "UTC" ) [1] "2000-01-30 UTC", I want to convert the date "2000-01-30 CET" to POSIXct and then over to numeric before finally converting back to POSIXct without changing the date, time or time zone. I seem to get "2000-01-30 UTC" regardless of what I try so I am definitely missing something obvious. Best Regards Daniel Haugstvedt Ph.d.-student, NTNU, Trondheim, Norway PS. I am aware that my spelling is poor. Any comments on how it could be improved are appreciated but send it to me personally and not the list. [[alternative HTML version deleted]]
Daniel Haugstvedt
2013-Aug-23 10:12 UTC
[R] From POSIXct to numeric and back with time zone
I am replying to my own question in case someone else finds this tread and needs help with the same problem. Thanks to Mark Leeds for helping me on my way. Any errors or flaws are mine since I have rewritten most of his comments to make sure I understood them correctly. First three general recommendations for time zone problems: 1) When asking time zone related questions always give OS information. It does not hurt to give information on version etc. either. My system is OSX, lion (10.8.4). Using str(R.Version()) to get system information is one option. str(R.Version()) List of 14 $ platform : chr "x86_64-apple-darwin9.8.0" $ arch : chr "x86_64" $ os : chr "darwin9.8.0" $ system : chr "x86_64, darwin 9.8.0" . . . $ version.string: chr "R version 2.15.1 (2012-06-22)" $ nickname : chr "Roasted Marshmallows 2) Before you do ANYTHING with timezones, put Sys.setenv(TZ = "UTC") in your .Rprofile ir at the tiop of the code you're working in. Otherwise, if you start trying to convert date time objects to plain date objects, things can really get whacked. 3) Check that the time zone you are using is valid. I am no expert on this, but from what I understand, in OSX a valid time zone has the name of one of the files in the folder /usr/share/zoneinfo, with some obvious exceptions like the files "iso3166.tab", "posixrules" and "zone.tab". It can also be one of the entries in the file, /usr/share/zoneinfo/zone.tab. CET and CEST (daylight savings time) are the time zone my system use when nothing is specified. I am sorry for writing ETC in one of the lines in the first email. Now, to the problem: How do I change from POSIXct to numeric and back with another time zone than UTC? I have tried to simplify the original question and attempted and answer. Please correct me if I am wrong. Sys.setenv(TZ = "UTC") ## Number of seconds from '1970-01-01 00:00:00 UTC' to '2000-01-30 00:00:00 CET' not ## counting leap seconds. Display as CET date tmp = as.POSIXct( '2000-01-30', origin = '1970-01-01' , tz = "CET") ## Number of seconds from '1970-01-01 00:00:00 UTC' to '2000-01-30 00:00:00 CET' not ## counting leap seconds. Display as UTC date tmp2 =as.POSIXct( as.numeric( tmp ),origin = '1970-01-01' , tz = "UTC") ## What I wanted was to go to numeric and back to the original with the same time zone. What I got was ## the number of seconds from '1970-01-01 00:00:00 UTC' to '2000-01-30 00:00:00 UTC' not ## counting leap seconds. Display as CET date. Which is 60*60 seconds less then I expect. tmp3 = as.POSIXct( as.numeric( tmp ),origin = '1970-01-01' , tz = "CET") ## Solution: Convert to the desired time zone after as.POSIXct has been used wit UTC to get the ## correct number of seconds tmp4 = tmp2 attributes(tmp4)$tzone = 'CET' tmp [1] "2000-01-30 CET"> tmp2[1] "2000-01-29 23:00:00 UTC"> tmp3[1] "2000-01-29 23:00:00 CET"> tmp4[1] "2000-01-30 CET"> > as.numeric(tmp)[1] 949186800> as.numeric(tmp2)[1] 949186800> as.numeric(tmp3)[1] 949183200> as.numeric(tmp4)[1] 949186800 My conclusions are 1) The tz argument sets the tzone attribute but it also determines how the entered date should be interpreted IF the date is entered as a string. 2) If the date is entered as numeric it is assumed to be the number of seconds from UTC to UTC and the tz argument is used to add / subtract the number of seconds which converts it to the time zone specified. Some additional conclusions that I came across while testing a bit. The code which made me draw them are attached at the end. 3) If a time zone is not needed the tz argument does nothing. It sets the tzone but it does not change it. 4) The origin is assumed to be UTC regardless of what Sys.timezone() say as long as no time zone for the origin is specified. I checked this by changing the Sys.timezone() to CET before running the example again. Best regards Daniel Haugstvedt Ph.d student NTNU, Trondheim, Norway ## If a time zone is not needed the tz argument does nothing. It sets the tzone but it does not change it. Sys.setenv(TZ = "UTC") tmp = as.POSIXct( '2000-01-30', origin = '1970-01-01' , tz = "CET") tmp2 = as.POSIXct(tmp, tz = "CET")> tmp[1] "2000-01-30 CET"> tmp2[1] "2000-01-30 CET" ## Sys.setenv does not change the time zone of the origin Sys.setenv(TZ = "CET") tmp5 = as.POSIXct( '2000-01-30', origin = '1970-01-01' , tz = "CET") tmp6 =as.POSIXct( as.numeric( tmp5 ),origin = '1970-01-01' , tz = "UTC") tmp7 = as.POSIXct( as.numeric( tmp5 ),origin = '1970-01-01' , tz = "CET") tmp5 [1] "2000-01-30 CET"> tmp6[1] "2000-01-29 23:00:00 UTC"> tmp7[1] "2000-01-29 23:00:00 CET"> > as.numeric(tmp5)[1] 949186800> as.numeric(tmp6)[1] 949186800> as.numeric(tmp7)[1] 949183200 On 22 Aug 2013, at 15:22, Daniel Haugstvedt <daniel.haugstvedt@gmail.com> wrote:> From POSIXct to numeric and back with time zone > > I am running regressions on data which has time series with different time resolution. Some data has hourly resolution, while most has either daily or weekly resolution. Aggregation is used to make the hourly data daily, while liner interpolation is used to find daily data from the weekly time series. This data manipulation requires some careful handling of date and time. > > I do travel across time zones and want my code to keep working as the system time zone changes. > > So far quick fixes have been used to handle problems. Now I am trying to get a grip and make a more robust solution. Google and forums have left me with an increasing amount of questions instead of answers. > > I have chosen one question and one problem. The question, which should be trivial, should allow me to solve the problem. However, I have been stuck with this all day so if anyone know the solution to the problem straight away, it will be highly appreciated. > > > The question: What does the tz attribute in POSIXct do? > > > > As an example, two dates with different time zone attributes, tmp1 and tmp2, are compared. > > > > tmp1 = as.POSIXct('2000-01-30',origin = '1970-01-01', tz = "UTC") > > > tmp1 > > [1] "2000-01-30 UTC" > > > > tmp2 = as.POSIXct('2000-01-30',origin = '1970-01-01', tz = "ETC") > > > tmp2 > > [1] "2000-01-30 UTC" > > > The time displayed, including the time zone, is the same but the tzone attributes are not. > > > > attributes(tmp1) > > $class > > [1] "POSIXct" "POSIXt" > > > $tzone > > [1] "UTC" > > > > > attributes(tmp2) > > $class > > [1] "POSIXct" "POSIXt" > > > $tzone > > [1] "ETC" > > > As a final check the numbers are compared > > > > as.numeric(tmp1) > > [1] 949190400 > > > as.numeric(tmp2) > > [1] 949190400 > > > and they match. > > > I was under the impression that POSIXct always used UTC and that the tzone attribute was only for displaying and converting to POSIXlt but that seems wrong in the above example. As far as I can see, the tzone attribute is neither used for display, as both dates display as UTC, and not used to change to origin, as both numbers are the same. My question is, what does the tzone attribute in POSIXct actually do? > > > I hope increased understanding of that part will let me solve the true problem without further assistance. > > > > > The problem: from POSIXct to numeric and back. > > > > tmp3 = as.POSIXct( '2000-01-30', origin = '1970-01-01' ) > > tmp3 > > [1] "2000-01-30 CET" > > > Converting it to numeric and back to POSIXct it becomes > > > as.POSIXct( as.numeric( tmp3 ),origin = '1970-01-01' ) > > [1] "2000-01-29 23:00:00 CET" > > > which is "2000-01-30 UTC". By converting to numeric and back to POSIXct, an hour has been added. This is not the behavior I want. I am trying to sett the tz attribute but it does not change the added hour. > > > Trying to understand more of what is going on and to replicate the original date, I set the time zone to be CET in both conversions. > > > as.POSIXct( as.numeric( as.POSIXct( '2000-01-30', origin = '1970-01-01', tz = "CET" ) ), origin = '1970-01-01', tz = "CET" ) > > [1] "2000-01-29 23:00:00 CET" > > > Which is "2000-01-30 UTC". Choosing set the time zone to be UTC in both conversions, > > > as.POSIXct( as.numeric( as.POSIXct( '2000-01-30', origin = '1970-01-01', tz = "UTC" ) ), > origin = '1970-01-01', tz = "UTC" ) > > [1] "2000-01-30 UTC", > > > I want to convert the date "2000-01-30 CET" to POSIXct and then over to numeric before finally converting back to POSIXct without changing the date, time or time zone. I seem to get "2000-01-30 UTC" regardless of what I try so I am definitely missing something obvious. > > > Best Regards > > > Daniel Haugstvedt > > Ph.d.-student, > > NTNU, Trondheim, Norway > > > PS. I am aware that my spelling is poor. Any comments on how it could be improved are appreciated but send it to me personally and not the list. >[[alternative HTML version deleted]]