On Thu, 2 Sep 2021, Jeff Newmiller wrote:> Regardless of whether you use the lower-level split function, or the > higher-level aggregate function, or the tidyverse group_by function, the > key is learning how to create the column that is the same for all records > corresponding to the time interval of interest.Jeff, I definitely agree with the above> If you convert the sampdate to POSIXct, the tz IS important, because most > of us use local timezones that respect daylight savings time, and a naive > conversion of standard time will run into trouble if R is assuming > daylight savings time applies. The lubridate package gets around this by > always assuming UTC and giving you a function to "fix" the timezone after > the conversion. I prefer to always be specific about timezones, at least > by using so something like > Sys.setenv( TZ = "Etc/GMT+8" ) > which does not respect daylight savings.I'm not following you here. All my projects have always been in a single time zone and the data might be recorded at June 19th or November 4th but do not depend on whether the time is PDT or PST. My hosts all set the hardware clock to local time, not UTC. As the location(s) at which data are collected remain fixed geographically I don't understand why daylight savings time, or non-daylight savings time is important.> Regarding using character data for identifying the month, in order to have > clean plots of the data I prefer to use the trunc function but it returns > a POSIXlt so I convert it to POSIXct:I don't use character data for months, as far as I know. If a sample data is, for example, 2021-09-03 then monthly summaries are based on '09', not 'September.' I've always valued your inputs to help me understand what I don't. In this case I'm really lost in understanding your position. Have a good Labor Day weekend, Rich
Jeff Newmiller
2021-Sep-04 06:30 UTC
[R] Calculate daily means from 5-minute interval data
On Fri, 3 Sep 2021, Rich Shepard wrote:> On Thu, 2 Sep 2021, Jeff Newmiller wrote: > >> Regardless of whether you use the lower-level split function, or the >> higher-level aggregate function, or the tidyverse group_by function, the >> key is learning how to create the column that is the same for all records >> corresponding to the time interval of interest. > > Jeff, > > I definitely agree with the above > >> If you convert the sampdate to POSIXct, the tz IS important, because most >> of us use local timezones that respect daylight savings time, and a naive >> conversion of standard time will run into trouble if R is assuming >> daylight savings time applies. The lubridate package gets around this by >> always assuming UTC and giving you a function to "fix" the timezone after >> the conversion. I prefer to always be specific about timezones, at least >> by using so something like >> Sys.setenv( TZ = "Etc/GMT+8" ) >> which does not respect daylight savings. > > I'm not following you here. All my projects have always been in a single > time zone and the data might be recorded at June 19th or November 4th but do > not depend on whether the time is PDT or PST. My hosts all set the hardware > clock to local time, not UTC.The fact that your projects are in a single time zone is irrelevant. I am not sure how you can be so confident in saying it does not matter whether the data were recorded in PDT or PST, since if it were recorded in PDT then there would be a day in March with 23 hours and another day in November with 25 hours, but if it were recorded in PST then there would always be 24 hours in every day, and R almost always assumes daylight savings if you don't tell it otherwise! I am also normally working with automated collection devices that record data in standard time year round. But if you fail to tell R that this is the case, then it will almost always assume your data are stored with daylight savings time and screw up the conversion to computable time format. This screw up may include NA values in spring time when standard time has perfectly valid times between 1am and 2am on the changeover day, but in daylight time those timestamps would be invalid and will end up as NA values in your timestamp column.> As the location(s) at which data are collected remain fixed geographically I > don't understand why daylight savings time, or non-daylight savings time is > important.I am telling you that it is important _TO R_ if you use POSIXt times. Acknowledge this and move on with life, or avoid POSIXt data. As I said, one way to acknowledge this while limiting the amount of attention you have to give to the problem is to use UTC/GMT everywhere... but this can lead to weird time of day problems as I pointed out in my timestamp cleaning slides: https://jdnewmil.github.io/time-2018-10/TimestampCleaning.html If you want to use GMT everywhere... then you have to use GMT explicitly because the default timezone in R is practically never GMT for most people. You. Need. To. Be. Explicit. Don't fight it. Just do it. It isn't hard.>> Regarding using character data for identifying the month, in order to have >> clean plots of the data I prefer to use the trunc function but it returns >> a POSIXlt so I convert it to POSIXct: > > I don't use character data for months, as far as I know. If a sample data > is, for example, 2021-09-03 then monthly summaries are based on '09', not > 'September.'You are taking this out of context and complaining that it has no context. This was a reply to a response by Andrew Simmons in which he used the "format" function to create unique year/month strings to act as group-by data. Earlier, when I originally responded to clarify how you could use the dplyr group_by function, I used your character date column without combining it with time or convertint to Date at all. If you studied these responses more carefully you would indeed have been using character data for grouping in some cases, and my only point was that doing so can indeed be a shortcut to the immediate answer while being troublesome later in the analysis. Accusing you of mishandling data was not my intention.> I've always valued your inputs to help me understand what I don't. In this > case I'm really lost in understanding your position.I hope my comments are clear enough now.> Have a good Labor Day weekend,Thanks! (Not relevant to many on this list.) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k