On Mon, 30 Aug 2021, Richard O'Keefe wrote:>> x <- rnorm(samples.per.day * 365) >> length(x) > [1] 105120 > > Reshape the fake data into a matrix where each row represents one > 24-hour period. > >> m <- matrix(x, ncol=samples.per.day, byrow=TRUE)Richard, Now I understand the need to keep the date and time as a single datetime column; separately dplyr's sumamrize() provides daily means (too many data points to plot over 3-5 years). I reformatted the data to provide a sampledatetime column and a values column. If I correctly understand the output of as.POSIXlt each date and time element is separate, so input such as 2016-03-03 12:00 would now be 2016 03 03 12 00 (I've not read how the elements are separated). (The TZ is not important because all data are either PST or PDT.)> Now we can summarise the rows any way we want. > The basic tool here is ?apply. > ?rowMeans is said to be faster than using apply to calculate means, > so we'll use that. There is no *rowSds so we have to use apply > for the standard deviation. I use ?head because I don't want to > post tens of thousands of meaningless numbers.If I create a matrix using the above syntax the resulting rows contain all recorded values for a specific day. What would be the syntax to collect all values for each month? This would result in 12 rows per year; the periods of record for the five variables availble from that gauge station vary in length. Regards, Rich
On Thu, 2 Sep 2021, Rich Shepard wrote:> If I correctly understand the output of as.POSIXlt each date and time > element is separate, so input such as 2016-03-03 12:00 would now be 2016 03 > 03 12 00 (I've not read how the elements are separated). (The TZ is not > important because all data are either PST or PDT.)Using this script: discharge <- read.csv('../data/water/discharge.dat', header = TRUE, sep = ',', stringsAsFactors = FALSE) discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz = "", format = '%Y-%m-%d %H:%M', optional = 'logical') discharge$cfs <- as.numeric(discharge$cfs, length = 6) I get this result:> head(discharge)sampdate cfs 1 2016-03-03 12:00:00 149000 2 2016-03-03 12:10:00 150000 3 2016-03-03 12:20:00 151000 4 2016-03-03 12:30:00 156000 5 2016-03-03 12:40:00 154000 6 2016-03-03 12:50:00 150000 I'm completely open to suggestions on using this output to calculate monthly means and sds. If dplyr:summarize() will do so please show me how to modify this command: disc_monthly <- ( discharge %>% group_by(sampdate) %>% summarize(exp_value = mean(cfs, na.rm = TRUE)) because it produces daily means, not monthly means. TIA, Rich