On 2013-03-12 17:10, Nathan Miller wrote:> Hello,
>
> I have a challenge!
>
> I have a large dataset with three columns,
"date","temp", "location".
> "date" is in the format %m/%d/%y %H:%M, with a "temp"
recorded every 10
> minutes. These temperatures of surface temperatures and so fluctuate during
> the day, heating up and then cooling down, so the data is a series of peaks
> and troughs. I would like to develop a function that would go through a
> dataset consisting of many sequential dates and determine for each day the
> maximum hourly slope of temp~date for each site (the fastest hourly rate of
> heating). The output would be the date, the maximum hourly slope for that
> date, and the location. It would also be great if I could extract when
> during the day the maximum hourly slope occurred.
>
> I have been playing around with using the package lubridate to identify
> each hour of the day using something like this to create a separate column
> grouping the data into hours
>
> library(lubridate)
> data$date2 <- floor_date(data$date, "hour")
>
> I was then imagining something like this though this code doesn't work
as
> written.
>
> ddply(data, .(location, date2), function(d)
> max(rollapply(slope(d$temp~d$date, data=d)))
>
> Essentially what I'm imagining is calculating the slope (though I'd
have to
> write a quick slope function) of the date/temp relationship, use rollapply
> to apply this function across the dataset, and determine the maximum slope,
> grouped by location and hour (using date2). Hmm... and per day!
>
> This seems complicated. Can others think of a simpler, more elegant means
> of extracting this type of data? I struggled to put together a working
> example with a set of data, but if this doesn't make sense let me know
and
> I'll see what I can do.
>
>
> Thanks,
> Nate
First, let's ignore location; if you can do it for one location,
you can surely do it for others.
Second, let's ignore date; if you can do it for one date, you
can surely do it for others.
That leaves us with the question of what you want to do for one
given date. If you want the maximum slope for any 60-minute interval
on that date (which I take your question to mean), then rollapply
should do the job. But I'm not very familiar with zoo, so here's a
crude approach:
d <- data.frame(time = 1:72, temp = rnorm(72))
slope <- rep(NA, 72)
for(i in 6:72) {
slope[i] <- coef(lm(temp ~ time, data = d, subset = (i-5):i))[2]
}
maxslope <- max(slope, na.rm = TRUE)
idx <- which.max(slope)
Obviously, this can be extended to cover more than a 24-hour period.
Now, let's wait for Gabor to show us the trivial way with zoo::rollapply.
Peter Ehlers