Good day everyone,
I want to use zoo(aggregate) to calculate
15-min mean values from a wind dataset which
has 1-min values. The data I have looks like this:
vector VDATE vector WS
1 1998-10-22:02:11 12.5
2 1998-10-22:02:12 10.1
3 1998-10-22:02:13 11.2
4 1998-10-22:02:14 10.5
5 1998-10-22:02:15 11.5
.
.
.
n 2005-06-30:23:59 9.1
I want to use:
aggregate(zoo(WS),'in 15-min intervals',mean)
How do you specify 'in 15-min intervals' using
vector VDATE? The length of VDATE cannot be
changed, otherwise it would be a trivial problem
because I can generate a 15-min spaced vector
using 'seq'.
Am I missing something?
Thanks a lot,
Augusto
--------------------------------------------
Augusto Sanabria. MSc, PhD.
Mathematical Modeller
Risk Research Group
Geospatial & Earth Monitoring Division
Geoscience Australia (www.ga.gov.au)
Cnr. Jerrabomberra Av. & Hindmarsh Dr.
Symonston ACT 2609
Ph. (02) 6249-9155
Assume VDATE is a character vector. If its a
factor first convert it using
VDATE <- as.character(VDATE)
Lets assume we only need the times portion
and later we handle the full case which may or
may be needed.
We create a times object from the times portion of
vdate and then in the aggregate statement we use
trunc.times -- note that trunc.times is a recent
addition to the chron package so make sure you have
the latest chron and R 2.2.1. See ?trunc.times
# test data
library(chron)
library(zoo)
VDATE <- c("1998-10-22:02:11", "1998-10-22:02:12",
"1998-10-22:02:13", "1998-10-22:02:14",
"1998-10-22:02:15")
WS <- c(12.5, 10.1, 11.2, 10.5, 11.5)
# convert VDATES to times class and aggregate
vtimes <- times(sub(".*:(..:..)", "\\1:00", VDATE))
aggregate(zoo(WS), trunc(vtimes, "00:15:00"), mean)
If we need the day part too then its only a little
harder.
Represent VDATE as a chron object, vdate. We do this
by extracting out the date and time portions
and converting each separately. We use regular
expressions to do that conversion but show in a
comment how to do it without regular expressions.
See R News 4/1 Help Desk for more info on this and
the table at the end of the article in particular.
# alternative way to convert to vdate would be:
# vdate <- chron(dates = as.numeric(as.Date(substring(VDATE, 1, 10))),
# times = paste(substring(VDATE, 12), 0, sep =":"))
vdate <- chron(dates = sub("(....)-(..)-(..).*",
"\\2/\\3/\\1", VDATE),
times = sub(".*:(..:..)", "\\1:00", VDATE))
aggregate(zoo(WS), chron(trunc(times(vdate), "00:15:00")), mean)
On 2/2/06, Augusto.Sanabria at ga.gov.au <Augusto.Sanabria at ga.gov.au>
wrote:>
> Good day everyone,
>
> I want to use zoo(aggregate) to calculate
> 15-min mean values from a wind dataset which
> has 1-min values. The data I have looks like this:
>
> vector VDATE vector WS
> 1 1998-10-22:02:11 12.5
> 2 1998-10-22:02:12 10.1
> 3 1998-10-22:02:13 11.2
> 4 1998-10-22:02:14 10.5
> 5 1998-10-22:02:15 11.5
> .
> .
> .
> n 2005-06-30:23:59 9.1
>
>
> I want to use:
>
> aggregate(zoo(WS),'in 15-min intervals',mean)
>
> How do you specify 'in 15-min intervals' using
> vector VDATE? The length of VDATE cannot be
> changed, otherwise it would be a trivial problem
> because I can generate a 15-min spaced vector
> using 'seq'.
>
> Am I missing something?
>
> Thanks a lot,
>
> Augusto
>
>
> --------------------------------------------
> Augusto Sanabria. MSc, PhD.
> Mathematical Modeller
> Risk Research Group
> Geospatial & Earth Monitoring Division
> Geoscience Australia (www.ga.gov.au)
> Cnr. Jerrabomberra Av. & Hindmarsh Dr.
> Symonston ACT 2609
> Ph. (02) 6249-9155
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
Here's another approach which can be easily implemented in SQL. 1. Start with the dates as character vectors, dt <- as.character(Sys.time()) 2. Extract the minutes and round them to 0,15,30,45: minutes <- floor(as.numeric(substr(dt,15,16))/15)*15 final.mins <- as.character(minutes) final.mins[final.mins == "0"] <- "00" 3. Get the dates you need for aggregating: final.dt <- paste(substr(dt,1,14),final.mins,":00",sep="") (If you had wanted to use 10 minutes, it would have been enough to transform MM:SS to M0:00.) 4. Use aggregate(), SQL GROUP BY etc 5. Finally, convert final.dt from character to datettime.> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor > Grothendieck > Sent: Thursday, February 02, 2006 1:44 AM > To: Augusto.Sanabria at ga.gov.au > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] 15-min mean values > > Assume VDATE is a character vector. If its a > factor first convert it using > > VDATE <- as.character(VDATE) > > Lets assume we only need the times portion > and later we handle the full case which may or > may be needed. > > We create a times object from the times portion of > vdate and then in the aggregate statement we use > trunc.times -- note that trunc.times is a recent > addition to the chron package so make sure you have > the latest chron and R 2.2.1. See ?trunc.times > > # test data > library(chron) > library(zoo) > VDATE <- c("1998-10-22:02:11", "1998-10-22:02:12", > "1998-10-22:02:13", "1998-10-22:02:14", "1998-10-22:02:15") > WS <- c(12.5, 10.1, 11.2, 10.5, 11.5) > > # convert VDATES to times class and aggregate > vtimes <- times(sub(".*:(..:..)", "\\1:00", VDATE)) > aggregate(zoo(WS), trunc(vtimes, "00:15:00"), mean) > > If we need the day part too then its only a little > harder. > > Represent VDATE as a chron object, vdate. We do this > by extracting out the date and time portions > and converting each separately. We use regular > expressions to do that conversion but show in a > comment how to do it without regular expressions. > See R News 4/1 Help Desk for more info on this and > the table at the end of the article in particular. > > # alternative way to convert to vdate would be: > # vdate <- chron(dates = as.numeric(as.Date(substring(VDATE, 1, 10))), > # times = paste(substring(VDATE, 12), 0, sep =":")) > > > vdate <- chron(dates = sub("(....)-(..)-(..).*", > "\\2/\\3/\\1", VDATE), > times = sub(".*:(..:..)", "\\1:00", VDATE)) > aggregate(zoo(WS), chron(trunc(times(vdate), "00:15:00")), mean) > > On 2/2/06, Augusto.Sanabria at ga.gov.au > <Augusto.Sanabria at ga.gov.au> wrote: > > > > Good day everyone, > > > > I want to use zoo(aggregate) to calculate > > 15-min mean values from a wind dataset which > > has 1-min values. The data I have looks like this: > > > > vector VDATE vector WS > > 1 1998-10-22:02:11 12.5 > > 2 1998-10-22:02:12 10.1 > > 3 1998-10-22:02:13 11.2 > > 4 1998-10-22:02:14 10.5 > > 5 1998-10-22:02:15 11.5 > > . > > . > > . > > n 2005-06-30:23:59 9.1 > > > > > > I want to use: > > > > aggregate(zoo(WS),'in 15-min intervals',mean) > > > > How do you specify 'in 15-min intervals' using > > vector VDATE? The length of VDATE cannot be > > changed, otherwise it would be a trivial problem > > because I can generate a 15-min spaced vector > > using 'seq'. > > > > Am I missing something? > > > > Thanks a lot, > > > > Augusto > > > > > > -------------------------------------------- > > Augusto Sanabria. MSc, PhD. > > Mathematical Modeller > > Risk Research Group > > Geospatial & Earth Monitoring Division > > Geoscience Australia (www.ga.gov.au) > > Cnr. Jerrabomberra Av. & Hindmarsh Dr. > > Symonston ACT 2609 > > Ph. (02) 6249-9155 > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Thanks a lot to Gabor for his help with the solution of this problem. The solution using "zoo(aggregate)" is easy to implement & efficient. I have calculated the 15min mean values of a 1min wind speed file containing 2.9 million records x 16 columns (size 179 MB) in just 144 seconds (R-2.1.1 running in a GNU/LINUX machine). Thanks to Bogdan too for his suggested solution using SQL, I have not tried that one yet. Augusto