Dear Lucia,
lucia a ?crit :> Hello,
> I'm very new to R, and so my question is simple.
>
> I have data record with 80 years of daily temperatures in one long
> string. The dates are also recorded, in YYMMDD format. I'd like to
> learn an elegant simple way to pull out the annual averages.
> (Obviously, every 4th year has 366 days.)
>
> I know I can set up a formal loop to create annual records and then
> average. But R seems to have such neat methods, is there some better
> way to do this?
For sake of simplicity, let's say you managed to store your data in a
two-column dataframe df, with columns date and temperature.
The first step is to know how to extract the "year" part of the dates.
The obvious solution is of course as.numeric(substr(date,1,2)), but I'd
rather transform your date variable in genuine R's Date class variable,
by as.Date(date,format="%y%m%d") or
as.POSIXlt(date,format="%y%m%d"),
the latter allowing easy year extraction by reading the "year"
component.
The second step is of course to apply to each of your relevant subtables
the "mean" function ; that's what tapply() is meant for.
So, a one-liner for your proble might be :
Means<- tapply(df$temperature,
as.POSIXlt(df$date,format="%y%m%d")$year,
FUN=mean, na.rm=TRUE)
However, this is only a very crude way to work with time series. I'd
consider converting permanently your date variable in a suitable
datetime representation. There are more than one way to do it : Julian
dates, chron objects and, more recently introduced, DateTime classes,
which seems to be "the standard way" to represent dates and times.
Unfortunately, different packages teeem to expect different
representations... <Sigh>.
HTH