We have two time series: the first is a series of weekly counts of
isolates of RSV (respiratory syncytial virus) by pathology laboratories,
and the second is a series of weekly counts of cases of bronchiolitis in
young children presenting to hospital emergency departments.
Bronchiolitis in young children is usually caused by RSV infection, and
simple visual inspection reveals a very close correspondence between the
two series, both of which show strong seasonality and also corresponding
variation from year to year.
My question is how to approach the analysis of these data using R. Here
is what we have done so far (guided by Diggle and MASS):
1) Create two time-series (ts) objects from the data, making sure the
corresponding observations in the two ts are in fact contemporaneous.
2) Decompose each ts into seasonal, trend and remainder components using
stl() and decompose().
3) Examine the cross-correlogram for the raw ts and the decomposed
components using ccf() - this revealed that bronchiolitis cases were
maximally cross-correlated with RSV isolates at a 3 week lag.
4) Examine periodograms of the raw ts and the pre-whitened data (the
remainders) - most of the energy is in the week-to-week variation.
5) Calculate the cross-correlation between the remainders of the two
series using a 3 week lag - it is about 0.55.
OK as far as it goes, but these results only obliquely shed light on the
question we want to answer: "Can lab RSV isolate counts be used to
predict the hospital bronchiolitis case-load a few weeks hence, and if
so, how reliably?"
Stephen Morrell [Morrell S. Times Series (Box-Jenkins) Analysis. In:
Kerr C, Taylor R, Heard G. /Handbook of Public Health Methods, /McGraw
Hill, Sydney, 1998.] suggests the following approach (direct quote
observing fair-use copyright provisions follows):
"In the first stage of analysis, the outcome and predictor series are
pre-analysed tp identify the form of the transfer function. In the
second stage the transfer function is identified and its residuals
computed. Finally, an ARIMA model is fitted to the residuals to assess
the adequacy of teh overall model. A ratio of U- and S-polynomials,
U(B)/S(B), called impulse weights, is used to specify the effect of a
unit change in the predictor series on teh outcome series. These weights
are initally estimated by a cross-correlation function (CCF), which
assess the relationship between the de=trended predictor series on the
de=trended outcome series (with autocorrelation influences removed,
called prewhitening)."
Is this a reasonable approach to our question? Hints on how to proceed
are most welcome, and/or references to papers or texts which might
render us a bit less clueless wrt this problem.
Regards,
Tim C
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._