adiamond@fas.harvard.edu
2004-Jul-26 22:07 UTC
[R] qcc package & syndromic surveillance (multivar CUSUM?)
Dear R Community: I am working on a public health early warning system, and I see that the qcc package allows for CUSUM and other statistical quality tests but I am not sure if my project is a good match for qcc functions as written. Any advice you may have is very much appreciated. I have four years worth of daily counts of emergency room admissions for different conditions (e.g. respiratory, neurologic, etc) from several local hospitals. the data looks like this... DAY 1 Respiratory Neuro ... Hospital A: 10 12 . . . . . . Hospital F: 7 14 DAY 2 Respiratory Neuro ... Hospital A: 10 12 . . . . . . Hospital F: 7 14 ... etc., and my goal is to do a kind of multivariate quality control test (without fitting a GLM), that would run each day after the data is updated and be able to answer the question: "Has there been a significant variation in the central tendency of the data?" An analogous problem would be detecting the early signs of a shift in global trading patterns by examining stock market indexes in different countries around the world, updating and testing the data each business day. Thank you, Alexis Diamond
Spencer Graves
2004-Jul-27 01:07 UTC
[R] qcc package & syndromic surveillance (multivar CUSUM?)
What do you think is most plausible: an abrupt jump or a gradual drift? To detect an abrupt jump from a null hypothesis H0 to an alternative H1, the tool of choice seems to be a cumulative sum (CUSUM) of log(likelihood ratio). If H0 and H1 are normal distributions with equal variances, this general rule specializes to a one-sided cumulative sum of (y[t]-mu.bar), where mu.bar is the average of the means under H0 and H1. However, to detect a gradual drift modeled as a random walk, the theory says that the best tool is something like an exponentially weighted moving average (EWMA). For monitor design, I like to write the following: joint = observation * prior = posterior * predictive f(y & mu) = f(y | mu)*f(mu) = f(mu | y)*f(y). When each observation arrives, I test to see if it is consistent with the predictive distribution f(y). If it is not consistent, I report a potential problem. If it is consistent, I incorporate it into the EWMA [or CUSUM], as described by the posterior f(mu | y). For more information on this, see "Bayes' Rule of Information" and other "foundations of monitoring" reports downloadable from "www.prodsyse.com". This kind if use of the predictive distribution is discussed in the West and Harrison (1999) book cited in the "Bayes' Rule" paper, and a Poisson EWMA is derived on p. 5. What you use, of course, depends on the events you hope to capture. For your applications, I might consider running separate monitors on each condition-hospital pair plus monitors on the totals for each hospital and for each condition, plus one for overall. I might use the qcc package to calibrate my thresholds, but do the daily computations in some data base system. Selecting thresholds is not easy, in part because the assumptions you make for monitor design will never hold exactly in practice. The result of this is that any thresholds you compute based purely on theory will be wrong. However, if you tune your thresholds based on the years of historical data you have, you should be on safer ground. This theory should get you close to an optimal arrangement. I think I would use quite loose thresholds for the hospital-condition (interaction) monitors, thighter thresholds for the condition and hospital totals, and the tightest threshold for the overall. Monitors on specific conditions should be sensitive to epidemics or to an effective biological warfare terrorist attack; if this is your concern, a CUSUM might be best. Monitors on specific hospitals should be sensitive to changes in the competence of local staff (suggesting a preference for an EWMA) or to a sudden local outbreak of something (suggesting a CUSUM). Monitors on condition-hospital pairs might be sensitive to local changes in preferred diagnoses. I would run Cusums or EWMAs but not both: Either will catch conditions most quickly caught by the other, with possible a little longer delay. hope this helps. spencer graves adiamond at fas.harvard.edu wrote:>Dear R Community: > >I am working on a public health early warning system, and >I see that the qcc package allows for CUSUM and other statistical quality tests >but I am not sure if my project is a good match for qcc functions as written. >Any advice you may have is very much appreciated. > >I have four years worth of daily counts of emergency room admissions for >different conditions (e.g. respiratory, neurologic, etc) from several local >hospitals. the data looks like this... > >DAY 1 > Respiratory Neuro ... >Hospital A: 10 12 >. . . >. . . >Hospital F: 7 14 > > >DAY 2 Respiratory Neuro ... >Hospital A: 10 12 >. . . >. . . >Hospital F: 7 14 ... > >etc., > >and my goal is to do a kind of multivariate quality control test (without >fitting a GLM), that would run each day after the data is updated and be able >to answer the question: >"Has there been a significant variation in the central tendency of the data?" > >An analogous problem would be detecting the early signs of a shift in global >trading patterns by examining stock market indexes in different countries >around the world, updating and testing the data each business day. > >Thank you, > >Alexis Diamond > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > >