Suresh Krishna
2005-Feb-12 11:11 UTC
[R] comparing predicted sequence A'(t) to observed sequence A(t)
Hi, I have a question that I have not been succesful in finding a definitive answer to; and I was hoping someone here could give me some pointers to the right place in the literature. A. We have 4 sets of data, A(t), B(t), C(t), and D(t). Each of these consists of a series of counts obtained in sequential time-intervals: so for example, A(t) would be something like: Count A(t): 25, 28, 26, 34 ...... Time (ms): 0-10, 10-20, 20-30, 30-40 ....... Each count in the series A(t) is obtained by summing the total number of observed counts over multiple (say 50), independent repetitions of that time-series. These counts are generally known to be Poisson distributed, and the 4 processes A(t), B(t), C(t) and D(t) are independent of each other. B. It appears on visual observation that the following relationship holds; and such a relationship would also be expected on mechanistic considerations. A(t) = B(t) + C(t) - D(t) We now want to test this hypothesis statistically. Because successive counts in the sequence are likely to be correlated, isnt it true that none of these methods are valid ? Perhaps for other reasons as well ? a)Doing a chi-squared test to see if the predicted curve for A(t) deviates significantly from the observed A(t); this also seems to not take the variability of the predicted curve into account. b)Doing a regression of the predicted values of A(t) against the actual values of A(t) and checking for deviations of slope from 1 and intercept from 0 ? Here, in addition to lack of independence, the fact that X-values are not fixed (i.e. are variable) and the fact that X and Y are Poisson distributed counts should also be taken into account, right ? I would be very grateful if someone could point me to methods to handle this kind of situation, or where to look for them. Is there something in the time-series literature, for instance ? Thanks !! Suresh
Spencer Graves
2005-Feb-12 15:35 UTC
[R] comparing predicted sequence A'(t) to observed sequence A(t)
What do you mean by the following: A(t) = B(t) + C(t) - D(t)? Since you speak of regressing predicted against actual A(t), I gather this is not what you mean. Another question: Do you have numbers <-0 for either predicted or actual A(t)? If yes but only a very few, I might replace the 0's by 0.5 and any negatives by 0.25, take their logarithms, then try acf, pacf, ar, arima(..., xreg=A.pred), etc. There are doubtless better methods. However, if I had to have an answer today, I think I'd try this, then discuss implications and limitations. If I needed a more sophisticated answer and I had a few weeks or months to work on it, I might develop some way to simulate a process that seemed to describe what I thought generated these numbers and compare simulated results with actual, under a variety of hypotheses, obtaining various kinds of p-values, etc. hope this helps. spencer graves Suresh Krishna wrote:> > Hi, > > I have a question that I have not been succesful in finding a > definitive answer to; and I was hoping someone here could give me some > pointers to the right place in the literature. > > A. We have 4 sets of data, A(t), B(t), C(t), and D(t). Each of these > consists of a series of counts obtained in sequential time-intervals: > so for example, A(t) would be something like: > > Count A(t): 25, 28, 26, 34 ...... > Time (ms): 0-10, 10-20, 20-30, 30-40 ....... > > Each count in the series A(t) is obtained by summing the total number > of observed counts over multiple (say 50), independent repetitions of > that time-series. These counts are generally known to be Poisson > distributed, and the 4 processes A(t), B(t), C(t) and D(t) are > independent of each other. > > B. It appears on visual observation that the following relationship > holds; and such a relationship would also be expected on mechanistic > considerations. > > A(t) = B(t) + C(t) - D(t) > > We now want to test this hypothesis statistically. > > Because successive counts in the sequence are likely to be correlated, > isnt it true that none of these methods are valid ? Perhaps for other > reasons as well ? > > a)Doing a chi-squared test to see if the predicted curve for A(t) > deviates significantly from the observed A(t); this also seems to not > take the variability of the predicted curve into account. > > b)Doing a regression of the predicted values of A(t) against the > actual values of A(t) and checking for deviations of slope from 1 and > intercept from 0 ? Here, in addition to lack of independence, the fact > that X-values are not fixed (i.e. are variable) and the fact that X > and Y are Poisson distributed counts should also be taken into > account, right ? > > I would be very grateful if someone could point me to methods to > handle this kind of situation, or where to look for them. Is there > something in the time-series literature, for instance ? > > Thanks !! > > Suresh > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
Christian Jost
2005-Feb-13 18:55 UTC
[R] Re: comparing predicted sequence A'(t) to observed sequence A(t)
>From: Suresh Krishna <ssk2031 at columbia.edu> >Subject: [R] comparing predicted sequence A'(t) to observed sequence > A(t) >To: r-help at stat.math.ethz.ch >Message-ID: <420DE463.8080009 at columbia.edu> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > >Hi, > >I have a question that I have not been succesful in finding a definitive >answer to; and I was hoping someone here could give me some pointers to >the right place in the literature. > >A. We have 4 sets of data, A(t), B(t), C(t), and D(t). Each of these >consists of a series of counts obtained in sequential time-intervals: so > for example, A(t) would be something like: > >Count A(t): 25, 28, 26, 34 ...... >Time (ms): 0-10, 10-20, 20-30, 30-40 ....... > >Each count in the series A(t) is obtained by summing the total number of >observed counts over multiple (say 50), independent repetitions of that >time-series. These counts are generally known to be Poisson distributed, >and the 4 processes A(t), B(t), C(t) and D(t) are independent of each other. > >B. It appears on visual observation that the following relationship >holds; and such a relationship would also be expected on mechanistic >considerations. > >A(t) = B(t) + C(t) - D(t) > >We now want to test this hypothesis statistically. > >Because successive counts in the sequence are likely to be correlated, >isnt it true that none of these methods are valid ? Perhaps for other >reasons as well ? > >a)Doing a chi-squared test to see if the predicted curve for A(t) >deviates significantly from the observed A(t); this also seems to not >take the variability of the predicted curve into account. > >b)Doing a regression of the predicted values of A(t) against the actual >values of A(t) and checking for deviations of slope from 1 and intercept >from 0 ? Here, in addition to lack of independence, the fact that >X-values are not fixed (i.e. are variable) and the fact that X and Y are >Poisson distributed counts should also be taken into account, right ? > >I would be very grateful if someone could point me to methods to handle >this kind of situation, or where to look for them. Is there something in >the time-series literature, for instance ? > >This is a frequent problem I also encounter when wanting to compare two dynamic processes (e.g. temporal evolution of number of ants on two branches). To my knowledge there is no general statistical way to compare these two time series. But in your case you might try a repeated measure anova, e.g. to compare A(t) against B(t)+C(t)-D(t), put in a first column 'counts' the counts for A and then for B+C-D, in a second column 'time' the correspoding t, in a third column 'series' mark the A measures by "A" and the B+C-D measures by "BCD", then run an anova summary(aov(counts ~ series:time + Error(series))) This works if there are replicates of conditions "A" and "BDC", but I am not a statistitian and am not sure whether it applies to your case (though, you seem to have repetitions, so you might use this information instead of only looking at the sums). For a hands-on example with behavioural data of mice (with or without treatment, 4 training session for each mouse, does treatment affect training) see http://cognition.ups-tlse.fr/_christian/M7P14M/TP7/TP-Anova.pdf with the data in http://cognition.ups-tlse.fr/_christian/M7P14M/TP7/tp-anova.rda (well, its in french, but the R formulas should be understandable ;-) Well, as I said, I am not a statistitian, there might be a logical flaw in applying repeated measures anova to time series, if anybody out there sees one please tell us ;-) Best, Christian. -- *********************************************************** http://cognition.ups-tlse.fr/vas-y.php?id=chj jost at cict.fr Christian Jost (PhD, MdC) Centre de Recherches sur la Cognition Animale Universite Paul Sabatier, Bat IV R3 118 route de Narbonne 31062 Toulouse cedex 4, France Tel: +33 5 61 55 64 37 Fax: +33 5 61 55 61 54