Chaouch, Aziz
2006-May-25 21:21 UTC
[R] Computing a reliability index of a statistic with missing data
Hi All, I'd like to compute a kind of reliability index (RI) that would in a sense stand as a measure of reliability of a statistic (histogram etc) computed on a time serie with missing values. The final goal is that: RI=1 for a perfect reliability RI=0 for a total unreliability (no data at all as an extreme case...) The percentage of missing data is one indication: the more missing data, the less confidence we can have in the statistic. But the distribution of missing data throughout the data serie is important as well: independently of the number of missing data, if available data are regularily spaced in time the RI should be higher than if available data are irregulary spaced. As a measure of sampling regularity, I thought about computing the time to next record and then take its variance over the time interval on which the statistic is computed. The variance of the time to next record would be a measure of sampling regularity so that the final RI could be of the form: RI=1 when n=0 RI~1/n*var(T) with n=% of missing data T=time to next record (in hours) However I need to "normalize" var(T) to use it to compute the RI. Does someone have an idea on how to do this (or another proposal to compute the RI)? Thanks, Aziz [[alternative HTML version deleted]]
Spencer Graves
2006-May-26 00:12 UTC
[R] Computing a reliability index of a statistic with missing data
Have you considered some kind of binary time series model? 'RSiteSearch("binary time series")' produced 150 hits. One of the first 20 mentioned "continuous-time hidden Markov chains" (http://finzi.psych.upenn.edu/R/library/repeated/html/chidden.html). I don't know if this will help you or not, but it might be worth examining. hope this helps. Spencer Graves Chaouch, Aziz wrote:> Hi All, > > I'd like to compute a kind of reliability index (RI) that would in a > sense stand as a measure of reliability of a statistic (histogram etc) > computed on a time serie with missing values. The final goal is that: > > RI=1 for a perfect reliability > RI=0 for a total unreliability (no data at all as an extreme case...) > > The percentage of missing data is one indication: the more missing data, > the less confidence we can have in the statistic. But the distribution > of missing data throughout the data serie is important as well: > independently of the number of missing data, if available data are > regularily spaced in time the RI should be higher than if available data > are irregulary spaced. As a measure of sampling regularity, I thought > about computing the time to next record and then take its variance over > the time interval on which the statistic is computed. The variance of > the time to next record would be a measure of sampling regularity so > that the final RI could be of the form: > > RI=1 when n=0 > RI~1/n*var(T) > > with > n=% of missing data > T=time to next record (in hours) > > However I need to "normalize" var(T) to use it to compute the RI. Does > someone have an idea on how to do this (or another proposal to compute > the RI)? > > Thanks, > > Aziz > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Apparently Analagous Threads
- Maximum likelihood estimate of bivariate vonmises-weibulldistribution
- Maximum likelihood estimate of bivariate vonmises-weibull distribution
- [as.POSIXlt]: Incorrect conversion only for some specific date/time (PR#8654)
- strucchange package Linux help
- WINS server looses election irregular in a heterogeneous network