Andrew Robinson
2006-Aug-15 21:39 UTC
[R] A model for possibly periodic data with varying amplitude [repost, much edited]
Hi dear R community, I have up to 12 measures of a protein for each of 6 patients, taken every two or three days. The pattern of the protein looks periodic, but the height of the peaks is highly variable. I'm testing for periodicity using a Monte Carlo simulation envelope approach applied to a cumulative periodogram. Now I want to predict the location of the peaks in time. Of course, the peaks might be occurring on unmeasured days. Sadly, an NDA prohibits me from sharing the real data. The data look something like this: ################################################################## patient <- data.frame( day = c(1, 3, 5, 8, 10, 12, 15, 17, 19, 22, 24, 26), protein = c(5, 3, 10, 7, 2, 8, 25, 22, 7, 10, 12, 5) ) plot(patient$day, patient$protein, type="b") # This is my model: wave.form <- deriv3( ~ sin(2*pi*((day-offset)/period + 0.25)) * amplitude + mean, c("period", "offset", "amplitude", "mean"), function(day, period, offset, amplitude, mean){}) curve(wave.form(x, period=7, offset=2, mean=5, amplitude=4), from=1, to=30) require(nlme) wave.1 <- gnls(protein ~ wave.form(day, period, offset, amplitude, mean), start=list(period=7, offset=0, amplitude=10, mean=5), weights=varPower(), data=patient) wave.1 ################################################################# I emphasize that I don't care about the mean or amplitude, just the offset and the period. The problem for my model is that spikes in the data, such as at day 15, seem to make fitting the model quite unstable, (although it is fine in this artificial case). The best I can think of right now is to use the "weights=varPower()" model in gnls(), which I expect will down-weight the extreme spikes. Often that model fails to converge. Then, all I can do is fall back on modelling the log response. If I do both, then when the weights model converges the estimates sometimes differ, and sometimes considerably. I know that I need more data, but more data are not available. So, instead I need to do the best I can :) I have a half-baked idea of conditionally shrinking the peaks, so that they all look like the same size - perhaps by discretizing the data into, say, three possible values defined by quantiles of the response variable or the estimated slope - but I'm concerned that doing so risks creating the periodicity that I want to detect. Also the series is not necessarily stationary, so I wouild have to smooth it somehow first. On the other hand, maybe I can test for periodicity in the original data, and then fit on some transformed data. If I reject the null hypothesis of no periodic oscillation, then is it reasonable to condition subsequent analysis on the belief that periodicity exists, and try to determine its characteristics? I wonder if anyone can suggest an analysis technique or model that more directly accommodates the varying heights of the peaks? Or is this pretty much the best I can do? Any suggestions will be gratefully received. Cheers, Andrew -- Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: a.robinson at ms.unimelb.edu.au http://www.ms.unimelb.edu.au
Dieter Menne
2006-Aug-16 06:43 UTC
[R] A model for possibly periodic data with varying amplitude [repost, much edited]
Andrew Robinson <A.Robinson <at> ms.unimelb.edu.au> writes:> I have up to 12 measures of a protein for each of 6 patients, taken > every two or three days. The pattern of the protein looks periodic, > but the height of the peaks is highly variable. I'm testing for > periodicity using a Monte Carlo simulation envelope approach applied > to a cumulative periodogram. Now I want to predict the location of > the peaks in time. Of course, the peaks might be occurring on > unmeasured days.Have you checked one of the methods in Chapter 13. of MASS to obtain a smoothed estimate? Dieter