Hi Folks, I've just come across a kind of problem which leads me to wonder how to approach it in R. Basically, each a set of items is subjected to a series of "impacts" until it eventually "fails". The "force" of each impact would depend on covariates X,Y say; but as a result of preceding impacts an item would be expected to have a "cumulative frailty" such that the probability of failure due to a particular impact would possibly increase according to the series of impacts already survived. Without the "cumulative frailty" one could envisage something like a logistic model for the probabiliy of failure at each impact, leading to a kind of generalised "exponential distribution" -- that is, the likelihood for each item would be of the form (1-P[1])*(1-P[2])*...*(1-P[n-1])*P[n] where P[i] could have a logistic model in terms of the values of X[i] and Y[i], and n is the index of the impact at which failure occurred. That is then a solvable problem. Even so, I'm not (so far) finding in the R resources the appropriate analogue of glm for this kind of model. I dare say a prolonged trawl through the various "survival" resources might lead to something applicable, but ... And then there's the cumulative frailty ... ! Suggestions welcome! With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 22-Sep-06 Time: 20:25:12 ------------------------------ XFMail ------------------------------
On Fri, 22 Sep 2006, Ted.Harding at nessie.mcc.ac.uk wrote:> Hi Folks, > > I've just come across a kind of problem which leads > me to wonder how to approach it in R. > > Basically, each a set of items is subjected to a series > of "impacts" until it eventually "fails". The "force" > of each impact would depend on covariates X,Y say; > but as a result of preceding impacts an item would be > expected to have a "cumulative frailty" such that the > probability of failure due to a particular impact would > possibly increase according to the series of impacts > already survived.So this is a discrete-time survival model.> Without the "cumulative frailty" one could envisage > something like a logistic model for the probabiliy > of failure at each impact, leading to a kind of > generalised "exponential distribution" -- that is, > the likelihood for each item would be of the form > > (1-P[1])*(1-P[2])*...*(1-P[n-1])*P[n] > > where P[i] could have a logistic model in terms of > the values of X[i] and Y[i], and n is the index of > the impact at which failure occurred. That is then > a solvable problem. > > Even so, I'm not (so far) finding in the R resources > the appropriate analogue of glm for this kind of > model. I dare say a prolonged trawl through the various > "survival" resources might lead to something applicable, > but ...What is inadequate about glm itself? The log-likelihood is a sum of terms over impacts, so fitting logisitic models for each impact can be done separately for each model. However, nnet() can fit them simultaneously (and couple them if you want).> And then there's the cumulative frailty ... !Add the history to that point into the model. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 22-Sep-06 Ted Harding wrote:> I've just come across a kind of problem which leads > me to wonder how to approach it in R. > > Basically, each a set of items is subjected to a series > of "impacts" until it eventually "fails". The "force" > of each impact would depend on covariates X,Y say; > [...] > ... one could envisage > something like a logistic model for the probabiliy > of failure at each impact, leading to a kind of > generalised "geometric distribution" -- that is, > the likelihood for each item would be of the form > > (1-P[1])*(1-P[2])*...*(1-P[n-1])*P[n] > > where P[i] could have a logistic model in terms of > the values of X[i] and Y[i], and n is the index of > the impact at which failure occurred. That is then > a solvable problem.I may be getting closer, but am well off target still! Starting with the case of no covariates, one has p*(1-p)^(n-1) (n = 1,2,...) or p*(1-p)^y (y = 0,1,...) which is a particular case of a negative binomial, with "target successes" = 1. In terms of the two-stage model for a negative binomial (see V&R MASS section 7.4), this corresponds to (mu^y * theta^theta)/(mu + theta)^(theta + y) *gamma(theta + y)/(gamma(theta)*y!) with theta = 1 and p = theta/(mu + theta) = 1/(mu + 1). This was in the context of having landed on glm.nb in MASS. However, glm.nb fits theta, which I would want to fix at 1. I don't see anything in ?glm.nb which allows theta to be held at a fixed value. The next snag is that it would not be straightforward, as far as I can see, to introduce covariates. The typical data set would be a set of sequences each of the form X1 Y1 0 X2 Y2 0 ....... Xn Yn 1 where the value of n is random, so varies from sequence to sequence. In the above negative binomial framework, y=(n-1) and the covariates for that value of y would be the set (X1,X2,...,Xn, Y1,Y2,...,Yn) and therefore of variable length for each observation (i.e. sequence as above, or value of y per sequence). I don't know how one can accomodate a variable length of covariates per observation. So it looks as though glm.nb, while thinking along the lines I want, won't fit the bill! However, other features of glm.nb would be suitable, since p/(1-p) = 1/mu and a logistic model for p therefore means a linear fit to log(mu), and glm.nb allows a log link. Comments welcome! With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 23-Sep-06 Time: 16:15:14 ------------------------------ XFMail ------------------------------