There a numerous issues, some of which David has pointed out. I will add some
and address some:
1. As far as I understand, you look at only one population. For a survival
model, you would need an indicator when the species was extinguished (rather
than a probability). However, with only one extinguishing point in time, this
model is nonsense.
2. Your dependent variable, however, is decline (or rather probably a prediction
of the percentage of the existing population relative to its baseline at date
t=0; that would be my guess). Echoing David, what was this logistic regression
(what was the model)? Is this derived from a count of the animals in each time
period? You may create all sorts of issues by doing that (issues that can bias
your result) and be better off by working on the original data. Please provide
us with more info on your dependent variable and what this logistic regression
was.
3. Your current dependent variable has time-series nature. So you may be facing
autocorrelation of the error term among observations. My best guess is that you
better model this as a time series, but again, we need more information.
4. As for the missing variables. There are several ways to address this issue.
1st. Imputation (this is probably not the right way to go, when large amounts of
data are missing, and there is a host of literature on imputation). 2nd. Missing
variable coding (You create a second variable, a missing-value indicator, for
each variable that contains NAs. The missing variable indicator you code 1 if
the underlying variable is NA and 0 if the underlying variable has a numeric
value. All NAs in the underlying variable you recode to 0.)
Example for missing variable coding (Oxygen = variable with NAs, Recoded =
recoded oxygen variable, MVI = missing variable indicator
Oxygen Recoded MVI
3 3 0
5 5 0
NA 0 1
NA 0 1
6 6 0
NA 0 1
4 0 0
If the data is missing at random, the coefficient on the MVI indicator should be
insignificant. If it comes out significant, it will tell you that something
about obs for which your data is missing is different than for the year for
which you have observed the independent variables. But that requires us to
figure out which model to use in the first place.
Best,
Daniel
-------------------------
cuncta stricte discussurus
-------------------------
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of FishR
Sent: Wednesday, February 17, 2010 1:55 PM
To: r-help at r-project.org
Subject: [R] Survival analysis
Dear all
I have a dataset examining the probability of a population surviving
(calculated from a logistic regression) of a species over a 200yr period.
The predictor variables are either continuous but non-normal (e.g.
temperature, oxygen) or categorical (e.g. channelisation), unfortunately I
also have a large amount of missing values.
Year Decline Temperature Oxygen Channelisation
1800 0.947758115 36.6 NA NA
1801 0.946135961 25.2 NA NA
1802 0.944466388 28.5 NA NA
1803 0.942748196 35.5 NA NA
1804 0.940980166 33 NA NA
1805 0.93916106 30.2 NA NA
truncated ?
1999 0.028531339 10.5 NA 5
2000 0.027649801 8.4 NA 5
I have been trying to run a Cox Proportional Hazards Model with the code
model<-coxph(Surv(Year, Decline) ~ Temperature + Oxygen + Channelisation)
but keep getting an error message ?Invalid status value?.
Have I inputted the data in the wrong format or am I trying to run a totally
unsuitable model?
Any help would be greatly appreciated
Tom
--
View this message in context:
http://n4.nabble.com/Survival-analysis-tp1559155p1559155.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.