thr3ads.net - R help - [R] R Kaplan-Meier plotting quirks? [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Michael Rentz

2012-Oct-16 16:36 UTC

[R] R Kaplan-Meier plotting quirks?

Hello. I apologize in advance for the VERY lengthy e-mail. I endeavor to 
include enough detail.

I have a question about survival curves I have been battling off and on for 
a few months. No one local seems to be able to help, so I turn here. The 
issue seems to either be how R calculates Kaplan-Meier Plots, or something 
with the underlying statistic itself that I am misunderstanding. Basically, 
longer survival times are yielding steeper drops in survival than a set of 
shorter survival times but with the same number of loss and retention 
events.

As a minor part of my research I have been comparing tag survival in marked 
wild rodents. I am comparing a standard ear tag with a relatively new 
technique. The newer tag clearly ?wins? using survival tests, but the 
resultant Kaplan-Meier plot does not seem to make sense. Since I am dealing 
with a wild animal and only trapped a few days out of a month the data is 
fairly messy, with gaps in capture history that require assumptions of tag 
survival. An animal that is tagged and recaptured 2 days later with a tag 
and 30 days later without one could have an assumed tag retention of 2 days 
(minimum confirmed) or 30 days (maximum possible).

Both are significant with a survtest, but the K-M plots differ. A plot of 
minimum confirmed (overall harsher data, lots of 0 days and 1 or 2 days) 
yields a curve with a steep initial drop in ?survival?, but then a 
leveling off and straight line thereafter at about 80% survival. Plotting 
the maximum possible dates (same number of losses/retention, but retention 
times are longer, the length to the next capture without a tag, typically 
25-30 days or more) does not show as steep of a drop in the first few days, 
but at about the point the minimum estimate levels off this one begins 
dropping steeply. 400 days out the plot with minimum possible estimates has 
tag survival of about 80%, whereas the plot with the same loss rate but 
longer assumed survival times shows only a 20% assumed survival at 400 
days. Complicating this of course is the fact that the great majority of 
the animals die before the tag is lost, survival of the rodents is on the 
order of months.

I really am not sure what is going on, unless somehow the high number of 
events in the first few days followed by few events thereafter leads to the 
assumption that after the initial few days survival of the tag is high. The 
plotting of maximum lengths has a more even distribution of events, rather 
than a clumping in the first few days, so I guess the model assumes 
relatively constant hazards? As an aside, a plot of the mean between the 
minimum and maximum almost mirrors the maximum plot. Adding five days to 
the minimum when the minimum plus 5 is less than the maximum returns a plot 
with a steeper initial drop, but then constant thereafter, mimicking the 
minimum plot, but at a lower final survival rate.

Basically, I am at a loss why surviving longer would *decrease* the 
survival rate???

My co-author wants to drop the K-M graph given the confusion, but I think 
it would be odd to publish a survival paper without one. I am not sure 
which graph to use? They say very different things, while the actual 
statistics do not differ that greatly.

I am more than happy to provide the data and code for anyone who would like 
to help if the above is not explanation enough. Thank you in advance.

Mike.


-- 
Michael S. Rentz
PhD Candidate, Conservation Biology
University of Minnesota
5122 Idlewild Street
Duluth, MN 55804
(218) 525-3299
rent0009 at umn.edu

Andrews, Chris

2012-Oct-17 11:24 UTC

head link

[R] R Kaplan-Meier plotting quirks?

Mike,

My guess is that you have censored observations in the middle.
When using the minimum time, the events are happening prior to censorings.  Then
the riskset is large and the curve decreases slightly.
When using the maximum time, the events are happening after the censorings. 
Then the riskset is small and the curve decreases quickly.

For example, moving the first event from time 1 to time 5 causes the final
survival estimate to be lower when using max time (.375) than min time (.533):

library(survival)
df <- data.frame(mintime = c(1,2,3,4,6), maxtime = c(5,2,3,4,6), Delta=
c(1,0,1,0,0))
plot(survfit(Surv(mintime,Delta)~1,data=df), conf=FALSE, xlim=c(0,7))
lines(survfit(Surv(maxtime,Delta)~1,data=df), col=2)
> summary(survfit(Surv(mintime,Delta)~1,data=df))Call: survfit(formula = Surv(mintime, Delta) ~ 1, data = df)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1      5       1    0.800   0.179        0.516            1
    3      3       1    0.533   0.248        0.214           
1> summary(survfit(Surv(maxtime,Delta)~1,data=df))Call: survfit(formula = Surv(maxtime, Delta) ~ 1, data = df)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    3      4       1    0.750   0.217       0.4259            1
    5      2       1    0.375   0.286       0.0839            1

Given that you have interval censored data, you can consider fitting the
survival curve with interval censoring techniques.  For example survreg fits a
parametric curve.

Chris

-----Original Message-----
From: Michael Rentz [mailto:rent0009 at umn.edu] 
Sent: Tuesday, October 16, 2012 12:36 PM
To: r-help at r-project.org
Subject: [R] R Kaplan-Meier plotting quirks?

Hello. I apologize in advance for the VERY lengthy e-mail. I endeavor to include
enough detail.

I have a question about survival curves I have been battling off and on for a
few months. No one local seems to be able to help, so I turn here. The issue
seems to either be how R calculates Kaplan-Meier Plots, or something with the
underlying statistic itself that I am misunderstanding. Basically, longer
survival times are yielding steeper drops in survival than a set of shorter
survival times but with the same number of loss and retention events.

As a minor part of my research I have been comparing tag survival in marked wild
rodents. I am comparing a standard ear tag with a relatively new technique. The
newer tag clearly ?wins? using survival tests, but the resultant Kaplan-Meier
plot does not seem to make sense. Since I am dealing with a wild animal and only
trapped a few days out of a month the data is fairly messy, with gaps in capture
history that require assumptions of tag survival. An animal that is tagged and
recaptured 2 days later with a tag and 30 days later without one could have an
assumed tag retention of 2 days (minimum confirmed) or 30 days (maximum
possible).

Both are significant with a survtest, but the K-M plots differ. A plot of
minimum confirmed (overall harsher data, lots of 0 days and 1 or 2 days) yields
a curve with a steep initial drop in ?survival?, but then a leveling off and
straight line thereafter at about 80% survival. Plotting the maximum possible
dates (same number of losses/retention, but retention times are longer, the
length to the next capture without a tag, typically
25-30 days or more) does not show as steep of a drop in the first few days, but
at about the point the minimum estimate levels off this one begins dropping
steeply. 400 days out the plot with minimum possible estimates has tag survival
of about 80%, whereas the plot with the same loss rate but longer assumed
survival times shows only a 20% assumed survival at 400 days. Complicating this
of course is the fact that the great majority of the animals die before the tag
is lost, survival of the rodents is on the order of months.

I really am not sure what is going on, unless somehow the high number of events
in the first few days followed by few events thereafter leads to the assumption
that after the initial few days survival of the tag is high. The plotting of
maximum lengths has a more even distribution of events, rather than a clumping
in the first few days, so I guess the model assumes relatively constant hazards?
As an aside, a plot of the mean between the minimum and maximum almost mirrors
the maximum plot. Adding five days to the minimum when the minimum plus 5 is
less than the maximum returns a plot with a steeper initial drop, but then
constant thereafter, mimicking the minimum plot, but at a lower final survival
rate.

Basically, I am at a loss why surviving longer would *decrease* the survival
rate???

My co-author wants to drop the K-M graph given the confusion, but I think it
would be odd to publish a survival paper without one. I am not sure which graph
to use? They say very different things, while the actual statistics do not
differ that greatly.

I am more than happy to provide the data and code for anyone who would like to
help if the above is not explanation enough. Thank you in advance.

Mike.


--
Michael S. Rentz
PhD Candidate, Conservation Biology
University of Minnesota
5122 Idlewild Street
Duluth, MN 55804
(218) 525-3299
rent0009 at umn.edu


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used
for urgent or sensitive issues

rent0009 at umn.edu

2012-Oct-17 15:38 UTC

head link

[R] R Kaplan-Meier plotting quirks?

Thank you Chris! That makes very good sense, I was just so in the weeds I 
could not see it. I will mess with survreg after I am done teaching today. 
Thank you for the fresh set of eyes and advice.

Mike.

On Oct 17 2012, Andrews, Chris wrote:
>Mike,
>
>My guess is that you have censored observations in the middle. When using the minimum time, the events are happening prior to censorings. 
Then the riskset is large and the curve decreases slightly.
 When using the maximum time, the events are happening after the 
censorings. Then the riskset is small and the curve decreases
quickly.> For example, moving the first event from time 1 to time 5 causes the final 
survival estimate to be lower when using max time (.375) than min time 
(.533):>
>library(survival) df <- data.frame(mintime = c(1,2,3,4,6), maxtime = c(5,2,3,4,6), Delta= 
c(1,0,1,0,0))>plot(survfit(Surv(mintime,Delta)~1,data=df), conf=FALSE, xlim=c(0,7))
>lines(survfit(Surv(maxtime,Delta)~1,data=df), col=2)
>
>> summary(survfit(Surv(mintime,Delta)~1,data=df))
>Call: survfit(formula = Surv(mintime, Delta) ~ 1, data = df)
>
> time n.risk n.event survival std.err lower 95% CI upper 95% CI
>    1      5       1    0.800   0.179        0.516            1
>    3      3       1    0.533   0.248        0.214            1
>> summary(survfit(Surv(maxtime,Delta)~1,data=df))
>Call: survfit(formula = Surv(maxtime, Delta) ~ 1, data = df)
>
> time n.risk n.event survival std.err lower 95% CI upper 95% CI
>    3      4       1    0.750   0.217       0.4259            1
>    5      2       1    0.375   0.286       0.0839            1
> Given that you have interval censored data, you can consider fitting the 
survival curve with interval censoring techniques. For example survreg fits 
a parametric curve.>
>Chris
>
>-----Original Message-----
>From: Michael Rentz [mailto:rent0009 at umn.edu] 
>Sent: Tuesday, October 16, 2012 12:36 PM
>To: r-help at r-project.org
>Subject: [R] R Kaplan-Meier plotting quirks?
> Hello. I apologize in advance for the VERY lengthy e-mail. I endeavor to 
include enough detail.> I have a question about survival curves I have been battling off and on 
for a few months. No one local seems to be able to help, so I turn here. 
The issue seems to either be how R calculates Kaplan-Meier Plots, or 
something with the underlying statistic itself that I am misunderstanding. 
Basically, longer survival times are yielding steeper drops in survival 
than a set of shorter survival times but with the same number of loss and 
retention events.> As a minor part of my research I have been comparing tag survival in 
marked wild rodents. I am comparing a standard ear tag with a relatively 
new technique. The newer tag clearly ?wins? using survival tests, but 
the resultant Kaplan-Meier plot does not seem to make sense. Since I am 
dealing with a wild animal and only trapped a few days out of a month the 
data is fairly messy, with gaps in capture history that require assumptions 
of tag survival. An animal that is tagged and recaptured 2 days later with 
a tag and 30 days later without one could have an assumed tag retention of 
2 days (minimum confirmed) or 30 days (maximum
possible).> Both are significant with a survtest, but the K-M plots differ. A plot of 
minimum confirmed (overall harsher data, lots of 0 days and 1 or 2 days) 
yields a curve with a steep initial drop in ?survival?, but then a 
leveling off and straight line thereafter at about 80% survival. Plotting 
the maximum possible dates (same number of losses/retention, but retention 
times are longer, the length to the next capture without a tag, typically
 25-30 days or more) does not show as steep of a drop in the first few 
days, but at about the point the minimum estimate levels off this one 
begins dropping steeply. 400 days out the plot with minimum possible 
estimates has tag survival of about 80%, whereas the plot with the same 
loss rate but longer assumed survival times shows only a 20% assumed 
survival at 400 days. Complicating this of course is the fact that the 
great majority of the animals die before the tag is lost, survival of the 
rodents is on the order of months.> I really am not sure what is going on, unless somehow the high number of 
events in the first few days followed by few events thereafter leads to the 
assumption that after the initial few days survival of the tag is high. The 
plotting of maximum lengths has a more even distribution of events, rather 
than a clumping in the first few days, so I guess the model assumes 
relatively constant hazards? As an aside, a plot of the mean between the 
minimum and maximum almost mirrors the maximum plot. Adding five days to 
the minimum when the minimum plus 5 is less than the maximum returns a plot 
with a steeper initial drop, but then constant thereafter, mimicking the 
minimum plot, but at a lower final survival rate.> Basically, I am at a loss why surviving longer would *decrease* the 
survival rate???> My co-author wants to drop the K-M graph given the confusion, but I think 
it would be odd to publish a survival paper without one. I am not sure 
which graph to use? They say very different things, while the actual 
statistics do not differ that greatly.> I am more than happy to provide the data and code for anyone who would 
like to help if the above is not explanation enough. Thank you in
advance.>
>Mike.
>
>
>--
>Michael S. Rentz
>PhD Candidate, Conservation Biology
>University of Minnesota
>5122 Idlewild Street
>Duluth, MN 55804
>(218) 525-3299
>rent0009 at umn.edu
>
>
>********************************************************** Electronic Mail is not secure, may not be read every day, and should not 
be used for urgent or sensitive issues>

Maybe Matching Threads

Search for more apparently analagous threads

R help - Oct 2012 - R Kaplan-Meier plotting quirks?

[R] R Kaplan-Meier plotting quirks?

[R] R Kaplan-Meier plotting quirks?

[R] R Kaplan-Meier plotting quirks?

Maybe Matching Threads