POLWART, Calum (SOUTH TEES HOSPITALS NHS FOUNDATION TRUST)
2020-Sep-04 15:45 UTC
[R] Survival Object - is 12month survival = 365days
Using survfit I can get the '1 year' Survival from this dataset which holds survival in days: require (survival) survfit( Surv(time, status) ~sex, data=colon) summary (fit, 365) My current real world data I'm calculating time using lubridate to calculate time and since it made the axis easy I just told it to do and so my "time" appears to be a float in months. time <- time_length(interval(startDate, endDate), "months") Is there a "right" approach to this (as in a convention). If I use 12months as a year and describe it in the write up as 12, 24 and 36 month survival rather than 1, 2 and 3 year presumably that is OK.. I've been asked to report 30, 60 & 90day. Then 6month, 1, 2 and 3 year survival. Should I calculate time 3 times, (interval day, month and year) and run the survival on each to get the requested outputs or would people just provide something close. Should I run a campaign to decimilise time? Sent from Nine<http://www.9folders.com/> ******************************************************************************************************************** This message may contain confidential information. If you are not the intended recipient please inform the sender that you have received the message in error before deleting it. Please do not disclose, copy or distribute information in this e-mail or take any action in relation to its contents. To do so is strictly prohibited and may be unlawful. Thank you for your co-operation. NHSmail is the secure email and directory service available for all NHS staff in England and Scotland. NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and other accredited email services. For more information and to find out how you can switch, https://portal.nhs.net/help/joiningnhsmail [[alternative HTML version deleted]]
On Sep 4, 2020, at 11:45 AM, POLWART, Calum (SOUTH TEES HOSPITALS NHS FOUNDATION TRUST) via R-help <r-help at r-project.org> wrote:> > Using survfit I can get the '1 year' Survival from this dataset which holds survival in days: > > require (survival) > survfit( Surv(time, status) ~sex, data=colon) > summary (fit, 365) > > My current real world data I'm calculating time using lubridate to calculate time and since it made the axis easy I just told it to do and so my "time" appears to be a float in months. > > time <- time_length(interval(startDate, endDate), "months") > > Is there a "right" approach to this (as in a convention). If I use 12months as a year and describe it in the write up as 12, 24 and 36 month survival rather than 1, 2 and 3 year presumably that is OK.. > > I've been asked to report 30, 60 & 90day. Then 6month, 1, 2 and 3 year survival. > > Should I calculate time 3 times, (interval day, month and year) and run the survival on each to get the requested outputs or would people just provide something close. > > Should I run a campaign to decimilise time?Hi, The answer may depend upon whether you are presenting the results in a tabular fashion, in the body of a manuscript, or in a figure. Also, what may be the community conventions in your domain. If you want to get the irregular time points out in a single output, you can use the times argument to do this, remembering that the default time intervals are in days for this dataset:> summary(fit, times = c(30, 60, 90, 180, 365.25, 2 * 365.25, 3 * 365.25))Call: survfit(formula = Surv(time, status) ~ sex, data = colon) sex=0 time n.risk n.event survival std.err lower 95% CI upper 95% CI 30 887 2 0.998 0.00159 0.995 1.000 60 880 6 0.991 0.00317 0.985 0.997 90 869 11 0.979 0.00485 0.969 0.988 180 827 42 0.931 0.00849 0.915 0.948 365 731 94 0.825 0.01274 0.801 0.851 730 595 135 0.673 0.01576 0.643 0.705 1096 536 57 0.608 0.01641 0.577 0.641 sex=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI 30 962 5 0.995 0.00230 0.990 0.999 60 955 6 0.989 0.00341 0.982 0.995 90 947 8 0.980 0.00446 0.972 0.989 180 906 41 0.938 0.00776 0.923 0.953 365 819 85 0.850 0.01150 0.828 0.873 730 679 133 0.711 0.01462 0.683 0.740 1096 592 84 0.623 0.01566 0.593 0.654 Now, the time output there is arguably a bit cumbersome to read...but, at least you get the relevant values in a single output. You can transform those values as you may require. Another option is to use the scale argument, but I just noted that, unless I am missing something, I think that there may be a lingering buglet in the code for summary.survfit(), and I am adding Terry Therneau here as a cc:, if that is correct. The behavior of the interaction between the times and scale arguments changed in 2009 after an exchange I had with Thomas Lumley: https://stat.ethz.ch/pipermail/r-devel/2009-April/052901.html and it is not clear to me if the current behavior is or is not intended after all this time. Albeit, it may be the defacto behavior at this point in either case, given some volume of code written over the years that may depend upon this behavior. Thus, this may be better for you, using the current behavior:> summary(fit, scale = 30.44, times = c(1, 2, 3, 6, 12, 24, 36) * 30.44)Call: survfit(formula = Surv(time, status) ~ sex, data = colon) sex=0 time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 887 2 0.998 0.00159 0.995 1.000 2 880 6 0.991 0.00317 0.985 0.997 3 868 12 0.977 0.00498 0.968 0.987 6 826 42 0.930 0.00855 0.914 0.947 12 731 93 0.825 0.01274 0.801 0.851 24 595 135 0.673 0.01576 0.643 0.705 36 536 57 0.608 0.01641 0.577 0.641 sex=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 962 5 0.995 0.00230 0.990 0.999 2 955 6 0.989 0.00341 0.982 0.995 3 946 9 0.979 0.00458 0.970 0.988 6 906 40 0.938 0.00776 0.923 0.953 12 819 85 0.850 0.01150 0.828 0.873 24 679 133 0.711 0.01462 0.683 0.740 36 592 84 0.623 0.01566 0.593 0.654 where the times values are now in months over the range of values, instead of days. I don't use the lubridate package, so there may be other options for you there, but the above will work, if your underlying time intervals in the source data frame for the model are still in days as a unit of measurement. Using the base graphics functions, albeit perhaps you are using ggplot or similar, you can plot the above model with axis markings at the irregular time intervals, using something like the following: plot(fit, xaxt = "n", las = 1, xlim = c(0, 36 * 30.44)) axis(1, at = c(1, 2, 3, 6, 12, 24, 36) * 30.44, labels = c(1, 2, 3, 6, 12, 24, 36), cex.axis = 0.65) This essentially truncates the x axis to 36 months, since the intervals in the example colon dataset go to about 9 years or so, and does not label the x axis. Bearing in mind that the underlying x axis unit is in days, the axis() function then places labels at the irregular intervals. You could then annotate the plot further as you may desire. Regards, Marc Schwartz
POLWART, Calum (SOUTH TEES HOSPITALS NHS FOUNDATION TRUST)
2020-Sep-04 17:43 UTC
[R] Survival Object - is 12month survival = 365days
Hi Mark Huge thanks for the detailed answer. At the moment, likely to be a mix of tabulation (30, 60, 90), a plot and some narrative with the 6mo, 12mo and so on. I think! So it sounds like days is the answer, and then estimate months and years from days. Sent from Nine<http://www.9folders.com/> ________________________________ From: Marc Schwartz <marc_schwartz at me.com> Sent: Friday, 4 September 2020 18:32 To: POLWART, Calum (SOUTH TEES HOSPITALS NHS FOUNDATION TRUST) Cc: R-help; Terry Therneau Subject: Re: [R] Survival Object - is 12month survival = 365days On Sep 4, 2020, at 11:45 AM, POLWART, Calum (SOUTH TEES HOSPITALS NHS FOUNDATION TRUST) via R-help <r-help at r-project.org> wrote:> > Using survfit I can get the '1 year' Survival from this dataset which holds survival in days: > > require (survival) > survfit( Surv(time, status) ~sex, data=colon) > summary (fit, 365) > > My current real world data I'm calculating time using lubridate to calculate time and since it made the axis easy I just told it to do and so my "time" appears to be a float in months. > > time <- time_length(interval(startDate, endDate), "months") > > Is there a "right" approach to this (as in a convention). If I use 12months as a year and describe it in the write up as 12, 24 and 36 month survival rather than 1, 2 and 3 year presumably that is OK.. > > I've been asked to report 30, 60 & 90day. Then 6month, 1, 2 and 3 year survival. > > Should I calculate time 3 times, (interval day, month and year) and run the survival on each to get the requested outputs or would people just provide something close. > > Should I run a campaign to decimilise time?Hi, The answer may depend upon whether you are presenting the results in a tabular fashion, in the body of a manuscript, or in a figure. Also, what may be the community conventions in your domain. If you want to get the irregular time points out in a single output, you can use the times argument to do this, remembering that the default time intervals are in days for this dataset:> summary(fit, times = c(30, 60, 90, 180, 365.25, 2 * 365.25, 3 * 365.25))Call: survfit(formula = Surv(time, status) ~ sex, data = colon) sex=0 time n.risk n.event survival std.err lower 95% CI upper 95% CI 30 887 2 0.998 0.00159 0.995 1.000 60 880 6 0.991 0.00317 0.985 0.997 90 869 11 0.979 0.00485 0.969 0.988 180 827 42 0.931 0.00849 0.915 0.948 365 731 94 0.825 0.01274 0.801 0.851 730 595 135 0.673 0.01576 0.643 0.705 1096 536 57 0.608 0.01641 0.577 0.641 sex=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI 30 962 5 0.995 0.00230 0.990 0.999 60 955 6 0.989 0.00341 0.982 0.995 90 947 8 0.980 0.00446 0.972 0.989 180 906 41 0.938 0.00776 0.923 0.953 365 819 85 0.850 0.01150 0.828 0.873 730 679 133 0.711 0.01462 0.683 0.740 1096 592 84 0.623 0.01566 0.593 0.654 Now, the time output there is arguably a bit cumbersome to read...but, at least you get the relevant values in a single output. You can transform those values as you may require. Another option is to use the scale argument, but I just noted that, unless I am missing something, I think that there may be a lingering buglet in the code for summary.survfit(), and I am adding Terry Therneau here as a cc:, if that is correct. The behavior of the interaction between the times and scale arguments changed in 2009 after an exchange I had with Thomas Lumley: https://stat.ethz.ch/pipermail/r-devel/2009-April/052901.html and it is not clear to me if the current behavior is or is not intended after all this time. Albeit, it may be the defacto behavior at this point in either case, given some volume of code written over the years that may depend upon this behavior. Thus, this may be better for you, using the current behavior:> summary(fit, scale = 30.44, times = c(1, 2, 3, 6, 12, 24, 36) * 30.44)Call: survfit(formula = Surv(time, status) ~ sex, data = colon) sex=0 time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 887 2 0.998 0.00159 0.995 1.000 2 880 6 0.991 0.00317 0.985 0.997 3 868 12 0.977 0.00498 0.968 0.987 6 826 42 0.930 0.00855 0.914 0.947 12 731 93 0.825 0.01274 0.801 0.851 24 595 135 0.673 0.01576 0.643 0.705 36 536 57 0.608 0.01641 0.577 0.641 sex=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 962 5 0.995 0.00230 0.990 0.999 2 955 6 0.989 0.00341 0.982 0.995 3 946 9 0.979 0.00458 0.970 0.988 6 906 40 0.938 0.00776 0.923 0.953 12 819 85 0.850 0.01150 0.828 0.873 24 679 133 0.711 0.01462 0.683 0.740 36 592 84 0.623 0.01566 0.593 0.654 where the times values are now in months over the range of values, instead of days. I don't use the lubridate package, so there may be other options for you there, but the above will work, if your underlying time intervals in the source data frame for the model are still in days as a unit of measurement. Using the base graphics functions, albeit perhaps you are using ggplot or similar, you can plot the above model with axis markings at the irregular time intervals, using something like the following: plot(fit, xaxt = "n", las = 1, xlim = c(0, 36 * 30.44)) axis(1, at = c(1, 2, 3, 6, 12, 24, 36) * 30.44, labels = c(1, 2, 3, 6, 12, 24, 36), cex.axis = 0.65) This essentially truncates the x axis to 36 months, since the intervals in the example colon dataset go to about 9 years or so, and does not label the x axis. Bearing in mind that the underlying x axis unit is in days, the axis() function then places labels at the irregular intervals. You could then annotate the plot further as you may desire. Regards, Marc Schwartz ******************************************************************************************************************** This message may contain confidential information. If yo...{{dropped:19}}