thr3ads.net - R help - [R] Survival analysis with truncated data [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Terry Therneau

2013-Nov-14 15:44 UTC

[R] Survival analysis with truncated data

I think that your data is censored, not truncated.
   For a fault introduced 1/2005 and erased 2/2006, duration = 13 months
   For a fault introduced 4/2010 and still in existence at the last observation
12/2010,
duration> 8 months.
   For a fault introduced before 2004, erased  3/2005, in a machine installed
2/1998, the
duration is somewhere between 15 and 87 months.
   For a fault introduced before 2004, smachine installed 5/2000, still present
11/2010 at
last check, the duration is > 126 months.

For type=interval2 the data would be (13,13), (8,NA), (15,87), (126, NA).

Terry T.

On 11/14/2013 05:00 AM, r-help-request at r-project.org
wrote:> Hi,
>
> I would like to know how to handle truncated data.
> My intend is to have the survival curve of a software fault in order
> to have some information
> about fault lifespan.
>
> I have some observations of a software system between 2004 and 2010.
> The system was first released in 1994.
> The event considered is the disappearance of a software fault. The
> faults can have been
> introduced at any time, between 1994 and 2010. But for fault
> introduced before 2004,
> there is not mean to know their age.
>
> I used the Surv and survfit functions with type interval2.
> For the faults that are first observed in 2004, I set the lower bound
> to the lifespan
> observed between 2004 and 2010.
>
> How could I set the upper bound ? Using 1994 as a starting point to not
seems
> to be meaningful. Neither is using only the lower bound.
>
> Should I consider another survival estimator ?
>
> Thanks in advance.

Nicolas Palix

2013-Nov-14 17:11 UTC

head link

[R] Survival analysis with truncated data

Hi,

Thanks for your response.

On Thu, Nov 14, 2013 at 4:44 PM, Terry Therneau <therneau at mayo.edu>
wrote:> I think that your data is censored, not truncated.
>   For a fault introduced 1/2005 and erased 2/2006, duration = 13 months
>   For a fault introduced 4/2010 and still in existence at the last
> observation 12/2010, duration> 8 months.
>   For a fault introduced before 2004, erased  3/2005, in a machine
installed
> 2/1998, the duration is somewhere between 15 and 87 months.
>   For a fault introduced before 2004, smachine installed 5/2000, still
> present 11/2010 at last check, the duration is > 126 months.
>
> For type=interval2 the data would be (13,13), (8,NA), (15,87), (126, NA).
I have done this that way. My problem is that I have no information
when a fault is introduced before 2004. Indeed, this is about the lifespan
of software faults in the code.

In your example, this means I could not set the upper bound to 87 months.
As I know for sure that the first software release was in 1994. For a fault
which is observed from 2004 up to 2005 I set the range to (12, 120+12). That is
12 observed + 10 years from 1994 to 2004. The estimation is almost
similar if I use (12, NA) and gives me an upper bound.
I tried (12, 12) to have the lower bound.


I tried with 5 years instead of 10. This seems to give an
over-estimation too.

Could I use some properties of the data from ]2004;2010] to give
an average extension to these faults ? The average or median
for instance.

Thanks in advance.

>
> Terry T.
>
>
> On 11/14/2013 05:00 AM, r-help-request at r-project.org wrote:
>>
>> Hi,
>>
>> I would like to know how to handle truncated data.
>> My intend is to have the survival curve of a software fault in order
>> to have some information
>> about fault lifespan.
>>
>> I have some observations of a software system between 2004 and 2010.
>> The system was first released in 1994.
>> The event considered is the disappearance of a software fault. The
>> faults can have been
>> introduced at any time, between 1994 and 2010. But for fault
>> introduced before 2004,
>> there is not mean to know their age.
>>
>> I used the Surv and survfit functions with type interval2.
>> For the faults that are first observed in 2004, I set the lower bound
>> to the lifespan
>> observed between 2004 and 2010.
>>
>> How could I set the upper bound ? Using 1994 as a starting point to not
>> seems
>> to be meaningful. Neither is using only the lower bound.
>>
>> Should I consider another survival estimator ?
>>
>> Thanks in advance.


-- 
Nicolas Palix
Tel: +33 4 76 51 46 27
membres-liglab.imag.fr/palix

R help - Nov 2013 - Survival analysis with truncated data

[R] Survival analysis with truncated data

[R] Survival analysis with truncated data