thr3ads.net - R help - [R] comparing lm(), survreg( ... , dist="gaussian") and survreg( ... , dist="lognormal") [May 2005]

If this information is useful, please help other people find it:
Share via:

Charles Annis, P.E.

2005-May-03 03:37 UTC

[R] comparing lm(), survreg( ... , dist="gaussian") and survreg( ... , dist="lognormal")

Dear R-Helpers:

I have tried everything I can think of and hope not to appear too foolish
when my error is pointed out to me.

I have some real data (18 points) that look linear on a log-log plot so I
used them for a comparison of lm() and survreg.  There are no suspensions.  

survreg.df <- data.frame(Cycles=c(2009000, 577000, 145000, 376000, 37000,
979000, 17420000, 71065000, 46397000, 70168000, 69120000, 68798000,
72615000, 133051000, 38384000, 15204000, 1558000, 14181000), stress=c(90,
100, 110, 90, 100, 80, 70, 60, 56, 62, 62, 59, 56, 53, 59, 70, 90, 70),
event=rep(1, 18))


sN.lm<- lm(log(Cycles) ~ log10(stress), data=survreg.df)

and 
                                             vvvvvvvvvvv
gaussian.survreg<- survreg(formula=Surv(time=log(Cycles), event) ~
log10(stress), dist="gaussian", data=survreg.df)

produce identical parameter estimates and differ slightly in the residual
standard error and scale, which is accounted for by scale being the MLE and
thus biased.  Correcting by sqrt(18/16) produces agreement.  Using predict()
for the lm, and predict.survreg() for the survreg model and correcting for
the differences in stdev, produces identical plots of the fit and the upper
and lower confidence intervals.  All of this is as it should be.

And, 
                                               vvvvvv
lognormal.survreg<- survreg(formula=Surv(time=(Cycles), event) ~
log10(stress), dist="lognormal", data=survreg.df)

produces summary() results that are identical to the earlier call to
survreg(), except for the call, of course.  The parameter estimates and SE
are identical.  Again this is as I would expect it.

But since the call uses Cycles, rather than log(Cycles) predict.survreg()
returns $fit in Cycles units, rather than logs, and of course the fits are
identical when plotted on a log-log grid and also agree with lm()

Here is the fly in the ointment:  The upper and lower confidence intervals,
based on the $se.fit for the dist="lognormal" are quite obviously
different
from the other two methods, and although I have tried everything I could
imagine I cannot reconcile the differences.

I believe that the confidence bounds for both models should agree.  After
all, both calls to survreg() produce identical parameter estimates.  

So I have missed something.  Would some kind soul please point out my error?

Thanks.


Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:?? 614-455-3265
http://www.StatisticalEngineering.com
??

Prof Brian Ripley

2005-May-03 06:54 UTC

head link

[R] comparing lm(), survreg( ... , dist="gaussian") and survreg( ... , dist="lognormal")

On Mon, 2 May 2005, Charles Annis, P.E. wrote:
> I have tried everything I can think of and hope not to appear too foolish
> when my error is pointed out to me.
>
> I have some real data (18 points) that look linear on a log-log plot so I
> used them for a comparison of lm() and survreg.  There are no suspensions.
>
> survreg.df <- data.frame(Cycles=c(2009000, 577000, 145000, 376000,
37000,
> 979000, 17420000, 71065000, 46397000, 70168000, 69120000, 68798000,
> 72615000, 133051000, 38384000, 15204000, 1558000, 14181000), stress=c(90,
> 100, 110, 90, 100, 80, 70, 60, 56, 62, 62, 59, 56, 53, 59, 70, 90, 70),
> event=rep(1, 18))
>
>
> sN.lm<- lm(log(Cycles) ~ log10(stress), data=survreg.df)
>
> and
>                                             vvvvvvvvvvv
> gaussian.survreg<- survreg(formula=Surv(time=log(Cycles), event) ~
> log10(stress), dist="gaussian", data=survreg.df)
>
> produce identical parameter estimates and differ slightly in the residual
> standard error and scale, which is accounted for by scale being the MLE and
> thus biased.  Correcting by sqrt(18/16) produces agreement.  Using
predict()
> for the lm, and predict.survreg() for the survreg model and correcting for
> the differences in stdev, produces identical plots of the fit and the upper
> and lower confidence intervals.  All of this is as it should be.
I trust you called predict() on both and let R choose the method.
> And,
>                                               vvvvvv
> lognormal.survreg<- survreg(formula=Surv(time=(Cycles), event) ~
> log10(stress), dist="lognormal", data=survreg.df)
>
> produces summary() results that are identical to the earlier call to
> survreg(), except for the call, of course.  The parameter estimates and SE
> are identical.  Again this is as I would expect it.
>
> But since the call uses Cycles, rather than log(Cycles) predict.survreg()
> returns $fit in Cycles units, rather than logs, and of course the fits are
> identical when plotted on a log-log grid and also agree with lm()
>
> Here is the fly in the ointment:  The upper and lower confidence intervals,
> based on the $se.fit for the dist="lognormal" are quite obviously
different
> from the other two methods, and although I have tried everything I could
> imagine I cannot reconcile the differences.
How did you do this?  (BTW, I assume you mean upper and lower confidence
>limits< for the predicted means.)  For the predictions and standard errors are (or should be) on the response scale, a non-linear function of 
the parameters.  In that case it is normal to form confidence limits on 
the linear predictor scale and transform.
> I believe that the confidence bounds for both models should agree.  After
> all, both calls to survreg() produce identical parameter estimates.
They will, if computed on the same basis.  On log-scale (to avoid large 
numbers)

pr1 <- predict(lognormal.survreg, se.fit=T)
log(cbind(pr1$fit - 1.96*pr1$se.fit, pr1$fit + 1.96*pr1$se.fit))
pr2 <- predict(gaussian.survreg, se.fit=T)
cbind(pr2$fit - 1.96*pr2$se.fit, pr2$fit + 1.96*pr2$se.fit)

are really pretty close.  The main difference is a slight shift, which 
comes about because the mean of a log(X) is not log(mean(X)).  Note that 
the second set at the preferred ones.  Transforming to log scale before 
making the confidence limits:

cbind(log(pr1$fit) - 1.96*pr1$se.fit/pr1$fit, log(pr1$fit) +
1.96*pr1$se.fit/pr1$fit)

does give identical answers.

Consider care is needed in interpreting what predict() is actually 
predicting in non-linear models.  For both glm() and survreg() it is 
closer to the median of the uncertainty in the predictions than to the 
mean.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Charles Annis, P.E.

2005-May-03 13:20 UTC

head link

[R] comparing lm(), survreg( ... , dist="gaussian") and survreg(... , dist="lognormal")

Thank you, Professor Ripley.

cbind(log(pr1$fit) - 1.96*pr1$se.fit/pr1$fit, log(pr1$fit) +
1.96*pr1$se.fit/pr1$fit)

... is precisely what had eluded me, self-evident though it appears after
you have illuminated the way.

Again, thank you. 


Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof Brian Ripley
Sent: Tuesday, May 03, 2005 2:55 AM
To: Charles Annis, P.E.
Cc: R-help at stat.math.ethz.ch
Subject: Re: [R] comparing lm(), survreg( ... , dist="gaussian") and
survreg(... , dist="lognormal")

On Mon, 2 May 2005, Charles Annis, P.E. wrote:
> I have tried everything I can think of and hope not to appear too foolish
> when my error is pointed out to me.
>
> I have some real data (18 points) that look linear on a log-log plot so I
> used them for a comparison of lm() and survreg.  There are no suspensions.
>
> survreg.df <- data.frame(Cycles=c(2009000, 577000, 145000, 376000,
37000,
> 979000, 17420000, 71065000, 46397000, 70168000, 69120000, 68798000,
> 72615000, 133051000, 38384000, 15204000, 1558000, 14181000), stress=c(90,
> 100, 110, 90, 100, 80, 70, 60, 56, 62, 62, 59, 56, 53, 59, 70, 90, 70),
> event=rep(1, 18))
>
>
> sN.lm<- lm(log(Cycles) ~ log10(stress), data=survreg.df)
>
> and
>                                             vvvvvvvvvvv
> gaussian.survreg<- survreg(formula=Surv(time=log(Cycles), event) ~
> log10(stress), dist="gaussian", data=survreg.df)
>
> produce identical parameter estimates and differ slightly in the residual
> standard error and scale, which is accounted for by scale being the MLE
and> thus biased.  Correcting by sqrt(18/16) produces agreement.  Using
predict()> for the lm, and predict.survreg() for the survreg model and correcting for
> the differences in stdev, produces identical plots of the fit and the
upper> and lower confidence intervals.  All of this is as it should be.
I trust you called predict() on both and let R choose the method.
> And,
>                                               vvvvvv
> lognormal.survreg<- survreg(formula=Surv(time=(Cycles), event) ~
> log10(stress), dist="lognormal", data=survreg.df)
>
> produces summary() results that are identical to the earlier call to
> survreg(), except for the call, of course.  The parameter estimates and SE
> are identical.  Again this is as I would expect it.
>
> But since the call uses Cycles, rather than log(Cycles) predict.survreg()
> returns $fit in Cycles units, rather than logs, and of course the fits are
> identical when plotted on a log-log grid and also agree with lm()
>
> Here is the fly in the ointment:  The upper and lower confidence
intervals,> based on the $se.fit for the dist="lognormal" are quite obviously
different> from the other two methods, and although I have tried everything I could
> imagine I cannot reconcile the differences.
How did you do this?  (BTW, I assume you mean upper and lower confidence
>limits< for the predicted means.)  For the predictions and standard errors are (or should be) on the response scale, a non-linear function of 
the parameters.  In that case it is normal to form confidence limits on 
the linear predictor scale and transform.
> I believe that the confidence bounds for both models should agree.  After
> all, both calls to survreg() produce identical parameter estimates.
They will, if computed on the same basis.  On log-scale (to avoid large 
numbers)

pr1 <- predict(lognormal.survreg, se.fit=T)
log(cbind(pr1$fit - 1.96*pr1$se.fit, pr1$fit + 1.96*pr1$se.fit))
pr2 <- predict(gaussian.survreg, se.fit=T)
cbind(pr2$fit - 1.96*pr2$se.fit, pr2$fit + 1.96*pr2$se.fit)

are really pretty close.  The main difference is a slight shift, which 
comes about because the mean of a log(X) is not log(mean(X)).  Note that 
the second set at the preferred ones.  Transforming to log scale before 
making the confidence limits:

cbind(log(pr1$fit) - 1.96*pr1$se.fit/pr1$fit, log(pr1$fit) +
1.96*pr1$se.fit/pr1$fit)

does give identical answers.

Consider care is needed in interpreting what predict() is actually 
predicting in non-linear models.  For both glm() and survreg() it is 
closer to the median of the uncertainty in the predictions than to the 
mean.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Apparently Analagous Threads

Search for more reasonably related threads

R help - May 2005 - comparing lm(), survreg( ... , dist="gaussian") and survreg( ... , dist="lognormal")

[R] comparing lm(), survreg( ... , dist="gaussian") and survreg( ... , dist="lognormal")

[R] comparing lm(), survreg( ... , dist="gaussian") and survreg( ... , dist="lognormal")

[R] comparing lm(), survreg( ... , dist="gaussian") and survreg(... , dist="lognormal")

Apparently Analagous Threads