thr3ads.net - R help - [R] [Fwd: Re: Coefficients of Logistic Regression from bootstrap

If this information is useful, please help other people find it:
Share via:

Michal Figurski

2008-Jul-23 13:14 UTC

[R] [Fwd: Re: Coefficients of Logistic Regression from bootstrap - how to get them?]

I think the argument supporting the use of bootstrap to determine
coefficients, as opposed to just running linear regression on the whole
dataset, is the comparison of Rsq and prediction errors between these
two approaches - page 1502. There's a substantial difference in favor of
the bootstrap approach.

--
Michal J. Figurski

Gustaf Rydevik wrote:
> The url  for the mentioned paper is at:
> http://www.clinchem.org/cgi/content/full/48/9/1497
> 
> The bootstrap as applied in that paper is used to evaluate different
> regression models against each other (though I wonder how sensible it
> is to look at 26 different models with only 50 data points), which to
> me seem like an ok usage.
> The use of the median of bootstrap coefficients for the final
> estimates seem more like an afterthought, probably with the hope to
> reduce bias but without any arguments.
> 
> /Gustaf

Gustaf Rydevik

2008-Jul-23 13:45 UTC

head link

[R] [Fwd: Re: Coefficients of Logistic Regression from bootstrap - how to get them?]

On Wed, Jul 23, 2008 at 3:14 PM, Michal Figurski
<figurski at mail.med.upenn.edu> wrote:> I think the argument supporting the use of bootstrap to determine
> coefficients, as opposed to just running linear regression on the whole
> dataset, is the comparison of Rsq and prediction errors between these
> two approaches - page 1502. There's a substantial difference in favor
of
> the bootstrap approach.
>
> --
> Michal J. Figurski
>
Are you talking about this passage?

"A commonly used approach for establishing estimation
models is to perform a multiple stepwise linear
regression on the total set of full AUCs (19 ). When we
used that approach, we obtained a r2 value of 0.74 and a
prediction error of 7.6%   26.7%, (median, 6.5%; 95% CI,
 51.9% to 67.5%), and the model estimated MPA AUC to
within 15% of the full value in 56% of the profiles. Our
estimation model using the repeated cross-validation approach
was significantly better, with a r2 value of 0.862,
prediction error of 6.1%   19%, (median, 3.0%; 95% CI,
 33.1% to 32%), and estimation of MPA AUC to within
15% of the value (when all 12 samples are used to
calculate MPA AUC) in 82% of the profiles".

As far as I can tell, they  are talking about the disadvantage using
stepwise regression to determine the optimal variables in the
regression, versus the bootstrap/CV-approach. And this might well be
true.

It is the following part in the methods description that seem unmotivated to me:
"Once the general model (of the 26) was
selected, the proposed regression coefficients were
taken as the median of the distribution of regression
coefficient values described in step 2."

I.e, after having decided upon the model that uses C0, C0.5 and C2 ,
using a median of the bootstrap estimates (which is what the R-code I
wrote does, more or less) , instead of fitting that model on the
entire data set. I don't see how this could be better,
since we can't get any more information from the data other than
what's there from the beginning. And I believe that this is what's all
the other people on the list is trying to tell you, that it's a step
without purpose.

You have to distinguish between finding out which model is best, which
bootstrap can be useful for, and estimating the parameters for the
final, decided model, where bootstrapping several regressions and
taking median most likely is no better than standard regression.

best regards,

Gustaf
-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jul 2008 - [Fwd: Re: Coefficients of Logistic Regression from bootstrap - how to get them?]

[R] [Fwd: Re: Coefficients of Logistic Regression from bootstrap - how to get them?]

[R] [Fwd: Re: Coefficients of Logistic Regression from bootstrap - how to get them?]

Possibly Parallel Threads