khosoda at med.kobe-u.ac.jp
2011-May-15 05:31 UTC
[R] Question on approximations of full logistic regression model
Hi, I am trying to construct a logistic regression model from my data (104 patients and 25 events). I build a full model consisting of five predictors with the use of penalization by rms package (lrm, pentrace etc) because of events per variable issue. Then, I tried to approximate the full model by step-down technique predicting L from all of the componet variables using ordinary least squares (ols in rms package) as the followings. I would like to know whether I am doing right or not.> library(rms) > plogit <- predict(full.model) > full.ols <- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) > fastbw(full.ols, aics=1e10)Deleted Chi-Sq d.f. P Residual d.f. P AIC R2 stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991 x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882 procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711 ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544 x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000 Then, fitted an approximation to the full model using most imprtant variable (R^2 for predictions from the reduced model against the original Y drops below 0.95), that is, dropping "stenosis".> full.ols.approx <- ols(plogit ~ x1+x2+ClinicalScore+procedure) > full.ols.approx$statsn Model L.R. d.f. R2 g Sigma 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622 This approximate model had R^2 against the full model of 0.99. Therefore, I updated the original full logistic model dropping "stenosis" as predictor.> full.approx.lrm <- update(full.model, ~ . -stenosis)> validate(full.model, bw=F, B=1000)index.orig training test optimism index.corrected n Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000 Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000 Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000 Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000> validate(full.approx.lrm, bw=F, B=1000)index.orig training test optimism index.corrected n Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000 Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000 Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000 Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000 Validatin revealed this approximation was not bad. Then, I made a nomogram.> full.approx.lrm.nom <- nomogram(full.approx.lrm,fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)> plot(full.approx.lrm.nom)Another nomogram using ols model,> full.ols.approx.nom <- nomogram(full.ols.approx,fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)> plot(full.ols.approx.nom)These two nomograms are very similar but a little bit different. My questions are; 1. Am I doing right? 2. Which nomogram is correct I would appreciate your help in advance. -- KH
Frank Harrell
2011-May-16 03:25 UTC
[R] Question on approximations of full logistic regression model
I think you are doing this correctly except for one thing. The validation and other inferential calculations should be done on the full model. Use the approximate model to get a simpler nomogram but not to get standard errors. With only dropping one variable you might consider just running the nomogram on the entire model. Frank ???? wrote:> > Hi, > I am trying to construct a logistic regression model from my data (104 > patients and 25 events). I build a full model consisting of five > predictors with the use of penalization by rms package (lrm, pentrace > etc) because of events per variable issue. Then, I tried to approximate > the full model by step-down technique predicting L from all of the > componet variables using ordinary least squares (ols in rms package) as > the followings. I would like to know whether I am doing right or not. > >> library(rms) >> plogit <- predict(full.model) >> full.ols <- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) >> fastbw(full.ols, aics=1e10) > > Deleted Chi-Sq d.f. P Residual d.f. P AIC R2 > stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991 > x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882 > procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711 > ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544 > x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000 > > Then, fitted an approximation to the full model using most imprtant > variable (R^2 for predictions from the reduced model against the > original Y drops below 0.95), that is, dropping "stenosis". > >> full.ols.approx <- ols(plogit ~ x1+x2+ClinicalScore+procedure) >> full.ols.approx$stats > n Model L.R. d.f. R2 g Sigma > 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622 > > This approximate model had R^2 against the full model of 0.99. > Therefore, I updated the original full logistic model dropping > "stenosis" as predictor. > >> full.approx.lrm <- update(full.model, ~ . -stenosis) > >> validate(full.model, bw=F, B=1000) > index.orig training test optimism index.corrected n > Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 > R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000 > Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000 > Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000 > Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000 > >> validate(full.approx.lrm, bw=F, B=1000) > index.orig training test optimism index.corrected n > Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 > R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000 > Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000 > Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000 > Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000 > > Validatin revealed this approximation was not bad. > Then, I made a nomogram. > >> full.approx.lrm.nom <- nomogram(full.approx.lrm, > fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >> plot(full.approx.lrm.nom) > > Another nomogram using ols model, > >> full.ols.approx.nom <- nomogram(full.ols.approx, > fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >> plot(full.ols.approx.nom) > > These two nomograms are very similar but a little bit different. > > My questions are; > > 1. Am I doing right? > > 2. Which nomogram is correct > > I would appreciate your help in advance. > > -- > KH > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html Sent from the R help mailing list archive at Nabble.com.
khosoda at med.kobe-u.ac.jp
2011-May-16 04:49 UTC
[R] Question on approximations of full logistic regression model
Thank you for your reply, Prof. Harrell. I agree with you. Dropping only one variable does not actually help a lot. I have one more question. During analysis of this model I found that the confidence intervals (CIs) of some coefficients provided by bootstrapping (bootcov function in rms package) was narrower than CIs provided by usual variance-covariance matrix and CIs of other coefficients wider. My data has no cluster structure. I am wondering which CIs are better. I guess bootstrapping one, but is it right? I would appreciate your help in advance. -- KH (11/05/16 12:25), Frank Harrell wrote:> I think you are doing this correctly except for one thing. The validation > and other inferential calculations should be done on the full model. Use > the approximate model to get a simpler nomogram but not to get standard > errors. With only dropping one variable you might consider just running the > nomogram on the entire model. > Frank > > > KH wrote: >> >> Hi, >> I am trying to construct a logistic regression model from my data (104 >> patients and 25 events). I build a full model consisting of five >> predictors with the use of penalization by rms package (lrm, pentrace >> etc) because of events per variable issue. Then, I tried to approximate >> the full model by step-down technique predicting L from all of the >> componet variables using ordinary least squares (ols in rms package) as >> the followings. I would like to know whether I am doing right or not. >> >>> library(rms) >>> plogit<- predict(full.model) >>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) >>> fastbw(full.ols, aics=1e10) >> >> Deleted Chi-Sq d.f. P Residual d.f. P AIC R2 >> stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991 >> x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882 >> procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711 >> ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544 >> x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000 >> >> Then, fitted an approximation to the full model using most imprtant >> variable (R^2 for predictions from the reduced model against the >> original Y drops below 0.95), that is, dropping "stenosis". >> >>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure) >>> full.ols.approx$stats >> n Model L.R. d.f. R2 g Sigma >> 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622 >> >> This approximate model had R^2 against the full model of 0.99. >> Therefore, I updated the original full logistic model dropping >> "stenosis" as predictor. >> >>> full.approx.lrm<- update(full.model, ~ . -stenosis) >> >>> validate(full.model, bw=F, B=1000) >> index.orig training test optimism index.corrected n >> Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 >> R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000 >> Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000 >> Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000 >> Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000 >> >>> validate(full.approx.lrm, bw=F, B=1000) >> index.orig training test optimism index.corrected n >> Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 >> R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000 >> Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000 >> Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000 >> Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000 >> >> Validatin revealed this approximation was not bad. >> Then, I made a nomogram. >> >>> full.approx.lrm.nom<- nomogram(full.approx.lrm, >> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>> plot(full.approx.lrm.nom) >> >> Another nomogram using ols model, >> >>> full.ols.approx.nom<- nomogram(full.ols.approx, >> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>> plot(full.ols.approx.nom) >> >> These two nomograms are very similar but a little bit different. >> >> My questions are; >> >> 1. Am I doing right? >> >> 2. Which nomogram is correct >> >> I would appreciate your help in advance. >> >> -- >> KH >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > ----- > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.E-mail address Office: khosoda at med.kobe-u.ac.jp Home : khosoda at venus.dti.ne.jp
Frank Harrell
2011-May-16 13:01 UTC
[R] Question on approximations of full logistic regression model
The choice is not clear, and requires some simulations to estimate the average absolute error of the covariance matrix estimators. Frank ???? wrote:> > Thank you for your reply, Prof. Harrell. > > I agree with you. Dropping only one variable does not actually help a lot. > > I have one more question. > During analysis of this model I found that the confidence > intervals (CIs) of some coefficients provided by bootstrapping (bootcov > function in rms package) was narrower than CIs provided by usual > variance-covariance matrix and CIs of other coefficients wider. My data > has no cluster structure. I am wondering which CIs are better. > I guess bootstrapping one, but is it right? > > I would appreciate your help in advance. > -- > KH > > > > (11/05/16 12:25), Frank Harrell wrote: >> I think you are doing this correctly except for one thing. The >> validation >> and other inferential calculations should be done on the full model. Use >> the approximate model to get a simpler nomogram but not to get standard >> errors. With only dropping one variable you might consider just running >> the >> nomogram on the entire model. >> Frank >> >> >> KH wrote: >>> >>> Hi, >>> I am trying to construct a logistic regression model from my data (104 >>> patients and 25 events). I build a full model consisting of five >>> predictors with the use of penalization by rms package (lrm, pentrace >>> etc) because of events per variable issue. Then, I tried to approximate >>> the full model by step-down technique predicting L from all of the >>> componet variables using ordinary least squares (ols in rms package) as >>> the followings. I would like to know whether I am doing right or not. >>> >>>> library(rms) >>>> plogit<- predict(full.model) >>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, >>>> sigma=1) >>>> fastbw(full.ols, aics=1e10) >>> >>> Deleted Chi-Sq d.f. P Residual d.f. P AIC R2 >>> stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991 >>> x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882 >>> procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711 >>> ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544 >>> x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000 >>> >>> Then, fitted an approximation to the full model using most imprtant >>> variable (R^2 for predictions from the reduced model against the >>> original Y drops below 0.95), that is, dropping "stenosis". >>> >>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure) >>>> full.ols.approx$stats >>> n Model L.R. d.f. R2 g Sigma >>> 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622 >>> >>> This approximate model had R^2 against the full model of 0.99. >>> Therefore, I updated the original full logistic model dropping >>> "stenosis" as predictor. >>> >>>> full.approx.lrm<- update(full.model, ~ . -stenosis) >>> >>>> validate(full.model, bw=F, B=1000) >>> index.orig training test optimism index.corrected n >>> Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 >>> R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000 >>> Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000 >>> Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000 >>> Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000 >>> >>>> validate(full.approx.lrm, bw=F, B=1000) >>> index.orig training test optimism index.corrected n >>> Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 >>> R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000 >>> Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000 >>> Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000 >>> Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000 >>> >>> Validatin revealed this approximation was not bad. >>> Then, I made a nomogram. >>> >>>> full.approx.lrm.nom<- nomogram(full.approx.lrm, >>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>> plot(full.approx.lrm.nom) >>> >>> Another nomogram using ols model, >>> >>>> full.ols.approx.nom<- nomogram(full.ols.approx, >>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>> plot(full.ols.approx.nom) >>> >>> These two nomograms are very similar but a little bit different. >>> >>> My questions are; >>> >>> 1. Am I doing right? >>> >>> 2. Which nomogram is correct >>> >>> I would appreciate your help in advance. >>> >>> -- >>> KH >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> ----- >> Frank Harrell >> Department of Biostatistics, Vanderbilt University >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > E-mail address > Office: khosoda at med.kobe-u.ac.jp > Home : khosoda at venus.dti.ne.jp > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3526155.html Sent from the R help mailing list archive at Nabble.com.
khosoda at med.kobe-u.ac.jp
2011-May-16 14:19 UTC
[R] Question on approximations of full logistic regression model
Thank you for your comment, Prof. Harrell. I would appreciate it very much if you could teach me how to simulate for the estimation. For reference, following codes are what I did (bootcov, summary, and validation). MyFullModel.boot <- bootcov(MyFullModel, B=1000, coef.reps=T) > summary(MyFullModel, stenosis=c(70, 80), x1=c(1.5, 2.0), x2=c(1.5, 2.0)) Effects Response : outcome Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95 stenosis 70.0 80 10.0 -0.11 0.24 -0.59 0.37 Odds Ratio 70.0 80 10.0 0.90 NA 0.56 1.45 x1 1.5 2 0.5 1.21 0.37 0.49 1.94 Odds Ratio 1.5 2 0.5 3.36 NA 1.63 6.95 x2 1.5 2 0.5 -0.29 0.19 -0.65 0.08 Odds Ratio 1.5 2 0.5 0.75 NA 0.52 1.08 ClinicalScore 3.0 5 2.0 0.61 0.38 -0.14 1.36 Odds Ratio 3.0 5 2.0 1.84 NA 0.87 3.89 procedure - CA:CE 2.0 1 NA 0.83 0.46 -0.07 1.72 Odds Ratio 2.0 1 NA 2.28 NA 0.93 5.59 > summary(MyFullModel.boot, stenosis=c(70, 80), x1=c(1.5, 2.0), x2=c(1.5, 2.0)) Effects Response : outcome Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95 stenosis 70.0 80 10.0 -0.11 0.28 -0.65 0.43 Odds Ratio 70.0 80 10.0 0.90 NA 0.52 1.54 x1 1.5 2 0.5 1.21 0.29 0.65 1.77 Odds Ratio 1.5 2 0.5 3.36 NA 1.92 5.89 x2 1.5 2 0.5 -0.29 0.16 -0.59 0.02 Odds Ratio 1.5 2 0.5 0.75 NA 0.55 1.02 ClinicalScore 3.0 5 2.0 0.61 0.45 -0.28 1.50 Odds Ratio 3.0 5 2.0 1.84 NA 0.76 4.47 procedure - CAS:CEA 2.0 1 NA 0.83 0.38 0.07 1.58 Odds Ratio 2.0 1 NA 2.28 NA 1.08 4.85 > validate(MyFullModel, bw=F, B=1000) index.orig training test optimism index.corrected n Dxy 0.6425 0.7054 0.6122 0.0932 0.5493 1000 R2 0.3270 0.3745 0.3330 0.0415 0.2855 1000 Intercept 0.0000 0.0000 0.0683 -0.0683 0.0683 1000 Slope 1.0000 1.0000 1.0465 -0.0465 1.0465 1000 Emax 0.0000 0.0000 0.0221 0.0221 0.0221 1000 D 0.2715 0.2795 0.2424 0.0371 0.2345 1000 U -0.0192 -0.0192 -0.0035 -0.0157 -0.0035 1000 Q 0.2908 0.2987 0.2460 0.0528 0.2380 1000 B 0.1265 0.1164 0.1332 -0.0168 0.1433 1000 g 1.3366 1.5041 1.5495 -0.0455 1.3821 1000 gp 0.2082 0.2172 0.2258 -0.0087 0.2169 1000 > validate(MyFullModel.boot, bw=F, B=1000) index.orig training test optimism index.corrected n Dxy 0.6425 0.7015 0.6139 0.0877 0.5549 1000 R2 0.3270 0.3738 0.3346 0.0392 0.2878 1000 Intercept 0.0000 0.0000 0.0613 -0.0613 0.0613 1000 Slope 1.0000 1.0000 1.0569 -0.0569 1.0569 1000 Emax 0.0000 0.0000 0.0226 0.0226 0.0226 1000 D 0.2715 0.2805 0.2438 0.0367 0.2348 1000 U -0.0192 -0.0192 -0.0039 -0.0153 -0.0039 1000 Q 0.2908 0.2997 0.2477 0.0521 0.2387 1000 B 0.1265 0.1177 0.1329 -0.0153 0.1417 1000 g 1.3366 1.5020 1.5523 -0.0503 1.3869 1000 gp 0.2082 0.2191 0.2263 -0.0072 0.2154 1000 (11/05/16 22:01), Frank Harrell wrote:> The choice is not clear, and requires some simulations to estimate the > average absolute error of the covariance matrix estimators. > Frank > > > ???? wrote: >> >> Thank you for your reply, Prof. Harrell. >> >> I agree with you. Dropping only one variable does not actually help a lot. >> >> I have one more question. >> During analysis of this model I found that the confidence >> intervals (CIs) of some coefficients provided by bootstrapping (bootcov >> function in rms package) was narrower than CIs provided by usual >> variance-covariance matrix and CIs of other coefficients wider. My data >> has no cluster structure. I am wondering which CIs are better. >> I guess bootstrapping one, but is it right? >> >> I would appreciate your help in advance. >> -- >> KH >> >> >> >> (11/05/16 12:25), Frank Harrell wrote: >>> I think you are doing this correctly except for one thing. The >>> validation >>> and other inferential calculations should be done on the full model. Use >>> the approximate model to get a simpler nomogram but not to get standard >>> errors. With only dropping one variable you might consider just running >>> the >>> nomogram on the entire model. >>> Frank >>> >>> >>> KH wrote: >>>> >>>> Hi, >>>> I am trying to construct a logistic regression model from my data (104 >>>> patients and 25 events). I build a full model consisting of five >>>> predictors with the use of penalization by rms package (lrm, pentrace >>>> etc) because of events per variable issue. Then, I tried to approximate >>>> the full model by step-down technique predicting L from all of the >>>> componet variables using ordinary least squares (ols in rms package) as >>>> the followings. I would like to know whether I am doing right or not. >>>> >>>>> library(rms) >>>>> plogit<- predict(full.model) >>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, >>>>> sigma=1) >>>>> fastbw(full.ols, aics=1e10) >>>> >>>> Deleted Chi-Sq d.f. P Residual d.f. P AIC R2 >>>> stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991 >>>> x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882 >>>> procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711 >>>> ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544 >>>> x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000 >>>> >>>> Then, fitted an approximation to the full model using most imprtant >>>> variable (R^2 for predictions from the reduced model against the >>>> original Y drops below 0.95), that is, dropping "stenosis". >>>> >>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure) >>>>> full.ols.approx$stats >>>> n Model L.R. d.f. R2 g Sigma >>>> 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622 >>>> >>>> This approximate model had R^2 against the full model of 0.99. >>>> Therefore, I updated the original full logistic model dropping >>>> "stenosis" as predictor. >>>> >>>>> full.approx.lrm<- update(full.model, ~ . -stenosis) >>>> >>>>> validate(full.model, bw=F, B=1000) >>>> index.orig training test optimism index.corrected n >>>> Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 >>>> R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000 >>>> Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000 >>>> Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000 >>>> Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000 >>>> >>>>> validate(full.approx.lrm, bw=F, B=1000) >>>> index.orig training test optimism index.corrected n >>>> Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 >>>> R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000 >>>> Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000 >>>> Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000 >>>> Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000 >>>> >>>> Validatin revealed this approximation was not bad. >>>> Then, I made a nomogram. >>>> >>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm, >>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>>> plot(full.approx.lrm.nom) >>>> >>>> Another nomogram using ols model, >>>> >>>>> full.ols.approx.nom<- nomogram(full.ols.approx, >>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>>> plot(full.ols.approx.nom) >>>> >>>> These two nomograms are very similar but a little bit different. >>>> >>>> My questions are; >>>> >>>> 1. Am I doing right? >>>> >>>> 2. Which nomogram is correct >>>> >>>> I would appreciate your help in advance. >>>> >>>> -- >>>> KH >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> ----- >>> Frank Harrell >>> Department of Biostatistics, Vanderbilt University >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> E-mail address >> Office: khosoda at med.kobe-u.ac.jp >> Home : khosoda at venus.dti.ne.jp >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > ----- > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3526155.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- ************************************************* ????????????? ???????? ??? ?? ? ??650-0017?????????7??5-1 Phone: 078-382-5966 Fax : 078-382-5979 E-mail address Office: khosoda at med.kobe-u.ac.jp Home : khosoda at venus.dti.ne.jp
khosoda at med.kobe-u.ac.jp
2011-May-18 13:01 UTC
[R] Question on approximations of full logistic regression model
I tried to make a histogram of bootstrap distribution of my logistic model according to "Regression Model Strategy" (pp197-200). Attached is the histogram I made. The figure demonstrates bootstrap distribution of log odds ratio from my logistic model. The solid curve is a kernel density estimate and dashed curve is a normal density with the dame mean and standard deviation as the bootstrapped values. Vertical lines indicate asymmetric 0.9, 0.95, and 0,99 two-sided confidence limits for the log odds ratio based on quantiles of the bootstrap values. It seems to me that bootstrap distribution is normal and that estimation of confidence interval is, ummm, accurate. Am I right? The codes I used are followings; > library(rms) > b <- bootcov(MyModel.penalized, B=1000, coef.reps=T) > s <- summary(MyModel.penalized, x1=c(1.0, 1.5), x2=c(1.0, 1.5), ClinicalScore=c(4,6), procedure=c("E", "A")) > X <- predict(MyModel.penalized, data.frame(T1=c(1.0, 1.5), T2=c(1.0, 1.5), ClinicalScore=c(4,6), procedure=c("E", "A")), type="x") > X Intercept x1 x2 ClinicalScore procedure=E 1 1 1.0 1.0 4 1 2 1 1.5 1.5 6 0 > Xdif <- X[2, drop=F] -X[1, drop=F] > Xdif Intercept x1 x2 ClinicalScore procedure=E 2 0 0.5 0.5 2 -1 > conf <- c(.9, .95, .99) > bootplot(b, X=Xdif, conf.int=conf, xlim=c(0, 6)) > boot.log.odds.ratio <- b$boot.Coef %*% t(Xdif) > sd <- sqrt(var(boot.log.odds.ratio)) > sd 2 2 0.7412509 > z <- seq(0, 6, length=104) > lines(z, dnorm(z, mean=mean(boot.log.odds.ratio), sd = sd), lty=2) (11/05/16 22:01), Frank Harrell wrote:> The choice is not clear, and requires some simulations to estimate the > average absolute error of the covariance matrix estimators. > Frank > > > ???? wrote: >> >> Thank you for your reply, Prof. Harrell. >> >> I agree with you. Dropping only one variable does not actually help a lot. >> >> I have one more question. >> During analysis of this model I found that the confidence >> intervals (CIs) of some coefficients provided by bootstrapping (bootcov >> function in rms package) was narrower than CIs provided by usual >> variance-covariance matrix and CIs of other coefficients wider. My data >> has no cluster structure. I am wondering which CIs are better. >> I guess bootstrapping one, but is it right? >> >> I would appreciate your help in advance. >> -- >> KH >> >> >> >> (11/05/16 12:25), Frank Harrell wrote: >>> I think you are doing this correctly except for one thing. The >>> validation >>> and other inferential calculations should be done on the full model. Use >>> the approximate model to get a simpler nomogram but not to get standard >>> errors. With only dropping one variable you might consider just running >>> the >>> nomogram on the entire model. >>> Frank >>> >>> >>> KH wrote: >>>> >>>> Hi, >>>> I am trying to construct a logistic regression model from my data (104 >>>> patients and 25 events). I build a full model consisting of five >>>> predictors with the use of penalization by rms package (lrm, pentrace >>>> etc) because of events per variable issue. Then, I tried to approximate >>>> the full model by step-down technique predicting L from all of the >>>> componet variables using ordinary least squares (ols in rms package) as >>>> the followings. I would like to know whether I am doing right or not. >>>> >>>>> library(rms) >>>>> plogit<- predict(full.model) >>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, >>>>> sigma=1) >>>>> fastbw(full.ols, aics=1e10) >>>> >>>> Deleted Chi-Sq d.f. P Residual d.f. P AIC R2 >>>> stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991 >>>> x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882 >>>> procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711 >>>> ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544 >>>> x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000 >>>> >>>> Then, fitted an approximation to the full model using most imprtant >>>> variable (R^2 for predictions from the reduced model against the >>>> original Y drops below 0.95), that is, dropping "stenosis". >>>> >>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure) >>>>> full.ols.approx$stats >>>> n Model L.R. d.f. R2 g Sigma >>>> 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622 >>>> >>>> This approximate model had R^2 against the full model of 0.99. >>>> Therefore, I updated the original full logistic model dropping >>>> "stenosis" as predictor. >>>> >>>>> full.approx.lrm<- update(full.model, ~ . -stenosis) >>>> >>>>> validate(full.model, bw=F, B=1000) >>>> index.orig training test optimism index.corrected n >>>> Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 >>>> R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000 >>>> Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000 >>>> Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000 >>>> Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000 >>>> >>>>> validate(full.approx.lrm, bw=F, B=1000) >>>> index.orig training test optimism index.corrected n >>>> Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 >>>> R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000 >>>> Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000 >>>> Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000 >>>> Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000 >>>> >>>> Validatin revealed this approximation was not bad. >>>> Then, I made a nomogram. >>>> >>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm, >>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>>> plot(full.approx.lrm.nom) >>>> >>>> Another nomogram using ols model, >>>> >>>>> full.ols.approx.nom<- nomogram(full.ols.approx, >>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>>> plot(full.ols.approx.nom) >>>> >>>> These two nomograms are very similar but a little bit different. >>>> >>>> My questions are; >>>> >>>> 1. Am I doing right? >>>> >>>> 2. Which nomogram is correct >>>> >>>> I would appreciate your help in advance. >>>> >>>> -- >>>> KH-------------- next part -------------- A non-text attachment was scrubbed... Name: x5factor_final-CI-histogram.pdf Type: application/pdf Size: 103774 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110518/00b34a96/attachment.pdf>