khosoda at med.kobe-u.ac.jp
2011-Apr-20 08:23 UTC
[R] BMA, logistic regression, odds ratio, model reduction etc
Hi everybody, I apologize for long mail in advance. I have data of 104 patients, which consists of 15 explanatory variables and one binary outcome (poor/good). The outcome consists of 25 poor results and 79 good results. I tried to analyze the data with logistic regression. However, the 15 variables and 25 events means events per variable (EPV) is much less than 10 (rule of thumb). Therefore, I used R package, "BMA" to perform logistic regression with BMA to avoid this problem. model 1 (full model): x1, x2, x3, x4 are continuous variables and others are binary data.> x16.bic.glm <- bic.glm(outcome ~ ., data=x16.df,glm.family="binomial", OR20, strict=FALSE)> summary(x16.bic.glm)(The output below has been cut off at the right edge to save space) 62 models were selected Best 5 models (cumulative posterior probability = 0.3606 ): p!=0 EV SD model 1 model2 Intercept 100 -5.1348545 1.652424 -4.4688 -5.15 -5.1536 age 3.3 0.0001634 0.007258 . sex 4.0 .M -0.0243145 0.220314 . side 10.8 .R 0.0811227 0.301233 . procedure 46.9 -0.5356894 0.685148 . -1.163 symptom 3.8 -0.0099438 0.129690 . . stenosis 3.4 -0.0003343 0.005254 . x1 3.7 -0.0061451 0.144084 . x2 100.0 3.1707661 0.892034 3.2221 3.11 x3 51.3 -0.4577885 0.551466 -0.9154 . HT 4.6 .positive 0.0199299 0.161769 . . DM 3.3 .positive -0.0019986 0.105910 . . IHD 3.5 .positive 0.0077626 0.122593 . . smoking 9.1 .positive 0.0611779 0.258402 . . hyperlipidemia 16.0 .positive 0.1784293 0.512058 . . x4 8.2 0.0607398 0.267501 . . nVar 2 2 1 3 3 BIC -376.9082 -376.5588 -376.3094 -375.8468 -374.5582 post prob 0.104 0.087 0.077 0.061 0.032 [Question 1] Is it O.K to calculate odds ratio and its 95% confidence interval from "EV" (posterior distribution mean) and?SD?(posterior distribution standard deviation)? For example, 95%CI of EV of x2 can be calculated as;> exp(3.1707661)[1] 23.82573 -----> odds ratio> exp(3.1707661+1.96*0.892034)[1] 136.8866> exp(3.1707661-1.96*0.892034)[1] 4.146976 ------------------> 95%CI (4.1 to 136.9) Is this O.K.? [Question 2] Is it permissible to delete variables with small value of "p!=0" and "EV", such as age (3.3% and 0.0001634) to reduce the number of explanatory variables and reconstruct new model without those variables for new session of BMA? model 2 (reduced model): I used R package, "pvclust", to reduce the model. The result suggested x1, x2 and x4 belonged to the same cluster, so I picked up only x2. Based on the subject knowledge, I made a simple unweighted sum, by counting the number of clinical features. For 9 features (sex, side, HT2, hyperlipidemia, DM, IHD, smoking, symptom, age), the sum ranges from 0 to 9. This score was defined as ClinicalScore. Consequently, I made up new data set (x6.df), which consists of 5 variables (stenosis, x2, x3, procedure, and ClinicalScore) and one binary outcome (poor/good). Then, for alternative BMA session...> BMAx6.glm <- bic.glm(postopDWI_HI ~ ., data=x6.df,glm.family="binomial", OR=20, strict=FALSE)> summary(BMAx6.glm)(The output below has been cut off at the right edge to save space) Call: bic.glm.formula(f = postopDWI_HI ~ ., data = x6.df, glm.family "binomial", strict = FALSE, OR = 20) 13 models were selected Best 5 models (cumulative posterior probability = 0.7626 ): p!=0 EV SD model 1 model 2 Intercept 100 -5.6918362 1.81220 -4.4688 -6.3166 stenosis 8.1 -0.0008417 0.00815 . . x2 100.0 3.0606165 0.87765 3.2221 3.1154 x3 46.5 -0.3998864 0.52688 -0.9154 . procedure 49.3 0.5747013 0.70164 . 1.1631 ClinicalScore 27.1 0.0966633 0.19645 . . nVar 2 2 1 3 3 BIC -376.9082 -376.5588 -376.3094 -375.8468 -375.5025 post prob 0.208 0.175 0.154 0.122 0.103 [Question 3] Am I doing it correctly or not? I mean this kind of model reduction is permissible for BMA? [Question 4] I still have 5 variables, which violates the rule of thumb, "EPV > 10". Is it permissible to delete "stenosis" variable because of small value of "EV"? Or is it O.K. because this is BMA? Sorry for long post. I appreciate your help very much in advance. -- KH
Frank Harrell
2011-Apr-20 13:01 UTC
[R] BMA, logistic regression, odds ratio, model reduction etc
Deleting variables is a bad idea unless you make that a formal part of the BMA so that the attempt to delete variables is penalized for. Instead of BMA I recommend simple penalized maximum likelihood estimation (see the lrm function in the rms package) or pre-modeling data reduction that is blinded to the outcome variable. Frank ???? wrote:> > Hi everybody, > I apologize for long mail in advance. > > I have data of 104 patients, which consists of 15 explanatory variables > and one binary outcome (poor/good). The outcome consists of 25 poor > results and 79 good results. I tried to analyze the data with logistic > regression. However, the 15 variables and 25 events means events per > variable (EPV) is much less than 10 (rule of thumb). Therefore, I used R > package, "BMA" to perform logistic regression with BMA to avoid this > problem. > > model 1 (full model): > x1, x2, x3, x4 are continuous variables and others are binary data. > >> x16.bic.glm <- bic.glm(outcome ~ ., data=x16.df, > glm.family="binomial", OR20, strict=FALSE) >> summary(x16.bic.glm) > (The output below has been cut off at the right edge to save space) > > 62 models were selected > Best 5 models (cumulative posterior probability = 0.3606 ): > > p!=0 EV SD model 1 model2 > Intercept 100 -5.1348545 1.652424 -4.4688 -5.15 > -5.1536 > age 3.3 0.0001634 0.007258 . > sex 4.0 > .M -0.0243145 0.220314 . > side 10.8 > .R 0.0811227 0.301233 . > procedure 46.9 -0.5356894 0.685148 . -1.163 > symptom 3.8 -0.0099438 0.129690 . . > stenosis 3.4 -0.0003343 0.005254 . > x1 3.7 -0.0061451 0.144084 . > x2 100.0 3.1707661 0.892034 3.2221 3.11 > x3 51.3 -0.4577885 0.551466 -0.9154 . > HT 4.6 > .positive 0.0199299 0.161769 . . > DM 3.3 > .positive -0.0019986 0.105910 . . > IHD 3.5 > .positive 0.0077626 0.122593 . . > smoking 9.1 > .positive 0.0611779 0.258402 . . > hyperlipidemia 16.0 > .positive 0.1784293 0.512058 . . > x4 8.2 0.0607398 0.267501 . . > > > nVar 2 2 > 1 3 3 > BIC -376.9082 > -376.5588 -376.3094 -375.8468 -374.5582 > post prob 0.104 > 0.087 0.077 0.061 0.032 > > [Question 1] > Is it O.K to calculate odds ratio and its 95% confidence interval from > "EV" (posterior distribution mean) and?SD?(posterior distribution > standard deviation)? > For example, 95%CI of EV of x2 can be calculated as; >> exp(3.1707661) > [1] 23.82573 -----> odds ratio >> exp(3.1707661+1.96*0.892034) > [1] 136.8866 >> exp(3.1707661-1.96*0.892034) > [1] 4.146976 > ------------------> 95%CI (4.1 to 136.9) > Is this O.K.? > > [Question 2] > Is it permissible to delete variables with small value of "p!=0" and > "EV", such as age (3.3% and 0.0001634) to reduce the number of > explanatory variables and reconstruct new model without those variables > for new session of BMA? > > model 2 (reduced model): > I used R package, "pvclust", to reduce the model. The result suggested > x1, x2 and x4 belonged to the same cluster, so I picked up only x2. > Based on the subject knowledge, I made a simple unweighted sum, by > counting the number of clinical features. For 9 features (sex, side, > HT2, hyperlipidemia, DM, IHD, smoking, symptom, age), the sum ranges > from 0 to 9. This score was defined as ClinicalScore. Consequently, I > made up new data set (x6.df), which consists of 5 variables (stenosis, > x2, x3, procedure, and ClinicalScore) and one binary outcome > (poor/good). Then, for alternative BMA session... > >> BMAx6.glm <- bic.glm(postopDWI_HI ~ ., data=x6.df, > glm.family="binomial", OR=20, strict=FALSE) >> summary(BMAx6.glm) > (The output below has been cut off at the right edge to save space) > Call: > bic.glm.formula(f = postopDWI_HI ~ ., data = x6.df, glm.family > "binomial", strict = FALSE, OR = 20) > > > 13 models were selected > Best 5 models (cumulative posterior probability = 0.7626 ): > > p!=0 EV SD model 1 model 2 > Intercept 100 -5.6918362 1.81220 -4.4688 -6.3166 > stenosis 8.1 -0.0008417 0.00815 . . > x2 100.0 3.0606165 0.87765 3.2221 3.1154 > x3 46.5 -0.3998864 0.52688 -0.9154 . > procedure 49.3 0.5747013 0.70164 . 1.1631 > ClinicalScore 27.1 0.0966633 0.19645 . . > > > nVar 2 2 1 > 3 3 > BIC -376.9082 -376.5588 > -376.3094 -375.8468 -375.5025 > post prob 0.208 0.175 > 0.154 0.122 0.103 > > [Question 3] > Am I doing it correctly or not? > I mean this kind of model reduction is permissible for BMA? > > [Question 4] > I still have 5 variables, which violates the rule of thumb, "EPV > 10". > Is it permissible to delete "stenosis" variable because of small value > of "EV"? Or is it O.K. because this is BMA? > > Sorry for long post. > > I appreciate your help very much in advance. > > -- > KH > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3462919.html Sent from the R help mailing list archive at Nabble.com.
khosoda at med.kobe-u.ac.jp
2011-Apr-20 14:44 UTC
[R] BMA, logistic regression, odds ratio, model reduction etc
Dear Prof. Harrel, Thank you very much for your quick advice. I will try rms package. Regarding model reduction, is my model 2 method (clustering and recoding that are blinded to the outcome) permissible? Sincerely, -- KH (11/04/20 22:01), Frank Harrell wrote:> Deleting variables is a bad idea unless you make that a formal part of the > BMA so that the attempt to delete variables is penalized for. Instead of > BMA I recommend simple penalized maximum likelihood estimation (see the lrm > function in the rms package) or pre-modeling data reduction that is blinded > to the outcome variable. > Frank > > > ???? wrote: >> >> Hi everybody, >> I apologize for long mail in advance. >> >> I have data of 104 patients, which consists of 15 explanatory variables >> and one binary outcome (poor/good). The outcome consists of 25 poor >> results and 79 good results. I tried to analyze the data with logistic >> regression. However, the 15 variables and 25 events means events per >> variable (EPV) is much less than 10 (rule of thumb). Therefore, I used R >> package, "BMA" to perform logistic regression with BMA to avoid this >> problem. >> >> model 1 (full model): >> x1, x2, x3, x4 are continuous variables and others are binary data. >> >>> x16.bic.glm<- bic.glm(outcome ~ ., data=x16.df, >> glm.family="binomial", OR20, strict=FALSE) >>> summary(x16.bic.glm) >> (The output below has been cut off at the right edge to save space) >> >> 62 models were selected >> Best 5 models (cumulative posterior probability = 0.3606 ): >> >> p!=0 EV SD model 1 model2 >> Intercept 100 -5.1348545 1.652424 -4.4688 -5.15 >> -5.1536 >> age 3.3 0.0001634 0.007258 . >> sex 4.0 >> .M -0.0243145 0.220314 . >> side 10.8 >> .R 0.0811227 0.301233 . >> procedure 46.9 -0.5356894 0.685148 . -1.163 >> symptom 3.8 -0.0099438 0.129690 . . >> stenosis 3.4 -0.0003343 0.005254 . >> x1 3.7 -0.0061451 0.144084 . >> x2 100.0 3.1707661 0.892034 3.2221 3.11 >> x3 51.3 -0.4577885 0.551466 -0.9154 . >> HT 4.6 >> .positive 0.0199299 0.161769 . . >> DM 3.3 >> .positive -0.0019986 0.105910 . . >> IHD 3.5 >> .positive 0.0077626 0.122593 . . >> smoking 9.1 >> .positive 0.0611779 0.258402 . . >> hyperlipidemia 16.0 >> .positive 0.1784293 0.512058 . . >> x4 8.2 0.0607398 0.267501 . . >> >> >> nVar 2 2 >> 1 3 3 >> BIC -376.9082 >> -376.5588 -376.3094 -375.8468 -374.5582 >> post prob 0.104 >> 0.087 0.077 0.061 0.032 >> >> [Question 1] >> Is it O.K to calculate odds ratio and its 95% confidence interval from >> "EV" (posterior distribution mean) and?SD?(posterior distribution >> standard deviation)? >> For example, 95%CI of EV of x2 can be calculated as; >>> exp(3.1707661) >> [1] 23.82573 -----> odds ratio >>> exp(3.1707661+1.96*0.892034) >> [1] 136.8866 >>> exp(3.1707661-1.96*0.892034) >> [1] 4.146976 >> ------------------> 95%CI (4.1 to 136.9) >> Is this O.K.? >> >> [Question 2] >> Is it permissible to delete variables with small value of "p!=0" and >> "EV", such as age (3.3% and 0.0001634) to reduce the number of >> explanatory variables and reconstruct new model without those variables >> for new session of BMA? >> >> model 2 (reduced model): >> I used R package, "pvclust", to reduce the model. The result suggested >> x1, x2 and x4 belonged to the same cluster, so I picked up only x2. >> Based on the subject knowledge, I made a simple unweighted sum, by >> counting the number of clinical features. For 9 features (sex, side, >> HT2, hyperlipidemia, DM, IHD, smoking, symptom, age), the sum ranges >> from 0 to 9. This score was defined as ClinicalScore. Consequently, I >> made up new data set (x6.df), which consists of 5 variables (stenosis, >> x2, x3, procedure, and ClinicalScore) and one binary outcome >> (poor/good). Then, for alternative BMA session... >> >>> BMAx6.glm<- bic.glm(postopDWI_HI ~ ., data=x6.df, >> glm.family="binomial", OR=20, strict=FALSE) >>> summary(BMAx6.glm) >> (The output below has been cut off at the right edge to save space) >> Call: >> bic.glm.formula(f = postopDWI_HI ~ ., data = x6.df, glm.family >> "binomial", strict = FALSE, OR = 20) >> >> >> 13 models were selected >> Best 5 models (cumulative posterior probability = 0.7626 ): >> >> p!=0 EV SD model 1 model 2 >> Intercept 100 -5.6918362 1.81220 -4.4688 -6.3166 >> stenosis 8.1 -0.0008417 0.00815 . . >> x2 100.0 3.0606165 0.87765 3.2221 3.1154 >> x3 46.5 -0.3998864 0.52688 -0.9154 . >> procedure 49.3 0.5747013 0.70164 . 1.1631 >> ClinicalScore 27.1 0.0966633 0.19645 . . >> >> >> nVar 2 2 1 >> 3 3 >> BIC -376.9082 -376.5588 >> -376.3094 -375.8468 -375.5025 >> post prob 0.208 0.175 >> 0.154 0.122 0.103 >> >> [Question 3] >> Am I doing it correctly or not? >> I mean this kind of model reduction is permissible for BMA? >> >> [Question 4] >> I still have 5 variables, which violates the rule of thumb, "EPV> 10". >> Is it permissible to delete "stenosis" variable because of small value >> of "EV"? Or is it O.K. because this is BMA? >> >> Sorry for long post. >> >> I appreciate your help very much in advance. >> >> -- >> KH >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > ----- > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3462919.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- ************************************************* ????????????? ???????? ??? ?? ? ??650-0017?????????7??5-1 Phone: 078-382-5966 Fax : 078-382-5979 E-mail address Office: khosoda at med.kobe-u.ac.jp Home : khosoda at venus.dti.ne.jp