thr3ads.net - R help - [R] BMA, logistic regression, odds ratio, model reduction etc [Apr 2011]

If this information is useful, please help other people find it:
Share via:

khosoda at med.kobe-u.ac.jp

2011-Apr-20 08:23 UTC

[R] BMA, logistic regression, odds ratio, model reduction etc

Hi everybody,
I apologize for long mail in advance.

I have data of 104 patients, which consists of 15 explanatory variables
and one binary outcome (poor/good). The outcome consists of 25 poor
results and 79 good results. I tried to analyze the data with logistic
regression. However, the 15 variables and 25 events means events per
variable (EPV) is much less than 10 (rule of thumb). Therefore, I used R
package, "BMA" to perform logistic regression with BMA to avoid this
problem.

model 1 (full model):
x1, x2, x3, x4 are continuous variables and others are binary data.
> x16.bic.glm <- bic.glm(outcome ~ ., data=x16.df,
glm.family="binomial", OR20, strict=FALSE)> summary(x16.bic.glm)(The output below has been cut off at the right edge to save space)

  62  models were selected
 Best  5  models (cumulative posterior probability =  0.3606 ):

                         p!=0    EV         SD        model 1    model2
Intercept                100    -5.1348545  1.652424    -4.4688  -5.15
-5.1536
age                        3.3   0.0001634  0.007258      .
sex                        4.0
   .M                           -0.0243145  0.220314      .
side                      10.8
    .R                           0.0811227  0.301233      .
procedure                 46.9  -0.5356894  0.685148      .      -1.163
symptom                    3.8  -0.0099438  0.129690      .          .
stenosis                   3.4  -0.0003343  0.005254      .
x1                        3.7  -0.0061451  0.144084      .
x2                       100.0   3.1707661  0.892034     3.2221     3.11
x3                        51.3  -0.4577885  0.551466    -0.9154     .
HT                         4.6
  .positive                      0.0199299  0.161769      .          .
DM                         3.3
  .positive                     -0.0019986  0.105910      .          .
IHD                        3.5
   .positive                     0.0077626  0.122593      .          .
smoking                    9.1
       .positive                 0.0611779  0.258402      .          .
hyperlipidemia            16.0
              .positive          0.1784293  0.512058      .          .
x4                         8.2   0.0607398  0.267501      .          .


nVar                                                       2          2
         1          3          3
BIC                                                   -376.9082
-376.5588  -376.3094  -375.8468  -374.5582
post prob                                                0.104
0.087      0.077      0.061      0.032

[Question 1]
Is it O.K to calculate odds ratio and its 95% confidence interval from
"EV" (posterior distribution mean) and?SD?(posterior distribution
standard deviation)?
For example, 95%CI of EV of x2 can be calculated as;> exp(3.1707661)
[1] 23.82573     -----> odds ratio> exp(3.1707661+1.96*0.892034)
[1] 136.8866> exp(3.1707661-1.96*0.892034)[1] 4.146976
------------------> 95%CI (4.1 to 136.9)
Is this O.K.?

[Question 2]
Is it permissible to delete variables with small value of "p!=0" and
"EV", such as age (3.3% and 0.0001634) to reduce the number of
explanatory variables and reconstruct new model without those variables
for new session of BMA?

model 2 (reduced model):
I used R package, "pvclust", to reduce the model. The result suggested
x1, x2 and x4 belonged to the same cluster, so I picked up only x2.
Based on the subject knowledge, I made a simple unweighted sum, by
counting the number of clinical features. For 9 features (sex, side,
HT2, hyperlipidemia, DM, IHD, smoking, symptom, age), the sum ranges
from 0 to 9. This score was defined as ClinicalScore. Consequently, I
made up new data set (x6.df), which consists of 5 variables (stenosis,
x2, x3, procedure, and ClinicalScore) and one binary outcome
(poor/good). Then, for alternative BMA session...
> BMAx6.glm <- bic.glm(postopDWI_HI ~ ., data=x6.df,
glm.family="binomial", OR=20, strict=FALSE)> summary(BMAx6.glm)(The output below has been cut off at the right edge to save space)
Call:
bic.glm.formula(f = postopDWI_HI ~ ., data = x6.df, glm.family
"binomial",     strict = FALSE, OR = 20)


  13  models were selected
 Best  5  models (cumulative posterior probability =  0.7626 ):

                p!=0    EV         SD       model 1    model 2
Intercept       100    -5.6918362  1.81220    -4.4688    -6.3166
stenosis          8.1  -0.0008417  0.00815      .          .
x2              100.0   3.0606165  0.87765     3.2221     3.1154
x3               46.5  -0.3998864  0.52688    -0.9154      .
procedure       49.3   0.5747013  0.70164      .         1.1631
ClinicalScore   27.1   0.0966633  0.19645      .          .


nVar                                             2          2          1
         3          3
BIC                                         -376.9082  -376.5588
-376.3094  -375.8468  -375.5025
post prob                                      0.208      0.175
0.154      0.122      0.103

[Question 3]
Am I doing it correctly or not?
I mean this kind of model reduction is permissible for BMA?

[Question 4]
I still have 5 variables, which violates the rule of thumb, "EPV >
10".
Is it permissible to delete "stenosis" variable because of small value
of "EV"? Or is it O.K. because this is BMA?

Sorry for long post.

I appreciate your help very much in advance.

--
KH

Frank Harrell

2011-Apr-20 13:01 UTC

head link

[R] BMA, logistic regression, odds ratio, model reduction etc

Deleting variables is a bad idea unless you make that a formal part of the
BMA so that the attempt to delete variables is penalized for.  Instead of
BMA I recommend simple penalized maximum likelihood estimation (see the lrm
function in the rms package) or pre-modeling data reduction that is blinded
to the outcome variable.
Frank


???? wrote:> 
> Hi everybody,
> I apologize for long mail in advance.
> 
> I have data of 104 patients, which consists of 15 explanatory variables
> and one binary outcome (poor/good). The outcome consists of 25 poor
> results and 79 good results. I tried to analyze the data with logistic
> regression. However, the 15 variables and 25 events means events per
> variable (EPV) is much less than 10 (rule of thumb). Therefore, I used R
> package, "BMA" to perform logistic regression with BMA to avoid
this
> problem.
> 
> model 1 (full model):
> x1, x2, x3, x4 are continuous variables and others are binary data.
> 
>> x16.bic.glm <- bic.glm(outcome ~ ., data=x16.df,
> glm.family="binomial", OR20, strict=FALSE)
>> summary(x16.bic.glm)
> (The output below has been cut off at the right edge to save space)
> 
>   62  models were selected
>  Best  5  models (cumulative posterior probability =  0.3606 ):
> 
>                          p!=0    EV         SD        model 1    model2
> Intercept                100    -5.1348545  1.652424    -4.4688  -5.15
> -5.1536
> age                        3.3   0.0001634  0.007258      .
> sex                        4.0
>    .M                           -0.0243145  0.220314      .
> side                      10.8
>     .R                           0.0811227  0.301233      .
> procedure                 46.9  -0.5356894  0.685148      .      -1.163
> symptom                    3.8  -0.0099438  0.129690      .          .
> stenosis                   3.4  -0.0003343  0.005254      .
> x1                        3.7  -0.0061451  0.144084      .
> x2                       100.0   3.1707661  0.892034     3.2221     3.11
> x3                        51.3  -0.4577885  0.551466    -0.9154     .
> HT                         4.6
>   .positive                      0.0199299  0.161769      .          .
> DM                         3.3
>   .positive                     -0.0019986  0.105910      .          .
> IHD                        3.5
>    .positive                     0.0077626  0.122593      .          .
> smoking                    9.1
>        .positive                 0.0611779  0.258402      .          .
> hyperlipidemia            16.0
>               .positive          0.1784293  0.512058      .          .
> x4                         8.2   0.0607398  0.267501      .          .
> 
> 
> nVar                                                       2          2
>          1          3          3
> BIC                                                   -376.9082
> -376.5588  -376.3094  -375.8468  -374.5582
> post prob                                                0.104
> 0.087      0.077      0.061      0.032
> 
> [Question 1]
> Is it O.K to calculate odds ratio and its 95% confidence interval from
> "EV" (posterior distribution mean) and?SD?(posterior distribution
> standard deviation)?
> For example, 95%CI of EV of x2 can be calculated as;
>> exp(3.1707661)
> [1] 23.82573     -----> odds ratio
>> exp(3.1707661+1.96*0.892034)
> [1] 136.8866
>> exp(3.1707661-1.96*0.892034)
> [1] 4.146976
> ------------------> 95%CI (4.1 to 136.9)
> Is this O.K.?
> 
> [Question 2]
> Is it permissible to delete variables with small value of "p!=0"
and
> "EV", such as age (3.3% and 0.0001634) to reduce the number of
> explanatory variables and reconstruct new model without those variables
> for new session of BMA?
> 
> model 2 (reduced model):
> I used R package, "pvclust", to reduce the model. The result
suggested
> x1, x2 and x4 belonged to the same cluster, so I picked up only x2.
> Based on the subject knowledge, I made a simple unweighted sum, by
> counting the number of clinical features. For 9 features (sex, side,
> HT2, hyperlipidemia, DM, IHD, smoking, symptom, age), the sum ranges
> from 0 to 9. This score was defined as ClinicalScore. Consequently, I
> made up new data set (x6.df), which consists of 5 variables (stenosis,
> x2, x3, procedure, and ClinicalScore) and one binary outcome
> (poor/good). Then, for alternative BMA session...
> 
>> BMAx6.glm <- bic.glm(postopDWI_HI ~ ., data=x6.df,
> glm.family="binomial", OR=20, strict=FALSE)
>> summary(BMAx6.glm)
> (The output below has been cut off at the right edge to save space)
> Call:
> bic.glm.formula(f = postopDWI_HI ~ ., data = x6.df, glm.family >
"binomial",     strict = FALSE, OR = 20)
> 
> 
>   13  models were selected
>  Best  5  models (cumulative posterior probability =  0.7626 ):
> 
>                 p!=0    EV         SD       model 1    model 2
> Intercept       100    -5.6918362  1.81220    -4.4688    -6.3166
> stenosis          8.1  -0.0008417  0.00815      .          .
> x2              100.0   3.0606165  0.87765     3.2221     3.1154
> x3               46.5  -0.3998864  0.52688    -0.9154      .
> procedure       49.3   0.5747013  0.70164      .         1.1631
> ClinicalScore   27.1   0.0966633  0.19645      .          .
> 
> 
> nVar                                             2          2          1
>          3          3
> BIC                                         -376.9082  -376.5588
> -376.3094  -375.8468  -375.5025
> post prob                                      0.208      0.175
> 0.154      0.122      0.103
> 
> [Question 3]
> Am I doing it correctly or not?
> I mean this kind of model reduction is permissible for BMA?
> 
> [Question 4]
> I still have 5 variables, which violates the rule of thumb, "EPV >
10".
> Is it permissible to delete "stenosis" variable because of small
value
> of "EV"? Or is it O.K. because this is BMA?
> 
> Sorry for long post.
> 
> I appreciate your help very much in advance.
> 
> --
> KH
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3462919.html
Sent from the R help mailing list archive at Nabble.com.

khosoda at med.kobe-u.ac.jp

2011-Apr-20 14:44 UTC

head link

[R] BMA, logistic regression, odds ratio, model reduction etc

Dear Prof. Harrel,

Thank you very much for your quick advice.
I will try rms package.

Regarding model reduction, is my model 2 method (clustering and recoding 
that are blinded to the outcome) permissible?

Sincerely,

--
KH

(11/04/20 22:01), Frank Harrell wrote:> Deleting variables is a bad idea unless you make that a formal part of the
> BMA so that the attempt to delete variables is penalized for.  Instead of
> BMA I recommend simple penalized maximum likelihood estimation (see the lrm
> function in the rms package) or pre-modeling data reduction that is blinded
> to the outcome variable.
> Frank
>
>
> ???? wrote:
>>
>> Hi everybody,
>> I apologize for long mail in advance.
>>
>> I have data of 104 patients, which consists of 15 explanatory variables
>> and one binary outcome (poor/good). The outcome consists of 25 poor
>> results and 79 good results. I tried to analyze the data with logistic
>> regression. However, the 15 variables and 25 events means events per
>> variable (EPV) is much less than 10 (rule of thumb). Therefore, I used
R
>> package, "BMA" to perform logistic regression with BMA to
avoid this
>> problem.
>>
>> model 1 (full model):
>> x1, x2, x3, x4 are continuous variables and others are binary data.
>>
>>> x16.bic.glm<- bic.glm(outcome ~ ., data=x16.df,
>> glm.family="binomial", OR20, strict=FALSE)
>>> summary(x16.bic.glm)
>> (The output below has been cut off at the right edge to save space)
>>
>>    62  models were selected
>>   Best  5  models (cumulative posterior probability =  0.3606 ):
>>
>>                           p!=0    EV         SD        model 1   
model2
>> Intercept                100    -5.1348545  1.652424    -4.4688  -5.15
>> -5.1536
>> age                        3.3   0.0001634  0.007258      .
>> sex                        4.0
>>     .M                           -0.0243145  0.220314      .
>> side                      10.8
>>      .R                           0.0811227  0.301233      .
>> procedure                 46.9  -0.5356894  0.685148      .      -1.163
>> symptom                    3.8  -0.0099438  0.129690      .          .
>> stenosis                   3.4  -0.0003343  0.005254      .
>> x1                        3.7  -0.0061451  0.144084      .
>> x2                       100.0   3.1707661  0.892034     3.2221    
3.11
>> x3                        51.3  -0.4577885  0.551466    -0.9154     .
>> HT                         4.6
>>    .positive                      0.0199299  0.161769      .          .
>> DM                         3.3
>>    .positive                     -0.0019986  0.105910      .          .
>> IHD                        3.5
>>     .positive                     0.0077626  0.122593      .          .
>> smoking                    9.1
>>         .positive                 0.0611779  0.258402      .          .
>> hyperlipidemia            16.0
>>                .positive          0.1784293  0.512058      .          .
>> x4                         8.2   0.0607398  0.267501      .          .
>>
>>
>> nVar                                                       2          2
>>           1          3          3
>> BIC                                                   -376.9082
>> -376.5588  -376.3094  -375.8468  -374.5582
>> post prob                                                0.104
>> 0.087      0.077      0.061      0.032
>>
>> [Question 1]
>> Is it O.K to calculate odds ratio and its 95% confidence interval from
>> "EV" (posterior distribution mean) and?SD?(posterior
distribution
>> standard deviation)?
>> For example, 95%CI of EV of x2 can be calculated as;
>>> exp(3.1707661)
>> [1] 23.82573     ----->  odds ratio
>>> exp(3.1707661+1.96*0.892034)
>> [1] 136.8866
>>> exp(3.1707661-1.96*0.892034)
>> [1] 4.146976
>> ------------------>  95%CI (4.1 to 136.9)
>> Is this O.K.?
>>
>> [Question 2]
>> Is it permissible to delete variables with small value of
"p!=0" and
>> "EV", such as age (3.3% and 0.0001634) to reduce the number
of
>> explanatory variables and reconstruct new model without those variables
>> for new session of BMA?
>>
>> model 2 (reduced model):
>> I used R package, "pvclust", to reduce the model. The result
suggested
>> x1, x2 and x4 belonged to the same cluster, so I picked up only x2.
>> Based on the subject knowledge, I made a simple unweighted sum, by
>> counting the number of clinical features. For 9 features (sex, side,
>> HT2, hyperlipidemia, DM, IHD, smoking, symptom, age), the sum ranges
>> from 0 to 9. This score was defined as ClinicalScore. Consequently, I
>> made up new data set (x6.df), which consists of 5 variables (stenosis,
>> x2, x3, procedure, and ClinicalScore) and one binary outcome
>> (poor/good). Then, for alternative BMA session...
>>
>>> BMAx6.glm<- bic.glm(postopDWI_HI ~ ., data=x6.df,
>> glm.family="binomial", OR=20, strict=FALSE)
>>> summary(BMAx6.glm)
>> (The output below has been cut off at the right edge to save space)
>> Call:
>> bic.glm.formula(f = postopDWI_HI ~ ., data = x6.df, glm.family >>
"binomial",     strict = FALSE, OR = 20)
>>
>>
>>    13  models were selected
>>   Best  5  models (cumulative posterior probability =  0.7626 ):
>>
>>                  p!=0    EV         SD       model 1    model 2
>> Intercept       100    -5.6918362  1.81220    -4.4688    -6.3166
>> stenosis          8.1  -0.0008417  0.00815      .          .
>> x2              100.0   3.0606165  0.87765     3.2221     3.1154
>> x3               46.5  -0.3998864  0.52688    -0.9154      .
>> procedure       49.3   0.5747013  0.70164      .         1.1631
>> ClinicalScore   27.1   0.0966633  0.19645      .          .
>>
>>
>> nVar                                             2          2         
1
>>           3          3
>> BIC                                         -376.9082  -376.5588
>> -376.3094  -375.8468  -375.5025
>> post prob                                      0.208      0.175
>> 0.154      0.122      0.103
>>
>> [Question 3]
>> Am I doing it correctly or not?
>> I mean this kind of model reduction is permissible for BMA?
>>
>> [Question 4]
>> I still have 5 variables, which violates the rule of thumb,
"EPV>  10".
>> Is it permissible to delete "stenosis" variable because of
small value
>> of "EV"? Or is it O.K. because this is BMA?
>>
>> Sorry for long post.
>>
>> I appreciate your help very much in advance.
>>
>> --
>> KH
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context:
http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3462919.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
*************************************************
????????????? ????????
??? ??
?
??650-0017?????????7??5-1
     Phone: 078-382-5966
     Fax  : 078-382-5979
     E-mail address
         Office: khosoda at med.kobe-u.ac.jp
	Home  : khosoda at venus.dti.ne.jp

R help - Apr 2011 - BMA, logistic regression, odds ratio, model reduction etc

[R] BMA, logistic regression, odds ratio, model reduction etc

[R] BMA, logistic regression, odds ratio, model reduction etc

[R] BMA, logistic regression, odds ratio, model reduction etc