thr3ads.net - R help - [R] paradox about the degree of freedom in a logistic regression model [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Bin Yue

2007-Dec-07 07:55 UTC

[R] paradox about the degree of freedom in a logistic regression model

Dear all:
   "predict.glm" provides an example to perform logistic regression
when the
response variable is a tow-columned  matrix. I find some paradox about the
degree of freedom  .
 > summary(budworm.lg)

Call:
glm(formula = SF ~ sex * ldose, family = binomial)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.39849  -0.32094  -0.07592   0.38220   1.10375  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.9935     0.5527  -5.416 6.09e-08 ***
sexM          0.1750     0.7783   0.225    0.822    
ldose         0.9060     0.1671   5.422 5.89e-08 ***
sexM:ldose    0.3529     0.2700   1.307    0.191    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 124.8756  on 11  degrees of freedom
Residual deviance:   4.9937  on  8  degrees of freedom
AIC: 43.104

Number of Fisher Scoring iterations: 4

This is the data set used in regression:
  numdead numalive sex ldose
1        1       19   M     0
2        4       16   M     1
3        9       11   M     2
4       13        7   M     3
5       18        2   M     4
6       20        0   M     5
7        0       20   F     0
8        2       18   F     1
9        6       14   F     2
10      10       10   F     3
11      12        8   F     4
12      16        4   F     5

     The degree of freedom is 8. Each row in the example is thought to be
one observation. If  I extend it to be a three column data.frame, the first
denoting the whether the individual is alive , the secode denoting the sex,
and the third "ldose",there will be 12*20=240 observations. 
     Since my data set is one of the second type , I wish to know whether
the form of data set affects the result of regression ,such as the degree of
freedom.
   Dose anybody have any idea about this? Thank all who read this message.
   Regards,
   Bin Yue

-----
Best regards,
Bin Yue

*************
student for a Master program in South Botanical Garden , CAS

-- 
View this message in context:
http://www.nabble.com/paradox-about-the-degree-of-freedom-in-a-logistic-regression-model-tf4960753.html#a14208306
Sent from the R help mailing list archive at Nabble.com.

Peter Dalgaard

2007-Dec-07 14:14 UTC

head link

[R] paradox about the degree of freedom in a logistic regression model

Bin Yue wrote:>  Dear all:
>    "predict.glm" provides an example to perform logistic
regression when the
> response variable is a tow-columned  matrix. I find some paradox about the
> degree of freedom  .
>  > summary(budworm.lg)
>
> Call:
> glm(formula = SF ~ sex * ldose, family = binomial)
>
> Deviance Residuals: 
>      Min        1Q    Median        3Q       Max  
> -1.39849  -0.32094  -0.07592   0.38220   1.10375  
>
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept)  -2.9935     0.5527  -5.416 6.09e-08 ***
> sexM          0.1750     0.7783   0.225    0.822    
> ldose         0.9060     0.1671   5.422 5.89e-08 ***
> sexM:ldose    0.3529     0.2700   1.307    0.191    
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 
>
> (Dispersion parameter for binomial family taken to be 1)
>
>     Null deviance: 124.8756  on 11  degrees of freedom
> Residual deviance:   4.9937  on  8  degrees of freedom
> AIC: 43.104
>
> Number of Fisher Scoring iterations: 4
>
> This is the data set used in regression:
>   numdead numalive sex ldose
> 1        1       19   M     0
> 2        4       16   M     1
> 3        9       11   M     2
> 4       13        7   M     3
> 5       18        2   M     4
> 6       20        0   M     5
> 7        0       20   F     0
> 8        2       18   F     1
> 9        6       14   F     2
> 10      10       10   F     3
> 11      12        8   F     4
> 12      16        4   F     5
>
>      The degree of freedom is 8. Each row in the example is thought to be
> one observation. If  I extend it to be a three column data.frame, the first
> denoting the whether the individual is alive , the secode denoting the sex,
> and the third "ldose",there will be 12*20=240 observations. 
>      Since my data set is one of the second type , I wish to know whether
> the form of data set affects the result of regression ,such as the degree
of
> freedom.
>    Dose anybody have any idea about this? Thank all who read this message.
>    Regards,
>    Bin Yue
>
>   Yes. Never use the deviance in binary logistic regression. Only use
differences in deviance between models, each of which satisfy
requirements for asymptotic theory (in your case, you could compare your
model with that described by sex*factor(ldose)). Another striking
example is this

y <- rbinom(1000, prob=.5, size=1)
summary(glm(y~-1,binomial))

now try it with different data

y <- rbinom(1000, prob=.01, size=1)
summary(glm(y~-1,binomial))

and think about it. Then consider the same thing with y~1.

As Brian keeps telling me, there IS a sense in which the residual
deviances make sense in such cases, but it is not as a means of testing
the model adequacy.
> -----
> Best regards,
> Bin Yue
>
> *************
> student for a Master program in South Botanical Garden , CAS
>
>   

-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Seemingly Similar Threads

Search for more reasonably related threads

R help - Dec 2007 - paradox about the degree of freedom in a logistic regression model

[R] paradox about the degree of freedom in a logistic regression model

[R] paradox about the degree of freedom in a logistic regression model

Seemingly Similar Threads