thr3ads.net - R help - [R] poisson regression with robust error variance ('eyestudy [May 2008]

If this information is useful, please help other people find it:
Share via:

Paul Johnson

2008-May-08 20:35 UTC

[R] poisson regression with robust error variance ('eyestudy

Ted Harding said:> I can get the estimated RRs from
> RRs <- exp(summary(GLM)$coef[,1])
> but do not see how to implement confidence intervals based
> on "robust error variances" using the output in GLM.

Thanks for the link to the data.  Here's my best guess.   If you use
the following approach, with the HC0 type of robust standard errors in
the "sandwich" package (thanks to Achim Zeileis), you get
"almost" the
same numbers as that Stata output gives. The estimated b's from the
glm match exactly, but the robust standard errors are a bit off.



### Paul Johnson 2008-05-08
### sandwichGLM.R
system("wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta")

library(foreign)

dat <- read.dta("eyestudy.dta")

### Ach, stata codes factor contrasts backwards

dat$carrot0 <- ifelse(dat$carrot==0,1,0)
dat$gender1 <- ifelse(dat$gender==1,1,0)

glm1 <- glm(lenses~carrot0,  data=dat, family=poisson(link=log))

summary(glm1)

library(sandwich)

vcovHC(glm1)


sqrt(diag(vcovHC(glm1)))


sqrt(diag(vcovHC(glm1, type="HC0")))

### Result:
# > summary(glm1)
# Call:
# glm(formula = lenses ~ carrot0, family = poisson(link = log),
#    data = dat)

# Deviance Residuals:
#     Min       1Q   Median       3Q      Max
# -1.1429  -0.9075   0.3979   0.3979   0.7734

# Coefficients:
#            Estimate Std. Error z value Pr(>|z|)
# (Intercept)  -0.8873     0.2182  -4.066 4.78e-05 ***
# carrot0       0.4612     0.2808   1.642    0.101
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

# (Dispersion parameter for poisson family taken to be 1)

#     Null deviance: 67.297  on 99  degrees of freedom
# Residual deviance: 64.536  on 98  degrees of freedom
# AIC: 174.54

# Number of Fisher Scoring iterations: 5

# > sqrt(diag(vcovHC(glm1, type="HC0")))
# (Intercept)     carrot0
#  0.1673655   0.1971117
# >

### Compare against
## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm
## robust standard errors are:
## .1682086  .1981048


glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat,
family=poisson(link=log))


sqrt(diag(vcovHC(glm2, type="HC0")))

### Result:


# > summary(glm2)

#Call:
# glm(formula = lenses ~ carrot0 + gender1 + latitude, family poisson(link =
log),
#     data = dat)

# Deviance Residuals:
#     Min       1Q   Median       3Q      Max
# -1.2332  -0.9316   0.2437   0.5470   0.9466

# Coefficients:
#            Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.65212    0.69820  -0.934   0.3503
# carrot0      0.48322    0.28310   1.707   0.0878 .
# gender1      0.20520    0.27807   0.738   0.4605
# latitude    -0.01001    0.01898  -0.527   0.5980
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

# (Dispersion parameter for poisson family taken to be 1)

#    Null deviance: 67.297  on 99  degrees of freedom
# Residual deviance: 63.762  on 96  degrees of freedom
# AIC: 177.76

# Number of Fisher Scoring iterations: 5

# > sqrt(diag(vcovHC(glm2, type="HC0")))
# (Intercept)     carrot0     gender1    latitude
# 0.49044963  0.19537743  0.18481067  0.01275001

### UCLA site has:

## .4929193   .1963616  .1857414  .0128142


So, either there is some "rounding error" or Stata is not using
exactly the same algorithm as vcovHC in sandwich.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

(Ted Harding)

2008-May-08 21:28 UTC

head link

[R] poisson regression with robust error variance ('eyestudy

Once again, Paul, many thanks for your thorough examination
of this question! And for spelling out your approach!!!

It certainly looks as though you're very close to target
(or even spot-on).

I've only one comment -- see at end.

On 08-May-08 20:35:38, Paul Johnson wrote:> Ted Harding said:
>> I can get the estimated RRs from
> 
>> RRs <- exp(summary(GLM)$coef[,1])
> 
>> but do not see how to implement confidence intervals based
>> on "robust error variances" using the output in GLM.
> 
> Thanks for the link to the data. Here's my best guess. If you
> use the following approach, with the HC0 type of robust standard
> errors in the "sandwich" package (thanks to Achim Zeileis),
> you get "almost" the same numbers as that Stata output gives.
> The estimated b's from the glm match exactly, but the robust
> standard errors are a bit off.
> 
>### Paul Johnson 2008-05-08
>### sandwichGLM.R
> system("wget
http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta")
> library(foreign)
> 
> dat <- read.dta("eyestudy.dta")
>### Ach, stata codes factor contrasts backwards
> 
> dat$carrot0 <- ifelse(dat$carrot==0,1,0)
> dat$gender1 <- ifelse(dat$gender==1,1,0)
> 
> glm1 <- glm(lenses~carrot0,  data=dat, family=poisson(link=log))
> summary(glm1)
> 
> library(sandwich)
> vcovHC(glm1)
> sqrt(diag(vcovHC(glm1)))
> sqrt(diag(vcovHC(glm1, type="HC0")))
> 
>### Result:
># > summary(glm1)
># Call:
># glm(formula = lenses ~ carrot0, family = poisson(link = log),
>#    data = dat)
> 
># Deviance Residuals:
>#     Min       1Q   Median       3Q      Max
># -1.1429  -0.9075   0.3979   0.3979   0.7734
> 
># Coefficients:
>#            Estimate Std. Error z value Pr(>|z|)
># (Intercept)  -0.8873     0.2182  -4.066 4.78e-05 ***
># carrot0       0.4612     0.2808   1.642    0.101
># ---
># Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> 
># (Dispersion parameter for poisson family taken to be 1)
> 
>#     Null deviance: 67.297  on 99  degrees of freedom
># Residual deviance: 64.536  on 98  degrees of freedom
># AIC: 174.54
> 
># Number of Fisher Scoring iterations: 5
> 
># > sqrt(diag(vcovHC(glm1, type="HC0")))
># (Intercept)     carrot0
>#  0.1673655   0.1971117
> 
>### Compare against
>## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm
>## robust standard errors are:
>## .1682086  .1981048
> 
> glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat,
> family=poisson(link=log))
> 
> sqrt(diag(vcovHC(glm2, type="HC0")))
>### Result:
># > summary(glm2)
> 
>#Call:
># glm(formula = lenses ~ carrot0 + gender1 + latitude, family >
poisson(link = log),
>#     data = dat)
> 
># Deviance Residuals:
>#     Min       1Q   Median       3Q      Max
># -1.2332  -0.9316   0.2437   0.5470   0.9466
> 
># Coefficients:
>#            Estimate Std. Error z value Pr(>|z|)
># (Intercept) -0.65212    0.69820  -0.934   0.3503
># carrot0      0.48322    0.28310   1.707   0.0878 .
># gender1      0.20520    0.27807   0.738   0.4605
># latitude    -0.01001    0.01898  -0.527   0.5980
># ---
># Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> 
># (Dispersion parameter for poisson family taken to be 1)
> 
>#    Null deviance: 67.297  on 99  degrees of freedom
># Residual deviance: 63.762  on 96  degrees of freedom
># AIC: 177.76
> 
># Number of Fisher Scoring iterations: 5
> 
># > sqrt(diag(vcovHC(glm2, type="HC0")))
># (Intercept)     carrot0     gender1    latitude
># 0.49044963  0.19537743  0.18481067  0.01275001
> 
>### UCLA site has:
> 
>## .4929193   .1963616  .1857414  .0128142
> 
> 
> So, either there is some "rounding error" or Stata is not using
> exactly the same algorithm as vcovHC in sandwich.
The above differences look somewhat systematic (though very
small). The percentage differences (vcovHC relative to STATA)
for the two cases you analyse above are

vcovHC "HC0":  0.1673655   0.1971117
STATA       :  0.1682086   0.1981048
-------------------------------------
% Difference: -0.5012229  -0.5013003

vcovHC "HC0":  0.49044963  0.19537743  0.18481067  0.01275001
STATA:         0.4929193   0.1963616   0.1857414    .0128142
-------------------------------------------------------------
% Difference: -0.5010293  -0.5012029  -0.5010891  -0.5009287

so, in all cases, very close to -0.501%, despite varying absolute
values. This suggests a very subtle difference in algorithm,
rather than rounding error.
Though there is conceivably a convergence issue in the background.

Does ANYONE here know exactly how STATA does it?

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-May-08                                       Time: 22:28:36
------------------------------ XFMail ------------------------------

Achim.Zeileis at R-project.org

2008-May-08 21:31 UTC

head link

[R] poisson regression with robust error variance ('eyestudy

Paul & Ted:
> > I can get the estimated RRs from
>
> > RRs <- exp(summary(GLM)$coef[,1])
>
> > but do not see how to implement confidence intervals based
> > on "robust error variances" using the output in GLM.
>
>
> Thanks for the link to the data.  Here's my best guess.   If you use
> the following approach, with the HC0 type of robust standard errors in
> the "sandwich" package (thanks to Achim Zeileis), you get
"almost" the
> same numbers as that Stata output gives. The estimated b's from the
> glm match exactly, but the robust standard errors are a bit off.
Thanks for posting the code reproducing this example!
> ### Paul Johnson 2008-05-08
> ### sandwichGLM.R
> system("wget
http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta")
>
> library(foreign)
>
> dat <- read.dta("eyestudy.dta")
>
> ### Ach, stata codes factor contrasts backwards
> dat$carrot0 <- ifelse(dat$carrot==0,1,0)
> dat$gender1 <- ifelse(dat$gender==1,1,0)
>
> glm1 <- glm(lenses~carrot0,  data=dat, family=poisson(link=log))
>
> library(sandwich)
> sqrt(diag(vcovHC(glm1, type="HC0")))
> # (Intercept)     carrot0
> #  0.1673655   0.1971117
>
> ### Compare against
> ## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm
> ## robust standard errors are:
> ## .1682086  .1981048
I have the solution. Note that the ratio of both standard errors to those
from sandwich is almost constant which suggests a scaling difference. In
"sandwich" I have implemented two scaling strategies: divide by
"n"
(number of observations) or by "n-k" (residual degrees of freedom).
This
leads to

R> sqrt(diag(sandwich(glm1)))
(Intercept)     carrot0
  0.1673655   0.1971117
R> sqrt(diag(sandwich(glm1, adjust = TRUE)))
(Intercept)     carrot0
  0.1690647   0.1991129

(Equivalently, you could youse vcovHC() with types "HC0" and
"HC1",
respectively.)

Their standard errors are between those, so I thought that they used a
different scaling. Here, n = 100 and n-k = 98 which only left

R> sqrt(diag(sandwich(glm1) * 100/99))
(Intercept)     carrot0
  0.1682086   0.1981048

Bingo! So they scaled with n-1 rather than n or n-k. This is, of course,
also consistent (if the other estimators are), but I wouldn't know of a
good theoretical underpinning for it.
> glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat,
> family=poisson(link=log))
>
> ### UCLA site has:
> ## .4929193   .1963616  .1857414  .0128142
R> round(sqrt(diag(sandwich(glm2) * 100/99)), digits = 7)
(Intercept)     carrot0     gender1    latitude
  0.4929204   0.1963617   0.1857417   0.0128142

Only the constant is a bit off, but that is not unusual in replication
exercises.

Some further advertising: If you want to get the full table of
coefficients, you can use coeftest() from "lmtest"

R> library("lmtest")
R> coeftest(glm2, sandwich(glm2) * 100/99)

z test of coefficients:

             Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.652122   0.492920 -1.3230  0.18584
carrot0      0.483220   0.196362  2.4609  0.01386 *
gender1      0.205201   0.185742  1.1048  0.26926
latitude    -0.010009   0.012814 -0.7811  0.43474
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

Best,
Z

Maybe Matching Threads

Search for more maybe matching threads

R help - May 2008 - poisson regression with robust error variance ('eyestudy

[R] poisson regression with robust error variance ('eyestudy

[R] poisson regression with robust error variance ('eyestudy

[R] poisson regression with robust error variance ('eyestudy

Maybe Matching Threads