Paul Johnson
2008-May-08 20:35 UTC
[R] poisson regression with robust error variance ('eyestudy
Ted Harding said:> I can get the estimated RRs from> RRs <- exp(summary(GLM)$coef[,1])> but do not see how to implement confidence intervals based > on "robust error variances" using the output in GLM.Thanks for the link to the data. Here's my best guess. If you use the following approach, with the HC0 type of robust standard errors in the "sandwich" package (thanks to Achim Zeileis), you get "almost" the same numbers as that Stata output gives. The estimated b's from the glm match exactly, but the robust standard errors are a bit off. ### Paul Johnson 2008-05-08 ### sandwichGLM.R system("wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta") library(foreign) dat <- read.dta("eyestudy.dta") ### Ach, stata codes factor contrasts backwards dat$carrot0 <- ifelse(dat$carrot==0,1,0) dat$gender1 <- ifelse(dat$gender==1,1,0) glm1 <- glm(lenses~carrot0, data=dat, family=poisson(link=log)) summary(glm1) library(sandwich) vcovHC(glm1) sqrt(diag(vcovHC(glm1))) sqrt(diag(vcovHC(glm1, type="HC0"))) ### Result: # > summary(glm1) # Call: # glm(formula = lenses ~ carrot0, family = poisson(link = log), # data = dat) # Deviance Residuals: # Min 1Q Median 3Q Max # -1.1429 -0.9075 0.3979 0.3979 0.7734 # Coefficients: # Estimate Std. Error z value Pr(>|z|) # (Intercept) -0.8873 0.2182 -4.066 4.78e-05 *** # carrot0 0.4612 0.2808 1.642 0.101 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # (Dispersion parameter for poisson family taken to be 1) # Null deviance: 67.297 on 99 degrees of freedom # Residual deviance: 64.536 on 98 degrees of freedom # AIC: 174.54 # Number of Fisher Scoring iterations: 5 # > sqrt(diag(vcovHC(glm1, type="HC0"))) # (Intercept) carrot0 # 0.1673655 0.1971117 # > ### Compare against ## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm ## robust standard errors are: ## .1682086 .1981048 glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat, family=poisson(link=log)) sqrt(diag(vcovHC(glm2, type="HC0"))) ### Result: # > summary(glm2) #Call: # glm(formula = lenses ~ carrot0 + gender1 + latitude, family poisson(link = log), # data = dat) # Deviance Residuals: # Min 1Q Median 3Q Max # -1.2332 -0.9316 0.2437 0.5470 0.9466 # Coefficients: # Estimate Std. Error z value Pr(>|z|) # (Intercept) -0.65212 0.69820 -0.934 0.3503 # carrot0 0.48322 0.28310 1.707 0.0878 . # gender1 0.20520 0.27807 0.738 0.4605 # latitude -0.01001 0.01898 -0.527 0.5980 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # (Dispersion parameter for poisson family taken to be 1) # Null deviance: 67.297 on 99 degrees of freedom # Residual deviance: 63.762 on 96 degrees of freedom # AIC: 177.76 # Number of Fisher Scoring iterations: 5 # > sqrt(diag(vcovHC(glm2, type="HC0"))) # (Intercept) carrot0 gender1 latitude # 0.49044963 0.19537743 0.18481067 0.01275001 ### UCLA site has: ## .4929193 .1963616 .1857414 .0128142 So, either there is some "rounding error" or Stata is not using exactly the same algorithm as vcovHC in sandwich. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas
(Ted Harding)
2008-May-08 21:28 UTC
[R] poisson regression with robust error variance ('eyestudy
Once again, Paul, many thanks for your thorough examination of this question! And for spelling out your approach!!! It certainly looks as though you're very close to target (or even spot-on). I've only one comment -- see at end. On 08-May-08 20:35:38, Paul Johnson wrote:> Ted Harding said: >> I can get the estimated RRs from > >> RRs <- exp(summary(GLM)$coef[,1]) > >> but do not see how to implement confidence intervals based >> on "robust error variances" using the output in GLM. > > Thanks for the link to the data. Here's my best guess. If you > use the following approach, with the HC0 type of robust standard > errors in the "sandwich" package (thanks to Achim Zeileis), > you get "almost" the same numbers as that Stata output gives. > The estimated b's from the glm match exactly, but the robust > standard errors are a bit off. > >### Paul Johnson 2008-05-08 >### sandwichGLM.R > system("wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta") > library(foreign) > > dat <- read.dta("eyestudy.dta") >### Ach, stata codes factor contrasts backwards > > dat$carrot0 <- ifelse(dat$carrot==0,1,0) > dat$gender1 <- ifelse(dat$gender==1,1,0) > > glm1 <- glm(lenses~carrot0, data=dat, family=poisson(link=log)) > summary(glm1) > > library(sandwich) > vcovHC(glm1) > sqrt(diag(vcovHC(glm1))) > sqrt(diag(vcovHC(glm1, type="HC0"))) > >### Result: ># > summary(glm1) ># Call: ># glm(formula = lenses ~ carrot0, family = poisson(link = log), ># data = dat) > ># Deviance Residuals: ># Min 1Q Median 3Q Max ># -1.1429 -0.9075 0.3979 0.3979 0.7734 > ># Coefficients: ># Estimate Std. Error z value Pr(>|z|) ># (Intercept) -0.8873 0.2182 -4.066 4.78e-05 *** ># carrot0 0.4612 0.2808 1.642 0.101 ># --- ># Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > ># (Dispersion parameter for poisson family taken to be 1) > ># Null deviance: 67.297 on 99 degrees of freedom ># Residual deviance: 64.536 on 98 degrees of freedom ># AIC: 174.54 > ># Number of Fisher Scoring iterations: 5 > ># > sqrt(diag(vcovHC(glm1, type="HC0"))) ># (Intercept) carrot0 ># 0.1673655 0.1971117 > >### Compare against >## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm >## robust standard errors are: >## .1682086 .1981048 > > glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat, > family=poisson(link=log)) > > sqrt(diag(vcovHC(glm2, type="HC0"))) >### Result: ># > summary(glm2) > >#Call: ># glm(formula = lenses ~ carrot0 + gender1 + latitude, family > poisson(link = log), ># data = dat) > ># Deviance Residuals: ># Min 1Q Median 3Q Max ># -1.2332 -0.9316 0.2437 0.5470 0.9466 > ># Coefficients: ># Estimate Std. Error z value Pr(>|z|) ># (Intercept) -0.65212 0.69820 -0.934 0.3503 ># carrot0 0.48322 0.28310 1.707 0.0878 . ># gender1 0.20520 0.27807 0.738 0.4605 ># latitude -0.01001 0.01898 -0.527 0.5980 ># --- ># Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > ># (Dispersion parameter for poisson family taken to be 1) > ># Null deviance: 67.297 on 99 degrees of freedom ># Residual deviance: 63.762 on 96 degrees of freedom ># AIC: 177.76 > ># Number of Fisher Scoring iterations: 5 > ># > sqrt(diag(vcovHC(glm2, type="HC0"))) ># (Intercept) carrot0 gender1 latitude ># 0.49044963 0.19537743 0.18481067 0.01275001 > >### UCLA site has: > >## .4929193 .1963616 .1857414 .0128142 > > > So, either there is some "rounding error" or Stata is not using > exactly the same algorithm as vcovHC in sandwich.The above differences look somewhat systematic (though very small). The percentage differences (vcovHC relative to STATA) for the two cases you analyse above are vcovHC "HC0": 0.1673655 0.1971117 STATA : 0.1682086 0.1981048 ------------------------------------- % Difference: -0.5012229 -0.5013003 vcovHC "HC0": 0.49044963 0.19537743 0.18481067 0.01275001 STATA: 0.4929193 0.1963616 0.1857414 .0128142 ------------------------------------------------------------- % Difference: -0.5010293 -0.5012029 -0.5010891 -0.5009287 so, in all cases, very close to -0.501%, despite varying absolute values. This suggests a very subtle difference in algorithm, rather than rounding error. Though there is conceivably a convergence issue in the background. Does ANYONE here know exactly how STATA does it? Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 08-May-08 Time: 22:28:36 ------------------------------ XFMail ------------------------------
Achim.Zeileis at R-project.org
2008-May-08 21:31 UTC
[R] poisson regression with robust error variance ('eyestudy
Paul & Ted:> > I can get the estimated RRs from > > > RRs <- exp(summary(GLM)$coef[,1]) > > > but do not see how to implement confidence intervals based > > on "robust error variances" using the output in GLM. > > > Thanks for the link to the data. Here's my best guess. If you use > the following approach, with the HC0 type of robust standard errors in > the "sandwich" package (thanks to Achim Zeileis), you get "almost" the > same numbers as that Stata output gives. The estimated b's from the > glm match exactly, but the robust standard errors are a bit off.Thanks for posting the code reproducing this example!> ### Paul Johnson 2008-05-08 > ### sandwichGLM.R > system("wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta") > > library(foreign) > > dat <- read.dta("eyestudy.dta") > > ### Ach, stata codes factor contrasts backwards > dat$carrot0 <- ifelse(dat$carrot==0,1,0) > dat$gender1 <- ifelse(dat$gender==1,1,0) > > glm1 <- glm(lenses~carrot0, data=dat, family=poisson(link=log)) > > library(sandwich) > sqrt(diag(vcovHC(glm1, type="HC0"))) > # (Intercept) carrot0 > # 0.1673655 0.1971117 > > ### Compare against > ## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm > ## robust standard errors are: > ## .1682086 .1981048I have the solution. Note that the ratio of both standard errors to those from sandwich is almost constant which suggests a scaling difference. In "sandwich" I have implemented two scaling strategies: divide by "n" (number of observations) or by "n-k" (residual degrees of freedom). This leads to R> sqrt(diag(sandwich(glm1))) (Intercept) carrot0 0.1673655 0.1971117 R> sqrt(diag(sandwich(glm1, adjust = TRUE))) (Intercept) carrot0 0.1690647 0.1991129 (Equivalently, you could youse vcovHC() with types "HC0" and "HC1", respectively.) Their standard errors are between those, so I thought that they used a different scaling. Here, n = 100 and n-k = 98 which only left R> sqrt(diag(sandwich(glm1) * 100/99)) (Intercept) carrot0 0.1682086 0.1981048 Bingo! So they scaled with n-1 rather than n or n-k. This is, of course, also consistent (if the other estimators are), but I wouldn't know of a good theoretical underpinning for it.> glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat, > family=poisson(link=log)) > > ### UCLA site has: > ## .4929193 .1963616 .1857414 .0128142R> round(sqrt(diag(sandwich(glm2) * 100/99)), digits = 7) (Intercept) carrot0 gender1 latitude 0.4929204 0.1963617 0.1857417 0.0128142 Only the constant is a bit off, but that is not unusual in replication exercises. Some further advertising: If you want to get the full table of coefficients, you can use coeftest() from "lmtest" R> library("lmtest") R> coeftest(glm2, sandwich(glm2) * 100/99) z test of coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.652122 0.492920 -1.3230 0.18584 carrot0 0.483220 0.196362 2.4609 0.01386 * gender1 0.205201 0.185742 1.1048 0.26926 latitude -0.010009 0.012814 -0.7811 0.43474 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Best, Z