thr3ads.net - R help - [R] Exactly Replicating Stata's Survey Data Confidence Intervals in R [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Anthony Damico

2012-Sep-21 14:51 UTC

[R] Exactly Replicating Stata's Survey Data Confidence Intervals in R

Hi everyone, apologies if the answer to this is in an obvious place.  I've
been searching for about a day and haven't found anything..


I'm trying to replicate Stata's confidence intervals in R with the
survey
package, and the numbers are very very close but not exact.  My ultimate
goal is to replicate Berkeley's SDA website with R
(sda.berkeley.edu),
which seems to match Stata precisely.  Everything lines up, except for the
confidence intervals, and I'm wondering if there's a relatively
straightforward reason why the numbers are different.


To review the two major cross-package comparison documents on Dr. Lumley's
homepage of the survey package --

faculty.washington.edu/tlumley/survey/ucla-examples.pdf -- doesn't
contain confidence interval output.

faculty.washington.edu/tlumley/survey/YRBS-report-extension.pdf --
confirms that the confidence interval differences between SUDAAN and R in
Table 3 exist, but doesn't provide much detail about why.

I tried running the analysis example below in SUDAAN as well, which
calculated confidence intervals not matching either R or Stata -- I'm
confused why all three would be different.

I understand (as the report above quotes) these differences are
statistically inconsequential, but I aim to convince people to switch to R,
so hitting numbers dead-on might provide some reassurance to skeptics..



I've pasted some ultra-simple Stata code below and (more importantly) the
output it generates.  Below that, I've pasted some commented R code that
shows how everything matches exactly except for the confidence intervals,
and then displays a number of my failed attempts at hitting the numbers
right on the nose.


Thanks!!

Anthony Damico
Kaiser Family Foundation




* Stata/MP 11.2

* example stata code to replicate in R

use ats.ucla.edu/stat/stata/library/apiclus1, clear
svyset dnum [pw=pw], fpc(fpc)
gen ell0 = 1
replace ell0 = 0 if ell != 0
svy: mean ell0


* results of the final command above:

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =     183
Number of PSUs   =      15          Population size  =  9235.4
                                    Design df        =      14

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        ell0 |   .0218579   .0226225     -.0266624    .0703783
--------------------------------------------------------------






# R x64

# simple example using example code pulled from ?svymean
options(digits=20)
library(survey)
library(foreign)

# read the data file from a website, to make sure you're using the same
data in both R and stata
x <- read.dta(
"ats.ucla.edu/stat/stata/library/apiclus1.dta" )
dclus1<-svydesign(id=~dnum, fpc=~fpc, data=x)

# mean matches exactly
coef(svymean(~I(ell==0), dclus1))

# SE matches exactly
SE(svymean(~I(ell==0), dclus1))

# design df matches exactly
degf(dclus1)

# unwtd count matches exactly
unwtd.count( ~ell, dclus1)

# wtd count matches exactly
svytotal( ~!is.na(ell), dclus1)

# number of clusters match exactly
dclus1


# none of the confidence interval options match exactly

# the standard confint() will certainly be wrong,
# since stata gives an asymmetric confidence interval

confint(svymean(~I(ell==0), dclus1))
svyciprop(~I(ell==0), dclus1, method="me")    # 2nd way to perform
confint()


# but none of the other methods match either..
svyciprop(~I(ell==0), dclus1, method="li")
svyciprop(~I(ell==0), dclus1, method="lo")
svyciprop(~I(ell==0), dclus1, method="as")
svyciprop(~I(ell==0), dclus1, method="be")



In case it's of any value, here's my R session info:
> sessionInfo()R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252
LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] foreign_0.8-50 survey_3.28-2

loaded via a namespace (and not attached):
[1] MASS_7.3-18  tools_2.15.1

	[[alternative HTML version deleted]]

Thomas Lumley

2012-Sep-23 20:30 UTC

head link

[R] Exactly Replicating Stata's Survey Data Confidence Intervals in R

On Sat, Sep 22, 2012 at 2:51 AM, Anthony Damico <ajdamico at gmail.com>
wrote:
> Survey: Mean estimation
>
> Number of strata =       1          Number of obs    =     183
> Number of PSUs   =      15          Population size  =  9235.4
>                                     Design df        =      14
>
> --------------------------------------------------------------
>              |             Linearized
>              |       Mean   Std. Err.     [95% Conf. Interval]
> -------------+------------------------------------------------
>         ell0 |   .0218579   .0226225     -.0266624    .0703783
> --------------------------------------------------------------
This matches
> svyciprop(~I(ell==0),dclus1,df=14,method="mean")                             2.5%   97.5%
I(ell == 0)  0.0218579 -0.0266624 0.07038

as does this
> confint(svymean(~I(ell==0),dclus1),df=14)                          2.5 %        97.5 %
I(ell == 0)FALSE  0.92962174505 1.02666240796
I(ell == 0)TRUE  -0.02666240796 0.07037825495

The df= argument is not explicitly documented in ?svyciprop, but it is
in ?confint.svystat and ?svymean

[I was slowed down a bit by the claim that the Stata intervals were
asymmetric, but in fact they aren't]

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

Seemingly Similar Threads

Search for more maybe matching threads

R help - Sep 2012 - Exactly Replicating Stata's Survey Data Confidence Intervals in R

[R] Exactly Replicating Stata's Survey Data Confidence Intervals in R

[R] Exactly Replicating Stata's Survey Data Confidence Intervals in R

Seemingly Similar Threads