Muhuri, Pradip (AHRQ/CFACT)
2016-Jun-23 01:32 UTC
[R] svymean using the survey package - strata containing no subpopulation members
Hi,
Below is a reproducible example that produces the estimate of
"totexp13" (total health care expenditure 2013) for the subpopulation
that includes "Asians with diabetes diagnosed" in MEPS. The R script
below downloads file from the web for processing.
Issue/Question: The R/survey package does not seem to provide a NOTE regarding
the number of strata containing NO SUBOPOPULATION MEMBERS (in this case - Asians
with diabetes diagnosed in MEPS 2013). Is there a way to get this count or ask R
to provide this information? Any hints will be appreciated.
Acknowledgements: The current R script is a tweaked-version of the code
originally sent (on this forum) by Anthony Damico for another application.
Thanks to Anthony!
Good news: The estimate is almost the same as the estimates obtained from SAS,
SUDAAN and STATA runs.
Additional Information: STATA provides a NOTE that " 84 strata omitted
because no subpopulation members".
SAS LOG (proc surveymeans) provides a NOTE that "Only one cluster in a
stratum in domain Asian_with_diab for variable(s) TOTEXP13. The estimate of
variance for TOTEXP13 will
omit this stratum".
Thanks,
Pradip Muhuri
library(foreign)
library(survey)
library(dplyr)
tf <- tempfile()
download.file(
"https://meps.ahrq.gov/mepsweb/data_files/pufs/h163ssp.zip", tf , mode
= 'wb' )
z <- unzip( tf , exdir = tempdir() )
x <- read.xport( z )
names( x ) <- tolower( names( x ) )
mydata <- select(x, varstr, varpsu, perwt13f, diabdx, totexp13, racethx)
mydata[mydata <=0] <- NA
design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, data=x,
nest=TRUE)
# include missings as "No" values here
#design <-
# update(design,
# xbpchek53 = ifelse(bpchek53 ==1,'yes','no or missing'),
# xcholck53 = ifelse(cholck53 ==1, 'yes','no or
missing')
#)
# get the estimate for "totexp13" for the subset that includes Asians
with diabetes diagnosed
svymean(~ totexp13, subset(design, racethx==4 & diabdx==1))
Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
[[alternative HTML version deleted]]
Anthony Damico
2016-Jun-23 03:54 UTC
[R] svymean using the survey package - strata containing no subpopulation members
hi pradip, with meps you should be able to match precisely between r+survey
and those other languages[1]
if i had to guess, i would say that your sas and stata code is actually
doing the equivalent of this, which is not correct. check the journal
article table #1 for syntax comparisons
design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f,
data=subset(x, racethx==4 & diabdx==1), nest=TRUE)
svymean(~ totexp13, design)
#Error in onestrat(x[index, , drop = FALSE], clusters[index],
nPSU[index][1], :
# Stratum (1004) has only one PSU at stage 1
more detailed discussion of lonely psu behavior at [2] including how to
override this error -- but i think it comes from faulty design
specification so should not be necessary? thanks
[1] https://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf
[2] http://faculty.washington.edu/tlumley/old-survey/exmample-lonely.html
On Wed, Jun 22, 2016 at 9:32 PM, Muhuri, Pradip (AHRQ/CFACT) <
Pradip.Muhuri at ahrq.hhs.gov> wrote:
> Hi,
>
> Below is a reproducible example that produces the estimate of
"totexp13"
> (total health care expenditure 2013) for the subpopulation that includes
> "Asians with diabetes diagnosed" in MEPS. The R script below
downloads file
> from the web for processing.
>
> Issue/Question: The R/survey package does not seem to provide a NOTE
> regarding the number of strata containing NO SUBOPOPULATION MEMBERS (in
> this case - Asians with diabetes diagnosed in MEPS 2013). Is there a way to
> get this count or ask R to provide this information? Any hints will be
> appreciated.
>
> Acknowledgements: The current R script is a tweaked-version of the code
> originally sent (on this forum) by Anthony Damico for another application.
> Thanks to Anthony!
>
> Good news: The estimate is almost the same as the estimates obtained from
> SAS, SUDAAN and STATA runs.
>
> Additional Information: STATA provides a NOTE that " 84 strata
omitted
> because no subpopulation members".
> SAS LOG (proc surveymeans) provides a NOTE that "Only one cluster in a
> stratum in domain Asian_with_diab for variable(s) TOTEXP13. The estimate of
> variance for TOTEXP13 will
> omit this stratum".
>
>
> Thanks,
>
> Pradip Muhuri
>
>
>
>
> library(foreign)
> library(survey)
> library(dplyr)
>
> tf <- tempfile()
>
> download.file(
"https://meps.ahrq.gov/mepsweb/data_files/pufs/h163ssp.zip",
> tf , mode = 'wb' )
>
> z <- unzip( tf , exdir = tempdir() )
>
> x <- read.xport( z )
>
> names( x ) <- tolower( names( x ) )
>
> mydata <- select(x, varstr, varpsu, perwt13f, diabdx, totexp13, racethx)
>
> mydata[mydata <=0] <- NA
>
> design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, data=x,
> nest=TRUE)
>
>
> # include missings as "No" values here
> #design <-
> # update(design,
> # xbpchek53 = ifelse(bpchek53 ==1,'yes','no or
missing'),
> # xcholck53 = ifelse(cholck53 ==1, 'yes','no or
missing')
> #)
>
> # get the estimate for "totexp13" for the subset that includes
Asians with
> diabetes diagnosed
> svymean(~ totexp13, subset(design, racethx==4 & diabdx==1))
>
> Pradip K. Muhuri, AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]