Muhuri, Pradip (AHRQ/CFACT)
2016-Jun-23 01:32 UTC
[R] svymean using the survey package - strata containing no subpopulation members
Hi, Below is a reproducible example that produces the estimate of "totexp13" (total health care expenditure 2013) for the subpopulation that includes "Asians with diabetes diagnosed" in MEPS. The R script below downloads file from the web for processing. Issue/Question: The R/survey package does not seem to provide a NOTE regarding the number of strata containing NO SUBOPOPULATION MEMBERS (in this case - Asians with diabetes diagnosed in MEPS 2013). Is there a way to get this count or ask R to provide this information? Any hints will be appreciated. Acknowledgements: The current R script is a tweaked-version of the code originally sent (on this forum) by Anthony Damico for another application. Thanks to Anthony! Good news: The estimate is almost the same as the estimates obtained from SAS, SUDAAN and STATA runs. Additional Information: STATA provides a NOTE that " 84 strata omitted because no subpopulation members". SAS LOG (proc surveymeans) provides a NOTE that "Only one cluster in a stratum in domain Asian_with_diab for variable(s) TOTEXP13. The estimate of variance for TOTEXP13 will omit this stratum". Thanks, Pradip Muhuri library(foreign) library(survey) library(dplyr) tf <- tempfile() download.file( "https://meps.ahrq.gov/mepsweb/data_files/pufs/h163ssp.zip", tf , mode = 'wb' ) z <- unzip( tf , exdir = tempdir() ) x <- read.xport( z ) names( x ) <- tolower( names( x ) ) mydata <- select(x, varstr, varpsu, perwt13f, diabdx, totexp13, racethx) mydata[mydata <=0] <- NA design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, data=x, nest=TRUE) # include missings as "No" values here #design <- # update(design, # xbpchek53 = ifelse(bpchek53 ==1,'yes','no or missing'), # xcholck53 = ifelse(cholck53 ==1, 'yes','no or missing') #) # get the estimate for "totexp13" for the subset that includes Asians with diabetes diagnosed svymean(~ totexp13, subset(design, racethx==4 & diabdx==1)) Pradip K. Muhuri, AHRQ/CFACT 5600 Fishers Lane # 7N142A, Rockville, MD 20857 Tel: 301-427-1564 [[alternative HTML version deleted]]
Anthony Damico
2016-Jun-23 03:54 UTC
[R] svymean using the survey package - strata containing no subpopulation members
hi pradip, with meps you should be able to match precisely between r+survey and those other languages[1] if i had to guess, i would say that your sas and stata code is actually doing the equivalent of this, which is not correct. check the journal article table #1 for syntax comparisons design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, data=subset(x, racethx==4 & diabdx==1), nest=TRUE) svymean(~ totexp13, design) #Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1], : # Stratum (1004) has only one PSU at stage 1 more detailed discussion of lonely psu behavior at [2] including how to override this error -- but i think it comes from faulty design specification so should not be necessary? thanks [1] https://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf [2] http://faculty.washington.edu/tlumley/old-survey/exmample-lonely.html On Wed, Jun 22, 2016 at 9:32 PM, Muhuri, Pradip (AHRQ/CFACT) < Pradip.Muhuri at ahrq.hhs.gov> wrote:> Hi, > > Below is a reproducible example that produces the estimate of "totexp13" > (total health care expenditure 2013) for the subpopulation that includes > "Asians with diabetes diagnosed" in MEPS. The R script below downloads file > from the web for processing. > > Issue/Question: The R/survey package does not seem to provide a NOTE > regarding the number of strata containing NO SUBOPOPULATION MEMBERS (in > this case - Asians with diabetes diagnosed in MEPS 2013). Is there a way to > get this count or ask R to provide this information? Any hints will be > appreciated. > > Acknowledgements: The current R script is a tweaked-version of the code > originally sent (on this forum) by Anthony Damico for another application. > Thanks to Anthony! > > Good news: The estimate is almost the same as the estimates obtained from > SAS, SUDAAN and STATA runs. > > Additional Information: STATA provides a NOTE that " 84 strata omitted > because no subpopulation members". > SAS LOG (proc surveymeans) provides a NOTE that "Only one cluster in a > stratum in domain Asian_with_diab for variable(s) TOTEXP13. The estimate of > variance for TOTEXP13 will > omit this stratum". > > > Thanks, > > Pradip Muhuri > > > > > library(foreign) > library(survey) > library(dplyr) > > tf <- tempfile() > > download.file( "https://meps.ahrq.gov/mepsweb/data_files/pufs/h163ssp.zip", > tf , mode = 'wb' ) > > z <- unzip( tf , exdir = tempdir() ) > > x <- read.xport( z ) > > names( x ) <- tolower( names( x ) ) > > mydata <- select(x, varstr, varpsu, perwt13f, diabdx, totexp13, racethx) > > mydata[mydata <=0] <- NA > > design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, data=x, > nest=TRUE) > > > # include missings as "No" values here > #design <- > # update(design, > # xbpchek53 = ifelse(bpchek53 ==1,'yes','no or missing'), > # xcholck53 = ifelse(cholck53 ==1, 'yes','no or missing') > #) > > # get the estimate for "totexp13" for the subset that includes Asians with > diabetes diagnosed > svymean(~ totexp13, subset(design, racethx==4 & diabdx==1)) > > Pradip K. Muhuri, AHRQ/CFACT > 5600 Fishers Lane # 7N142A, Rockville, MD 20857 > Tel: 301-427-1564 > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]