David L. Van Brunt, Ph.D.
2005-Oct-04 21:44 UTC
[R] "Survey" package and NAMCS data... unsure of specification
Hello, all. I wanted to use the "survey" package to analyze data from the National Ambulatory Medical Care Survey, and am having some difficulty translating the analysis keywords from one package (Stata) to the other (R). The data were collected using a multistage probability sampling, and there are variables included to identify the sampling units and weights. Documentation from the NAMCS describes this for Stata as follows (note the variable names in the data are in caps): The pweight (PATWT), strata (CSTRATM), and PSU (CPSUM) are set with the svyset command as follows: svyset pweight PATWT svyset strata CSTRATM svyset psu CPSUM They provide similar instructions for SUDAAN: as SUDAAN 1-stage WR Option The program below provides a with replacement ultimate cluster (1-stage) estimate of standard errors for a cross-tabulation. PROC CROSSTAB DATA=COMB1 DESIGN=WR FILETYPE=SAS; NEST CSTRATM CPSUM/MISSUNIT; In R, the svydesign command is used to set the sampling scheme, but as follows (example from the documentation): dstrat <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) stratified on stype, with sampling weights pw. The fpc variable contains the population size for the stratum. As the schools are sampled independently, each record in the data frame is a separate PSU. This is indicated by id=~1. Since the sampling weights could have been determined from the population size an equivalent declaration would be dstrat <- svydesign(id=~1,strata=~stype, data=apistrat, fpc=~fpc) I get that the "weights" should be PATWT, and it seems that the "strata" should be CSTRATM, but I'm unsure of how to handle the primary sampling units (CPSUM). Does anyone have any suggestions? -- --------------------------------------- David L. Van Brunt, Ph.D. mailto:dlvanbrunt@gmail.com [[alternative HTML version deleted]]
Thomas Lumley
2005-Oct-04 23:21 UTC
[R] "Survey" package and NAMCS data... unsure of specification
On Tue, 4 Oct 2005, David L. Van Brunt, Ph.D. wrote:> Hello, all. > > I wanted to use the "survey" package to analyze data from the National > Ambulatory Medical Care Survey, and am having some difficulty translating > the analysis keywords from one package (Stata) to the other (R). The data > were collected using a multistage probability sampling, and there are > variables included to identify the sampling units and weights. Documentation > from the NAMCS describes this for Stata as follows (note the variable names > in the data are in caps): > > The pweight (PATWT), strata (CSTRATM), and PSU (CPSUM) are set with the > svyset command as > follows: > svyset pweight PATWT > svyset strata CSTRATM > svyset psu CPSUM >Supposing your data frame is called 'namcs' dnamcs <- svydesign(id=~CPSUM, strata=~CSTRATM, weight=~PATWT, data=namcs) or perhaps dnamcs <- svydesign(id=~CPSUM, strata=~CSTRATM, weight=~PATWT, data=namcs, nest=TRUE) (nest=TRUE is needed if CPSUM repeats the same values in different strata). Also, if you have access to design variables for the multistage design you can use them (but it probably won't make much difference). There's a very brief example using the National Health Interview Study at http://faculty.washington.edu/tlumley/survey/example-twostage.html -thomas