Michael.Laviolette at dhhs.state.nh.us
2014-Sep-30 13:17 UTC
[R] Using "survey" package with ACS PUMS
I'm trying to reproduce some results from the American Community Survey PUMS data using the "survey" package. I'm using the one-year 2012 estimates for New Hampshire (http://www2.census.gov/acs2012_1yr/pums/csv_pnh.zip) and comparing to the estimates for user verification from http://www.census.gov/acs/www/Downloads/data_documentation/pums/Estimates/pums_estimates_12.csv Once the age groups are set up as specified in the verification estimates, the following SAS code produces the correct estimated totals with standard errors: proc surveyfreq data = acs2012 varmethod = jackknife; weight pwgtp; repweights pwgtp1 -- pwgtp80 / jkcoefs = 0.05; table SEX agegroup; run; I've not been successful in reproducing the standard errors with R, although they are very close. My code follows; what revisions do I need to make? Thanks, Mike L. # load estimates for verification pums_est <- read.csv("pums_estimates_12.csv") pums_est[,4] <- as.integer(gsub(",", "", pums_est[,4])) # load PUMS data pums_p <- read.csv("ss12pnh.csv") # convert sex and age group to factors pums_p$SEX <- factor(pums_p$SEX, labels = c("M","F")) pums_p$agegrp <- cut(pums_p$AGEP, c(0,5,10,15,20,25,35,45,55,60,65,75,85,100), right = FALSE) # create replicate-weight survey object library(survey) pums_p.rep <- svrepdesign(repweights = pums_p[207:286], weights = pums_p[,7], combined.weights = TRUE, type = "Fay", rho = 1 - 1/sqrt(4), scale = 1, rscales = 1, data = pums_p) # using type "JK1" with scale = 4/80 and rscales = rep(1,80) # seems to produce the same results # total population by sex with SE's by.sex <- svyby(~SEX, ~ST, pums_p.rep, svytotal, na.rm = TRUE) round(by.sex[1,4:5]) # se1 se2 # 33 1606 1606 # compare results with Census pums_est[966:967, 5] #[1] 1610 1610 # total population by age group with SE's by.agegrp <- svyby(~agegrp, ~ST, pums_p.rep, svytotal, na.rm = TRUE) round((by.agegrp)[15:27]) # se1 se2 se3 se4 se5 se6 se7 se8 se9 se10 se11 se12 se13 # 33 874 2571 2613 1463 1398 1475 1492 1552 2191 2200 880 1700 1678 # compare results with Census pums_est[968:980, 5] # [1] 874 2578 2613 1463 1399 1476 1493 1555 2191 2200 880 1702 1684
hi michael, you probably need options( "survey.replicates.mse" = TRUE ) i also think you don't want type = "Fay" and you do want scale = 4/80 and rscales = rep( 1 , 80 ) as well, but what you have might be equivalent (not sure) regardless, this blog post details how to precisely replicate the census bureau's estimates using the acs pums with R http://www.asdfree.com/search/label/american%20community%20survey%20%28acs%29 On Tue, Sep 30, 2014 at 9:17 AM, <Michael.Laviolette at dhhs.state.nh.us> wrote:> > I'm trying to reproduce some results from the American Community Survey > PUMS data using the "survey" package. I'm using the one-year 2012 estimates > for New Hampshire > (http://www2.census.gov/acs2012_1yr/pums/csv_pnh.zip) and comparing to the > estimates for user verification from > > http://www.census.gov/acs/www/Downloads/data_documentation/pums/Estimates/pums_estimates_12.csv > > Once the age groups are set up as specified in the verification estimates, > the following SAS code produces the correct estimated totals with standard > errors: > > proc surveyfreq data = acs2012 varmethod = jackknife; > weight pwgtp; > repweights pwgtp1 -- pwgtp80 / jkcoefs = 0.05; > table SEX agegroup; > run; > > I've not been successful in reproducing the standard errors with R, > although they are very close. My code follows; what revisions do I need to > make? > > Thanks, > Mike L. > > # load estimates for verification > pums_est <- read.csv("pums_estimates_12.csv") > pums_est[,4] <- as.integer(gsub(",", "", pums_est[,4])) > > # load PUMS data > pums_p <- read.csv("ss12pnh.csv") > # convert sex and age group to factors > pums_p$SEX <- factor(pums_p$SEX, labels = c("M","F")) > pums_p$agegrp <- cut(pums_p$AGEP, > c(0,5,10,15,20,25,35,45,55,60,65,75,85,100), > right = FALSE) > > # create replicate-weight survey object > library(survey) > pums_p.rep <- svrepdesign(repweights = pums_p[207:286], > weights = pums_p[,7], > combined.weights = TRUE, > type = "Fay", rho = 1 - 1/sqrt(4), > scale = 1, rscales = 1, > data = pums_p) > > # using type "JK1" with scale = 4/80 and rscales = rep(1,80) > # seems to produce the same results > > # total population by sex with SE's > by.sex <- svyby(~SEX, ~ST, pums_p.rep, svytotal, na.rm = TRUE) > round(by.sex[1,4:5]) > # se1 se2 > # 33 1606 1606 > # compare results with Census > pums_est[966:967, 5] > #[1] 1610 1610 > > # total population by age group with SE's > by.agegrp <- svyby(~agegrp, ~ST, pums_p.rep, svytotal, na.rm = TRUE) > round((by.agegrp)[15:27]) > # se1 se2 se3 se4 se5 se6 se7 se8 se9 se10 se11 se12 se13 > # 33 874 2571 2613 1463 1398 1475 1492 1552 2191 2200 880 1700 1678 > # compare results with Census > pums_est[968:980, 5] > # [1] 874 2578 2613 1463 1399 1476 1493 1555 2191 2200 880 1702 1684 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]