Dirk Enzmann
2007-Sep-05 22:55 UTC
[R] confidence intervals of proportions from complex surveys
This is partly an R and partly a general statistics question. I'm trying to get confidence intervals of proportions (sometimes for subgroups) estimated from complex survey data. Because a function like prop.test() does not exist for the "survey" package I tried the following: 1) Define a survey object (PSU of clustered sample, population weights); 2) Use svyglm() of the package "survey" to estimate a binary logistic regression (family='binomial'): For the confidence interval of a single proportion regress the binary dependent variable on a constant (1), for confidence intervals of that variable for subgroups regress this variable on the groups (factor) variable; 3) Use predict() to obtain estimated logits and the respective standard errors (mod.dat specifiying either the constant or the subgroups): pred=predict(model,mod.dat,type='link',se.fit=T) and apply the following to obtain the proportion with its confidence intervals (for example, for conf.level=.95): lo.e = pred[1:length(pred)]-qnorm((1+conf.level)/2)*SE(pred) hi.e = pred[1:length(pred)]+qnorm((1+conf.level)/2)*SE(pred) prop = 1/(1+exp(-pred[1:length(pred)])) lo = 1/(1+exp(-lo.e)) hi = 1/(1+exp(-hi.e)) I think that in that way I get CI's based on asymptotic normality - either for a single proportion or split up into subgroups. Question: Is this a correct or a defensible procedure? Or should I use a different approach? Note that this approach should also allow to estimate CI's for proportions of subgroups taking into account the complex survey design. TIA, Dirk ******************************** R version 2.5.1 Patched (2007-08-10 r42469) i386-pc-mingw32