Dear WizaRds, dear Thomas, First of all, I want to tell you how grateful I am for all your support. I wish I will be able to help others along one day the same way you do. Thank you so much. I am struggling with a multistage sampling design: library(survey) multi3 <- data.frame(cluster=c(1,1,1,1 ,2,2,2, 3,3), id=c(1,2,3,4, 1,2,3, 1,2), nl=c(4,4,4,4, 3,3,3, 2,2), Nl=c(100,100,100,100, 50,50,50, 75,75), M=rep(23,9), y=c(23,33,77,25, 35,74,27, 37,72) ) dmulti3 <- svydesign(id=~cluster+id, fpc=~M+Nl, data=multi3) svymean (~y, dmulti3) mean SE y 45.796 5.5483 svytotal(~y, dmulti3) total SE y 78999 13643 and I estimate the population total as N=M/m sum(Nl) = 23/3*(100+50+75)=1725. With this, my variance estimator is: y1<-mean(multi3$y[1:4]) # 39.5 y2<-mean(multi3$y[5:7]) # 45.33 y3<-mean(multi3$y[8:9]) # 54.5 yT1<-100*y1 # 3950 total cluster 1 yT2<-50*y2 # 2266.67 total cluster 2 yT3<-75*y3 # 4087.5 total cluster 3 ybarT<-1/3*sum(yT1,yT2,yT3) # 3434.722 s1 <- var(multi3$y[1:4]) # 643.67 var cluster 1 s2 <- var(multi3$y[5:7]) # 632.33 var cluster 2 s3 <- var(multi3$y[8:9]) # 612.5 var cluster 3 var.yT <- 23^2*( 20/23*1/6*sum( (yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) + 1/69 * sum(100*96*s1, 50*47*s2, 75*73*s3) ) # 242 101 517 but var.yT/1725^2 = 81.36157 SE = 9.02006, but it should be SE=13643/1725=7.90899 Is this calculation correct? I remember svytotal using a different variance estimator compared to svymean, and that svytotal gives the unbiased estimation. To solve the problem, I went ahead and tried to calibrate the design object, telling Survey the population total N=1725: dmulti3.cal <- calibrate(dmulti3, ~1, pop=1725) svymean (~y, dmulti3.cal) mean SE y 45.796 5.5483 svytotal(~y, dmulti3.cal) total SE y 78999 9570.7 , which indeed gives me the computed svymean SE, but alas, I still don't know why my variance is so different. I think it might have sthg to do with a differently computed N and the fact that your estimator formula is a different one. Since I calculated the Taylor Series solution, i suppose there must be another approach? The calibration help page tells me to enter a list of population total vectors for each cluster, which would result in: dmulti3.cal <- calibrate(dmulti3, ~1, pop=c(100,50,75)) Error in regcalibrate.survey.design2(design, formula, population, aggregate.stage = aggregate.stage, : Population and sample totals are not the same length. I am very grateful for your help and wish you alle the best Yours mark
>From: Mark Hempelmann <e.rehak at t-online.de> >Date: Fri Jul 07 14:05:29 CDT 2006 >To: r-help at stat.math.ethz.ch >Subject: [R] Multistage Samplingi also find it an truly amazing group also. the general kindness and generosity of everyone is beyond belief. it will be a long time coming but i hope i can help some day also. unfortunately, i can't help you with your question either. mark>Dear WizaRds, dear Thomas, > > First of all, I want to tell you how grateful I am for all your >support. I wish I will be able to help others along one day the same way >you do. Thank you so much. I am struggling with a multistage sampling >design: > >library(survey) >multi3 <- data.frame(cluster=c(1,1,1,1 ,2,2,2, 3,3), id=c(1,2,3,4, >1,2,3, 1,2), >nl=c(4,4,4,4, 3,3,3, 2,2), Nl=c(100,100,100,100, 50,50,50, 75,75), >M=rep(23,9), >y=c(23,33,77,25, 35,74,27, 37,72) ) > >dmulti3 <- svydesign(id=~cluster+id, fpc=~M+Nl, data=multi3) >svymean (~y, dmulti3) > mean SE >y 45.796 5.5483 > >svytotal(~y, dmulti3) > total SE >y 78999 13643 > >and I estimate the population total as N=M/m sum(Nl) = >23/3*(100+50+75)=1725. With this, my variance estimator is: >y1<-mean(multi3$y[1:4]) # 39.5 >y2<-mean(multi3$y[5:7]) # 45.33 >y3<-mean(multi3$y[8:9]) # 54.5 > >yT1<-100*y1 # 3950 total cluster 1 >yT2<-50*y2 # 2266.67 total cluster 2 >yT3<-75*y3 # 4087.5 total cluster 3 >ybarT<-1/3*sum(yT1,yT2,yT3) # 3434.722 >s1 <- var(multi3$y[1:4]) # 643.67 var cluster 1 >s2 <- var(multi3$y[5:7]) # 632.33 var cluster 2 >s3 <- var(multi3$y[8:9]) # 612.5 var cluster 3 > >var.yT <- 23^2*( 20/23*1/6*sum( >(yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) + >1/69 * sum(100*96*s1, 50*47*s2, 75*73*s3) ) # 242 101 517 > >but >var.yT/1725^2 = 81.36157 >SE = 9.02006, >but it should be SE=13643/1725=7.90899 > >Is this calculation correct? I remember svytotal using a different >variance estimator compared to svymean, and that svytotal gives the >unbiased estimation. To solve the problem, I went ahead and tried to >calibrate the design object, telling Survey the population total N=1725: > >dmulti3.cal <- calibrate(dmulti3, ~1, pop=1725) >svymean (~y, dmulti3.cal) > mean SE >y 45.796 5.5483 > >svytotal(~y, dmulti3.cal) > total SE >y 78999 9570.7 > >, which indeed gives me the computed svymean SE, but alas, I still don't >know why my variance is so different. I think it might have sthg to do >with a differently computed N and the fact that your estimator formula >is a different one. Since I calculated the Taylor Series solution, i >suppose there must be another approach? The calibration help page tells >me to enter a list of population total vectors for each cluster, which >would result in: > >dmulti3.cal <- calibrate(dmulti3, ~1, pop=c(100,50,75)) >Error in regcalibrate.survey.design2(design, formula, population, >aggregate.stage = aggregate.stage, : >Population and sample totals are not the same length. > >I am very grateful for your help and wish you alle the best >Yours >mark > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Fri, 7 Jul 2006, Mark Hempelmann wrote:> library(survey) > multi3 <- data.frame(cluster=c(1,1,1,1 ,2,2,2, 3,3), id=c(1,2,3,4, > 1,2,3, 1,2), > nl=c(4,4,4,4, 3,3,3, 2,2), Nl=c(100,100,100,100, 50,50,50, 75,75), > M=rep(23,9), > y=c(23,33,77,25, 35,74,27, 37,72) ) > > dmulti3 <- svydesign(id=~cluster+id, fpc=~M+Nl, data=multi3) > svymean (~y, dmulti3) > mean SE > y 45.796 5.5483 > > svytotal(~y, dmulti3) > total SE > y 78999 13643 > > and I estimate the population total as N=M/m sum(Nl) > 23/3*(100+50+75)=1725. With this, my variance estimator is: > y1<-mean(multi3$y[1:4]) # 39.5 > y2<-mean(multi3$y[5:7]) # 45.33 > y3<-mean(multi3$y[8:9]) # 54.5 > > yT1<-100*y1 # 3950 total cluster 1 > yT2<-50*y2 # 2266.67 total cluster 2 > yT3<-75*y3 # 4087.5 total cluster 3 > ybarT<-1/3*sum(yT1,yT2,yT3) # 3434.722 > s1 <- var(multi3$y[1:4]) # 643.67 var cluster 1 > s2 <- var(multi3$y[5:7]) # 632.33 var cluster 2 > s3 <- var(multi3$y[8:9]) # 612.5 var cluster 3 > > var.yT <- 23^2*( 20/23*1/6*sum( > (yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) + > 1/69 * sum(100*96*s1, 50*47*s2, 75*73*s3) ) # 242 101 517I don't have any of my reference books here today, but if you use var.yT <- 23^2*( 20/23*1/6*sum( (yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) + 1/69 * sum(100*96*s1/4, 50*47*s2/3, 75*73*s3/2) ) # 242 101 517 the results agrees with svytotal(), and with Stata, and with formulas in a couple of sets of lecture notes I found by Googling.> but > var.yT/1725^2 = 81.36157 > SE = 9.02006, > but it should be SE=13643/1725=7.90899 > > Is this calculation correct? I remember svytotal using a different > variance estimator compared to svymean, and that svytotal gives the > unbiased estimation.This calculation is not correct for the mean, since it ignores the uncertainty in the estimated population total. The correct standard error comes from treating the mean as a ratio of estimated total to estimated population size. In this case you have to do it that way since you don't know the population size, but R always does it this way. Because the estimated population size and total are correlated, taking into account the uncertainty in the denominator actually reduces the standard error. The easiest way to reproduce the result that R gets is to do it the same way that R does: compute the standard error of the mean as the standard error of the total of a suitable set of estimating functions. If you define a new variable (y-45.796*1)/1725 and estimate the standard error of the total of this variable it will give:> svytotal(~I((y-45.796)/1725),dmulti3)I((y - 45.796)/1725) 0.0002963 5.5482 which is what svymean() gives for the standard error of the mean of y. Using your formula for the variance of the total (with the corrections above) on this variable also gives> sqrt(var.yT)[1] 5.54824 -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle