Other people have explained that the issue is missing data. I just wanted
to note that the reason for using only the complete cases on all variables
is that svymeans() computes the covariance matrix of all the means, and
this can't really be done sensibly when the means are based on different
subsets.
-thomas
On Tue, 26 Aug 2008, Doran, Harold wrote:
> I have the following code which produces the output below it
>
> clus1 <- svydesign(ids = ~schid, data = lower_dat)
> items <- as.formula(paste(" ~ ", paste(lset, collapse=
"+")))
> rr1 <- svymean(items, clus1, deff='replace', na.rm=TRUE)
>
>> rr1
> mean SE DEff
> W525209 0.719748 0.015606 2.4932
> W525223 0.508228 0.027570 6.2802
> W525035 0.827202 0.014060 2.8561
> W525131 0.805421 0.015425 3.1350
> W525033 0.242982 0.020074 4.5239
> W525163 0.904647 0.013905 4.6289
> W525165 0.439981 0.020029 3.3620
> W525167 0.148112 0.013047 2.7860
> W525177 0.865924 0.014977 3.9898
> W525179 0.409003 0.020956 3.7515
> W525181 0.634076 0.022076 4.3372
> W525183 0.242498 0.019073 4.0894
> W525401 0.262343 0.021830 3.4354
> W525059 0.854792 0.016551 4.5576
> W525251 0.691191 0.025010 6.0512
> W525083 0.433204 0.017310 2.5200
> W525289 0.634560 0.012762 1.4504
> W524763 0.791868 0.014478 2.6265
> W524765 0.223621 0.019627 4.5818
> W524951 0.242982 0.016796 3.1669
> W524769 0.820910 0.016786 3.9579
> W524771 0.872701 0.015853 4.6712
> W524839 0.518877 0.026433 5.7794
> W525374 1.209584 0.043065 5.1572
> W524885 0.585673 0.027780 6.5674
> W525377 1.100678 0.050093 5.8851
> W524787 0.839303 0.012994 2.5852
> W524789 0.339787 0.019230 3.4041
> W524791 0.847047 0.012885 2.6461
> W524825 0.500968 0.021988 3.9935
> W524795 0.868345 0.014951 4.0377
> W524895 0.864472 0.013872 3.3917
> W524897 0.804937 0.020070 5.2977
> W524967 0.475799 0.032137 8.5511
> W525009 0.681994 0.018670 3.3188
>
> However, when I do the following:
>
> svymean(~W524787, clus1, deff='replace', na.rm=TRUE)
> mean SE DEff
> W524787 0.855547 0.011365 4.1158
>
> Compare this to the value in the row 9 up from the bottom to see it is
> different.
>
> Computing the mean of the item by itself with svymeans agrees with the
> sample mean
>
>> mean(lower_dat$W524787, na.rm=T)
> [1] 0.8555471
>
> Now, I know that there is a covariance between the variables, but I was
> under the impression that the sample mean was still of pragmatic
> utility, but to account for sample design only the standard error is
> affected.
>
> In the work I am doing, it is important for the means of the items from
> svymeans to be the same as the sample mean when it is computed by
> itself. It's a bit of a story as to why, and I can provide that info if
> relevant.
>
> I don't see an argument in svydesign or in svymean that would allow for
> me to treat the variables as being independent. But, maybe I am missing
> something else and would welcome any reactions.
>
> Harold
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle