On Thu, 22 Jul 2010, R user wrote:
> This message is for those familiar with the survey package. I need to fit a
> weighted Cox model to accommodate the sampling weights as I have a
> case-control study with controls sampled at random from a database in a
> ratio 2:1 to cases (whom were all sampled). I want to make sure I am using
> the right svydesign syntax to specify this sampling design. Can anyone
> please check if the statement below is appropriate for my design?
>
> #group represents the case (total of 132) vs control (253 out of the total
> of 853 controls) groups; prob is 1 for cases and 253/853 for controls and
> ssize=132 for cases and 853 otherwise;
>
> dstr=svydesign(id=~1, strata=~group, prob=~prob, fpc=~ssize, data=noNA)
>
This is technically correct but probably not for what you want. You probably
want
dstr=svydesign(id=~1, strata=~group, prob=~prob, data=noNA)
or
dstr = twophase(id=list(~1,~1), strata=list(NULL, ~group), data=noNA)
Your svydesign() call treats the database as the full population. This could be
correct, but usually people want estimates for the 'superpopulation'
from which the population was sampled. The first option above is very slightly
conservative, the second describes the two phases of sampling that give first
the whole database and then your subsample.
-thomas
Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle