Hi, I have been experimenting with the new Survey package. Specifically, I was trying to use some of the functions on the public-use survey data from NHIS (2000 Sample Adult file). Error 1): The first error I get is when I try to specify the complex survey design. nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata, data=nhis.df, check.strata=TRUE) Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data nhis.df, : Clusters not nested in strata My data are sorted by strata, psu. Can someone tell me what the structure has to be for a stratified sample with clustering? Looking at the code, it appears to me that it does not allow more than 1 observation per psu [i.e. any(sc > 1)]. Error 2). If I go ahead and specify check.strata=FALSE, then svydesign runs ok. I then tried using the svymean function. In the following example, if I specify na.rm=TRUE, I get the error below:> svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE)Error in rowsum.default(x, strata) : Incorrect length for 'group' I traced this to the svyCprod call within svymean. SvyCprod calls rowsum and the group argument ("strata") appears to be the full length of that column rather than the subset with non-missing data. Error 3). I then tried svymean on another variable with na.rm=FALSE. I got the following error:> svymean(nhis.df$age, design=nhis.design)Error in drop(rval) : names attribute must be the same length as the vector I also traced this error to a call to rowsum within the function svyCprod. I'm not sure what names attribute this is referring to because the arguments to rowsum and the rval object do not appear to have a names attribute. Does anyone know what the problem here might be? Has anyone else used the survey package on public-use survey datasets like BRFSS or NHIS? Was there anything special you had to do to those datasets before specifying the survey design? I know that's a pretty vague question. If any of you are SUDAAN users, I basically mean does it have to be structured differently that what you pass into a SUDAAN procedure. Thanks in advance for any suggestions! I am using R 1.6.2 on Windows 2000. -Trevor
On Wed, 12 Feb 2003, Thompson, Trevor wrote:> Hi, > > I have been experimenting with the new Survey package. Specifically, I was > trying to use some of the functions on the public-use survey data from NHIS > (2000 Sample Adult file). > > Error 1): The first error I get is when I try to specify the complex survey > design. > > nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata, data=nhis.df, > check.strata=TRUE) > Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data > nhis.df, : > Clusters not nested in strata > > My data are sorted by strata, psu. Can someone tell me what the structure > has to be for a stratified sample with clustering? Looking at the code, it > appears to me that it does not allow more than 1 observation per psu [i.e. > any(sc > 1)].The problem is probably that your id numbers for PSU start up again in each stratum (eg you have a PSU numbered 1 in each stratum). If so, you need the nest=TRUE option to tell svydesign() that all the PSUs numbered 1 in different strata are really different PSUs> Error 2). If I go ahead and specify check.strata=FALSE, then svydesign runs > ok. I then tried using the svymean function. In the following example, if > I specify na.rm=TRUE, I get the error below:No, it doesn't run ok, it just doesn't report an error.> > svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE) > Error in rowsum.default(x, strata) : Incorrect length for 'group' > > I traced this to the svyCprod call within svymean. SvyCprod calls rowsum > and the group argument ("strata") appears to be the full length of that > column rather than the subset with non-missing data.With missing data you do need to use the data stored in the design object, not a separate data frame, otherwise it will get confused. That is, you want svymean(~crc10yr, design=nhis.design, na.rm=TRUE)> Error 3). I then tried svymean on another variable with na.rm=FALSE. I got > the following error: > > > svymean(nhis.df$age, design=nhis.design) > Error in drop(rval) : names attribute must be the same length as the vector > > I also traced this error to a call to rowsum within the function svyCprod. > I'm not sure what names attribute this is referring to because the arguments > to rowsum and the rval object do not appear to have a names attribute. Does > anyone know what the problem here might be?This might be the same problem, in which case svymean(~age, design=nhis.design) should work. You should also make sure you have version 1.0 of `survey' rather than any of them 0.9-x versions that went up briefly on CRAN. If you tell me where to find the NHIS data I will look at them. There shouldn't be any special requirements on the format (other than using nest=TRUE if PSUs don't have globally unique ids). I've looked at data from some NCHS studies that are used as examples by Stata, and I don't have any of these problems. Incidentally, you should try writing to the package maintainer first, rather than the list. In this case it doesn't matter, since I read the list frequently, but it might in other cases. -thomas
Dr. Lumley, Thanks for your response. I want to point out that I did try using the nest=TRUE option earlier and got the same error with svydesign. I checked and I was using version 0.9-1. I have updated this to version 1.0 and I am no longer getting an error. Your other suggestions work too of course. Still, if you are interstested in looking at the NHIS data, it is available at: http://www.cdc.gov/nchs/nhis.htm Thanks again for your help. I will first e-mail the package maintainer directly in the future. -Trevor -----Original Message----- From: Thomas Lumley [mailto:tlumley at u.washington.edu] Sent: Wednesday, February 12, 2003 8:49 PM To: Thompson, Trevor Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Various Errors using Survey Package On Wed, 12 Feb 2003, Thompson, Trevor wrote:> Hi, > > I have been experimenting with the new Survey package. Specifically, Iwas> trying to use some of the functions on the public-use survey data fromNHIS> (2000 Sample Adult file). > > Error 1): The first error I get is when I try to specify the complexsurvey> design. > > nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata,data=nhis.df,> check.strata=TRUE) > Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data > nhis.df, : > Clusters not nested in strata > > My data are sorted by strata, psu. Can someone tell me what the structure > has to be for a stratified sample with clustering? Looking at the code,it> appears to me that it does not allow more than 1 observation per psu [i.e. > any(sc > 1)].The problem is probably that your id numbers for PSU start up again in each stratum (eg you have a PSU numbered 1 in each stratum). If so, you need the nest=TRUE option to tell svydesign() that all the PSUs numbered 1 in different strata are really different PSUs> Error 2). If I go ahead and specify check.strata=FALSE, then svydesignruns> ok. I then tried using the svymean function. In the following example,if> I specify na.rm=TRUE, I get the error below:No, it doesn't run ok, it just doesn't report an error.> > svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE) > Error in rowsum.default(x, strata) : Incorrect length for 'group' > > I traced this to the svyCprod call within svymean. SvyCprod calls rowsum > and the group argument ("strata") appears to be the full length of that > column rather than the subset with non-missing data.With missing data you do need to use the data stored in the design object, not a separate data frame, otherwise it will get confused. That is, you want svymean(~crc10yr, design=nhis.design, na.rm=TRUE)> Error 3). I then tried svymean on another variable with na.rm=FALSE. Igot> the following error: > > > svymean(nhis.df$age, design=nhis.design) > Error in drop(rval) : names attribute must be the same length as thevector> > I also traced this error to a call to rowsum within the function svyCprod. > I'm not sure what names attribute this is referring to because thearguments> to rowsum and the rval object do not appear to have a names attribute.Does> anyone know what the problem here might be?This might be the same problem, in which case svymean(~age, design=nhis.design) should work. You should also make sure you have version 1.0 of `survey' rather than any of them 0.9-x versions that went up briefly on CRAN. If you tell me where to find the NHIS data I will look at them. There shouldn't be any special requirements on the format (other than using nest=TRUE if PSUs don't have globally unique ids). I've looked at data from some NCHS studies that are used as examples by Stata, and I don't have any of these problems. Incidentally, you should try writing to the package maintainer first, rather than the list. In this case it doesn't matter, since I read the list frequently, but it might in other cases. -thomas