thr3ads.net - R help - [R] Various Errors using Survey Package [Feb 2003]

If this information is useful, please help other people find it:
Share via:

Thompson, Trevor

2003-Feb-12 19:38 UTC

[R] Various Errors using Survey Package

Hi,

I have been experimenting with the new Survey package.  Specifically, I was
trying to use some of the functions on the public-use survey data from NHIS
(2000 Sample Adult file).  

Error 1):  The first error I get is when I try to specify the complex survey
design.

nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata, data=nhis.df,
check.strata=TRUE)
Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data nhis.df, 
:
        Clusters not nested in strata

My data are sorted by strata, psu.  Can someone tell me what the structure
has to be for a stratified sample with clustering?  Looking at the code, it
appears to me that it does not allow more than 1 observation per psu [i.e.
any(sc > 1)].

Error 2).  If I go ahead and specify check.strata=FALSE, then svydesign runs
ok.  I then tried using the svymean function.  In the following example, if
I specify na.rm=TRUE, I get the error below:
> svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE)Error in rowsum.default(x, strata) : Incorrect length for 'group'

I traced this to the svyCprod call within svymean.   SvyCprod calls rowsum
and the group argument ("strata") appears to be the full length of
that
column rather than the subset with non-missing data.  

Error 3).  I then tried svymean on another variable with na.rm=FALSE.  I got
the following error:
> svymean(nhis.df$age, design=nhis.design)Error in drop(rval) : names attribute must be the same length as the vector 

I also traced this error to a call to rowsum within the function svyCprod.
I'm not sure what names attribute this is referring to because the arguments
to rowsum and the rval object do not appear to have a names attribute.  Does
anyone know what the problem here might be?

Has anyone else used the survey package on public-use survey datasets like
BRFSS or NHIS?  Was there anything special you had to do to those datasets
before specifying the survey design?  I know that's a pretty vague question.
If any of you are SUDAAN users, I basically mean does it have to be
structured differently that what you pass into a SUDAAN procedure.

Thanks in advance for any suggestions!  I am using R 1.6.2 on Windows 2000.

-Trevor

Thomas Lumley

2003-Feb-13 02:50 UTC

head link

[R] Various Errors using Survey Package

On Wed, 12 Feb 2003, Thompson, Trevor wrote:
> Hi,
>
> I have been experimenting with the new Survey package.  Specifically, I was
> trying to use some of the functions on the public-use survey data from NHIS
> (2000 Sample Adult file).
>
> Error 1):  The first error I get is when I try to specify the complex
survey
> design.
>
> nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata,
data=nhis.df,
> check.strata=TRUE)
> Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data >
nhis.df,  :
>         Clusters not nested in strata
>
> My data are sorted by strata, psu.  Can someone tell me what the structure
> has to be for a stratified sample with clustering?  Looking at the code, it
> appears to me that it does not allow more than 1 observation per psu [i.e.
> any(sc > 1)].
  The problem is probably that your id numbers for PSU start up again in
each stratum (eg you have a PSU numbered 1 in each stratum).  If so, you
need the nest=TRUE option to tell svydesign() that all the PSUs numbered 1
in different strata are really different PSUs

> Error 2).  If I go ahead and specify check.strata=FALSE, then svydesign
runs
> ok.  I then tried using the svymean function.  In the following example, if
> I specify na.rm=TRUE, I get the error below:
No, it doesn't run ok, it just doesn't report an error.
> > svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE)
> Error in rowsum.default(x, strata) : Incorrect length for 'group'
>
> I traced this to the svyCprod call within svymean.   SvyCprod calls rowsum
> and the group argument ("strata") appears to be the full length
of that
> column rather than the subset with non-missing data.
With missing data you do need to use the data stored in the design object,
not a separate data frame, otherwise it will get confused. That is, you
want
  svymean(~crc10yr, design=nhis.design, na.rm=TRUE)

> Error 3).  I then tried svymean on another variable with na.rm=FALSE.  I
got
> the following error:
>
> > svymean(nhis.df$age, design=nhis.design)
> Error in drop(rval) : names attribute must be the same length as the vector
>
> I also traced this error to a call to rowsum within the function svyCprod.
> I'm not sure what names attribute this is referring to because the
arguments
> to rowsum and the rval object do not appear to have a names attribute. 
Does
> anyone know what the problem here might be?
This might be the same problem, in which case
    svymean(~age, design=nhis.design)
should work.  You should also make sure you have version 1.0 of `survey'
rather than any of them 0.9-x versions that went up briefly on CRAN.

If you tell me where to find the NHIS data I will look at them. There
shouldn't be any special requirements on the format (other than using
nest=TRUE if PSUs don't have globally unique ids).  I've looked at data
from some NCHS studies that are used as examples by Stata, and I don't
have any of these problems.

Incidentally, you should try writing to the package maintainer first,
rather than the list. In this case it doesn't matter, since I read the
list frequently, but it might in other cases.

	-thomas

Thompson, Trevor

2003-Feb-13 14:13 UTC

head link

[R] Various Errors using Survey Package

Dr. Lumley,

Thanks for your response.  I want to point out that I did try using the
nest=TRUE option earlier and got the same error with svydesign.  I checked
and I was using version 0.9-1.  I have updated this to version 1.0 and I am
no longer getting an error.  

Your other suggestions work too of course.  Still, if you are interstested
in looking at the NHIS data, it is available at:

http://www.cdc.gov/nchs/nhis.htm 

Thanks again for your help.  I will first e-mail the package maintainer
directly in the future.

-Trevor
 
-----Original Message-----
From: Thomas Lumley [mailto:tlumley at u.washington.edu]
Sent: Wednesday, February 12, 2003 8:49 PM
To: Thompson, Trevor
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Various Errors using Survey Package


On Wed, 12 Feb 2003, Thompson, Trevor wrote:
> Hi,
>
> I have been experimenting with the new Survey package.  Specifically, I
was> trying to use some of the functions on the public-use survey data from
NHIS> (2000 Sample Adult file).
>
> Error 1):  The first error I get is when I try to specify the complex
survey> design.
>
> nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata,
data=nhis.df,> check.strata=TRUE)
> Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data >
nhis.df,  :
>         Clusters not nested in strata
>
> My data are sorted by strata, psu.  Can someone tell me what the structure
> has to be for a stratified sample with clustering?  Looking at the code,
it> appears to me that it does not allow more than 1 observation per psu [i.e.
> any(sc > 1)].
  The problem is probably that your id numbers for PSU start up again in
each stratum (eg you have a PSU numbered 1 in each stratum).  If so, you
need the nest=TRUE option to tell svydesign() that all the PSUs numbered 1
in different strata are really different PSUs

> Error 2).  If I go ahead and specify check.strata=FALSE, then svydesign
runs> ok.  I then tried using the svymean function.  In the following example,
if> I specify na.rm=TRUE, I get the error below:
No, it doesn't run ok, it just doesn't report an error.
> > svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE)
> Error in rowsum.default(x, strata) : Incorrect length for 'group'
>
> I traced this to the svyCprod call within svymean.   SvyCprod calls rowsum
> and the group argument ("strata") appears to be the full length
of that
> column rather than the subset with non-missing data.
With missing data you do need to use the data stored in the design object,
not a separate data frame, otherwise it will get confused. That is, you
want
  svymean(~crc10yr, design=nhis.design, na.rm=TRUE)

> Error 3).  I then tried svymean on another variable with na.rm=FALSE.  I
got> the following error:
>
> > svymean(nhis.df$age, design=nhis.design)
> Error in drop(rval) : names attribute must be the same length as the
vector>
> I also traced this error to a call to rowsum within the function svyCprod.
> I'm not sure what names attribute this is referring to because the
arguments> to rowsum and the rval object do not appear to have a names attribute.
Does> anyone know what the problem here might be?
This might be the same problem, in which case
    svymean(~age, design=nhis.design)
should work.  You should also make sure you have version 1.0 of `survey'
rather than any of them 0.9-x versions that went up briefly on CRAN.

If you tell me where to find the NHIS data I will look at them. There
shouldn't be any special requirements on the format (other than using
nest=TRUE if PSUs don't have globally unique ids).  I've looked at data
from some NCHS studies that are used as examples by Stata, and I don't
have any of these problems.

Incidentally, you should try writing to the package maintainer first,
rather than the list. In this case it doesn't matter, since I read the
list frequently, but it might in other cases.

	-thomas

Reasonably Related Threads

Search for more reasonably related threads

R help - Feb 2003 - Various Errors using Survey Package

[R] Various Errors using Survey Package

[R] Various Errors using Survey Package

[R] Various Errors using Survey Package

Reasonably Related Threads