thr3ads.net - R help - [R] problem with svyby and NAs (survey package) [Apr 2012]

If this information is useful, please help other people find it:
Share via:

A.F.Fenton at lse.ac.uk

2012-Apr-13 17:44 UTC

[R] problem with svyby and NAs (survey package)

Hello

I'm trying to get the proportion "true" for dichotomous variable
for
various subgroups in a survey.

This works fine, but obviously doesn't give proportions directly:
svytable(~SurvYear+problem.vandal, seh.dsn, round=TRUE)
        problem.vandal
SurvYear FALSE  TRUE
    1995  8906   786
    1997 17164  2494
    1998 17890  1921
    1999 18322  1669
    2001 17623  2122
...

Note some years are missing - they are part of the dataset, but all
responses are NA (the question wasn't asked).

However, this gives an error, and I'd like to understand why - it works
for variables without missing years:

svyby(~problem.vandal, ~SurvYear, seh.dsn, svymean, na.rm=TRUE)
Error in tapply(1:NROW(x), list(factor(strata)), function(index) { : 
  arguments must have same length

The error only occurs when na.rm=TRUE and there are no observations in
one year.

Thanks
alex

Please access the attached hyperlink for an important electronic communications
disclaimer: lse.ac.uk/emailDisclaimer

A.F.Fenton at lse.ac.uk

2012-Apr-13 19:17 UTC

head link

[R] problem with svyby and NAs (survey package)

> I'm trying to get the proportion "true" for dichotomous
variable for
various> subgroups in a survey.
Sorry, I'm new to the list, and just saw the advice about minimally
reproducible code. Here goes:


library(survey)
foo <- data.frame(id       = 1:25,
                  weight   = runif(25),
                  year     = rep(2002:2006, 5),
                  problem  = rnorm(25) > 0)
foo.dsn = svydesign(id=~id, weight=~weight, data=foo)
svyby(~problem, ~year, foo.dsn, svymean, na.rm=TRUE) # Fine

# One year is missing 
foo[foo$year == 2004, "problem"] = NA
foo.dsn = svydesign(id=~id, weight=~weight, data=foo)
svyby(~problem, ~year, foo.dsn, svymean, na.rm=TRUE) # Error


thanks
alex

Please access the attached hyperlink for an important electronic communications
disclaimer: lse.ac.uk/emailDisclaimer

Thomas Lumley

2012-Apr-14 21:28 UTC

head link

[R] problem with svyby and NAs (survey package)

On Sat, Apr 14, 2012 at 5:44 AM,  <A.F.Fenton at lse.ac.uk>
wrote:> Hello
>
> I'm trying to get the proportion "true" for dichotomous
variable for
> various subgroups in a survey.
>
> This works fine, but obviously doesn't give proportions directly:
> svytable(~SurvYear+problem.vandal, seh.dsn, round=TRUE)
> ? ? ? ?problem.vandal
> SurvYear FALSE ?TRUE
> ? ?1995 ?8906 ? 786
> ? ?1997 17164 ?2494
> ? ?1998 17890 ?1921
> ? ?1999 18322 ?1669
> ? ?2001 17623 ?2122
> ...
>
> Note some years are missing - they are part of the dataset, but all
> responses are NA (the question wasn't asked).
>
> However, this gives an error, and I'd like to understand why - it works
> for variables without missing years:
>
> svyby(~problem.vandal, ~SurvYear, seh.dsn, svymean, na.rm=TRUE)
> Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :
> ?arguments must have same length
>
> The error only occurs when na.rm=TRUE and there are no observations in
> one year.
The error occurs because you are asking for the mean of a vector of
all NAs.   svyby() just calls svymean() on each subset of the data.
In your reproducible example,
   svymean(~problem, subset(foo.dsn, year==2004), na.rm=TRUE)
will give the same error, and a work-around is to use subset(foo.dsn,
year!=2004) in the call to svyby()

Now, svymean() is entitled to be a bit upset: you asked for the mean
of the all the non-missing values, but you didn't give it any
non-missing values.  What should it do? It obviously can't return a
sensible proportion, because it got given no data.

It could just return NaN as the answer, as mean() does, but that
wouldn't help you here since svyby() is expecting a vector of two
proportions and a covariance matrix for them.

Obviously it would be possible to rewrite svymean() to handle empty
data, and I'll do that, but that doesn't solve the general problem of
what happens when svyby() asks for something impossible.   It would
also be possible for svyby() to trap errors and treat them as empty
results, but that would have the disadvantage of making debugging a
lot harder.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

Apparently Analagous Threads

Search for more reasonably related threads

R help - Apr 2012 - problem with svyby and NAs (survey package)

[R] problem with svyby and NAs (survey package)

[R] problem with svyby and NAs (survey package)

[R] problem with svyby and NAs (survey package)

Apparently Analagous Threads