thr3ads.net - R help - [R] problem with 'svyby' function from SURVEY package [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Roni Kobrosly

2010-Jun-03 04:14 UTC

[R] problem with 'svyby' function from SURVEY package

Hello,

I'm using a complex survey dataset and my goal is to simply spit out a bunch
of probability-weighted outcome variable means for the different levels of
covariate. So I first define the structure of the study design (I'm using
the CDC's NHANES data):

dhanes <- svydesign(id=~PSU, strat=~STRATA, weight=~lab_weight, data=final,
nest=TRUE)

No problem there.
Now I use the "svyby" function as follows:

svyby(~outcome, ~covariate, design=dhanes, svymean, na.rm=T) -> haha
print(haha)

   covariate outcome   se.outcome
1        1   0.4961189 0.08828457
2        2   0.4474706 0.22214557
3        3   0.5157026 0.12076008
4        4   0.6773910 0.20605025
NA      NA   0.8728167 0.15622274

...and it works just fine. I get a nice table of the mean and standard error for
each level of the covariate. I started writing a custom function to automate
this and I had problems. Consider this really basic custom function (that does
not seem very different from the above code):

this_is_a_test <-function(outcome, covariate)
{

svyby(~outcome, ~covariate, design=dhanes, svymean, na.rm=T) -> haha
			
print(hah)


	}

I get the following output:

Error in `[.survey.design2`(design, nas == 0, ) : 
  (subscript) logical subscript too long


I really don't understand this because I haven't done anything too
different. Before I can even start writing up the rest of the function, I have
to settle this. Any ideas?

-Roni




	[[alternative HTML version deleted]]

Thomas Lumley

2010-Jun-03 16:05 UTC

head link

[R] problem with 'svyby' function from SURVEY package

On Thu, 3 Jun 2010, Roni Kobrosly wrote:
> Hello,
>
> I'm using a complex survey dataset and my goal is to simply spit out a
bunch of probability-weighted outcome variable means for the different levels of
covariate. So I first define the structure of the study design (I'm using
the CDC's NHANES data):
>
> dhanes <- svydesign(id=~PSU, strat=~STRATA, weight=~lab_weight,
data=final, nest=TRUE)
>
> No problem there.
> Now I use the "svyby" function as follows:
>
> svyby(~outcome, ~covariate, design=dhanes, svymean, na.rm=T) -> haha
> print(haha)
>
>   covariate outcome   se.outcome
> 1        1   0.4961189 0.08828457
> 2        2   0.4474706 0.22214557
> 3        3   0.5157026 0.12076008
> 4        4   0.6773910 0.20605025
> NA      NA   0.8728167 0.15622274
>
> ...and it works just fine. I get a nice table of the mean and standard
error for each level of the covariate. I started writing a custom function to
automate this and I had problems. Consider this really basic custom function
(that does not seem very different from the above code):
>
> this_is_a_test <-function(outcome, covariate)
> {
>
> svyby(~outcome, ~covariate, design=dhanes, svymean, na.rm=T) -> haha
>
> print(hah)
>
>
> 	}
>
You are asking for the mean of a variable called 'outcome', divided up
according to a variable called 'covariate'. Presumably you don't
have variables with either of those names, so R is getting confused.

Formulas don't work the way you want them to.  As a simpler example with
nothing to do with the survey package

this_is_a_simpler_example<-function(outcome){
    ~outcome
}
> this_is_a_simpler_example(test)~outcome

If you want to substitute a variable into a formula, you need to do it yourself.
In your case, you probably want to use make.formula(), from the survey package
> make.formula("test")
~test> make.formula(c("fred","barney","wilma"))~fred + barney + wilma

Presumably you want to do something like

approach_that_works <-function(outcome, covariate, design=dhanes,...)
svyby(make.formula(outcome),  make.formula(covariate), design,...)

some_outcomes <- colnames(dhanes)[47:63]

some_covariates <- colnames(dhanes)[83:95]

lapply( some_outcomes,
               function(an_outcome) lapply(some_covariates, 
approach_that_works, outcome=an_outcome)
            )


For another recent thread using another approach to a related question, see
tolstoy.newcastle.edu.au/R/e10/help/10/05/5676.html

     -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

R help - Jun 2010 - problem with 'svyby' function from SURVEY package

[R] problem with 'svyby' function from SURVEY package

[R] problem with 'svyby' function from SURVEY package