thr3ads.net - R help - [R] crosstable and regression for survey data (weighted) [Jun 2012]

If this information is useful, please help other people find it:
Share via:

haps

2012-Jun-21 11:02 UTC

[R] crosstable and regression for survey data (weighted)

I have survey data that I am working on. I need to make some multi-way tables
and regression analyses on the data. After attaching the data, this is the
code I use for tables for four variables (sweight is the weight variable):
> a <- xtabs(sweight~research.area + gender + a2n2 + age)
> tmp <- ftable(a)
Is this correct? I don't think I need to use the strata and cluster
variables, right? 
 
And, below is the logistic regression code that I use for randomly sampled,
or unweighted, data:> logit.1 <- glm(var4 ~ var3 + var2 + var1, family = binomial(link >
"logit"))
> summary(logit.1)But how can I do the same analyses for the weighted data? Here is some
additional info: There are four variables in the dataset that reflect the
sampling structure. These are
strat: stratum (urban or (sub-county) rural).
clust: batch of interviews that were part of the same random walk
vill_neigh_code: village or neighbourhood code
sweight: weights

--
View this message in context:
http://r.789695.n4.nabble.com/crosstable-and-regression-for-survey-data-weighted-tp4634083.html
Sent from the R help mailing list archive at Nabble.com.

Pablo Domínguez Vaselli

2012-Jun-22 16:10 UTC

head link

[R] crosstable and regression for survey data (weighted)

Regarding regression models, there's a bit of discussion on whether or not
it is necessary to take the sample design into account (for instance, SPSS
doesn't), so you can run them just normally without much remorse. Or get
your life complicated (see below).

Your xtabs call seems OK to me. However, regarding tables and totals, you
can expand cases as SPSS and most software does (frequency weights) with
this code:

mydata.x <- mydata[rep(1:nrow(mydata),mydata$sweight),]

Once your dataframe is expanded this way, any totals and crosstabulations
will be right without setting any count variable on xtabs or other
functions and using just about any normal call you want (i.e. aggregate(),
table(), etc.). This approach is memory-intensive, the dataframe will be as
large as the target population.

However, in order to properly deal with complex sample data you need the
survey package (I think this is the only sound approach to your modelling
problem). This package will enable you to calculate design effects,
variance estimators and regression modelling taking the survey design into
account without hitting the RAM as above.

In that case, you must first feed the design variables to a survey design
object, using something like:
> library(survey)
> mydesign <- svydesign(ids=~vill_neigh_code+clust, strata=~stratum,weights=~sweight, data=mydata)

Do check the survey package's vignette and help files, this is tricky. It
will also help to have the neighbors population. You must also check their
nesting (that is, if the clusters ids reuse names across strata).

Note the survey package has special functions for just about anything
(including getting your frequencies), all of them start with "svy"
such as
in "svytable" and return variance estimators (note your
estimation's errors
will vary tab-wise in such a complex design. Survey example:
>data(api)
>xtabs(~sch.wide+stype, data=apipop)
>dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
>summary(dclus1)
>(tbl <- svytable(~sch.wide+stype, dclus1))
Once you've specified your survey design, you can fit a design-conscious
glm model using:
>mymodel <- svyglm(var1~var2+var3, design=mydesign,
family=quasibinomial())

If you're out of time just use normal xtabs and glm!

	[[alternative HTML version deleted]]

haps

2012-Jun-27 12:17 UTC

head link

[R] crosstable and regression for survey data (weighted)

Thanks Pablo for your answer, it was very insightful, but I guess I got
something wrong. 

I formed a survey design as:> library(survey) 
> mydesign <- svydesign(ids=~vill_neigh_code+clust, strata=~strat,
> weights=~sweight, data=mydata)where 
strat: stratum (urban or (sub-county) rural). 
clust: batch of interviews that were part of the same random walk 
vill_neigh_code: village or neighbourhood code 
sweight: probability weights
Then, I run a logistic regression as> logit.1 <- svyglm(response~var1+var2+var3+var4+var5+var6,
design=mydesign,
> data=mydata, nest=TRUE, family=quasibinomial())And I get this error message:
Error in svyglm.survey.design(response ~ var1 + var2 + var3 + var4 +  : 
  all variables must be in design= argument
What should I change in the syntax in this case?

--
View this message in context:
http://r.789695.n4.nabble.com/crosstable-and-regression-for-survey-data-weighted-tp4634083p4634617.html
Sent from the R help mailing list archive at Nabble.com.

Pablo Domínguez Vaselli

2012-Jun-29 11:16 UTC

head link

[R] crosstable and regression for survey data (weighted)

It seems the var names you've put are not the same as in the design object:

"all variables must be in design= argument ": that means the object
you've
assigned in mydesign <- svydesign(ids=~vill_neigh_code+clust,
strata=~strat, weights=~sweight, data=mydata)

Check the spelling. Note that the "mydesign" is *not* a dataframe.
That
means that mydesign[,5] or mydesign$myvar won't work (off course neither
will naming the original dataframe "mydata"), you must just use the
variable names alone

for instance:

svyglm(api00~ell+meals+mobility, design=dstrat)

is correct, using only the var names, not dstrat[smth]~dstrat[smth]+
dstrat[smth]

If you write the names correctly it should work

regards

pablo

	[[alternative HTML version deleted]]

haps

2012-Jun-30 03:45 UTC

head link

[R] crosstable and regression for survey data (weighted)

Thanks Pablo,

There must be a spelling issue then although I can get the tables and other
stuff on the same variables. In this case, I will go for the glm below, and
hopefully this will not make the results too bad.

mylogit <- glm(response~ var1+ var2+ var3+ var4+ var5+ var6, weights sweight,
family = quasibinomial(link = "logit"))


--
View this message in context:
http://r.789695.n4.nabble.com/crosstable-and-regression-for-survey-data-weighted-tp4634083p4634950.html
Sent from the R help mailing list archive at Nabble.com.

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jun 2012 - crosstable and regression for survey data (weighted)

[R] crosstable and regression for survey data (weighted)

[R] crosstable and regression for survey data (weighted)

[R] crosstable and regression for survey data (weighted)

[R] crosstable and regression for survey data (weighted)

[R] crosstable and regression for survey data (weighted)

Possibly Parallel Threads