thr3ads.net - R help - [R] svy / weighted regression [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Laust

2009-Oct-09 11:18 UTC

[R] svy / weighted regression

Dear list,

I am trying to set up a propensity-weighted regression using the
survey package. Most of my population is sampled with a sampling
probability of one (that is, I have the full population). However, for
a subset of the data I have only a 50% sample of the full population.
In previous work on the data, I analyzed these data using SAS and
STATA. In those packages I used a propensity weight of 1/[sampling
probability] in various generalized linear regression-procedures, but
I am having trouble setting this up. I bet the solution is simple, but
I?m a R newbie. Code to illustrate my problem below.

Thanks
Laust

# loading survey
library(survey)

# creating data
listc <-
c("Denmark","Finland","Norway","Sweden","Denmark","Finland","Norway","Sweden")
listw <- c(1,2,1,1,1,1,1,1)
listd <- c(0,0,0,0,1000,1000,1000,2000)
listt <- c(750000,500000,900000,1900000,5000,5000,5000,10000)
list.cwdt <- c(listc, listw, listd, listt)
country <-
data.frame(country=listc,weight=listw,deaths=listd,yrs_at_risk=listt)

# running a frequency weighted regression to get the correct point
estimates for comparison
glm <- glm(deaths ~ country + offset(log(yrs_at_risk)),
weights=weight, data=country, family=poisson())
summary(glm)
regTermTest(glm, ~ country)

# running survey weighted regression
svy <- svydesign(~0,,data=country, weight=~weight)
svyglm <- svyglm(deaths ~ country + offset(log(yrs_at_risk)),
design=svy, data=country, family=poisson())
summary(svyglm)
# point estimates are correct, but standard error is way too large
regTermTest(svyglm, ~ country)
# test indicates no country differences

Peter Dalgaard

2009-Oct-10 09:02 UTC

head link

[R] svy / weighted regression

Sorry, forgot to "reply all"...

Laust wrote:> Dear list,
> 
> I am trying to set up a propensity-weighted regression using the
> survey package. Most of my population is sampled with a sampling
> probability of one (that is, I have the full population). However, for
> a subset of the data I have only a 50% sample of the full population.
> In previous work on the data, I analyzed these data using SAS and
> STATA. In those packages I used a propensity weight of 1/[sampling
> probability] in various generalized linear regression-procedures, but
> I am having trouble setting this up. I bet the solution is simple, but
> I?m a R newbie. Code to illustrate my problem below.
Hi Laust,

You probably need the package author to explain fully, but as far as I
can see, the crux is that a dispersion parameter is being used, based on
Pearson residuals, even in the Poisson case (i.e. you effectively get
the same result as with quasipoisson()).

I don't know what the rationale is for this, but it is clear that with
your data, an estimated dispersion parameter is going to be large. E.g.
the data has both 0 cases in 750000 person-years and 1000 cases in 5000
person-years for Denmark, and in your model they are supposed to have
the same Poisson rate.

summary.svyglm starts off with

     est.disp <- TRUE

and AFAICS there is no way it can get set to FALSE.  Knowing Thomas,
there is probably a perfectly good reason not to just set the dispersion
to 1, but I don't get it either...
> 
> Thanks
> Laust
> 
> # loading survey
> library(survey)
> 
> # creating data
> listc <-
c("Denmark","Finland","Norway","Sweden","Denmark","Finland","Norway","Sweden")
> listw <- c(1,2,1,1,1,1,1,1)
> listd <- c(0,0,0,0,1000,1000,1000,2000)
> listt <- c(750000,500000,900000,1900000,5000,5000,5000,10000)
> list.cwdt <- c(listc, listw, listd, listt)
> country <-
data.frame(country=listc,weight=listw,deaths=listd,yrs_at_risk=listt)
> 
> # running a frequency weighted regression to get the correct point
> estimates for comparison
> glm <- glm(deaths ~ country + offset(log(yrs_at_risk)),
> weights=weight, data=country, family=poisson())
> summary(glm)
> regTermTest(glm, ~ country)
> 
> # running survey weighted regression
> svy <- svydesign(~0,,data=country, weight=~weight)
> svyglm <- svyglm(deaths ~ country + offset(log(yrs_at_risk)),
> design=svy, data=country, family=poisson())
> summary(svyglm)
> # point estimates are correct, but standard error is way too large
> regTermTest(svyglm, ~ country)
> # test indicates no country differences
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
    O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Reasonably Related Threads

Search for more possibly parallel threads

R help - Oct 2009 - svy / weighted regression

[R] svy / weighted regression

[R] svy / weighted regression

Reasonably Related Threads