thr3ads.net - R help - [R] weighted Cox proportional hazards regression [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Scott Bartell

2007-Dec-04 13:00 UTC

[R] weighted Cox proportional hazards regression

I'm getting unexpected results from the coxph function when using
weights from counter-matching.  For example, the following code
produces a parameter estimate of -1.59 where I expect 0.63:

d2 = structure(list(x = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
1, 0, 0, 1, 0, 1, 0, 1, 0, 1), wt = c(5, 42, 40, 4, 43, 4, 42,
4, 44, 5, 38, 4, 39, 4, 4, 37, 40, 4, 44, 5, 45, 5, 44, 5), riskset = c(1L,
1L, 4L, 4L, 6L, 6L, 12L, 12L, 13L, 13L, 19L, 19L, 23L, 23L, 31L,
31L, 42L, 42L, 45L, 45L, 70L, 70L, 93L, 93L), cc = c(1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0
), pseudotime = rep(1,24)), .Names = c("x", "wt",
"riskset",
"cc", "pseudotime"), class = "data.frame",
row.names=1:24)

coxph( Surv(pseudotime, cc) ~ x + strata(riskset), weights=wt,
robust=T, method="breslow",data=d2)

I'm expecting a value of about 0.63 to 0.64 based on the data source
(simulated) and the following hand-coded MLE:

negloglik = function(beta,dat) {
  dat$wexb = dat$wt * exp(dat$x * beta)
  agged = aggregate(dat$wexb,list(riskset=dat$riskset),sum)
  names(agged)[2] = "denom"
  dat = merge(dat[dat$cc==1,],agged,by="riskset")
  -sum(log(dat$wexb)-log(dat$denom))
  }
nlm(negloglik,0,hessian=T,dat=d2)

Am I misunderstanding the meaning of case weights in the coxph
function?  The help file is pretty terse.

Scott Bartell, PhD
Assistant Professor
Department of Epidemiology
University of California, Irvine

Thomas Lumley

2007-Dec-04 22:28 UTC

head link

[R] weighted Cox proportional hazards regression

On Tue, 4 Dec 2007, Scott Bartell wrote:
> I'm getting unexpected results from the coxph function when using
> weights from counter-matching.  For example, the following code
> produces a parameter estimate of -1.59 where I expect 0.63:
You can get the answer you want with

coxph(Surv(pseudotime, cc)~x+strata(riskset)+offset(log(wt)), data=d2,
robust=TRUE)

which is how countermatching usually seems to be done, and is what the 
original paper by Langholz & Borgan recommends.

I think it's right that weight=wt doesn't do the same thing.  The
weights
are not simple inverse-probability sampling weights, because the sampling 
units in this design are pairs, not individuals.  If we assume the 
distribution of x is the same across risk sets (which looks approximately 
true in your data) then the sampling weight for a pair is proportional to 
the number of eligible controls: ie, just your control weight.  Using 
these as weights for the pairs in the weight= argument I get 0.585 as the 
hazard ratio, reasonably close to your 0.63 given that this is a different 
estimator and given the assumptions.

 	-thomas

> d2 = structure(list(x = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
> 1, 0, 0, 1, 0, 1, 0, 1, 0, 1), wt = c(5, 42, 40, 4, 43, 4, 42,
> 4, 44, 5, 38, 4, 39, 4, 4, 37, 40, 4, 44, 5, 45, 5, 44, 5), riskset = c(1L,
> 1L, 4L, 4L, 6L, 6L, 12L, 12L, 13L, 13L, 19L, 19L, 23L, 23L, 31L,
> 31L, 42L, 42L, 45L, 45L, 70L, 70L, 93L, 93L), cc = c(1, 0, 1,
> 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0
> ), pseudotime = rep(1,24)), .Names = c("x", "wt",
"riskset",
> "cc", "pseudotime"), class = "data.frame",
row.names=1:24)
>
> coxph( Surv(pseudotime, cc) ~ x + strata(riskset), weights=wt,
> robust=T, method="breslow",data=d2)
>
> I'm expecting a value of about 0.63 to 0.64 based on the data source
> (simulated) and the following hand-coded MLE:
>
> negloglik = function(beta,dat) {
>  dat$wexb = dat$wt * exp(dat$x * beta)
>  agged = aggregate(dat$wexb,list(riskset=dat$riskset),sum)
>  names(agged)[2] = "denom"
>  dat = merge(dat[dat$cc==1,],agged,by="riskset")
>  -sum(log(dat$wexb)-log(dat$denom))
>  }
> nlm(negloglik,0,hessian=T,dat=d2)
>
> Am I misunderstanding the meaning of case weights in the coxph
> function?  The help file is pretty terse.
>
> Scott Bartell, PhD
> Assistant Professor
> Department of Epidemiology
> University of California, Irvine
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Terry Therneau

2007-Dec-05 13:22 UTC

head link

[R] weighted Cox proportional hazards regression

> I'm getting unexpected results from the coxph function when using
> weights from counter-matching.  For example, the following code
> produces a parameter estimate of -1.59 where I expect 0.63:
  I agree with Thomas' answer wrt using offset instead of weight.  One way
to
understand this is to look at the score equation for the Cox model, which is
  	sum over the deaths of (x[i] - xbar[i])
x[i] is the covariate vector of the ith death
xbar[i] is the average of all the subjects who were at risk at the time of the 
ith death.

  In situations where one samples selected controls, the score equation will be 
correct if one fixes up xbar so that it is an estimate of the population mean 
(all those in the population that were at risk for a death) rather than being 
the mean of just those in the sample.  Use of an offset statement allows one to 
reweight xbar without changing the rest of the score equation.  It's kind of
a
trick, see Therneau and Li, Lifetime Data Analysis, 1999, p99-112 for a simple 
example of how it works.  Langholz and Borgan give details on exactly how to 
correctly reweight using some old results from sampling theory - it is just a 
little bit more subtle than one would guess, but not too different from the 
obvious.
  
  	Terry Therneau

Reasonably Related Threads

Search for more reasonably related threads

R help - Dec 2007 - weighted Cox proportional hazards regression

[R] weighted Cox proportional hazards regression

[R] weighted Cox proportional hazards regression

[R] weighted Cox proportional hazards regression

Reasonably Related Threads