thr3ads.net - R help - [R] Discrepant lm() and survreg() standard errors with weighted fits [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Kyle Penner

2014-Feb-24 22:20 UTC

[R] Discrepant lm() and survreg() standard errors with weighted fits

Hi,

I have some measurements and their uncertainties.  I'm using an
uncensored subset of the data for a weighted fit (for now---I'll do a
fit to the full, censored, dataset when I understand the results).

survreg() reports a much smaller standard error for the model
parameter than lm(), but only when I use weights.  Am I missing
something?  Here is what I'm doing:

atten = read.table('http://hobo.as.arizona.edu/~kpenner/temp/r_help')

uvlog = atten[,1]
halog = atten[,2]
haerrlog = atten[,3]

eventcode_det = seq(1,1,length=length(halog))

# do 2 basic unweighted fits, one using lm(), one using survreg()

basic = lm(halog~uvlog-1)
surv_basic = survreg(Surv(time=halog, event=eventcode_det)~uvlog-1,
dist='gaussian', init=c(3.33*1.8/9.97))

summary(basic)$coef[2]
summary(surv_basic)$table[1,2]

# hey look they agree

basic_weight = lm(halog~uvlog-1, weights=1/(haerrlog^2))
surv_basic_weight = survreg(Surv(time=halog,
event=eventcode_det)~uvlog-1, dist='gaussian', init=c(3.33*1.8/9.97),
weights=1/(haerrlog^2), robust=T)

summary(basic_weight)$coef[2]
summary(surv_basic_weight)$table[1,2]

# if I leave off robust=T, survreg() SE is still much smaller

Thanks,

Kyle

Kyle Penner

2014-Feb-25 20:29 UTC

head link

[R] Discrepant lm() and survreg() standard errors with weighted fits

> Survreg treats weights as case weights, and lm treats them as sampling
weights.
> Here is a simple example.  Data set test2 has two copies of every obs in
data set test.
>
> > test <- data.frame(x=1:6, y=c(1,3,2,4,6,5))
> > test2 <- test[c(1:6, 1:6),]
>
> > summary(lm( y ~ x, data=test))$coef
>               Estimate Std. Error   t value   Pr(>|t|)
> (Intercept) 0.4000000  0.9039595 0.4424977 0.68100354
> x           0.8857143  0.2321154 3.8158362 0.01884548
>
> > summary(lm( y~x, data=test2))$coef
>               Estimate Std. Error   t value    Pr(>|t|)
> (Intercept) 0.4000000  0.5717142 0.6996503 0.500096805
> x           0.8857143  0.1468027 6.0333668 0.000126369
>
> As expected, the standard error has decreased by a factor of sqrt(2)
> Now fit the model using case weights:
>
> > summary(lm( y~x, data=test, weight= rep(2,6)))$coef
>               Estimate Std. Error   t value   Pr(>|t|)
> (Intercept) 0.4000000  0.9039595 0.4424977 0.68100354
>
>  Notice that the answer matches the first run with data set test.  Repeat
this experiment
> with survreg, and you will find that the weighted run matches data test2. 
When using the
> robust variance, survreg treats weights as sampling weights, not case
weights.
When I use robust=F, I now understand the results:
> summary(survreg(Surv(y)~x, dist='gaussian', data=test,
weights=rep(2,6)))$table                 Value Std. Error          z            p
(Intercept)  0.4000000  0.5219013  0.7664285 4.434214e-01
x            0.8857143  0.1340119  6.6092222 3.863445e-11
Log(scale)  -0.2321528  0.2041241 -1.1373118 2.554080e-01

When I use robust=T, I do not understand how survreg treats the
weights as sampling weights and arrives at a different standard error
from lm:
> summary(survreg(Surv(y)~x, dist='gaussian', data=test,
weights=rep(2,6), robust=T))$table                 Value   Std. Err (Naive SE)        z            p
(Intercept)  0.4000000 0.29426260  0.5219013  1.35933 1.740420e-01
x            0.8857143 0.08384353  0.1340119 10.56390 4.380958e-26
Log(scale)  -0.2321528 0.08117684  0.2041241 -2.85984 4.238543e-03

R help - Feb 2014 - Discrepant lm() and survreg() standard errors with weighted fits

[R] Discrepant lm() and survreg() standard errors with weighted fits

[R] Discrepant lm() and survreg() standard errors with weighted fits