Kyle Penner
2014-Feb-24 22:20 UTC
[R] Discrepant lm() and survreg() standard errors with weighted fits
Hi, I have some measurements and their uncertainties. I'm using an uncensored subset of the data for a weighted fit (for now---I'll do a fit to the full, censored, dataset when I understand the results). survreg() reports a much smaller standard error for the model parameter than lm(), but only when I use weights. Am I missing something? Here is what I'm doing: atten = read.table('http://hobo.as.arizona.edu/~kpenner/temp/r_help') uvlog = atten[,1] halog = atten[,2] haerrlog = atten[,3] eventcode_det = seq(1,1,length=length(halog)) # do 2 basic unweighted fits, one using lm(), one using survreg() basic = lm(halog~uvlog-1) surv_basic = survreg(Surv(time=halog, event=eventcode_det)~uvlog-1, dist='gaussian', init=c(3.33*1.8/9.97)) summary(basic)$coef[2] summary(surv_basic)$table[1,2] # hey look they agree basic_weight = lm(halog~uvlog-1, weights=1/(haerrlog^2)) surv_basic_weight = survreg(Surv(time=halog, event=eventcode_det)~uvlog-1, dist='gaussian', init=c(3.33*1.8/9.97), weights=1/(haerrlog^2), robust=T) summary(basic_weight)$coef[2] summary(surv_basic_weight)$table[1,2] # if I leave off robust=T, survreg() SE is still much smaller Thanks, Kyle
Kyle Penner
2014-Feb-25 20:29 UTC
[R] Discrepant lm() and survreg() standard errors with weighted fits
> Survreg treats weights as case weights, and lm treats them as sampling weights. > Here is a simple example. Data set test2 has two copies of every obs in data set test. > > > test <- data.frame(x=1:6, y=c(1,3,2,4,6,5)) > > test2 <- test[c(1:6, 1:6),] > > > summary(lm( y ~ x, data=test))$coef > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.4000000 0.9039595 0.4424977 0.68100354 > x 0.8857143 0.2321154 3.8158362 0.01884548 > > > summary(lm( y~x, data=test2))$coef > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.4000000 0.5717142 0.6996503 0.500096805 > x 0.8857143 0.1468027 6.0333668 0.000126369 > > As expected, the standard error has decreased by a factor of sqrt(2) > Now fit the model using case weights: > > > summary(lm( y~x, data=test, weight= rep(2,6)))$coef > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.4000000 0.9039595 0.4424977 0.68100354 > > Notice that the answer matches the first run with data set test. Repeat this experiment > with survreg, and you will find that the weighted run matches data test2. When using the > robust variance, survreg treats weights as sampling weights, not case weights.When I use robust=F, I now understand the results:> summary(survreg(Surv(y)~x, dist='gaussian', data=test, weights=rep(2,6)))$tableValue Std. Error z p (Intercept) 0.4000000 0.5219013 0.7664285 4.434214e-01 x 0.8857143 0.1340119 6.6092222 3.863445e-11 Log(scale) -0.2321528 0.2041241 -1.1373118 2.554080e-01 When I use robust=T, I do not understand how survreg treats the weights as sampling weights and arrives at a different standard error from lm:> summary(survreg(Surv(y)~x, dist='gaussian', data=test, weights=rep(2,6), robust=T))$tableValue Std. Err (Naive SE) z p (Intercept) 0.4000000 0.29426260 0.5219013 1.35933 1.740420e-01 x 0.8857143 0.08384353 0.1340119 10.56390 4.380958e-26 Log(scale) -0.2321528 0.08117684 0.2041241 -2.85984 4.238543e-03