On Jun 21, 2010, at 10:27 AM, David Riebel wrote:
> I am using the lm function in R to fit several linear models to a
> fair-sized dataset (~160 collections of ~1000 data points each). My
> data have intrinsic, systematic uncertainty much greater than the
> measurement errors on any individual point. My thought is to use the
> residuals of my linear fits to quantify this intrinsic uncertainty,
> but
> I am puzzled over the correct interpretation of R's output.
>
> I have attached plots of the fit and the residuals to one of my
> sub-groups, for illustration. By eye, the overwhelming majority of
> the
> residuals are within +- 0.4, and I would therefore expect the standard
> error of the residuals to be ~0.2. However, the output from lm does
> not
> show this:
Crack open a basic regression text. The standard error (more
completely, the standard error of the estimate) refers to the
parameter, not the residuals. It will depend on SS(resid)/(n), but
there are obviously other components in the calculation. Furthermore,
you have complicated matters by adding a weights term which will
affect your estimates in a manner that we cannot predict since you did
not provide the full data.
>
>> summary(ofit)
>
> Call:
> lm(formula = omag ~ oper, weights = (1/oerr))
>
> Residuals:
> Min 1Q Median 3Q Max
> -3.32185 -0.41181 0.03983 0.40041 2.52971
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 19.52847 0.03979 490.8 <2e-16 ***
> oper -4.25297 0.02101 -202.4 <2e-16 ***
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 0.6705 on 2287 degrees of freedom
> Multiple R-squared: 0.9471, Adjusted R-squared: 0.9471
> F-statistic: 4.097e+04 on 1 and 2287 DF, p-value: < 2.2e-16
>
> The plot thickens when I examine the residuals themselves:
>> summary(resid(ofit))
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> -0.611800 -0.095720 0.010200 0.005954 0.101100 0.680700
>> sd(resid(ofit))
> [1] 0.1533568
>
> These numbers are much more what I see by eye. There really aren't
> any
> residuals outside ~0.6, certainly nothing as large as 3.3! The help
> feature for lm tells me that the residuals are "the residuals, that is
> response minus fitted values." Exactly what I would expect. As an
> Astronomer, my knowledge of statistics is rather "workman-like"
if you
> will, but to me, "Residual standard error" means "the
standard
> deviation
> of the residuals," but the lm output doesn't seem to agree with
this.
Probably because you added the weights argument.
>
> I'd appreciate it if someone could clarify what's being output by
the
> summary function acting on an lm object.
>
> Replies by e-mail preferred.
>
> Thanks,
>
>
> David Riebel
> Graduate Research Assistant
> Johns Hopkins University
> Department of Physics and Astronomy
David Winsemius, MD
West Hartford, CT