thr3ads.net - R help - [R] naive "collinear" weighted linear regression [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Mauricio Calvao

2009-Nov-12 00:45 UTC

[R] naive "collinear" weighted linear regression

Hi there

Sorry for what may be a naive or dumb question.

I have the following data:

 > x <- c(1,2,3,4) # predictor vector

 > y <- c(2,4,6,8) # response vector. Notice that it is an exact, 
perfect straight line through the origin and slope equal to 2

 > error <- c(0.3,0.3,0.3,0.3) # I have (equal) ``errors'', for 
instance, in the measured responses

Of course the best fit coefficients should be 0 for the intercept and 2 
for the slope. Furthermore, it seems completely plausible (or not?) 
that, since the y_i have associated non-vanishing ``errors'' 
(dispersions), there should be corresponding non-vanishing ``errors'' 
associated to the best fit coefficients, right?

When I try:

 > fit_mod <- lm(y~x,weights=1/error^2)

I get

Warning message:
In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
   extra arguments weigths are just disregarded.

Keeping on, despite the warning message, which I did not quite 
understand, when I type:

 > summary(fit_mod)

I get

Call:
lm(formula = y ~ x, weigths = 1/error^2)

Residuals:
          1          2          3          4
-5.067e-17  8.445e-17 -1.689e-17 -1.689e-17

Coefficients:
              Estimate Std. Error   t value Pr(>|t|)
(Intercept) 0.000e+00  8.776e-17 0.000e+00        1
x           2.000e+00  3.205e-17 6.241e+16   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 7.166e-17 on 2 degrees of freedom
Multiple R-squared:     1,      Adjusted R-squared:     1
F-statistic: 3.895e+33 on 1 and 2 DF,  p-value: < 2.2e-16


Naively, should not the column Std. Error be different from zero?? What 
I have in mind, and sure is not what Std. Error means, is that if I 
carried out a large simulation, assuming each response y_i a Gaussian 
random variable with mean y_i and standard deviation 2*error=0.6, and 
then making an ordinary least squares fitting of the slope and 
intercept, I would end up with a mean for these simulated coefficients 
which should be 2 and 0, respectively, and, that's the point, a 
non-vanishing standard deviation for these fitted coefficients, right?? 
This somehow is what I expected should be an estimate or, at least, a 
good indicator, of the degree of uncertainty which I should assign to 
the fitted coefficients; it seems to me these deviations, thus 
calculated as a result of the simulation, will certainly not be zero (or 
  3e-17, for that matter). So this Std. Error does not provide what I, 
naively, think should be given as a measure of the uncertainties or 
errors in the fitted coefficients...

What am I not getting right??

Thanks and sorry for the naive and non-expert question!


-- 
#######################################
Prof. Mauricio Ortiz Calvao
Federal University of Rio de Janeiro
Institute of Physics, P O Box 68528
CEP 21941-972 Rio de Janeiro, RJ
Brazil

Email: orca at if.ufrj.br
Phone: (55)(21)25627483
Homepage: http://www.if.ufrj.br/~orca
#######################################

David Winsemius

2009-Nov-12 03:06 UTC

head link

[R] naive "collinear" weighted linear regression

On Nov 11, 2009, at 7:45 PM, Mauricio Calvao wrote:
> Hi there
>
> Sorry for what may be a naive or dumb question.
>
> I have the following data:
>
> > x <- c(1,2,3,4) # predictor vector
>
> > y <- c(2,4,6,8) # response vector. Notice that it is an exact,  
> perfect straight line through the origin and slope equal to 2
>
> > error <- c(0.3,0.3,0.3,0.3) # I have (equal) ``errors'',
for
> instance, in the measured responses
Which means those x, y, and "error" figures did not come from an  
experiment, but rather from theory???
>
> Of course the best fit coefficients should be 0 for the intercept  
> and 2 for the slope. Furthermore, it seems completely plausible (or  
> not?) that, since the y_i have associated non-vanishing  
> ``errors'' (dispersions), there should be corresponding non- 
> vanishing ``errors'' associated to the best fit coefficients,
right?
>
> When I try:
>
> > fit_mod <- lm(y~x,weights=1/error^2)
>
> I get
>
> Warning message:
> In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
>  extra arguments weigths are just disregarded.
  (Actually the weights are for adjusting for sampling, and I do not  
see any sampling in your "design".)
>
> Keeping on, despite the warning message, which I did not quite  
> understand, when I type:
>
> > summary(fit_mod)
>
> I get
>
> Call:
> lm(formula = y ~ x, weigths = 1/error^2)
>
> Residuals:
>         1          2          3          4
> -5.067e-17  8.445e-17 -1.689e-17 -1.689e-17
>
> Coefficients:
>             Estimate Std. Error   t value Pr(>|t|)
> (Intercept) 0.000e+00  8.776e-17 0.000e+00        1
> x           2.000e+00  3.205e-17 6.241e+16   <2e-16 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 7.166e-17 on 2 degrees of freedom
> Multiple R-squared:     1,      Adjusted R-squared:     1
> F-statistic: 3.895e+33 on 1 and 2 DF,  p-value: < 2.2e-16
>
>
> Naively, should not the column Std. Error be different from zero??  
> What I have in mind, and sure is not what Std. Error means, is that  
> if I carried out a large simulation, assuming each response y_i a  
> Gaussian random variable with mean y_i and standard deviation  
> 2*error=0.6, and then making an ordinary least squares fitting of  
> the slope and intercept, I would end up with a mean for these  
> simulated coefficients which should be 2 and 0, respectively,
Well, not precisely 2 and 0, but rather something very close ... i.e,  
within "experimental error". Please note that numbers in the range of
10e-17 are effectively zero from a numerical analysis perspective.

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

 > .Machine$double.eps ^ 0.5
[1] 1.490116e-08
> and, that's the point, a non-vanishing standard deviation for these  
> fitted coefficients, right?? This somehow is what I expected should  
> be an estimate or, at least, a good indicator, of the degree of  
> uncertainty which I should assign to the fitted coefficients; it  
> seems to me these deviations, thus calculated as a result of the  
> simulation, will certainly not be zero (or  3e-17, for that matter).  
> So this Std. Error does not provide what I, naively, think should be  
> given as a measure of the uncertainties or errors in the fitted  
> coefficients...
You are trying to impose an error structure on a data situation that  
you constructed artificially to be perfect.
>
> What am I not getting right??
That if you input "perfection" into R's linear regression program,
you
get appropriate warnings?
>
> Thanks and sorry for the naive and non-expert question!
You are a Professor of physics, right? You do experiments, right? You  
replicate them.  S0 perhaps I'm the one who should be puzzled.
> -- 
> #######################################
> Prof. Mauricio Ortiz Calvao
> Federal University of Rio de Janeiro
> Institute of Physics, P O Box 68528
> CEP 21941-972 Rio de Janeiro, RJ
> Brazil-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Apparently Analagous Threads

Search for more maybe matching threads

R help - Nov 2009 - naive "collinear" weighted linear regression

[R] naive "collinear" weighted linear regression

[R] naive "collinear" weighted linear regression

Apparently Analagous Threads