thr3ads.net - R help - [R] Error in lm() with very small (close to zero) regressor [Mar 2015]

If this information is useful, please help other people find it:
Share via:

RiGui

2015-Mar-27 23:32 UTC

[R] Error in lm() with very small (close to zero) regressor

Hello everybody,

I have encountered the following problem with lm():

When running lm() with a regressor close to zero - of the order e-10, the
value of the estimate is of huge absolute value , of order millions. 

However, if I write the formula of the OLS estimator, in matrix notation:
pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the
estimate has value 0.

here is the code:

y  <- rnorm(n_obs, 10,2.89)
x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)
x2 <- rnorm(n_obs, 10,3.21)
X  <- cbind(x1,x2)

bFE <- lm(y ~ x1 + x2)
bFE

bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
bOLS


Note: I am applying a deviation from the mean projection to the data, that
is why I have some regressors with such small values.

Thank you for any help!

Raluca






--
View this message in context:
http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185.html
Sent from the R help mailing list archive at Nabble.com.

peter dalgaard

2015-Mar-28 15:33 UTC

head link

[R] Error in lm() with very small (close to zero) regressor

> On 28 Mar 2015, at 00:32 , RiGui <raluca.gui at business.uzh.ch>
wrote:
> 
> Hello everybody,
> 
> I have encountered the following problem with lm():
> 
> When running lm() with a regressor close to zero - of the order e-10, the
> value of the estimate is of huge absolute value , of order millions. 
> 
> However, if I write the formula of the OLS estimator, in matrix notation:
> pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the
> estimate has value 0.
> 
> here is the code:
> 
> y  <- rnorm(n_obs, 10,2.89)
> x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)
> x2 <- rnorm(n_obs, 10,3.21)
> X  <- cbind(x1,x2)
> 
> bFE <- lm(y ~ x1 + x2)
> bFE
> 
> bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
> bOLS
> 
> 
> Note: I am applying a deviation from the mean projection to the data, that
> is why I have some regressors with such small values.
> 
> Thank you for any help!
> 
> Raluca
> Example not reproducible:
> y  <- rnorm(n_obs, 10,2.89)Error in rnorm(n_obs, 10, 2.89) : object 'n_obs' not
found> x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)Error in rnorm(n_obs, 1.235657e-14, 4.5e-17) : object 'n_obs' not
found> x2 <- rnorm(n_obs, 10,3.21)Error in rnorm(n_obs, 10, 3.21) : object 'n_obs' not
found> X  <- cbind(x1,x2)
Error in cbind(x1, x2) : object 'x1' not found> 
> bFE <- lm(y ~ x1 + x2)Error in eval(expr, envir, enclos) : object 'y' not
found> bFE
Error: object 'bFE' not found> 
> bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
Error: could not find function "pseudoinverse"> bOLSError: object 'bOLS' not found



-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Ben Bolker

2015-Mar-28 17:28 UTC

head link

[R] Error in lm() with very small (close to zero) regressor

peter dalgaard <pdalgd <at> gmail.com> writes:
> 
> 
> > On 28 Mar 2015, at 00:32 , RiGui <raluca.gui <at>
business.uzh.ch> wrote:
> > 
> > Hello everybody,
> > 
> > I have encountered the following problem with lm():
> > 
> > When running lm() with a regressor close to zero - 
> of the order e-10, the
> > value of the estimate is of huge absolute value , of order millions. 
> > 
> > However, if I write the formula of the OLS estimator, 
> in matrix notation:
> > pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning
the
> > estimate has value 0.
  How do you know this answer is "correct"?
> > here is the code:
> > 
> > y  <- rnorm(n_obs, 10,2.89)
> > x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)
> > x2 <- rnorm(n_obs, 10,3.21)
> > X  <- cbind(x1,x2)
> > 
> > bFE <- lm(y ~ x1 + x2)
> > bFE
> > 
> > bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
> > bOLS
> > 
> > 
> > Note: I am applying a deviation from the 
> mean projection to the data, that
> > is why I have some regressors with such small values.
> > 
> > Thank you for any help!
> > 
> > Raluca
  Is there a reason you can't scale your regressors?
> > 
> Example not reproducible:
> 
  I agree that the OP's question was not reproducible, but it's
not too hard to make it reproducible. I bothered to use
library("sos"); findFn("pseudoinverse") to find
pseudoinverse()
in corpcor:

It is true that we get estimates with very large magnitudes,
but their 

set.seed(101)
n_obs <- 1000
y  <- rnorm(n_obs, 10,2.89)
x1 <- rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17)
x2 <- rnorm(n_obs, 10,3.21)
X  <- cbind(x1,x2)
 
bFE <- lm(y ~ x1 + x2)
bFE
coef(summary(bFE))

                 Estimate   Std. Error     t value  Pr(>|t|)
(Intercept)  1.155959e+01 2.312956e+01  0.49977541 0.6173435
x1          -1.658420e+14 1.872598e+15 -0.08856254 0.9294474
x2           3.797342e-02 2.813000e-02  1.34992593 0.1773461

library("corpcor")
bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
bOLS

             [,1]
[1,] 9.807664e-16
[2,] 8.880273e-01

And if we scale the predictor:

bFE2 <- lm(y ~ I(1e14*x1) + x2)
coef(summary(bFE2))

                 Estimate Std. Error     t value  Pr(>|t|)
(Intercept)   11.55958731   23.12956  0.49977541 0.6173435
I(1e+14 * x1) -1.65842037   18.72598 -0.08856254 0.9294474
x2             0.03797342    0.02813  1.34992593 0.1773461

bOLS stays constant.

To be honest, I haven't thought about this enough to see
which answer is actually correct, although I suspect the
problem is in bOLS, since the numerical methods (unlike
the brute-force pseudoinverse method given here) behind
lm have been carefully considered for numerical stability.

RiGui

2015-Mar-31 12:10 UTC

head link

[R] Error in lm() with very small (close to zero) regressor

I found a fix to my problem using the fastLm() from package RcppEigen, using
the Jacobi singular value decomposition (SVD) (method 4) or a method based
on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm
function



install.packages("RcppEigen")
library(RcppEigen)

n_obs <- 1500
y  <- rnorm(n_obs, 10,2.89)
x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)
x2 <- rnorm(n_obs, 10,3.21)
X  <- cbind(x1,x2)



bFE <- fastLm(y ~ x1 + x2, method =4)
bFE

Call:
fastLm.formula(formula = y ~ x1 + x2, method = 4)

Coefficients:
        (Intercept)                  x1                  x2 
9.94832839474159414 0.00000000000012293 0.00440078989949841 


Best,

Raluca





--
View this message in context:
http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html
Sent from the R help mailing list archive at Nabble.com.

William Dunlap

2015-Mar-31 16:31 UTC

head link

[R] Error in lm() with very small (close to zero) regressor

If you really want your coefficient estimates to be scale-equivariant you
should test those methods for such a thing.  E.g., here are functions that
let you check how scaling one predictor affects the estimated coefficients
- they should give the same results for any scale factor.

f <-
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n))))
{
    cf <- coef(lm(data=data, Y ~ X1 + I(X2/scale)))
    cf * c(1, 1, 1/scale)
}
g <-
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n))))
{
    cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=4))
    cf * c(1, 1, 1/scale)
}
h <-
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n))))
{
    cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=5))
    cf * c(1, 1, 1/scale)
}

See how they compare for scale factors between 10^-15 and 10^15.  lm() is
looking pretty good.> options(digits=4)
> scale <- 10 ^ seq(-15,15,by=5)
> sapply(scale, f)               [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]
(Intercept)  -9.393  -9.393  -9.393  -9.393  -9.393  -9.393  -9.393
X1           19.955  19.955  19.955  19.955  19.955  19.955  19.955
I(X2/scale) -20.372 -20.372 -20.372 -20.372 -20.372 -20.372
-20.372> sapply(scale, g)                 [,1]    [,2]    [,3]    [,4]    [,5]    [,6]       [,7]
(Intercept) 0.000e+00  -9.393  -9.393  -9.393  -9.393  -9.393 -3.126e+01
X1          2.772e-29  19.955  19.955  19.955  19.955  19.955  1.218e+01
I(X2/scale) 1.474e+01 -20.372 -20.372 -20.372 -20.372 -20.372
-2.892e-29> sapply(scale, h)                 [,1]      [,2]    [,3]    [,4]    [,5]       [,6]
[,7]
(Intercept) 0.000e+00 3.807e-20  -9.395  -9.393  -9.393 -3.126e+01
-3.126e+01
X1          2.945e-29 2.772e-19  19.954  19.955  19.955  1.218e+01
 1.218e+01
I(X2/scale) 1.474e+01 1.474e+01 -20.369 -20.372 -20.372 -2.892e-19
 6.596e-30



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Mar 31, 2015 at 5:10 AM, RiGui <raluca.gui at business.uzh.ch>
wrote:
> I found a fix to my problem using the fastLm() from package RcppEigen,
> using
> the Jacobi singular value decomposition (SVD) (method 4) or a method based
> on the eigenvalue-eigenvector decomposition of X'X - method 5 of the
fastLm
> function
>
>
>
> install.packages("RcppEigen")
> library(RcppEigen)
>
> n_obs <- 1500
> y  <- rnorm(n_obs, 10,2.89)
> x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)
> x2 <- rnorm(n_obs, 10,3.21)
> X  <- cbind(x1,x2)
>
>
>
> bFE <- fastLm(y ~ x1 + x2, method =4)
> bFE
>
> Call:
> fastLm.formula(formula = y ~ x1 + x2, method = 4)
>
> Coefficients:
>         (Intercept)                  x1                  x2
> 9.94832839474159414 0.00000000000012293 0.00440078989949841
>
>
> Best,
>
> Raluca
>
>
>
>
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Mar 2015 - Error in lm() with very small (close to zero) regressor

[R] Error in lm() with very small (close to zero) regressor

[R] Error in lm() with very small (close to zero) regressor

[R] Error in lm() with very small (close to zero) regressor

[R] Error in lm() with very small (close to zero) regressor

[R] Error in lm() with very small (close to zero) regressor