Hello everybody, I have encountered the following problem with lm(): When running lm() with a regressor close to zero - of the order e-10, the value of the estimate is of huge absolute value , of order millions. However, if I write the formula of the OLS estimator, in matrix notation: pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the estimate has value 0. here is the code: y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(x1,x2) bFE <- lm(y ~ x1 + x2) bFE bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y bOLS Note: I am applying a deviation from the mean projection to the data, that is why I have some regressors with such small values. Thank you for any help! Raluca -- View this message in context: http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185.html Sent from the R help mailing list archive at Nabble.com.
peter dalgaard
2015-Mar-28 15:33 UTC
[R] Error in lm() with very small (close to zero) regressor
> On 28 Mar 2015, at 00:32 , RiGui <raluca.gui at business.uzh.ch> wrote: > > Hello everybody, > > I have encountered the following problem with lm(): > > When running lm() with a regressor close to zero - of the order e-10, the > value of the estimate is of huge absolute value , of order millions. > > However, if I write the formula of the OLS estimator, in matrix notation: > pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the > estimate has value 0. > > here is the code: > > y <- rnorm(n_obs, 10,2.89) > x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045) > x2 <- rnorm(n_obs, 10,3.21) > X <- cbind(x1,x2) > > bFE <- lm(y ~ x1 + x2) > bFE > > bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y > bOLS > > > Note: I am applying a deviation from the mean projection to the data, that > is why I have some regressors with such small values. > > Thank you for any help! > > Raluca >Example not reproducible:> y <- rnorm(n_obs, 10,2.89)Error in rnorm(n_obs, 10, 2.89) : object 'n_obs' not found> x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)Error in rnorm(n_obs, 1.235657e-14, 4.5e-17) : object 'n_obs' not found> x2 <- rnorm(n_obs, 10,3.21)Error in rnorm(n_obs, 10, 3.21) : object 'n_obs' not found> X <- cbind(x1,x2)Error in cbind(x1, x2) : object 'x1' not found> > bFE <- lm(y ~ x1 + x2)Error in eval(expr, envir, enclos) : object 'y' not found> bFEError: object 'bFE' not found> > bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% yError: could not find function "pseudoinverse"> bOLSError: object 'bOLS' not found -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Ben Bolker
2015-Mar-28 17:28 UTC
[R] Error in lm() with very small (close to zero) regressor
peter dalgaard <pdalgd <at> gmail.com> writes:> > > > On 28 Mar 2015, at 00:32 , RiGui <raluca.gui <at> business.uzh.ch> wrote: > > > > Hello everybody, > > > > I have encountered the following problem with lm(): > > > > When running lm() with a regressor close to zero - > of the order e-10, the > > value of the estimate is of huge absolute value , of order millions. > > > > However, if I write the formula of the OLS estimator, > in matrix notation: > > pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the > > estimate has value 0.How do you know this answer is "correct"?> > here is the code: > > > > y <- rnorm(n_obs, 10,2.89) > > x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045) > > x2 <- rnorm(n_obs, 10,3.21) > > X <- cbind(x1,x2) > > > > bFE <- lm(y ~ x1 + x2) > > bFE > > > > bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y > > bOLS > > > > > > Note: I am applying a deviation from the > mean projection to the data, that > > is why I have some regressors with such small values. > > > > Thank you for any help! > > > > RalucaIs there a reason you can't scale your regressors?> > > Example not reproducible: >I agree that the OP's question was not reproducible, but it's not too hard to make it reproducible. I bothered to use library("sos"); findFn("pseudoinverse") to find pseudoinverse() in corpcor: It is true that we get estimates with very large magnitudes, but their set.seed(101) n_obs <- 1000 y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(x1,x2) bFE <- lm(y ~ x1 + x2) bFE coef(summary(bFE)) Estimate Std. Error t value Pr(>|t|) (Intercept) 1.155959e+01 2.312956e+01 0.49977541 0.6173435 x1 -1.658420e+14 1.872598e+15 -0.08856254 0.9294474 x2 3.797342e-02 2.813000e-02 1.34992593 0.1773461 library("corpcor") bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y bOLS [,1] [1,] 9.807664e-16 [2,] 8.880273e-01 And if we scale the predictor: bFE2 <- lm(y ~ I(1e14*x1) + x2) coef(summary(bFE2)) Estimate Std. Error t value Pr(>|t|) (Intercept) 11.55958731 23.12956 0.49977541 0.6173435 I(1e+14 * x1) -1.65842037 18.72598 -0.08856254 0.9294474 x2 0.03797342 0.02813 1.34992593 0.1773461 bOLS stays constant. To be honest, I haven't thought about this enough to see which answer is actually correct, although I suspect the problem is in bOLS, since the numerical methods (unlike the brute-force pseudoinverse method given here) behind lm have been carefully considered for numerical stability.
I found a fix to my problem using the fastLm() from package RcppEigen, using
the Jacobi singular value decomposition (SVD) (method 4) or a method based
on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm
function
install.packages("RcppEigen")
library(RcppEigen)
n_obs <- 1500
y <- rnorm(n_obs, 10,2.89)
x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)
x2 <- rnorm(n_obs, 10,3.21)
X <- cbind(x1,x2)
bFE <- fastLm(y ~ x1 + x2, method =4)
bFE
Call:
fastLm.formula(formula = y ~ x1 + x2, method = 4)
Coefficients:
(Intercept) x1 x2
9.94832839474159414 0.00000000000012293 0.00440078989949841
Best,
Raluca
--
View this message in context:
http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html
Sent from the R help mailing list archive at Nabble.com.
William Dunlap
2015-Mar-31 16:31 UTC
[R] Error in lm() with very small (close to zero) regressor
If you really want your coefficient estimates to be scale-equivariant you
should test those methods for such a thing. E.g., here are functions that
let you check how scaling one predictor affects the estimated coefficients
- they should give the same results for any scale factor.
f <-
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n))))
{
cf <- coef(lm(data=data, Y ~ X1 + I(X2/scale)))
cf * c(1, 1, 1/scale)
}
g <-
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n))))
{
cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=4))
cf * c(1, 1, 1/scale)
}
h <-
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n))))
{
cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=5))
cf * c(1, 1, 1/scale)
}
See how they compare for scale factors between 10^-15 and 10^15. lm() is
looking pretty good.> options(digits=4)
> scale <- 10 ^ seq(-15,15,by=5)
> sapply(scale, f)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
(Intercept) -9.393 -9.393 -9.393 -9.393 -9.393 -9.393 -9.393
X1 19.955 19.955 19.955 19.955 19.955 19.955 19.955
I(X2/scale) -20.372 -20.372 -20.372 -20.372 -20.372 -20.372
-20.372> sapply(scale, g)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
(Intercept) 0.000e+00 -9.393 -9.393 -9.393 -9.393 -9.393 -3.126e+01
X1 2.772e-29 19.955 19.955 19.955 19.955 19.955 1.218e+01
I(X2/scale) 1.474e+01 -20.372 -20.372 -20.372 -20.372 -20.372
-2.892e-29> sapply(scale, h)
[,1] [,2] [,3] [,4] [,5] [,6]
[,7]
(Intercept) 0.000e+00 3.807e-20 -9.395 -9.393 -9.393 -3.126e+01
-3.126e+01
X1 2.945e-29 2.772e-19 19.954 19.955 19.955 1.218e+01
1.218e+01
I(X2/scale) 1.474e+01 1.474e+01 -20.369 -20.372 -20.372 -2.892e-19
6.596e-30
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Tue, Mar 31, 2015 at 5:10 AM, RiGui <raluca.gui at business.uzh.ch>
wrote:
> I found a fix to my problem using the fastLm() from package RcppEigen,
> using
> the Jacobi singular value decomposition (SVD) (method 4) or a method based
> on the eigenvalue-eigenvector decomposition of X'X - method 5 of the
fastLm
> function
>
>
>
> install.packages("RcppEigen")
> library(RcppEigen)
>
> n_obs <- 1500
> y <- rnorm(n_obs, 10,2.89)
> x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)
> x2 <- rnorm(n_obs, 10,3.21)
> X <- cbind(x1,x2)
>
>
>
> bFE <- fastLm(y ~ x1 + x2, method =4)
> bFE
>
> Call:
> fastLm.formula(formula = y ~ x1 + x2, method = 4)
>
> Coefficients:
> (Intercept) x1 x2
> 9.94832839474159414 0.00000000000012293 0.00440078989949841
>
>
> Best,
>
> Raluca
>
>
>
>
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]