Hello everybody, I have encountered the following problem with lm(): When running lm() with a regressor close to zero - of the order e-10, the value of the estimate is of huge absolute value , of order millions. However, if I write the formula of the OLS estimator, in matrix notation: pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the estimate has value 0. here is the code: y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(x1,x2) bFE <- lm(y ~ x1 + x2) bFE bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y bOLS Note: I am applying a deviation from the mean projection to the data, that is why I have some regressors with such small values. Thank you for any help! Raluca -- View this message in context: http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185.html Sent from the R help mailing list archive at Nabble.com.
peter dalgaard
2015-Mar-28 15:33 UTC
[R] Error in lm() with very small (close to zero) regressor
> On 28 Mar 2015, at 00:32 , RiGui <raluca.gui at business.uzh.ch> wrote: > > Hello everybody, > > I have encountered the following problem with lm(): > > When running lm() with a regressor close to zero - of the order e-10, the > value of the estimate is of huge absolute value , of order millions. > > However, if I write the formula of the OLS estimator, in matrix notation: > pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the > estimate has value 0. > > here is the code: > > y <- rnorm(n_obs, 10,2.89) > x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045) > x2 <- rnorm(n_obs, 10,3.21) > X <- cbind(x1,x2) > > bFE <- lm(y ~ x1 + x2) > bFE > > bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y > bOLS > > > Note: I am applying a deviation from the mean projection to the data, that > is why I have some regressors with such small values. > > Thank you for any help! > > Raluca >Example not reproducible:> y <- rnorm(n_obs, 10,2.89)Error in rnorm(n_obs, 10, 2.89) : object 'n_obs' not found> x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045)Error in rnorm(n_obs, 1.235657e-14, 4.5e-17) : object 'n_obs' not found> x2 <- rnorm(n_obs, 10,3.21)Error in rnorm(n_obs, 10, 3.21) : object 'n_obs' not found> X <- cbind(x1,x2)Error in cbind(x1, x2) : object 'x1' not found> > bFE <- lm(y ~ x1 + x2)Error in eval(expr, envir, enclos) : object 'y' not found> bFEError: object 'bFE' not found> > bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% yError: could not find function "pseudoinverse"> bOLSError: object 'bOLS' not found -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Ben Bolker
2015-Mar-28 17:28 UTC
[R] Error in lm() with very small (close to zero) regressor
peter dalgaard <pdalgd <at> gmail.com> writes:> > > > On 28 Mar 2015, at 00:32 , RiGui <raluca.gui <at> business.uzh.ch> wrote: > > > > Hello everybody, > > > > I have encountered the following problem with lm(): > > > > When running lm() with a regressor close to zero - > of the order e-10, the > > value of the estimate is of huge absolute value , of order millions. > > > > However, if I write the formula of the OLS estimator, > in matrix notation: > > pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the > > estimate has value 0.How do you know this answer is "correct"?> > here is the code: > > > > y <- rnorm(n_obs, 10,2.89) > > x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045) > > x2 <- rnorm(n_obs, 10,3.21) > > X <- cbind(x1,x2) > > > > bFE <- lm(y ~ x1 + x2) > > bFE > > > > bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y > > bOLS > > > > > > Note: I am applying a deviation from the > mean projection to the data, that > > is why I have some regressors with such small values. > > > > Thank you for any help! > > > > RalucaIs there a reason you can't scale your regressors?> > > Example not reproducible: >I agree that the OP's question was not reproducible, but it's not too hard to make it reproducible. I bothered to use library("sos"); findFn("pseudoinverse") to find pseudoinverse() in corpcor: It is true that we get estimates with very large magnitudes, but their set.seed(101) n_obs <- 1000 y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(x1,x2) bFE <- lm(y ~ x1 + x2) bFE coef(summary(bFE)) Estimate Std. Error t value Pr(>|t|) (Intercept) 1.155959e+01 2.312956e+01 0.49977541 0.6173435 x1 -1.658420e+14 1.872598e+15 -0.08856254 0.9294474 x2 3.797342e-02 2.813000e-02 1.34992593 0.1773461 library("corpcor") bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y bOLS [,1] [1,] 9.807664e-16 [2,] 8.880273e-01 And if we scale the predictor: bFE2 <- lm(y ~ I(1e14*x1) + x2) coef(summary(bFE2)) Estimate Std. Error t value Pr(>|t|) (Intercept) 11.55958731 23.12956 0.49977541 0.6173435 I(1e+14 * x1) -1.65842037 18.72598 -0.08856254 0.9294474 x2 0.03797342 0.02813 1.34992593 0.1773461 bOLS stays constant. To be honest, I haven't thought about this enough to see which answer is actually correct, although I suspect the problem is in bOLS, since the numerical methods (unlike the brute-force pseudoinverse method given here) behind lm have been carefully considered for numerical stability.
I found a fix to my problem using the fastLm() from package RcppEigen, using the Jacobi singular value decomposition (SVD) (method 4) or a method based on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm function install.packages("RcppEigen") library(RcppEigen) n_obs <- 1500 y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(x1,x2) bFE <- fastLm(y ~ x1 + x2, method =4) bFE Call: fastLm.formula(formula = y ~ x1 + x2, method = 4) Coefficients: (Intercept) x1 x2 9.94832839474159414 0.00000000000012293 0.00440078989949841 Best, Raluca -- View this message in context: http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html Sent from the R help mailing list archive at Nabble.com.
William Dunlap
2015-Mar-31 16:31 UTC
[R] Error in lm() with very small (close to zero) regressor
If you really want your coefficient estimates to be scale-equivariant you should test those methods for such a thing. E.g., here are functions that let you check how scaling one predictor affects the estimated coefficients - they should give the same results for any scale factor. f <- function (scale=1, n=100, data=data.frame(Y=seq_len(n), X1=sqrt(seq_len(n)), X2=log(seq_len(n)))) { cf <- coef(lm(data=data, Y ~ X1 + I(X2/scale))) cf * c(1, 1, 1/scale) } g <- function (scale=1, n=100, data=data.frame(Y=seq_len(n), X1=sqrt(seq_len(n)), X2=log(seq_len(n)))) { cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=4)) cf * c(1, 1, 1/scale) } h <- function (scale=1, n=100, data=data.frame(Y=seq_len(n), X1=sqrt(seq_len(n)), X2=log(seq_len(n)))) { cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=5)) cf * c(1, 1, 1/scale) } See how they compare for scale factors between 10^-15 and 10^15. lm() is looking pretty good.> options(digits=4) > scale <- 10 ^ seq(-15,15,by=5) > sapply(scale, f)[,1] [,2] [,3] [,4] [,5] [,6] [,7] (Intercept) -9.393 -9.393 -9.393 -9.393 -9.393 -9.393 -9.393 X1 19.955 19.955 19.955 19.955 19.955 19.955 19.955 I(X2/scale) -20.372 -20.372 -20.372 -20.372 -20.372 -20.372 -20.372> sapply(scale, g)[,1] [,2] [,3] [,4] [,5] [,6] [,7] (Intercept) 0.000e+00 -9.393 -9.393 -9.393 -9.393 -9.393 -3.126e+01 X1 2.772e-29 19.955 19.955 19.955 19.955 19.955 1.218e+01 I(X2/scale) 1.474e+01 -20.372 -20.372 -20.372 -20.372 -20.372 -2.892e-29> sapply(scale, h)[,1] [,2] [,3] [,4] [,5] [,6] [,7] (Intercept) 0.000e+00 3.807e-20 -9.395 -9.393 -9.393 -3.126e+01 -3.126e+01 X1 2.945e-29 2.772e-19 19.954 19.955 19.955 1.218e+01 1.218e+01 I(X2/scale) 1.474e+01 1.474e+01 -20.369 -20.372 -20.372 -2.892e-19 6.596e-30 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Mar 31, 2015 at 5:10 AM, RiGui <raluca.gui at business.uzh.ch> wrote:> I found a fix to my problem using the fastLm() from package RcppEigen, > using > the Jacobi singular value decomposition (SVD) (method 4) or a method based > on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm > function > > > > install.packages("RcppEigen") > library(RcppEigen) > > n_obs <- 1500 > y <- rnorm(n_obs, 10,2.89) > x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045) > x2 <- rnorm(n_obs, 10,3.21) > X <- cbind(x1,x2) > > > > bFE <- fastLm(y ~ x1 + x2, method =4) > bFE > > Call: > fastLm.formula(formula = y ~ x1 + x2, method = 4) > > Coefficients: > (Intercept) x1 x2 > 9.94832839474159414 0.00000000000012293 0.00440078989949841 > > > Best, > > Raluca > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]