Tal Galili
2009-Sep-13 17:19 UTC
[R] How can I get "predict.lm" results with manual calculations ? (a floating point problem)
Hello dear r-help group I am turning for you for help with FAQ number 7.31: "Why doesn't R think these numbers are equal?" http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f *My story* is this: I wish to run many lm predictions and need to have them run fast. Using predict.lm is relatively slow, so I tried having it run faster by doing the prediction calculations manually. But doing that gave me problematic results (I won't go into the details of how I found that out). I then discovered that the problem was that the manual calculations I used for the lm predictions yielded different results than that of predict.lm, *here is an example*: predict.lm.diff.from.manual.compute <- function(sample.size = 100) { x <- rnorm(sample.size) y <- x + rnorm(sample.size) new <- data.frame(x = seq(-3, 3, length.out = sample.size)) aa <- lm(y ~ x) predict.lm.result <- sum(predict(aa, new, se.fit = F)) manual.lm.compute.result <- sum(aa$coeff[1]+ new * aa$coeff[2]) # manual.lm.compute.result == predict.lm.result return(all.equal(manual.lm.compute.result , predict.lm.result, tol=0)) } # and here are the results of running the code several times:> predict.lm.diff.from.manual.compute(100)[1] "Mean relative difference: 1.046407e-15"> predict.lm.diff.from.manual.compute(1000)[1] "Mean relative difference: 4.113951e-16"> predict.lm.diff.from.manual.compute(10000)[1] "Mean relative difference: 2.047455e-14"> predict.lm.diff.from.manual.compute(100000)[1] "Mean relative difference: 1.294251e-14"> predict.lm.diff.from.manual.compute(1000000)[1] "Mean relative difference: 5.508314e-13">And that leaves me with *the question*: Can I reproduce more accurate results from the manual calculations (as the ones I might have gotten from predict.lm) ? Maybe some parameter to increase the precision of the computation ? Many thanks, Tal -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]]
David Winsemius
2009-Sep-13 17:34 UTC
[R] How can I get "predict.lm" results with manual calculations ? (a floating point problem)
On Sep 13, 2009, at 1:19 PM, Tal Galili wrote:> Hello dear r-help group > > I am turning for you for help with FAQ number 7.31: "Why doesn't R > think > these numbers are equal?" > http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f > > > *My story* is this: > I wish to run many lm predictions and need to have them run fast. > Using predict.lm is relatively slow, so I tried having it run faster > by > doing the prediction calculations manually. > But doing that gave me problematic results (I won't go into the > details of > how I found that out). > > I then discovered that the problem was that the manual calculations > I used > for the lm predictions yielded different results than that of > predict.lm, > > *here is an example*: > > predict.lm.diff.from.manual.compute <- function(sample.size = 100) > { > x <- rnorm(sample.size) > y <- x + rnorm(sample.size) > new <- data.frame(x = seq(-3, 3, length.out = sample.size)) > aa <- lm(y ~ x) > > predict.lm.result <- sum(predict(aa, new, se.fit = F)) > manual.lm.compute.result <- sum(aa$coeff[1]+ new * aa$coeff[2]) > > # manual.lm.compute.result == predict.lm.result > return(all.equal(manual.lm.compute.result , predict.lm.result, tol=0)) > } > > # and here are the results of running the code several times: > >> predict.lm.diff.from.manual.compute(100) > [1] "Mean relative difference: 1.046407e-15" >> predict.lm.diff.from.manual.compute(1000) > [1] "Mean relative difference: 4.113951e-16" >> predict.lm.diff.from.manual.compute(10000) > [1] "Mean relative difference: 2.047455e-14" >> predict.lm.diff.from.manual.compute(100000) > [1] "Mean relative difference: 1.294251e-14" >> predict.lm.diff.from.manual.compute(1000000) > [1] "Mean relative difference: 5.508314e-13"Are these meaningfully different? You are using different datasets and didn't even set a seed for your RNG. They seem really quite close. Why should we care? > .Machine$double.eps [1] 2.220446e-16>> > And that leaves me with *the question*: > Can I reproduce more accurate results from the manual calculations > (as the > ones I might have gotten from predict.lm) ? > Maybe some parameter to increase the precision of the computation ? > > Many thanks, > Tal-- David Winsemius, MD Heritage Laboratories West Hartford, CT
Tony Plate
2009-Sep-14 17:31 UTC
[R] How can I get "predict.lm" results with manual calculations ? (a floating point problem)
Results can be slightly different when matrix algebra routines are called. Here's your example again. When the prediction is computed directly using matrix multiplication, the result is the same as 'predict' produces (at least in this case.)> set.seed(1) > n <- 100 > x <- rnorm(n) > y <- rnorm(n) > aa <- lm(y ~ x) > all.equal(as.numeric(predict(aa, new)), as.numeric(aa$coef[1] + aa$coef[2] * new$x), tol=0)[1] "Mean relative difference: 1.840916e-16"> all.equal(as.numeric(predict(aa, new)), as.numeric(cbind(1, new$x) %*% aa$coef), tol=0)[1] TRUE>These types of small differences are often not indicative of lower precision in one method, but rather just random floating-point inaccuracies that can depend on things like the order numbers are summed in (e.g., ((bigNegNum + bigPosNum) + smallPosNum) will often be slightly different to ((bigPosNum + smallPosNum) + bigNegNum). They can also depend on whether intermediate results are kept in CPU registers, which sometimes have higher precision than 64 bits. Usually, they're nothing to worry about, which is one of the major reasons that all.equal() has a non-zero default for the tol= argument. -- Tony Plate Tal Galili wrote:> Hello dear r-help group > > I am turning for you for help with FAQ number 7.31: "Why doesn't R think > these numbers are equal?" > http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f > > > *My story* is this: > I wish to run many lm predictions and need to have them run fast. > Using predict.lm is relatively slow, so I tried having it run faster by > doing the prediction calculations manually. > But doing that gave me problematic results (I won't go into the details of > how I found that out). > > I then discovered that the problem was that the manual calculations I used > for the lm predictions yielded different results than that of predict.lm, > > *here is an example*: > > predict.lm.diff.from.manual.compute <- function(sample.size = 100) > { > x <- rnorm(sample.size) > y <- x + rnorm(sample.size) > new <- data.frame(x = seq(-3, 3, length.out = sample.size)) > aa <- lm(y ~ x) > > predict.lm.result <- sum(predict(aa, new, se.fit = F)) > manual.lm.compute.result <- sum(aa$coeff[1]+ new * aa$coeff[2]) > > # manual.lm.compute.result == predict.lm.result > return(all.equal(manual.lm.compute.result , predict.lm.result, tol=0)) > } > > # and here are the results of running the code several times: > >> predict.lm.diff.from.manual.compute(100) > [1] "Mean relative difference: 1.046407e-15" >> predict.lm.diff.from.manual.compute(1000) > [1] "Mean relative difference: 4.113951e-16" >> predict.lm.diff.from.manual.compute(10000) > [1] "Mean relative difference: 2.047455e-14" >> predict.lm.diff.from.manual.compute(100000) > [1] "Mean relative difference: 1.294251e-14" >> predict.lm.diff.from.manual.compute(1000000) > [1] "Mean relative difference: 5.508314e-13" > > > > And that leaves me with *the question*: > Can I reproduce more accurate results from the manual calculations (as the > ones I might have gotten from predict.lm) ? > Maybe some parameter to increase the precision of the computation ? > > Many thanks, > Tal > > > > > > > > >