Paul Johnson
2013-Apr-01 17:20 UTC
[R] example to demonstrate benefits of poly in regression?
Here's my little discussion example for a quadratic regression: http://pj.freefaculty.org/R/WorkingExamples/regression-quadratic-1.R Students press me to know the benefits of poly() over the more obvious regression formulas. I think I understand the theory on why poly() should be more numerically stable, but I'm having trouble writing down an example that proves the benefit of this. I am beginning to suspect that because R's lm uses a QR decomposition under the hood, the instability that we used to see when we inverted (X'X) is no longer so serious as it used to be. When the columns in X are x1 x2 x3 x3squared then the QR decomposition is doing about the same thing as we would do if we replaced x3 and x3squared by the poly(x3, 2) results. Or not. I'm not arguing that, I'm thinking out loud, hoping you'll correct me. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]]
Gabor Grothendieck
2013-Apr-01 17:42 UTC
[R] example to demonstrate benefits of poly in regression?
On Mon, Apr 1, 2013 at 1:20 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:> Here's my little discussion example for a quadratic regression: > > http://pj.freefaculty.org/R/WorkingExamples/regression-quadratic-1.R > > Students press me to know the benefits of poly() over the more obvious > regression formulas. > > I think I understand the theory on why poly() should be more numerically > stable, but I'm having trouble writing down an example that proves the > benefit of this. > > I am beginning to suspect that because R's lm uses a QR decomposition under > the hood, the instability that we used to see when we inverted (X'X) is no > longer so serious as it used to be. When the columns in X are > > x1 x2 x3 x3squared > > then the QR decomposition is doing about the same thing as we would do if > we replaced x3 and x3squared by the poly(x3, 2) results. > > Or not. I'm not arguing that, I'm thinking out loud, hoping you'll correct > me. ># 1. One benefit is if you fit a higher degree polynomial using poly(x, n) the lower degree coefficients will agree with the fit using a lower n. # For example, compare these two: set.seed(123) y <- rnorm(10); x <- 1:10 lm(y ~ poly(x, 2)) lm(y ~ poly(x, 3)) # however, that is not true if you don't use orthogonal polynomials. Compare these two: lm(y ~ poly(x, 2, raw = TRUE)) lm(y ~ poly(x, 3, raw = TRUE)) # 2. A second advantage is that the t statistics are invariant under shift if you use orthogonal polynomials # compare these two: summary(lm(y ~ poly(x, 2))) summary(lm(y ~ poly(x+1, 2))) # however, that is not true if you don;t use orthogonal polynomials, Compare these two: summary(lm(y ~ poly(x, 2, raw = TRUE))) summary(lm(y ~ poly(x+1, 2, raw = TRUE))) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
William Dunlap
2013-Apr-01 18:09 UTC
[R] example to demonstrate benefits of poly in regression?
Here is an example of the numerical instability that poly() can help with. Imagine a 100-minute experiment where you chose to record the times as POSIXt objects (stored as seconds since 1970). Using poly() gives the right predicted values, but using I(x^2) does not:> d <- data.frame(Date=as.POSIXct("2013-04-01")+60*(1:100), yQuadratic=(function(x)5*x-4*x^2)( (1:100)/50 )) > d$NumDate <- as.numeric(d$Date) > newD <- d[c(10, 80, 95),] > cbind(newD, PQuadratic=predict(lm(yQuadratic~NumDate+I(NumDate^2),data=d),newdata=newD))Date yQuadratic NumDate PQuadratic 10 2013-04-01 00:10:00 0.84 1364800200 2.1312 80 2013-04-01 01:20:00 -2.24 1364804400 -2.1808 95 2013-04-01 01:35:00 -4.94 1364805300 -3.1048 Warning message: In predict.lm(lm(yQuadratic ~ NumDate + I(NumDate^2), data = d), : prediction from a rank-deficient fit may be misleading> cbind(newD, PQuadratic=predict(lm(yQuadratic~poly(NumDate,2),data=d),newdata=newD))Date yQuadratic NumDate PQuadratic 10 2013-04-01 00:10:00 0.84 1364800200 0.84 80 2013-04-01 01:20:00 -2.24 1364804400 -2.24 95 2013-04-01 01:35:00 -4.94 1364805300 -4.94 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Paul Johnson > Sent: Monday, April 01, 2013 10:21 AM > To: R-help > Subject: [R] example to demonstrate benefits of poly in regression? > > Here's my little discussion example for a quadratic regression: > > http://pj.freefaculty.org/R/WorkingExamples/regression-quadratic-1.R > > Students press me to know the benefits of poly() over the more obvious > regression formulas. > > I think I understand the theory on why poly() should be more numerically > stable, but I'm having trouble writing down an example that proves the > benefit of this. > > I am beginning to suspect that because R's lm uses a QR decomposition under > the hood, the instability that we used to see when we inverted (X'X) is no > longer so serious as it used to be. When the columns in X are > > x1 x2 x3 x3squared > > then the QR decomposition is doing about the same thing as we would do if > we replaced x3 and x3squared by the poly(x3, 2) results. > > Or not. I'm not arguing that, I'm thinking out loud, hoping you'll correct > me. > > pj > > > -- > Paul E. Johnson > Professor, Political Science Assoc. Director > 1541 Lilac Lane, Room 504 Center for Research Methods > University of Kansas University of Kansas > http://pj.freefaculty.org http://quant.ku.edu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.