Dear All, I have created a time trend by doing x<-1:93 because I have a time series with 93 data points. Next I did :- y = lm(series ~ poly(x,4))$residuals to detrend series. I choose this 4 as the order of my polynomial using cross validation/ checking the absence of trend in the residuals so I think I have not overfit this series. I wish to document the formula of poly(x,4). I am not able to find it in ?poly Can someone please tell me what the formula for the orthogonal polynomial used by R is ? Thank you, Ashim [[alternative HTML version deleted]]
You could get answer quickly by searching net. https://stackoverflow.com/questions/39031172/how-poly-generates-orthogonal-p olynomials-how-to-understand-the-coefs-ret/39051154#39051154 Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim Kapoor > Sent: Wednesday, November 27, 2019 12:55 PM > To: R Help <r-help at r-project.org> > Subject: [R] Orthogonal polynomials used by R > > Dear All, > > I have created a time trend by doing x<-1:93 because I have a time series > with 93 data points. Next I did :- > > y = lm(series ~ poly(x,4))$residuals > > to detrend series. > > I choose this 4 as the order of my polynomial using cross validation/ > checking the absence of trend in the residuals so I think I have notoverfit> this series. > > I wish to document the formula of poly(x,4). I am not able to find it in?poly> > Can someone please tell me what the formula for the orthogonal > polynomial used by R is ? > > Thank you, > Ashim > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear Petr, Many thanks for the quick response. I also read this:- https://en.wikipedia.org/wiki/Discrete_orthogonal_polynomials Also I read in ?poly:- The orthogonal polynomial is summarized by the coefficients, which can be used to evaluate it via the three-term recursion given in Kennedy & Gentle (1980, pp. 343-4), and used in the ?predict? part of the code. I don't have access to the mentioned book. Out of curiosity, what is the name of the discrete orthogonal polynomial used by R ? What discrete measure is it orthogonal with respect to ? Many thanks, Ashim On Wed, Nov 27, 2019 at 6:11 PM PIKAL Petr <petr.pikal at precheza.cz> wrote:> You could get answer quickly by searching net. > > > https://stackoverflow.com/questions/39031172/how-poly-generates-orthogonal-p > olynomials-how-to-understand-the-coefs-ret/39051154#39051154 > <https://stackoverflow.com/questions/39031172/how-poly-generates-orthogonal-polynomials-how-to-understand-the-coefs-ret/39051154#39051154> > > Cheers > Petr > > > -----Original Message----- > > From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim Kapoor > > Sent: Wednesday, November 27, 2019 12:55 PM > > To: R Help <r-help at r-project.org> > > Subject: [R] Orthogonal polynomials used by R > > > > Dear All, > > > > I have created a time trend by doing x<-1:93 because I have a time series > > with 93 data points. Next I did :- > > > > y = lm(series ~ poly(x,4))$residuals > > > > to detrend series. > > > > I choose this 4 as the order of my polynomial using cross validation/ > > checking the absence of trend in the residuals so I think I have not > overfit > > this series. > > > > I wish to document the formula of poly(x,4). I am not able to find it in > ?poly > > > > Can someone please tell me what the formula for the orthogonal > > polynomial used by R is ? > > > > Thank you, > > Ashim > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dear Ashim, Orthogonal polynomials are used because they tend to produce more accurate numerical computations, not because their coefficients are interpretable, so I wonder why you're interested in the coefficients. The regressors produced are orthogonal to the constant regressor and are orthogonal to each other (and in fact are orthonormal), as it's simple to demonstrate: ------- snip -------> x <- 1:93 > y <- 1 + x + x^2 + x^3 + x^4 + rnorm(93) > (m <- lm(y ~ poly(x, 4)))Call: lm(formula = y ~ poly(x, 4)) Coefficients: (Intercept) poly(x, 4)1 poly(x, 4)2 poly(x, 4)3 poly(x, 4)4 15574516 172715069 94769949 27683528 3429259> X <- model.matrix(m) > head(X)(Intercept) poly(x, 4)1 poly(x, 4)2 poly(x, 4)3 poly(x, 4)4 1 1 -0.1776843 0.2245083 -0.2572066 0.27935949 2 1 -0.1738216 0.2098665 -0.2236579 0.21862917 3 1 -0.1699589 0.1955464 -0.1919525 0.16390514 4 1 -0.1660962 0.1815482 -0.1620496 0.11487597 5 1 -0.1622335 0.1678717 -0.1339080 0.07123722 6 1 -0.1583708 0.1545171 -0.1074869 0.03269145> zapsmall(crossprod(X))# X'X(Intercept) poly(x, 4)1 poly(x, 4)2 poly(x, 4)3 poly(x, 4)4 (Intercept) 93 0 0 0 0 poly(x, 4)1 0 1 0 0 0 poly(x, 4)2 0 0 1 0 0 poly(x, 4)3 0 0 0 1 0 poly(x, 4)4 0 0 0 0 1 ------- snip ------- If for some not immediately obvious reason you're interested in the regression coefficients, why not just use a "raw" polynomial: ------- snip -------> (m1 <- lm(y ~ poly(x, 4, raw=TRUE)))Call: lm(formula = y ~ poly(x, 4, raw = TRUE)) Coefficients: (Intercept) poly(x, 4, raw = TRUE)1 poly(x, 4, raw = TRUE)2 poly(x, 4, raw = TRUE)3 1.5640 0.8985 1.0037 1.0000 poly(x, 4, raw = TRUE)4 1.0000 ------- snip ------- These coefficients are simply interpretable but the model matrix is more poorly conditioned: ------- snip -------> head(X1)(Intercept) poly(x, 4, raw = TRUE)1 poly(x, 4, raw = TRUE)2 poly(x, 4, raw = TRUE)3 1 1 1 1 1 2 1 2 4 8 3 1 3 9 27 4 1 4 16 64 5 1 5 25 125 6 1 6 36 216 poly(x, 4, raw = TRUE)4 1 1 2 16 3 81 4 256 5 625 6 1296> round(cor(X1[, -1]), 2)poly(x, 4, raw = TRUE)1 poly(x, 4, raw = TRUE)2 poly(x, 4, raw = TRUE)3 poly(x, 4, raw = TRUE)1 1.00 0.97 0.92 poly(x, 4, raw = TRUE)2 0.97 1.00 0.99 poly(x, 4, raw = TRUE)3 0.92 0.99 1.00 poly(x, 4, raw = TRUE)4 0.87 0.96 0.99 poly(x, 4, raw = TRUE)4 poly(x, 4, raw = TRUE)1 0.87 poly(x, 4, raw = TRUE)2 0.96 poly(x, 4, raw = TRUE)3 0.99 poly(x, 4, raw = TRUE)4 1.00 ------- snip ------- The two parametrizations are equivalent, however, in that they represent the same regression surface, and so, e.g., produce the same fitted values: ------- snip -------> all.equal(fitted(m), fitted(m1))[1] TRUE ------- snip ------- Because one is usually not interested in the individual coefficients of a polynomial there usually isn't a reason to prefer one parametrization to the other on the grounds of interpretability, so why do you need to interpret the regression equation? I hope this helps, John ----------------------------- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: http::/socserv.mcmaster.ca/jfox> On Nov 27, 2019, at 10:17 AM, Ashim Kapoor <ashimkapoor at gmail.com> wrote: > > Dear Petr, > > Many thanks for the quick response. > > I also read this:- > https://en.wikipedia.org/wiki/Discrete_orthogonal_polynomials > > Also I read in ?poly:- > The orthogonal polynomial is summarized by the coefficients, which > can be used to evaluate it via the three-term recursion given in > Kennedy & Gentle (1980, pp. 343-4), and used in the ?predict? part > of the code. > > I don't have access to the mentioned book. > > Out of curiosity, what is the name of the discrete orthogonal polynomial > used by R ? > What discrete measure is it orthogonal with respect to ? > > Many thanks, > Ashim > > > > > On Wed, Nov 27, 2019 at 6:11 PM PIKAL Petr <petr.pikal at precheza.cz> wrote: > >> You could get answer quickly by searching net. >> >> >> https://stackoverflow.com/questions/39031172/how-poly-generates-orthogonal-p >> olynomials-how-to-understand-the-coefs-ret/39051154#39051154 >> <https://stackoverflow.com/questions/39031172/how-poly-generates-orthogonal-polynomials-how-to-understand-the-coefs-ret/39051154#39051154> >> >> Cheers >> Petr >> >>> -----Original Message----- >>> From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim Kapoor >>> Sent: Wednesday, November 27, 2019 12:55 PM >>> To: R Help <r-help at r-project.org> >>> Subject: [R] Orthogonal polynomials used by R >>> >>> Dear All, >>> >>> I have created a time trend by doing x<-1:93 because I have a time series >>> with 93 data points. Next I did :- >>> >>> y = lm(series ~ poly(x,4))$residuals >>> >>> to detrend series. >>> >>> I choose this 4 as the order of my polynomial using cross validation/ >>> checking the absence of trend in the residuals so I think I have not >> overfit >>> this series. >>> >>> I wish to document the formula of poly(x,4). I am not able to find it in >> ?poly >>> >>> Can someone please tell me what the formula for the orthogonal >>> polynomial used by R is ? >>> >>> Thank you, >>> Ashim >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting- >>> guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear Peter and John, Many thanks for your prompt replies. Here is what I was trying to do. I was trying to build a statistical model of a given time series using Box Jenkins methodology. The series has 93 data points. Before I analyse the ACF and PACF, I am required to de-trend the series. The series seems to have an upward trend. I wanted to find out what order polynomial should I fit the series without overfitting. For this I want to use orthogonal polynomials(I think someone on the internet was talking about preventing overfitting by using orthogonal polynomials) . This seems to me as a poor man's cross validation. So my plan is to keep increasing the degree of the orthogonal polynomials till the coefficient of the last orthogonal polynomial becomes insignificant. Note : If I do NOT use orthogonal polynomials, I will overfit the data set and I don't think that is a good way to detect the true order of the polynomial. Also now that I have detrended the series and built an ARIMA model of the residuals, now I want to forecast. For this I need to use the original polynomials and their coefficients. I hope I was clear and that my methodology is ok. I have another query here :- Note : If I used cross-validation to determine the order of the polynomial, I don't get a clear answer. See here :- library(boot) mydf = data.frame(cbind(gdp,x)) d<-(c( cv.glm(data = mydf,glm(gdp~x),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~poly(x,2)),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~poly(x,3)),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~poly(x,4)),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~poly(x,5)),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~poly(x,6)),K=10)$delta[1])) print(d) ## [1] 2.178574e+13 7.303031e+11 5.994783e+11 4.943586e+11 4.596648e+11 ## [6] 4.980159e+11 # Here it chooses 5. (but 4 and 5 are kind of similar). d1 <- (c( cv.glm(data = mydf,glm(gdp~1+x),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~1+x+x^2),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~1+x+x^2+x^3),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~1+x+x^2+x^3+x^4),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~1+x+x^2+x^3+x^4+x^5),K=10)$delta[1], cv.glm(data = mydf,glm(gdp~1+x+x^2+x^3+x^4+x^5+x^6),K=10)$delta[1])) print(d1) ## [1] 2.149647e+13 2.253999e+13 2.182175e+13 2.177170e+13 2.198675e+13 ## [6] 2.145754e+13 # here it chooses 1 or 6 Query : Why does it choose 1? Notice : Is this just round off noise / noise due to sampling error created by Cross Validation when it creates the K folds? Is this due to the ill conditioned model matrix? Best Regards, Ashim. On Wed, Nov 27, 2019 at 10:41 PM Fox, John <jfox at mcmaster.ca> wrote:> Dear Ashim, > > Orthogonal polynomials are used because they tend to produce more accurate > numerical computations, not because their coefficients are interpretable, > so I wonder why you're interested in the coefficients. > > The regressors produced are orthogonal to the constant regressor and are > orthogonal to each other (and in fact are orthonormal), as it's simple to > demonstrate: > > ------- snip ------- > > > x <- 1:93 > > y <- 1 + x + x^2 + x^3 + x^4 + rnorm(93) > > (m <- lm(y ~ poly(x, 4))) > > Call: > lm(formula = y ~ poly(x, 4)) > > Coefficients: > (Intercept) poly(x, 4)1 poly(x, 4)2 poly(x, 4)3 poly(x, 4)4 > 15574516 172715069 94769949 27683528 3429259 > > > X <- model.matrix(m) > > head(X) > (Intercept) poly(x, 4)1 poly(x, 4)2 poly(x, 4)3 poly(x, 4)4 > 1 1 -0.1776843 0.2245083 -0.2572066 0.27935949 > 2 1 -0.1738216 0.2098665 -0.2236579 0.21862917 > 3 1 -0.1699589 0.1955464 -0.1919525 0.16390514 > 4 1 -0.1660962 0.1815482 -0.1620496 0.11487597 > 5 1 -0.1622335 0.1678717 -0.1339080 0.07123722 > 6 1 -0.1583708 0.1545171 -0.1074869 0.03269145 > > > zapsmall(crossprod(X))# X'X > (Intercept) poly(x, 4)1 poly(x, 4)2 poly(x, 4)3 poly(x, 4)4 > (Intercept) 93 0 0 0 0 > poly(x, 4)1 0 1 0 0 0 > poly(x, 4)2 0 0 1 0 0 > poly(x, 4)3 0 0 0 1 0 > poly(x, 4)4 0 0 0 0 1 > > ------- snip ------- > > If for some not immediately obvious reason you're interested in the > regression coefficients, why not just use a "raw" polynomial: > > ------- snip ------- > > > (m1 <- lm(y ~ poly(x, 4, raw=TRUE))) > > Call: > lm(formula = y ~ poly(x, 4, raw = TRUE)) > > Coefficients: > (Intercept) poly(x, 4, raw = TRUE)1 poly(x, 4, raw = TRUE)2 > poly(x, 4, raw = TRUE)3 > 1.5640 0.8985 1.0037 > 1.0000 > poly(x, 4, raw = TRUE)4 > 1.0000 > > ------- snip ------- > > These coefficients are simply interpretable but the model matrix is more > poorly conditioned: > > ------- snip ------- > > > head(X1) > (Intercept) poly(x, 4, raw = TRUE)1 poly(x, 4, raw = TRUE)2 poly(x, 4, > raw = TRUE)3 > 1 1 1 1 > 1 > 2 1 2 4 > 8 > 3 1 3 9 > 27 > 4 1 4 16 > 64 > 5 1 5 25 > 125 > 6 1 6 36 > 216 > poly(x, 4, raw = TRUE)4 > 1 1 > 2 16 > 3 81 > 4 256 > 5 625 > 6 1296 > > round(cor(X1[, -1]), 2) > poly(x, 4, raw = TRUE)1 poly(x, 4, raw = TRUE)2 > poly(x, 4, raw = TRUE)3 > poly(x, 4, raw = TRUE)1 1.00 0.97 > 0.92 > poly(x, 4, raw = TRUE)2 0.97 1.00 > 0.99 > poly(x, 4, raw = TRUE)3 0.92 0.99 > 1.00 > poly(x, 4, raw = TRUE)4 0.87 0.96 > 0.99 > poly(x, 4, raw = TRUE)4 > poly(x, 4, raw = TRUE)1 0.87 > poly(x, 4, raw = TRUE)2 0.96 > poly(x, 4, raw = TRUE)3 0.99 > poly(x, 4, raw = TRUE)4 1.00 > > ------- snip ------- > > The two parametrizations are equivalent, however, in that they represent > the same regression surface, and so, e.g., produce the same fitted values: > > ------- snip ------- > > > all.equal(fitted(m), fitted(m1)) > [1] TRUE > > ------- snip ------- > > Because one is usually not interested in the individual coefficients of a > polynomial there usually isn't a reason to prefer one parametrization to > the other on the grounds of interpretability, so why do you need to > interpret the regression equation? > > I hope this helps, > John > > ----------------------------- > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > Web: http::/socserv.mcmaster.ca/jfox > > > On Nov 27, 2019, at 10:17 AM, Ashim Kapoor <ashimkapoor at gmail.com> > wrote: > > > > Dear Petr, > > > > Many thanks for the quick response. > > > > I also read this:- > > https://en.wikipedia.org/wiki/Discrete_orthogonal_polynomials > > > > Also I read in ?poly:- > > The orthogonal polynomial is summarized by the coefficients, which > > can be used to evaluate it via the three-term recursion given in > > Kennedy & Gentle (1980, pp. 343-4), and used in the ?predict? part > > of the code. > > > > I don't have access to the mentioned book. > > > > Out of curiosity, what is the name of the discrete orthogonal polynomial > > used by R ? > > What discrete measure is it orthogonal with respect to ? > > > > Many thanks, > > Ashim > > > > > > > > > > On Wed, Nov 27, 2019 at 6:11 PM PIKAL Petr <petr.pikal at precheza.cz> > wrote: > > > >> You could get answer quickly by searching net. > >> > >> > >> > https://stackoverflow.com/questions/39031172/how-poly-generates-orthogonal-p > >> olynomials-how-to-understand-the-coefs-ret/39051154#39051154 > >> < > https://stackoverflow.com/questions/39031172/how-poly-generates-orthogonal-polynomials-how-to-understand-the-coefs-ret/39051154#39051154 > > > >> > >> Cheers > >> Petr > >> > >>> -----Original Message----- > >>> From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim Kapoor > >>> Sent: Wednesday, November 27, 2019 12:55 PM > >>> To: R Help <r-help at r-project.org> > >>> Subject: [R] Orthogonal polynomials used by R > >>> > >>> Dear All, > >>> > >>> I have created a time trend by doing x<-1:93 because I have a time > series > >>> with 93 data points. Next I did :- > >>> > >>> y = lm(series ~ poly(x,4))$residuals > >>> > >>> to detrend series. > >>> > >>> I choose this 4 as the order of my polynomial using cross validation/ > >>> checking the absence of trend in the residuals so I think I have not > >> overfit > >>> this series. > >>> > >>> I wish to document the formula of poly(x,4). I am not able to find it > in > >> ?poly > >>> > >>> Can someone please tell me what the formula for the orthogonal > >>> polynomial used by R is ? > >>> > >>> Thank you, > >>> Ashim > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide http://www.R-project.org/posting- > >>> guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]