Martin Maechler
2015-Jul-17 16:00 UTC
[Rd] Improvements (?) in stats::poly and stats::polym.
Dear Keith,>>>>> <Keith.Jewell at campdenbri.co.uk> >>>>> on Thu, 16 Jul 2015 08:58:11 +0000 writes:> Dear R Core Team, > Last week I made a post to the R-help mailing list > ?predict.poly for multivariate data? > <https://stat.ethz.ch/pipermail/r-help/2015-July/430311.html> > but it has had no responses so I?m sending this to the > email address of package:stats maintainer. Please feel > free to tell me that this is inappropriate. Asking R Core in your case is ok ... { though still slightly "sub optimal" (but *not* "inappropriate"!): Ideallly you'd have followed the posting guide (http://www.r-project.org/posting-guide.html) here, namely to send your original post to R-devel instead of R-help. Then it would have been noticed by me and most probably several other R core members ... } > IMHO the reproducible code I presented in that post: > ############# > library(datasets) > alm <- lm(stack.loss ~ poly(Air.Flow, Water.Temp, degree=3), stackloss) > alm$fitted.values[1:10] # "correct" prediction values [1:10] > predict(alm, stackloss)[1:10] # gives correct values > predict(alm, stackloss[1:10,]) # gives wrong values > ######### > ... clearly demonstrates something wrong, the two predicts should not differ. > I hesitate to call it a bug, it might be viewed as inappropriate usage. But it's easy to get wrong answers, fairly small changes to poly and polym correct the wrongness, and I think the changes are backwards compatible. Perhaps appending the altered codes made the R-help post too long for easy comprehension, I attach them to this email. Thank you! I had started to look at your R-help post and noticed that you changed the *printout* of the R functions, instead of the source The current development version of that part of the R source code is always at https://svn.r-project.org/R/trunk/src/library/stats/R/contr.poly.R and if you look carefully, you see that there are comments in the sources that are lost in the process (of parsing, byte-compiling, saving in binary, ....), but never mind: you've marked your changes well and I can use your version to modify the sources.>From what I've understood, the changes make much sense and lookgood; and if no problem surfaces should make it into R - with an acknowledgement to you, of course. Thank you, indeed, Martin ------- Part 2: > While I'm writing, for didactic purposes I've sometimes > wanted to express the orthogonal polynomials in the > conventional form: > b0.x^0 + b1.x^1 + b2.x^2 + ... > ... rather than in terms of " the centering and > normalization constants used in constructing the > orthogonal polynomials". > Over the years in the various R forums others have made > similar requests to be told that one can: > (a) calculate the polynomials using poly.predict(), so the > conventional form coefficients aren't needed; > (b) see the algorithm in the code and/or Kennedy & Gentle (1980, pp. 343?4). > (a) doesn't meet my didactic needs. > With respect to (b) I could see how to calculate polynomials for given x value but I just got lost in the algebra trying to deduce the conventional form coefficients :-{ > Kennedy & Gentle refer to "solving for x in p(x)" which to > my simple mind suggested `lm' and led to the truly > horrible approach implemented in the attached > poly-orth2raw.R. > I fully accept that there must be a better, more direct, > way to deduce the conventional form coefficients from the > centering and normalisation constants but I can't work it > out, and this approach seems to work. I'm not really commenting on the above issue, and notably your orth2raw implementation at all. My vague recollection would be that indeed, it has been sometimes desirable, if only for didactical reasons, to better explain the meaning of the orthogonal polynomial basis. OTOH, notably if you think of high degree polynomials(10 already may be "high" here), it can even be "dangerous" to hand down the coefficients wrt to the (1, x, x^2, .., x^p) basis to the end user, because even using them in prediction may be numerical quite unstable {but then one, me included, would argue that using degree 10 polynomials is typically nonsense and should be replaced by using regression/smoothing splines}... > With best regards, > Keith Jewell ? Head of Statistics Group > Campden BRI Group > Tel + 44(0)1386 842055 > Email keith.jewell at campdenbri.co.uk > Web www.campdenbri.co.uk > Site Station Road, Chipping Campden, Gloucestershire, GL55 6LD, UK > [............ ca 20 lines of legal gibberish + anti-virus ad .. ] > ____________________________________________________________ > x[DELETED ATTACHMENT poly.R, Untyped binary data] > x[DELETED ATTACHMENT polym.R, Untyped binary data] > x[DELETED ATTACHMENT orth2raw.R, Untyped binary data]
Martin Maechler
2015-Jul-23 14:25 UTC
[Rd] Improvements (?) in stats::poly and stats::polym.
>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Fri, 17 Jul 2015 18:00:28 +0200 writes:MM> Dear Keith,>>>>> <Keith.Jewell at campdenbri.co.uk> >>>>> on Thu, 16 Jul 2015 08:58:11 +0000 writes:>> Dear R Core Team, >> Last week I made a post to the R-help mailing list >> ?predict.poly for multivariate data? >> <https://stat.ethz.ch/pipermail/r-help/2015-July/430311.html> >> but it has had no responses so I?m sending this to the >> email address of package:stats maintainer. Please feel >> free to tell me that this is inappropriate. MM> Asking R Core in your case is ok ... MM> { though still slightly "sub optimal" (but *not* "inappropriate"!): MM> Ideallly you'd have followed the posting guide MM> (http://www.r-project.org/posting-guide.html) here, MM> namely to send your original post to R-devel instead of R-help. MM> Then it would have been noticed by me and most probably MM> several other R core members ... MM> } >> IMHO the reproducible code I presented in that post: >> ############# >> library(datasets) >> alm <- lm(stack.loss ~ poly(Air.Flow, Water.Temp, degree=3), stackloss) >> alm$fitted.values[1:10] # "correct" prediction values [1:10] >> predict(alm, stackloss)[1:10] # gives correct values >> predict(alm, stackloss[1:10,]) # gives wrong values >> ######### >> ... clearly demonstrates something wrong, the two predicts should not differ. >> I hesitate to call it a bug, it might be viewed as inappropriate usage. But it's easy to get wrong answers, fairly small changes to poly and polym correct the wrongness, and I think the changes are backwards compatible. Perhaps appending the altered codes made the R-help post too long for easy comprehension, I attach them to this email. MM> Thank you! MM> I had started to look at your R-help post and noticed that you MM> changed the *printout* of the R functions, instead of the MM> source MM> The current development version of that part of the R MM> source code is always at MM> https://svn.r-project.org/R/trunk/src/library/stats/R/contr.poly.R MM> and if you look carefully, you see that there are comments in MM> the sources that are lost in the process (of parsing, MM> byte-compiling, saving in binary, ....), MM> but never mind: MM> you've marked your changes well and I can use your version to MM> modify the sources. >> From what I've understood, the changes make much sense and look MM> good; and if no problem surfaces should make it into R - with an MM> acknowledgement to you, of course. I've now committed corresponding changes to R-devel, changes which indeed have evolved from your (Keith) contributions, thank you very much. My additional changes were trying to slightly simplify the code logic, (and a new argument 'simple' to gain some speed). If the changes do not have visible negative effects on existing CRAN/Bioconductor code (which *is* possible, after all, the results now sometimes are different in the attributes), we may consider porting the changes to 'R 3.2.1 patched' which will become R 3.2.2 in three weeks. Thank you again, Martin Maechler > [............................]