Full_Name: Russell Lenth
Version: 2.6.2
OS: Windows XP Pro
Submission from: (NULL) (128.255.132.36)
The poly() function allows a higher-degree polynomial than it should, when
raw=FALSE.
For example, consider 5 distinct 'x' values, each repeated twice. we
can fit a
polynomial of degree 8:
====R> x = rep(1:5, 2)
R> y = rnorm(10)
R> lm(y ~ poly(x, 8))
Call:
lm(formula = y ~ poly(x, 8))
Coefficients:
(Intercept) poly(x, 8)1 poly(x, 8)2 poly(x, 8)3 poly(x, 8)4 poly(x, 8)5
-0.07790 0.58768 0.30147 -0.18237 0.64779 -0.04444
poly(x, 8)6 poly(x, 8)7 poly(x, 8)8
-0.22694 -0.07500 0.17235
====
If we specify 'raw=TRUE' in the same example, we are limited (as we
should) to a
degree-4 polynomial:
====R> lm(y ~ poly(x, 8, raw=TRUE))
Call:
lm(formula = y ~ poly(x, 8, raw = TRUE))
Coefficients:
(Intercept) poly(x, 8, raw = TRUE)1 poly(x, 8, raw = TRUE)2
7.3958 -14.0150 8.2784
poly(x, 8, raw = TRUE)3 poly(x, 8, raw = TRUE)4 poly(x, 8, raw = TRUE)5
-1.9502 0.1597 NA
poly(x, 8, raw = TRUE)6 poly(x, 8, raw = TRUE)7 poly(x, 8, raw = TRUE)8
NA NA NA
====
We do get an error with an even higher degree:
====R> lm(y ~ poly(x, 10))
Error in poly(x, 10) : 'degree' must be less than number of points
====
Looking at the code for poly(), I think the problem is that it should check the
'rank' result from its call to qr().
[The above results are using Windows XP Pro, but I verified this bug under Linux
as well (x86_64, dual core). It seems pretty obvious to me that this bug is not
platform-dependent.]
I don't understand why this is a bug in usage. Is it because the 2nd
argument is not named? I get the same behavior if I do name it:
====[R version 2.6.2 (2008-02-08), Windows XP Pro]
R> x = rep(1:4,3)
R> y = (1:12)^1.5
R> lm(y ~ poly(x, degree=10))
Call:
lm(formula = y ~ poly(x, degree = 10))
Coefficients:
(Intercept) poly(x, degree = 10)1 poly(x, degree = 10)2
18.39370 14.21385 0.58588
poly(x, degree = 10)3 poly(x, degree = 10)4 poly(x, degree = 10)5
-0.01770 3.34767 -11.46388
poly(x, degree = 10)6 poly(x, degree = 10)7 poly(x, degree = 10)8
0.51178 0.44296 12.47199
poly(x, degree = 10)9 poly(x, degree = 10)10
-28.38972 18.47439
====
Is there a case where we *would* want a 10th degree polynomial fitted to
only 4 distinct x values? A simple modification [changing 'length(x)'
to 'length(unique(x))' in 2 places] seems to fix this:
====R> mypoly
...
if (raw) {
if (degree >= length(unique(x)))
stop("'degree' must be less than number of
points")
...
if (is.null(coefs)) {
if (degree >= length(unique(x)))
stop("'degree' must be less than number of
points")
...
R> lm(y ~ mypoly(x, degree=10))
Error in mypoly(x, degree = 10) :
'degree' must be less than number of points
====
Russ
--
Russell V. Lenth, Professor
Department of Statistics
& Actuarial Science (319)335-0814 FAX (319)335-3017
The University of Iowa russell-lenth at uiowa.edu
Iowa City, IA 52242 USA http://www.stat.uiowa.edu/~rlenth/