Full_Name: Russell Lenth
Version: 2.6.2
OS: Windows XP Pro
Submission from: (NULL) (128.255.132.36)
The poly() function allows a higher-degree polynomial than it should, when
raw=FALSE.
For example, consider 5 distinct 'x' values, each repeated twice.  we
can fit a
polynomial of degree 8:
====R> x = rep(1:5, 2)
R> y = rnorm(10)
R> lm(y ~ poly(x, 8))
Call:
lm(formula = y ~ poly(x, 8))
Coefficients:
(Intercept)  poly(x, 8)1  poly(x, 8)2  poly(x, 8)3  poly(x, 8)4  poly(x, 8)5  
   -0.07790      0.58768      0.30147     -0.18237      0.64779     -0.04444  
poly(x, 8)6  poly(x, 8)7  poly(x, 8)8  
   -0.22694     -0.07500      0.17235  
====
If we specify 'raw=TRUE' in the same example, we are limited (as we
should) to a
degree-4 polynomial:
====R> lm(y ~ poly(x, 8, raw=TRUE))
Call:
lm(formula = y ~ poly(x, 8, raw = TRUE))
Coefficients:
            (Intercept)  poly(x, 8, raw = TRUE)1  poly(x, 8, raw = TRUE)2  
                 7.3958                 -14.0150                   8.2784  
poly(x, 8, raw = TRUE)3  poly(x, 8, raw = TRUE)4  poly(x, 8, raw = TRUE)5  
                -1.9502                   0.1597                       NA  
poly(x, 8, raw = TRUE)6  poly(x, 8, raw = TRUE)7  poly(x, 8, raw = TRUE)8  
                     NA                       NA                       NA  
====
We do get an error with an even higher degree:
====R> lm(y ~ poly(x, 10))
Error in poly(x, 10) : 'degree' must be less than number of points
====
Looking at the code for poly(), I think the problem is that it should check the
'rank' result from its call to qr().
[The above results are using Windows XP Pro, but I verified this bug under Linux
as well (x86_64, dual core).  It seems pretty obvious to me that this bug is not
platform-dependent.]
I don't understand why this is a bug in usage.  Is it because the 2nd 
argument is not named?  I get the same behavior if I do name it:
====[R version 2.6.2 (2008-02-08), Windows XP Pro]
R> x = rep(1:4,3)
R> y = (1:12)^1.5
R> lm(y ~ poly(x, degree=10))
Call:
lm(formula = y ~ poly(x, degree = 10))
Coefficients:
            (Intercept)   poly(x, degree = 10)1   poly(x, degree = 10)2
               18.39370                14.21385                 0.58588
  poly(x, degree = 10)3   poly(x, degree = 10)4   poly(x, degree = 10)5
               -0.01770                 3.34767               -11.46388
  poly(x, degree = 10)6   poly(x, degree = 10)7   poly(x, degree = 10)8
                0.51178                 0.44296                12.47199
  poly(x, degree = 10)9  poly(x, degree = 10)10
              -28.38972                18.47439
====
Is there a case where we *would* want a 10th degree polynomial fitted to 
only 4 distinct x values?  A simple modification [changing 'length(x)' 
to 'length(unique(x))' in 2 places] seems to fix this:
====R> mypoly
    ...
     if (raw) {
         if (degree >= length(unique(x)))
             stop("'degree' must be less than number of
points")
    ...
     if (is.null(coefs)) {
         if (degree >= length(unique(x)))
             stop("'degree' must be less than number of
points")
    ...
R> lm(y ~ mypoly(x, degree=10))
Error in mypoly(x, degree = 10) :
   'degree' must be less than number of points
====
Russ
-- 
Russell V. Lenth, Professor
Department of Statistics
   & Actuarial Science            (319)335-0814    FAX (319)335-3017
The University of Iowa           russell-lenth at uiowa.edu
Iowa City, IA 52242  USA         http://www.stat.uiowa.edu/~rlenth/