Dear r-devel list members,
I've recently encountered the following problem using predict() with a model
that has raw-polynomial terms. (Actually, I encountered the problem using
model.frame(), but the source of the error is the same.) The problem is
technical and concerns the design of poly(), which is why I'm sending this
message to r-devel rather than r-help.
To illustrate:
------------ snip -------------
> mod.pres <- lm(prestige ~ log(income, 10) + poly(education, 3) +
poly(women, 2),
+ data=Prestige) # Prestige data from car package
> predict(mod.pres, newdata=data.frame(education=10, income=6000, women=30))
# works
1
39.66414
> model.frame(delete.response(terms(mod.pres)), data.frame(education=10,
income=6000, women=30))
log(income, 10) poly(education, 3).1 poly(education, 3).2 poly(education,
3).3 poly(women, 2).1 poly(women, 2).2
1 3.778151 -0.02691558 -0.08720086
0.07199804 0.003202256 -0.138777837
> mod.pres.raw <- lm(prestige ~ log(income, 10) + poly(education, 3,
raw=TRUE) + poly(women, 2, raw=TRUE),
+ data=Prestige)
> predict(mod.pres.raw, newdata=data.frame(education=10, income=6000,
women=30)) # doesn't work
Error in poly(education, 3, raw = TRUE) :
'degree' must be less than number of unique points
> model.frame(delete.response(terms(mod.pres.raw)), data.frame(education=10,
income=6000, women=30))
Error in poly(education, 3, raw = TRUE) :
'degree' must be less than number of unique points
------------ snip -------------
I understand the source of the error, but what's unclear to me is why
it's
necessary for poly() to check the degree of the polynomial against the
number of unique supplied points when raw=TRUE. For example, if I simply
remove this check from poly(), then
------------ snip -------------
> predict(mod.pres.raw, newdata=data.frame(education=10, income=6000,
women=30))
1
39.66414
> model.frame(delete.response(terms(mod.pres.raw)), data.frame(education=10,
income=6000, women=30))
log(income, 10) poly(education, 3, raw = TRUE).1 poly(education, 3, raw
TRUE).2 poly(education, 3, raw = TRUE).3 poly(women, 2, raw = TRUE).1
poly(women, 2, raw = TRUE).2
1 3.778151 10
100 1000 30
900
------------ snip -------------
Of course, if one then used the modified poly() in a regression with fewer
unique Xs than the degree of the polynomial, the model matrix would be
singular; but why not just let the error appear at that stage?
I could solve my problem by maintaining a local version of poly(), but why
should it not be possible to get predictions under these circumstances?
Best,
John
--------------------------------
John Fox
Senator William McMaster
Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox