There is a bug in `predict' whereby the order of variables sometimes gets
re-arranged compared to the original fit, and then disaster results.
Specifically, the 'variables' and 'predvars' attributes of a
'terms' object
get out of synch. This only happens when the terms in the original formula
get re-ordered during fitting:
test> scrunge.data_ data.frame( contin=1:10, discrete=factor( rep( c(
'cat',
'dog'), 5)), resp=runif( 10))
test> lm.ok_ lm( resp ~ discrete + contin %in% discrete, data=scrunge.data)
test> predict( lm.ok, scrunge.data) # no problemo
1 2 3 4 5 6 7
8 9 10
0.29663793 0.04572655 0.42661779 0.31668732 0.55659764 0.58764809 0.68657750
0.85860886 0.81655736 1.12956963
test> lm.bug_ lm( resp ~ contin %in% discrete + discrete, data=scrunge.data)
# terms will be re-ordered
test> predict( lm.bug, scrunge.data)
Error in "contrasts<-"(*tmp*, value = "contr.treatment")
:
contrasts apply only to factors
In addition: Warning message:
variable discrete is not a factor in: model.frame.default(object, data, xlev
= xlev)
This actually turns out to be a bug in `model.frame.default', to do with an
inconsistency between `predvars' and `vars' when
`model.frame.default' is
called inside `predict'. AFAICS it can be fixed by including the commented
line below in `model.frame.default':
<<...>>
vars <- attr(formula, "variables")
predvars <- attr(formula, "predvars")
if (is.null(predvars))
predvars <- vars
varnames <- as.character(predvars[-1]) # MVB: was vars[-1] not
predvars[-1]
variables <- eval(predvars, data, env)
<<...>>
This has the side-effect that there are some ugly column names in the
model.frame if e.g. a `poly' term is used, but doesn't actually seem to
hurt
the prediction.
However, that doesn't fix it all. There is still a bug in `predict',
even
after replacing `model.frame.default' with the above:
test> predict( lm.bug, scrunge.data)
Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
subscript out of bounds
test> # wot???
This time, the bug is in `delete.response', which call `terms' to set
most
of the attributes including `variables', but adjusts the original
`predvars'
by hand. Because `terms' returns the variables in a different order when
it's called by `predict.lm' to when it was originally called by
`lm', things
get out of synch.
This is a slightly tricky bug to fix, because `predvars' and `variables'
can
look a bit different e.g. if there are `poly' terms, but I think the
following change near the end of `delete.response' does the trick:
<<...>>
if (length(formula(termobj)) == 3) {
# Old code, reliant on maintaining the order of terms: attr(tt,
"predvars") <- attr(termobj, "predvars")[-2]
reorder <- match( sapply( attr( tt, 'variables'), deparse),
sapply(
attr( termobj, 'variables'), deparse)) # MVB
attr( tt, 'predvars') <- attr( termobj, 'predvars')[
reorder] # MVB
}
<<...>>
cheers
Mark
*******************************
Mark Bravington
CSIRO (CMIS)
PO Box 1538
Castray Esplanade
Hobart
TAS 7001
phone (61) 3 6232 5118
fax (61) 3 6232 5012
Mark.Bravington@csiro.au
You forgot to give the R version, platform etc. This is already fixed in R-devel, and your example works there provided a valid assignment operator is used. It is the same PR#2206, and is marked as fixed in R-bugs. On Wed, 26 Mar 2003 Mark.Bravington@csiro.au wrote:> There is a bug in `predict' whereby the order of variables sometimes gets > re-arranged compared to the original fit, and then disaster results. > Specifically, the 'variables' and 'predvars' attributes of a 'terms' object > get out of synch. This only happens when the terms in the original formula > get re-ordered during fitting: > > test> scrunge.data_ data.frame( contin=1:10, discrete=factor( rep( c( 'cat', > 'dog'), 5)), resp=runif( 10)) > test> lm.ok_ lm( resp ~ discrete + contin %in% discrete, data=scrunge.data) > test> predict( lm.ok, scrunge.data) # no problemo > 1 2 3 4 5 6 7 > 8 9 10 > 0.29663793 0.04572655 0.42661779 0.31668732 0.55659764 0.58764809 0.68657750 > 0.85860886 0.81655736 1.12956963 > > test> lm.bug_ lm( resp ~ contin %in% discrete + discrete, data=scrunge.data) > # terms will be re-ordered > test> predict( lm.bug, scrunge.data) > Error in "contrasts<-"(*tmp*, value = "contr.treatment") : > contrasts apply only to factors > In addition: Warning message: > variable discrete is not a factor in: model.frame.default(object, data, xlev > = xlev) > > This actually turns out to be a bug in `model.frame.default', to do with an > inconsistency between `predvars' and `vars' when `model.frame.default' is > called inside `predict'. AFAICS it can be fixed by including the commented > line below in `model.frame.default': > > <<...>> > vars <- attr(formula, "variables") > predvars <- attr(formula, "predvars") > if (is.null(predvars)) > predvars <- vars > varnames <- as.character(predvars[-1]) # MVB: was vars[-1] not > predvars[-1] > variables <- eval(predvars, data, env) > <<...>> > > This has the side-effect that there are some ugly column names in the > model.frame if e.g. a `poly' term is used, but doesn't actually seem to hurt > the prediction. > > However, that doesn't fix it all. There is still a bug in `predict', even > after replacing `model.frame.default' with the above: > > test> predict( lm.bug, scrunge.data) > Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) : > subscript out of bounds > test> # wot??? > > This time, the bug is in `delete.response', which call `terms' to set most > of the attributes including `variables', but adjusts the original `predvars' > by hand. Because `terms' returns the variables in a different order when > it's called by `predict.lm' to when it was originally called by `lm', things > get out of synch. > > This is a slightly tricky bug to fix, because `predvars' and `variables' can > look a bit different e.g. if there are `poly' terms, but I think the > following change near the end of `delete.response' does the trick: > > <<...>> > if (length(formula(termobj)) == 3) { > # Old code, reliant on maintaining the order of terms: attr(tt, > "predvars") <- attr(termobj, "predvars")[-2] > reorder <- match( sapply( attr( tt, 'variables'), deparse), sapply( > attr( termobj, 'variables'), deparse)) # MVB > attr( tt, 'predvars') <- attr( termobj, 'predvars')[ reorder] # MVB > } > <<...>> > > cheers > Mark > > ******************************* > > Mark Bravington > CSIRO (CMIS) > PO Box 1538 > Castray Esplanade > Hobart > TAS 7001 > > phone (61) 3 6232 5118 > fax (61) 3 6232 5012 > Mark.Bravington@csiro.au > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
( summary: predict now gets round problems caused by re-ordering of terms )
Sorry-- glad it's fixed & surprised I left the info off-- it's in
yesterday's other bug reports tho' :).
I know that I searched the bug tracking page yesterday to see whether the
bugs were already reported before sending them. Didn't find anything
yesterday, but there they are today in black & white. Must have done
something daft.
cheers
Mark
*******************************
Mark Bravington
CSIRO (CMIS)
PO Box 1538
Castray Esplanade
Hobart
TAS 7001
phone (61) 3 6232 5118
fax (61) 3 6232 5012
Mark.Bravington@csiro.au
#-----Original Message-----
#From: ripley@stats.ox.ac.uk [mailto:ripley@stats.ox.ac.uk]
#Sent: Wednesday, 26 March 2003 6:09 PM
#To: Mark.Bravington@csiro.au
#Cc: r-devel@stat.math.ethz.ch; R-bugs@biostat.ku.dk
#Subject: Re: [Rd] predict (PR#2685)
#
#
#You forgot to give the R version, platform etc.
#
#This is already fixed in R-devel, and your example works there
#provided a
#valid assignment operator is used.
#
#It is the same PR#2206, and is marked as fixed in R-bugs.
#
#On Wed, 26 Mar 2003 Mark.Bravington@csiro.au wrote:
#
#> There is a bug in `predict' whereby the order of variables
#sometimes gets
#> re-arranged compared to the original fit, and then disaster results.
#> Specifically, the 'variables' and 'predvars' attributes of
a
#'terms' object
#> get out of synch. This only happens when the terms in the
#original formula
#> get re-ordered during fitting:
#>
#> test> scrunge.data_ data.frame( contin=1:10,
#discrete=factor( rep( c( 'cat',
#> 'dog'), 5)), resp=runif( 10))
#> test> lm.ok_ lm( resp ~ discrete + contin %in% discrete,
#data=scrunge.data)
#> test> predict( lm.ok, scrunge.data) # no problemo
#> 1 2 3 4 5
# 6 7
#> 8 9 10
#> 0.29663793 0.04572655 0.42661779 0.31668732 0.55659764
#0.58764809 0.68657750
#> 0.85860886 0.81655736 1.12956963
#>
#> test> lm.bug_ lm( resp ~ contin %in% discrete + discrete,
#data=scrunge.data)
#> # terms will be re-ordered
#> test> predict( lm.bug, scrunge.data)
#> Error in "contrasts<-"(*tmp*, value =
"contr.treatment") :
#> contrasts apply only to factors
#> In addition: Warning message:
#> variable discrete is not a factor in:
#model.frame.default(object, data, xlev
#> = xlev)
#>
#> This actually turns out to be a bug in
#`model.frame.default', to do with an
#> inconsistency between `predvars' and `vars' when
#`model.frame.default' is
#> called inside `predict'. AFAICS it can be fixed by including
#the commented
#> line below in `model.frame.default':
#>
#> <<...>>
#> vars <- attr(formula, "variables")
#> predvars <- attr(formula, "predvars")
#> if (is.null(predvars))
#> predvars <- vars
#> varnames <- as.character(predvars[-1]) # MVB: was vars[-1] not
#> predvars[-1]
#> variables <- eval(predvars, data, env)
#> <<...>>
#>
#> This has the side-effect that there are some ugly column names in the
#> model.frame if e.g. a `poly' term is used, but doesn't
#actually seem to hurt
#> the prediction.
#>
#> However, that doesn't fix it all. There is still a bug in
#`predict', even
#> after replacing `model.frame.default' with the above:
#>
#> test> predict( lm.bug, scrunge.data)
#> Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
#> subscript out of bounds
#> test> # wot???
#>
#> This time, the bug is in `delete.response', which call
#`terms' to set most
#> of the attributes including `variables', but adjusts the
#original `predvars'
#> by hand. Because `terms' returns the variables in a
#different order when
#> it's called by `predict.lm' to when it was originally called
#by `lm', things
#> get out of synch.
#>
#> This is a slightly tricky bug to fix, because `predvars' and
#`variables' can
#> look a bit different e.g. if there are `poly' terms, but I think the
#> following change near the end of `delete.response' does the trick:
#>
#> <<...>>
#> if (length(formula(termobj)) == 3) {
#> # Old code, reliant on maintaining the order of terms: attr(tt,
#> "predvars") <- attr(termobj, "predvars")[-2]
#> reorder <- match( sapply( attr( tt, 'variables'),
#deparse), sapply(
#> attr( termobj, 'variables'), deparse)) # MVB
#> attr( tt, 'predvars') <- attr( termobj,
'predvars')[
#reorder] # MVB
#> }
#> <<...>>
#>
#> cheers
#> Mark
#>
#> *******************************
#>
#> Mark Bravington
#> CSIRO (CMIS)
#> PO Box 1538
#> Castray Esplanade
#> Hobart
#> TAS 7001
#>
#> phone (61) 3 6232 5118
#> fax (61) 3 6232 5012
#> Mark.Bravington@csiro.au
#>
#> ______________________________________________
#> R-devel@stat.math.ethz.ch mailing list
#> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
#>
#
#--
#Brian D. Ripley, ripley@stats.ox.ac.uk
#Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
#University of Oxford, Tel: +44 1865 272861 (self)
#1 South Parks Road, +44 1865 272866 (PA)
#Oxford OX1 3TG, UK Fax: +44 1865 272595
#