John Maindonald
2007-May-01 04:26 UTC
[Rd] Levels attribute in integer columns created by model.frame()
The following is evidence of what is surely an undesirable feature. The issue is the handling, in calls to model.frame(), of an explanatory variable that has been derived as an unclassed factor. (Ross Darnell drew this to my attention.) ## Data are slightly modified from p.191 of MASS > worms <- data.frame(sex=gl(2,6), Dose=factor(rep(2^(0:5),2)), + deaths=c(1,4,9,13,18,20,0,2,6,10,12,16)) > worms$doselin <- unclass(worms$Dose) > class(worms$doselin) [1] "integer" > attributes(worms$doselin) $levels [1] "1" "2" "4" "8" "16" "32" > worms.glm <- glm(cbind(deaths, (20-deaths)) ~ sex+ doselin, + data=worms, family=binomial) > predict(worms.glm, new=data.frame(sex="1", doselin=6)) Error: variable 'doselin' was fitted with class "other" but class "numeric" was supplied In addition: Warning message: variable 'doselin' is not a factor in: model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) The error is reported in the call to model.frame() from predict.lm() which is called by predict.glm(). It is not clear to me why this call to model.frame identifies the class that should be expected as "other". The problem might be fixed by stripping the levels attribute from any column created by model.frame() that is integer or numeric. > ################################################### > > ## Note the following > mframe <- model.frame(cbind(deaths, (20-deaths)) ~ sex+ doselin, + data=worms) > class(mframe$doselin) [1] "integer" > attributes(mframe$doselin) $levels [1] "1" "2" "4" "8" "16" "32" John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.
Prof Brian Ripley
2007-May-01 06:33 UTC
[Rd] Levels attribute in integer columns created by model.frame()
Stripping attributes from a column in model.frame would be highly undesirable. The mistake was using 'unclass' when the intention was to remove the levels (I presume). The new variable given is correctly reported as not matching that used during fitting. Uuse of traceback() would have shown that the error is not reported from model.frame (as claimed) but from 4: .checkMFClasses(cl, m) 3: predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type = "link", "response", type), terms = terms, na.action = na.action) 2: predict.glm(worms.glm, new = data.frame(sex = "1", doselin = 6)) 1: predict(worms.glm, new = data.frame(sex = "1", doselin = 6)) The reason the class is reported as "other" is clear from attr(worms.glm, "dataClasses"). This comes from .MFclass. On Tue, 1 May 2007, John Maindonald wrote:> The following is evidence of what is surely an undesirable feature. > The issue is the handling, in calls to model.frame(), of an > explanatory variable that has been derived as an unclassed > factor. (Ross Darnell drew this to my attention.)He has already filed a bug report on it, without saying what he thinks the bug is.> ## Data are slightly modified from p.191 of MASS > > worms <- data.frame(sex=gl(2,6), Dose=factor(rep(2^(0:5),2)), > + deaths=c(1,4,9,13,18,20,0,2,6,10,12,16)) > > worms$doselin <- unclass(worms$Dose) > > class(worms$doselin) > [1] "integer" > > attributes(worms$doselin) > $levels > [1] "1" "2" "4" "8" "16" "32" > > > worms.glm <- glm(cbind(deaths, (20-deaths)) ~ sex+ doselin, > + data=worms, family=binomial) > > predict(worms.glm, new=data.frame(sex="1", doselin=6)) > Error: variable 'doselin' was fitted with class "other" but class > "numeric" was supplied > In addition: Warning message: > variable 'doselin' is not a factor in: model.frame.default(Terms, > newdata, na.action = na.action, xlev = object$xlevels) > > > The error is reported in the call to model.frame() from predict.lm() > which is called by predict.glm(). It is not clear to me why this call to > model.frame identifies the class that should be expected as "other". > > The problem might be fixed by stripping the levels attribute from > any column created by model.frame() that is integer or numeric. > > > ################################################### > > > > ## Note the following > > mframe <- model.frame(cbind(deaths, (20-deaths)) ~ sex+ doselin, > + data=worms) > > class(mframe$doselin) > [1] "integer" > > attributes(mframe$doselin) > $levels > [1] "1" "2" "4" "8" "16" "32" > > > John Maindonald email: john.maindonald at anu.edu.au > phone : +61 2 (6125)3473 fax : +61 2(6125)5549 > Centre for Mathematics & Its Applications, Room 1194, > John Dedman Mathematical Sciences Building (Building 27) > Australian National University, Canberra ACT 0200. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Seemingly Similar Threads
- [Fwd: Re: [R-downunder] Beware unclass(factor)] (PR#9641)
- subcripts on data frames (PR#9885)
- lrm in Design package--missing value where TRUE/FALSE needed
- factors in multinom function (nnet)
- Worms Armageddon: cant open Worms Armageddon folder on fake ../drive_c/Team17/Worms Armageddon