ripley at stats.ox.ac.uk
2007-May-01 13:33 UTC
[Rd] [Fwd: Re: [R-downunder] Beware unclass(factor)] (PR#9641)
It really is unclear what is claimed to be a bug here. But see https://stat.ethz.ch/pipermail/r-devel/2007-May/045592.html for why the bug is not in R: your old and new data do not match. Your fit is to a category. [The problem with the web interface to R-bugs was reported last week: it is being worked on.] On Mon, 30 Apr 2007, r.darnell at uq.edu.au wrote:> This is a multi-part message in MIME format. > --------------040101030901070905010208 > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > Content-Transfer-Encoding: 7bit > > The following "issue" was found using > > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 4.1 > year 2006 > month 12 > day 18 > svn rev 40228 > language R > version.string R version 2.4.1 (2006-12-18) > > > > > and discussed on the R-downunder mailing list. > > I hope I have provided enough info. I tried to look at the Bugs > Tracking page but got--- > > > The system encountered a fatal error > > * > > cannot open config file /home/sfe/r-bugs/jitterbug/R : No such file or directory > > * > > The last error code was: No such file or directory > > uid/gid=30/8 > > > Regards > > Ross Darnell > > --------------040101030901070905010208 > Content-Type: message/rfc822; > name="Re: [R-downunder] Beware unclass(factor)" > Content-Transfer-Encoding: 7bit > Content-Disposition: inline; > filename="Re: [R-downunder] Beware unclass(factor)" > > Return-path: <john.maindonald at anu.edu.au> > Received: from mail2a.soe.uq.edu.au (mail2a.soe.uq.edu.au [130.102.3.87]) > by MAILSTORE (The University of Queensland Central Mail System) > with ESMTP id <0JHB00BUB0WHC0 at anode.soe.uq.edu.au> for r.darnell at uq.edu.au; > Mon, 30 Apr 2007 19:26:41 +1000 (EST) > Received: from mailhub4.uq.edu.au (mailhub4.uq.edu.au [130.102.149.131]) > by MAILSTORE (The University of Queensland Central Mail System) > with ESMTP id <0JHB009DL0WH43 at positive.soe.uq.edu.au> for r.darnell at uq.edu.au; > Mon, 30 Apr 2007 19:26:41 +1000 (EST) > Received: from customer-domains.icp-qv1-irony10.iinet.net.au > (customer-domains.icp-qv1-irony10.iinet.net.au [203.59.1.145]) > by mailhub4.uq.edu.au (8.13.8/8.13.8) with ESMTP id l3U9QcOd021380 for > <r.darnell at uq.edu.au>; Mon, 30 Apr 2007 19:26:41 +1000 > Received: from 203-173-2-10.dyn.iinet.net.au (HELO [192.168.0.2]) > ([203.173.2.10]) by iinet-mail.icp-qv1-irony10.iinet.net.au with ESMTP; Mon, > 30 Apr 2007 17:25:10 +0800 > Date: Mon, 30 Apr 2007 19:25:09 +1000 > From: John Maindonald <john.maindonald at anu.edu.au> > Subject: Re: [R-downunder] Beware unclass(factor) > In-reply-to: <46359373.50504 at uq.edu.au> > To: Ross Darnell <r.darnell at uq.edu.au> > Cc: r-downunder at stat.auckland.ac.nz > Message-id: <68935773-EB35-4B4F-9970-0D241FDFF73C at anu.edu.au> > MIME-version: 1.0 (Apple Message framework v752.3) > X-Mailer: Apple Mail (2.752.3) > Content-type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > Content-transfer-encoding: 7bit > X-IronPort-Anti-Spam-Filtered: true > X-IronPort-Anti-Spam-Result: AgAAAARTNUbLrQIKUGdsb2JhbAANj3wBASo > X-IronPort-AV: i="4.14,469,1170601200"; d="scan'208"; > a="80792155:sNHT7461584868" > X-Sorbs: not_in_sorbs > X-Spam-Score: 0 (), 5 = high > X-UQ-Spam-Score: UQ-Spam-Score (0), 5 = high > X-UQ-FilterTime: 1177925201 > X-Scanned-By: MIMEDefang 2.58 on UQ Mailhub on 130.102.149.131 > References: <46359373.50504 at uq.edu.au> > Original-recipient: rfc822;r.darnell at uq.edu.au > > Observe the following > > > z <- model.frame(cbind(moths,(20-moths)) ~sex+ doselin,data=worms) > > class(z$doselin) > [1] "other" > > levels(z$doselin) > [1] "1" "2" "4" "8" "16" "32" > > attributes(z$doselin) > $levels > [1] "1" "2" "4" "8" "16" "32" > > $class > [1] "other" > > The problem surfaces in the call for model.frame() from predict.lm() > when it is called by predict.glm(). This call is jumping to conclusions > when it uses the presence of a levels attribute as an indication that > doselin is a factor, ironic as it was the call that was initiated by glm > that seems to have given the column doselin of the object returned > by model.frame() the class "other". > > This seems to me to be a bug. The call to unclass() does not > strip the levels attribute from doselin. (This is not, I think, the > bug; rather the problem is in the model matrix that is created.) > The column worms$doselin does though have class "integer", > at least as far as the function class() is concerned. > > You can fix the problem by setting: > > worms$doselin <- as.vector(unclass(worms$Dose)) > > This strips off the levels attribute. > > In my view model.frame ought to have stripped the levels > attribute from the column doselin in the object that it > returned. > > I consider that this should be reported as a bug, or at least > as an undesirable feature. > > John Maindonald email: john.maindonald at anu.edu.au > phone : +61 2 (6125)3473 fax : +61 2(6125)5549 > Centre for Mathematics & Its Applications, Room 1194, > John Dedman Mathematical Sciences Building (Building 27) > Australian National University, Canberra ACT 0200. > > > On 30 Apr 2007, at 4:57 PM, Ross Darnell wrote: > >> Just an observation about the use of unclass() to generate codes >> for factors. >> >> As an example take the dataset from the MASS4 book >> >>> worms <- data.frame(sex=gl(2,6),Dose=factor(rep(2^(0:5), >> 2)),moths=c(1,4,9,13,18,20,0,2,6,10,12,16)) >> >>> worms$doselin <- unclass(worms$Dose) >> >>> worms.glm <- glm(cbind(moths,(20-moths)) ~sex+ >> doselin,data=worms,family=binomial) >> >>> predict(worms.glm,new=data.frame(sex="1",doselin=6)) >> Error: variable 'doselin' was fitted with class "other" but class >> "numeric" was supplied >> In addition: Warning message: >> variable 'doselin' is not a factor in: model.frame.default(Terms, >> newdata, na.action = na.action, xlev = object$xlevels) >>> >> >> >> The /doselin/ vector is "atomic" --- good enough for the glm() >> function but not acceptable by predict() >> >>> str(worms$doselin) >> atomic [1:12] 1 2 3 4 5 6 1 2 3 4 ... >> - attr(*, "levels")= chr [1:6] "1" "2" "4" "8" ... >>> >> >> Cheers >> >> Ross Darnell >> >> -- >> R-downunder at stat.auckland.ac.nz >> http://www.stat.auckland.ac.nz/r-downunder >> >> To unsubscribe send an email to R-downunder- >> unsubscribe at stat.auckland.ac.nz > > --------------040101030901070905010208-- > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595