Mark.Bravington@csiro.au
2004-Mar-09 06:52 UTC
[Rd] update forgets about offset() (PR#6656)
In R1.7 and above (including R 1.9 alpha), 'update.formula' forgets to copy any offset(...) term in the original '.' formula: test> df <- data.frame( x=1:4, y=sqrt( 1:4), z=c(2:4,1)) test> fit1 <- glm( y~offset(x)+z, data=df) test> fit1$call glm(formula = y ~ offset(x) + z, data = df) test> fit1u <- update( fit1, ~.) test> fit1u$call glm(formula = y ~ z, data = df) The problem occurs when 'update.formula' calls 'terms.formula(..., simplify=TRUE)' which defines and calls a function 'fixFormulaObject'. The first line of 'fixFormulaObject' attempts to extract the contents of the RHS of the formula via tmp <- attr(terms(object), "term.labels") but this omits any offsets. Replacing that line with the following, which I think pulls in everything except the response, *seems* to fix the problem without disrupting the guts of 'terms' itself: tmp <- dimnames( attr(terms(object), "factors"))[[1]][ -attr( terms, 'response')] The suggested line might be simpler than checking the 'offset' component of 'terms(object)', which won't always exist. Footnote: strange things happen when there is more than one offset (OK, there probably shouldn't be, but I thought I'd experiment): test> fit2 <- glm( y ~ offset( x) + offset( log( x)) + z, data=df) test> fit2$call glm(formula = y ~ offset(x) + offset(log(x)) + z, data = df) test> fit2u <- update( fit2, ~.) test> fit2u$call glm(formula = y ~ offset(log(x)) + z, data = df) Curiously, the 'term.labels' attribute of 'terms(object)' now includes the second offset, but not the first. ******************************* Mark Bravington CSIRO (CMIS) PO Box 1538 Castray Esplanade Hobart TAS 7001 phone (61) 3 6232 5118 fax (61) 3 6232 5012 Mark.Bravington@csiro.au --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = major = 1 minor = 8.1 year = 2003 month = 11 day = 21 language = R Windows 2000 Professional (build 2195) Service Pack 4.0 Search Path: .GlobalEnv, ROOT, package:methods, package:ctest, package:mva, package:modreg, package:nls, package:ts, package:chstuff, package:handy2, package:handy, package:debug, mvb.session.info, package:mvbutils, package:tcltk, Autoloads, package:base
On Tue, 9 Mar 2004 Mark.Bravington@csiro.au wrote:> In R1.7 and above (including R 1.9 alpha), 'update.formula' forgets to copy any offset(...) term in the original '.' formula: > > test> df <- data.frame( x=1:4, y=sqrt( 1:4), z=c(2:4,1)) > test> fit1 <- glm( y~offset(x)+z, data=df) > test> fit1$call > glm(formula = y ~ offset(x) + z, data = df) > > test> fit1u <- update( fit1, ~.) > test> fit1u$call > glm(formula = y ~ z, data = df) > > > The problem occurs when 'update.formula' calls 'terms.formula(..., simplify=TRUE)' which defines and calls a function 'fixFormulaObject'. The first line of 'fixFormulaObject' attempts to extract the contents of the RHS of the formula via > > tmp <- attr(terms(object), "term.labels") > > but this omits any offsets. Replacing that line with the following, > which I think pulls in everything except the response, *seems* to fix > the problem without disrupting the guts of 'terms' itself: > > tmp <- dimnames( attr(terms(object), "factors"))[[1]][ -attr( terms, 'response')] > > The suggested line might be simpler than checking the 'offset' component > of 'terms(object)', which won't always exist.Sorry, but that is a common programming error. The possible values of attr(terms, "response") are 0 or 1 (although code should not rely on the non-existence of 2, 3, ...). foo[-0] == foo[0] is a length-0 vector. Also, in R please use rownames(): it is easier to read and safer.> Footnote: strange things happen when there is more than one offset (OK, > there probably shouldn't be, but I thought I'd experiment):That is allowed, and works in general.> test> fit2 <- glm( y ~ offset( x) + offset( log( x)) + z, data=df) > test> fit2$call > glm(formula = y ~ offset(x) + offset(log(x)) + z, data = df) > > test> fit2u <- update( fit2, ~.) > test> fit2u$call > glm(formula = y ~ offset(log(x)) + z, data = df) > > Curiously, the 'term.labels' attribute of 'terms(object)' now includes the second offset, but not the first.The issue here is the code to remove offset terms fails if two successive terms are offsets, but not otherwise. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Mark.Bravington@csiro.au
2004-Mar-15 01:46 UTC
[Rd] update forgets about offset() (PR#6656)
Thanks (I hadn't realize 'response' could be 0). However, there's now a problem (in R 1.9.0 alpha) with *removing* offsets via 'update':> fit2 <- glm( y ~ z + offset(x), data=df) > fit2$callglm(formula = y ~ z + offset(x), data = df)> update( fit2, ~.-offset(x))$callglm(formula = y ~ z + offset(x), data = df) # the offset wasn't removed even though it should have been> update( fit2, ~.-z)$callglm(formula = y ~ 1, <<...>> # now the offset has been removed even though it should have stayed! Mark ******************************* Mark Bravington CSIRO (CMIS) PO Box 1538 Castray Esplanade Hobart TAS 7001 phone (61) 3 6232 5118 fax (61) 3 6232 5012 Mark.Bravington@csiro.au --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = alpha major = 1 minor = 9.0 year = 2004 month = 03 day = 14 language = R Windows 2000 Professional (build 2195) Service Pack 4.0 Search Path: .GlobalEnv, package:methods, package:stats, package:graphics, package:utils, Autoloads, package:base #-----Original Message----- #From: Prof Brian Ripley [mailto:ripley@stats.ox.ac.uk] #Sent: Thursday, 11 March 2004 3:10 AM #To: Bravington, Mark (CMIS, Hobart) #Cc: r-devel@stat.math.ethz.ch; R-bugs@biostat.ku.dk #Subject: Re: [Rd] update forgets about offset() (PR#6656) # # #On Wed, 10 Mar 2004, Prof Brian Ripley wrote: # #> On Tue, 9 Mar 2004 Mark.Bravington@csiro.au wrote: #> #> > In R1.7 and above (including R 1.9 alpha), #'update.formula' forgets to copy any offset(...) term in the #original '.' formula: #> > #> > test> df <- data.frame( x=1:4, y=sqrt( 1:4), z=c(2:4,1)) #> > test> fit1 <- glm( y~offset(x)+z, data=df) #> > test> fit1$call #> > glm(formula = y ~ offset(x) + z, data = df) #> > #> > test> fit1u <- update( fit1, ~.) #> > test> fit1u$call #> > glm(formula = y ~ z, data = df) #> > #> > #> > The problem occurs when 'update.formula' calls #'terms.formula(..., simplify=TRUE)' which defines and calls a #function 'fixFormulaObject'. The first line of #'fixFormulaObject' attempts to extract the contents of the RHS #of the formula via #> > #> > tmp <- attr(terms(object), "term.labels") #> > #> > but this omits any offsets. Replacing that line with the following, #> > which I think pulls in everything except the response, #*seems* to fix #> > the problem without disrupting the guts of 'terms' itself: #> > #> > tmp <- dimnames( attr(terms(object), "factors"))[[1]][ #-attr( terms, 'response')] #> > #> > The suggested line might be simpler than checking the #'offset' component #> > of 'terms(object)', which won't always exist. #> #> Sorry, but that is a common programming error. The possible #values of #> attr(terms, "response") are 0 or 1 (although code should not #rely on the #> non-existence of 2, 3, ...). foo[-0] == foo[0] is a length-0 vector. #> #> Also, in R please use rownames(): it is easier to read and safer. # #There is a second level of problems. The rownames include all #terms, even #those with - signs, so that code would collapse # #y ~ x + z - z # #to y ~ x + z! # #> > Footnote: strange things happen when there is more than #one offset (OK, #> > there probably shouldn't be, but I thought I'd experiment): #> #> That is allowed, and works in general. #> #> > test> fit2 <- glm( y ~ offset( x) + offset( log( x)) + z, data=df) #> > test> fit2$call #> > glm(formula = y ~ offset(x) + offset(log(x)) + z, data = df) #> > #> > test> fit2u <- update( fit2, ~.) #> > test> fit2u$call #> > glm(formula = y ~ offset(log(x)) + z, data = df) #> > #> > Curiously, the 'term.labels' attribute of 'terms(object)' #now includes the second offset, but not the first. #> #> The issue here is the code to remove offset terms fails if #two successive #> terms are offsets, but not otherwise. # #It fact, only if the two successive offsets were first or last for two #separate reasons, which made it hard to track down. # #I have now committed patches for both problems. # #-- #Brian D. Ripley, ripley@stats.ox.ac.uk #Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ #University of Oxford, Tel: +44 1865 272861 (self) #1 South Parks Road, +44 1865 272866 (PA) #Oxford OX1 3TG, UK Fax: +44 1865 272595 # # #
On Mon, 15 Mar 2004 Mark.Bravington@csiro.au wrote:> Thanks (I hadn't realize 'response' could be 0). However, there's now a problem (in R 1.9.0 alpha) with *removing* offsets via 'update': > > > fit2 <- glm( y ~ z + offset(x), data=df) > > fit2$call > glm(formula = y ~ z + offset(x), data = df) > > > update( fit2, ~.-offset(x))$call > glm(formula = y ~ z + offset(x), data = df) > > # the offset wasn't removed even though it should have beenNo, it should not have been. In neither R nor S does - offset remove an offset term. For specials, - is equivalent to +.> glm(y ~ offset(x) + z - offset(x), data=df)Call: glm(formula = y ~ offset(x) + z - offset(x), data = df) Coefficients: (Intercept) z -1.3660 0.1610 is the same as fit1. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595