Ben Bolker
2010-Aug-15 18:38 UTC
[Rd] adding a built-in drop.levels option for subset() in 2.12 ?
With the approach of R 2.12.0: with mild apologies for re-opening this perennial issue: is there any hope, if appropriate patches are submitted, of adding a drop.levels argument (with default equal to FALSE to preserve backward compatibility/efficiency) to the subset function ... ? If not, would a patch to the documentation and/or the R FAQ be accepted? This does seem to be a continuing source of confusion/frustration (it certainly is among my students, and here is some documentation from r-help over the years). Note that some of the earliest threads here refer to the problem (now fixed) that the subset() documentation failed to note that the existing 'drop' argument would *not* (confusingly) drop unused levels. http://finzi.psych.upenn.edu/Rhelp10/2008-April/158566.html http://finzi.psych.upenn.edu/R/Rhelp02/archive/42976.html http://finzi.psych.upenn.edu/R/Rhelp02/archive/36961.html http://finzi.psych.upenn.edu/Rhelp10/2009-November/217878.html http://article.gmane.org/gmane.comp.lang.r.general/200395 This suggestion is milder and less wide-ranging than a global drop.unused.levels option, or than convincing everyone to use strings rather than factors most of the time ... cheers Ben Bolker -- Ben Bolker bbolker at gmail.com , bolker at mcmaster.ca http://www.math.mcmaster.ca/~bolker GPG key: http://www.math.mcmaster.ca/~bolker/benbolker-publickey.asc
Peter Dalgaard
2010-Aug-15 23:23 UTC
[Rd] adding a built-in drop.levels option for subset() in 2.12 ?
Ben Bolker wrote:> With the approach of R 2.12.0: > > with mild apologies for re-opening this perennial issue: > is there any hope, if appropriate patches are submitted, of adding a > drop.levels argument (with default equal to FALSE to preserve backward > compatibility/efficiency) to the subset function ... ? > If not, would a patch to the documentation and/or the R FAQ be accepted?I don't think it is desirable (I probably said so before). As far as I'm concerned, factors should NOT change their level set from subsetting, and if you want them to lose unused levels, f <- factor(f) gets you there soon enough.> > This does seem to be a continuing source of confusion/frustration (it > certainly is among my students, and here is some documentation from > r-help over the years).Well, if you don't give students tasks where it is important to preserve the levels set, then I can believe that they might be frustrated that empty levels are retained. However, if you have a data set with 50-odd responses of (say) good-medium-poor, they would get equally frustrated by having to reinstate the three factor levels after subsetting. (Perhaps you need to have been exposed to SAS PROC FREQ's notorious inability to generate zero-counts, or SPSS barplots labeled 4-6-7-9-10, to see the point.) As far as I can see, the confusion mainly arises when the factor itself is used in subsetting: "I selected sex=='M', but 'F' is still listed as a level". One has to ask whether the same reaction would have been triggered from (say) selecting everyone over 6 foot 2, which just happened to be an all-male population. I suggest that this would be taken as completely uncontroversial:> data(juul2) > juul2 <- transform(juul2, sex=factor(sex,labels=c("M","F"))) > with(subset(juul2, height > 187), table(sex))sex M F 23 0 If the selection is explicitly on sex, it may _feel_ like a contradiction if the other sex is "still present", but to R, a subset is a subset, and R cannot reasonably treat the two cases differently. Note that some of the earliest threads here> refer to the problem (now fixed) that the subset() documentation failed > to note that the existing 'drop' argument would *not* (confusingly) drop > unused levels.(Was that actually misdocumented at the time? Otherwise, I honestly don't know where that confusion came from: drop=TRUE drops single-level dimensions, which it also does in matrix and data frame indexing.)> > http://finzi.psych.upenn.edu/Rhelp10/2008-April/158566.html > http://finzi.psych.upenn.edu/R/Rhelp02/archive/42976.html > http://finzi.psych.upenn.edu/R/Rhelp02/archive/36961.html > http://finzi.psych.upenn.edu/Rhelp10/2009-November/217878.html > http://article.gmane.org/gmane.comp.lang.r.general/200395 > > This suggestion is milder and less wide-ranging than a global > drop.unused.levels option, or than convincing everyone to use strings > rather than factors most of the time ... > > cheers > Ben Bolker > >-- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
peter dalgaard
2010-Aug-23 13:27 UTC
[Rd] adding a built-in drop.levels option for subset() in 2.12 ?
On Aug 15, 2010, at 8:38 PM, Ben Bolker wrote:> > With the approach of R 2.12.0: > > with mild apologies for re-opening this perennial issue: > is there any hope, if appropriate patches are submitted, of adding a > drop.levels argument (with default equal to FALSE to preserve backward > compatibility/efficiency) to the subset function ... ? > If not, would a patch to the documentation and/or the R FAQ be accepted?Ben, there is now a dropLevels() _function_ in R-devel, please try it on for size. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com