a <- data.frame(b = rep(1:5, each=2), c=factor(rep("a",10), levels=c("a","b"))) levels(subset(a, b=1, drop=T)$c) # [1] "a" "b" Is this a bug? Thanks,, Hadley
hadley wickham wrote:> a <- data.frame(b = rep(1:5, each=2), c=factor(rep("a",10), levels=c("a","b"))) > levels(subset(a, b=1, drop=T)$c) > # [1] "a" "b" > > Is this a bug? >No, drop = TRUE is not doing what you're thinking it's doing. From ?subset (R-1.9.1): The 'drop' argument is passed on to the indexing method for data frames. I think you're hoping it is passed on to the indexing method for factors, which it isn't. --sundar
hadley wickham wrote:> a <- data.frame(b = rep(1:5, each=2), c=factor(rep("a",10), levels=c("a","b"))) > levels(subset(a, b=1, drop=T)$c) > # [1] "a" "b" > > Is this a bug? >Also, I think you meant: levels(subset(a, b==1, drop=T)$c) (Note the double-equals for logical equality) --sundar
hadley wickham <h.wickham at gmail.com> writes:> a <- data.frame(b = rep(1:5, each=2), c=factor(rep("a",10), levels=c("a","b"))) > levels(subset(a, b=1, drop=T)$c) > # [1] "a" "b" > > Is this a bug?In some older versions of R (at least older than 1.9.0), there was a documentation bug in that the help page said that it would drop unused levels. Nowadays, it says drop: passed on to '[' indexing operator. which is correct and the docs for [.data.frame drop: logical. If 'TRUE' the result is coerced to the lowest possible dimension: however, see the Warning below. (and that Warning seems to have a typo, but leave that for now...). So no, it is not a bug, that's not what that argument is for. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
> From: hadley wickham > > a <- data.frame(b = rep(1:5, each=2), c=factor(rep("a",10), > levels=c("a","b"))) > levels(subset(a, b=1, drop=T)$c) > # [1] "a" "b" > > Is this a bug?Don't think so:> args("[.data.frame")function (x, i, j, drop = if (missing(i)) TRUE else length(cols) == 1) NULL So the `drop' argument is passed to the "[" method for data.frame (as documented in ?subset), and not the "[" method for factor, as that's never called. Andy> Thanks,, > > Hadley > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
hadley wickham wrote:> a <- data.frame(b = rep(1:5, each=2), c=factor(rep("a",10), levels=c("a","b"))) > levels(subset(a, b=1, drop=T)$c) > # [1] "a" "b" > > Is this a bug? > > Thanks,, > > Hadley >This is always controversial. I am apparently in the small minority in believing that the default behavior is what you are wishing for. That's why the Hmisc package by default drops unused levels (but allows you to override that with options(drop.unused.levels=FALSE). It is distasteful to have to override system behavior but I felt I had to in this case. No one in R-core wanted to add a non-default option to R e.g. options(drop.unused.levels=TRUE). Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University