J. Hosking
2006-Jul-11 20:52 UTC
[Rd] Dropping unused levels of a factor that has "NA" as a level
Is this a bug? > f1 <- factor(c("a", NA), levels = c("a", "NA") ) > f2 <- f1[, drop = TRUE] > f2 [1] a <NA> Levels: a <NA> I would have expected f2 to have only one level, "a". It seems to me that the code in [.factor does not follow the advice in help("factor") on how to set factor codes to be missing when "NA" is a level of the factor. J. R. M. Hosking
Peter Dalgaard
2006-Jul-11 21:58 UTC
[Rd] Dropping unused levels of a factor that has "NA" as a level
"J. Hosking" <jh910 at juno.com> writes:> Is this a bug? > > > f1 <- factor(c("a", NA), levels = c("a", "NA") ) > > f2 <- f1[, drop = TRUE] > > f2 > [1] a <NA> > Levels: a <NA> > > I would have expected f2 to have only one level, "a". It seems > to me that the code in [.factor does not follow the advice in > help("factor") on how to set factor codes to be missing when > "NA" is a level of the factor.Something odd is going on, that's for sure... The problem is also there with factor(f1). And the logic in as.character.factor seems to be at the root of it:> as.character.factorfunction (x, ...) { cx <- levels(x)[x] if ("NA" %in% levels(x)) cx[is.na(x)] <- "<NA>" cx } This looks like something from before we had character NA values. I wonder if it is a mistake or there could actually be a reason to keep it. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Brahm, David
2006-Jul-11 22:19 UTC
[Rd] Dropping unused levels of a factor that has "NA" as a level
I mentioned this in R-help on April 28: <stat.ethz.ch/pipermail/r-help/2006-April/104595.html> | as.character.factor contains this line (where cx=levels(x)[x]): | if ("NA" %in% levels(x)) cx[is.na(x)] <- "<NA>" | | Is it possible that this is no longer the desired behavior? These | two results don't seem very consistent: | | > as.character(as.factor(c("AB", "CD", NA))) | [1] "AB" "CD" NA | > is.na(.Last.value)[3] | [1] TRUE | | > as.character(as.factor(c("NA", "CD", NA))) | [1] "NA" "CD" "<NA>" | > is.na(.Last.value)[3] | [1] FALSE | | I'm using R-2.3.0 on Redhat Linux, but I don't think the behavior | is new (maybe since character NA's were introduced?). | | -- David Brahm (brahm at alum.mit.edu) -----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Peter Dalgaard Sent: Tuesday, July 11, 2006 5:59 PM To: J. Hosking Cc: r-devel at stat.math.ethz.ch Subject: Re: [Rd] Dropping unused levels of a factor that has "NA" as a level "J. Hosking" <jh910 at juno.com> writes:> Is this a bug? > > > f1 <- factor(c("a", NA), levels = c("a", "NA") ) > > f2 <- f1[, drop = TRUE] > > f2 > [1] a <NA> > Levels: a <NA> > > I would have expected f2 to have only one level, "a". It seems > to me that the code in [.factor does not follow the advice in > help("factor") on how to set factor codes to be missing when > "NA" is a level of the factor.Something odd is going on, that's for sure... The problem is also there with factor(f1). And the logic in as.character.factor seems to be at the root of it:> as.character.factorfunction (x, ...) { cx <- levels(x)[x] if ("NA" %in% levels(x)) cx[is.na(x)] <- "<NA>" cx } This looks like something from before we had character NA values. I wonder if it is a mistake or there could actually be a reason to keep it. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-devel at r-project.org mailing list stat.ethz.ch/mailman/listinfo/r-devel