Dr. Jens Oehlschlägel
2016-Sep-27 19:33 UTC
[Rd] problem in levels<- and other inconsistencies
# A couple of years ago # I helped making R's character NA handling more consistent # Today I report an issue with R's factor NA handling # The core problem is that # levels(g) <- levels(g) # can change the levels of g # more details below # Kind regards # Jens Oehlschl?gel # Say I have an NA element in a vector or list x <- c("a","b",NA) # then using split() it gets lost split(x, x) # as it is (somewhat) when converting to a default factor table(as.factor(x)) # for table the workaround is table(as.factor(x), exclude=NULL) # but for split we need f <- factor(x, exclude=NULL) split(x, f) # conclusion: we MUST use an NA level # so far so good g <- f levels(g) # but re-assigning the levels changes them levels(g) <- levels(g) levels(g) # which I consider a severe problem. # Yes, I read the help page of levels<- # about removing levels by assigning NAs to them # but that implies: we MUST NOT use an NA level # If a language suggests # that we MUST and we MUST NOT use an NA level # the language has limited usefulness # (and a user who depends on the language # is put into a DOUBLE BIND) # SUGGESTION: assure the above assignment does not change levels # trying to apply the levels of f to new data also fails g <- factor(x, levels=levels(f)) g # and giving both arguments even stops h <- factor(x, levels=levels(f), labels=levels(f)) # I do understand that exclude= meaningfully has effect # if levels= are to be determined automatically, but # SUGGESTION: with explicit levels= exclude= should be ignored. # SUGGESTION: give split(x, y, exclude=NA) an exclude= argument, # which when set to NULL will prevent dropping NA levels # when coercing y to factor # (it still remains open what should have priority # if y is a factor with an NA-level and exclude=NA) table(f, exclude=NA) # here existing levels win over exclude# which is consistent with my suggestion for factor(, levels=, exclude=)
Hi, I totally agree that having foo(x) <- foo(x) behave like a no-op is a must. This is something I try to be careful about when I design my own objects and their getters and setters. Just wanted to mention though that there is notorious violation of this: x <- list(3:-1, NULL) x[[2]] <- x[[2]] x # [[1]] # [1] 3 2 1 0 -1 Now of course, not just because there is a precedent means the factor API shouldn't be improved. Cheers, H. On 09/27/2016 12:33 PM, Dr. Jens Oehlschl?gel wrote:> # A couple of years ago > # I helped making R's character NA handling more consistent > # Today I report an issue with R's factor NA handling > # The core problem is that > # levels(g) <- levels(g) > # can change the levels of g > # more details below > # Kind regards > # Jens Oehlschl?gel > > # Say I have an NA element in a vector or list > > x <- c("a","b",NA) > > # then using split() it gets lost > > split(x, x) > > # as it is (somewhat) when converting to a default factor > > table(as.factor(x)) > > # for table the workaround is > > table(as.factor(x), exclude=NULL) > > # but for split we need > > f <- factor(x, exclude=NULL) > > split(x, f) > > # conclusion: we MUST use an NA level > > # so far so good > > g <- f > levels(g) > > # but re-assigning the levels changes them > > levels(g) <- levels(g) > levels(g) > > # which I consider a severe problem. > # Yes, I read the help page of levels<- > # about removing levels by assigning NAs to them > # but that implies: we MUST NOT use an NA level > > # If a language suggests > # that we MUST and we MUST NOT use an NA level > # the language has limited usefulness > # (and a user who depends on the language > # is put into a DOUBLE BIND) > # SUGGESTION: assure the above assignment does not change levels > > # trying to apply the levels of f to new data also fails > > g <- factor(x, levels=levels(f)) > g > > # and giving both arguments even stops > > h <- factor(x, levels=levels(f), labels=levels(f)) > > # I do understand that exclude= meaningfully has effect > # if levels= are to be determined automatically, but > # SUGGESTION: with explicit levels= exclude= should be ignored. > > # SUGGESTION: give split(x, y, exclude=NA) an exclude= argument, > # which when set to NULL will prevent dropping NA levels > # when coercing y to factor > # (it still remains open what should have priority > # if y is a factor with an NA-level and exclude=NA) > > table(f, exclude=NA) > > # here existing levels win over exclude> # which is consistent with my suggestion for factor(, levels=, exclude=) > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
jens.oehlschlaegel at truecluster.com
2016-Sep-28 08:50 UTC
[Rd] problem in levels<- and other inconsistencies
Herv?, ? Good point, but easy to solve: ? since list[i] # always is.list deleting a list element with list[i] <- NULL # !is.list(NULL) does not lead into a contradiction whereas list[[i]] <- NULL should do the same as list[i] <- list(NULL) YES, I know that this would be major change, but NO, this is no justification to not fix a mistake in a language. Unless one has given up to fix the language, in which case we all should switch to another one (Julia, ...) Jens ? ? Gesendet:?Dienstag, 27. September 2016 um 23:20 Uhr Von:?"Herv? Pag?s" <hpages at fredhutch.org> An:?"Dr. Jens Oehlschl?gel" <Jens.Oehlschlaegel at truecluster.com>, r-devel at r-project.org Betreff:?Re: [Rd] problem in levels<- and other inconsistencies Hi, I totally agree that having foo(x) <- foo(x) behave like a no-op is a must. This is something I try to be careful about when I design my own objects and their getters and setters. Just wanted to mention though that there is notorious violation of this: x <- list(3:-1, NULL) x[[2]] <- x[[2]] x # [[1]] # [1] 3 2 1 0 -1 Now of course, not just because there is a precedent means the factor API shouldn't be improved. Cheers, H. On 09/27/2016 12:33 PM, Dr. Jens Oehlschl?gel wrote:> # A couple of years ago > # I helped making R's character NA handling more consistent > # Today I report an issue with R's factor NA handling > # The core problem is that > # levels(g) <- levels(g) > # can change the levels of g > # more details below > # Kind regards > # Jens Oehlschl?gel > > # Say I have an NA element in a vector or list > > x <- c("a","b",NA) > > # then using split() it gets lost > > split(x, x) > > # as it is (somewhat) when converting to a default factor > > table(as.factor(x)) > > # for table the workaround is > > table(as.factor(x), exclude=NULL) > > # but for split we need > > f <- factor(x, exclude=NULL) > > split(x, f) > > # conclusion: we MUST use an NA level > > # so far so good > > g <- f > levels(g) > > # but re-assigning the levels changes them > > levels(g) <- levels(g) > levels(g) > > # which I consider a severe problem. > # Yes, I read the help page of levels<- > # about removing levels by assigning NAs to them > # but that implies: we MUST NOT use an NA level > > # If a language suggests > # that we MUST and we MUST NOT use an NA level > # the language has limited usefulness > # (and a user who depends on the language > # is put into a DOUBLE BIND) > # SUGGESTION: assure the above assignment does not change levels > > # trying to apply the levels of f to new data also fails > > g <- factor(x, levels=levels(f)) > g > > # and giving both arguments even stops > > h <- factor(x, levels=levels(f), labels=levels(f)) > > # I do understand that exclude= meaningfully has effect > # if levels= are to be determined automatically, but > # SUGGESTION: with explicit levels= exclude= should be ignored. > > # SUGGESTION: give split(x, y, exclude=NA) an exclude= argument, > # which when set to NULL will prevent dropping NA levels > # when coercing y to factor > # (it still remains open what should have priority > # if y is a factor with an NA-level and exclude=NA) > > table(f, exclude=NA) > > # here existing levels win over exclude> # which is consistent with my suggestion for factor(, levels=, exclude=) > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319