Suharto Anggono Suharto Anggono
2016-May-06 08:05 UTC
[Rd] Suggestion on default 'levels' in 'factor'
At first read, the logic of the following fragment in code of function 'factor' was not clear to me. if (missing(levels)) { y <- unique(x, nmax = nmax) ind <- sort.list(y) # or possibly order(x) which is more (too ?) tolerant y <- as.character(y) levels <- unique(y[ind]) } Code similar to the originally proposed in https://stat.ethz.ch/pipermail/r-devel/2009-May/053316.html is more readable to me. I suggest using this. if (missing(levels)) levels <- unique(as.character( sort.int(unique(x, nmax = nmax), na.last = TRUE)# or possibly sort(x) which is more (too ?) tolerant )) I assume that as.character(y)[sort.list(y)] is equivalent to as.character(sort.int(y, na.last = TRUE)). So, what I suggest above has the same effect as code in current 'factor'. Function 'sort.int' instead of 'sort' to be like 'sort.list' that fails for non-atomic input. What I suggest is similar in form to default 'levels' in 'factor' in R before version 2.10.0, which is sort(unique.default(x), na.last = TRUE) If this suggestion is used, the help page for 'factor' can be changed to say "(by 'sort.int')" instead of "(by 'sort.list')".