thr3ads.net - R devel - [Rd] Suggestion on default 'levels' in 'factor' [May 2016]

If this information is useful, please help other people find it:
Share via:

Suharto Anggono Suharto Anggono

2016-May-06 08:05 UTC

[Rd] Suggestion on default 'levels' in 'factor'

At first read, the logic of the following fragment in code of function
'factor' was not clear to me.
    if (missing(levels)) {
	y <- unique(x, nmax = nmax)
	ind <- sort.list(y) # or possibly order(x) which is more (too ?) tolerant
	y <- as.character(y)
	levels <- unique(y[ind])
    }

Code similar to the originally proposed in
https://stat.ethz.ch/pipermail/r-devel/2009-May/053316.html is more readable to
me.

I suggest using this.
    if (missing(levels))
	levels <- unique(as.character(
            sort.int(unique(x, nmax = nmax), na.last = TRUE)# or possibly
sort(x) which is more (too ?) tolerant
            ))

I assume that as.character(y)[sort.list(y)] is equivalent to
as.character(sort.int(y, na.last = TRUE)). So, what I suggest above has the same
effect as code in current 'factor'.  Function 'sort.int' instead
of 'sort' to be like 'sort.list' that fails for non-atomic
input.

What I suggest is similar in form to default 'levels' in
'factor' in R before version 2.10.0, which is
sort(unique.default(x), na.last = TRUE)

If this suggestion is used, the help page for 'factor' can be changed to
say "(by 'sort.int')" instead of "(by
'sort.list')".

R devel - May 2016 - Suggestion on default 'levels' in 'factor'

[Rd] Suggestion on default 'levels' in 'factor'