Suharto Anggono Suharto Anggono
2012-Dec-13  07:24 UTC
[Rd] Suggestion of change to reduce overhead of 'table'
In R 2.7.2, if argument 'exclude' is not specified and input is already a factor, function 'table' uses the input as is. In R 2.15.2, in the same case, function 'table' always applies function 'factor' to the input. The time spent by 'factor' is not long, but is not negligible. I suggest to change 'table' so that 'factor' is not called for input that is already a factor when it is known that the resulting levels is as in the input. This is diff against https://svn.r-project.org/R/trunk/src/library/base/R/table.R. 85c85,88 <? ? ? ? ? ? ? ? ? ? ? ???a <- factor(a, levels = ll[!(ll %in% exclude)], --->? ? ? ? ? ? ? ? ? ? ? ???llexcl <- ll %in% exclude >? ? ? ? ? ? ? ? ? ? ? ???if (any(llexcl) || >? ? ? ? ? ? ? ? ? ? ? ???(useNA == "no" && any(is.na(ll)))) >? ? ? ? ? ? ? ? ? ? ? ? ? ???factor(a, levels = ll[!llexcl],86a90,91>? ? ? ? ? ? ? ? ? ? ? ???else >? ? ? ? ? ? ? ? ? ? ? ? ? ???aFunction 'table' calls function 'addNA' in some cases. I suggest to change 'addNA', too. This is diff against https://svn.r-project.org/R/trunk/src/library/base/R/factor.R. 336d335 <? ???if (ifany & !any(is.na(x))) return(x) 338c337,339 <? ???if (!any(is.na(ll))) ll <- c(ll, NA) --->? ???hasNAlev <- any(is.na(ll)) >? ???if ((ifany || hasNAlev) && !any(is.na(x))) return(x) >? ???if (!hasNAlev) ll <- c(ll, NA)Instead of calling 'factor', 'addNA' can also change "levels" attribute and accordingly fill missing value in internal code of the factor.
