Dear R-people,
recently at s-news we had a discussion about factor().
I thought you might be interested in some of my thoughts about factors.
Any comments welcome
Best regards
Jens Oehlschlaegel-Akiyoshi
-------------------------------------------------------------------
I think the problem is deeper than that factors would just be handled
inapprobriately by some S+ functions, the inconsistency is built in
the constructor factor(), and thus is built in the concept!
Let me cite S+ online help of factor() on the meaning of levels:
VALUE
object of class "factor", representing values taken from the finite
set
given by levels. It is important that this object
is not numeric; in particular, comparisons and other operations behave
AS IF THEY OPERATED ON VALUES FROM THE LEVELS SET, WHICH IS ALWAYS OF
MODE CHARACTER.
Let's try comparisions on values from the level set:
> my.animals <- c(4,5,6,4,5,6)
> my.levels <- 4:6
> my.labels <- c("dog","cat","rat")
> animals <- factor(my.animals,levels=my.levels,labels=my.labels)
> unclass(animals)
[1] 1 2 3 1 2 3
attr(, "levels"):
[1] "dog" "cat" "rat"
Obviously labels become levels, and comparisions with levels will never work,
whether levels are given as numerics or as characters, as in
> animals==4
[1] F F F F F F> animals=="4"
[1] F F F F F F
Obviously currently all comparisions work on the LABELS SET, which is the
only one stored with the factor object, but which - for purpose of
confusion - is a named attribute "levels", whereas
> labels(animals)
[1] "1" "2" "3" "4" "5"
"6"
is a totally different story.
All a user like me wishes is something like
> unclass(animals)
[1] 4 5 6 4 5 6
attr(, "labels"):
[1] "dog" "cat" "rat"
and of course it would be user-friendly if S+ would recognize which of
both representations are meant in comparisions and assignments like
animals==4
animals=="dog"
animals[1] <- 4
animals[1] <- "dog"
not coercing 4s to "4"s but interpreting animals the right way,
isn't S+
an interpreter?
If there is no way around a need for internal numeric representation of
integers 1:x then a factor object could look like
> unclass(animals)
[1] 1 2 3 1 2 3
attr(, "levels"):
[1] "dog" "cat" "rat"
attr(, "nlevels"):
[1] 4 5 6
and consequently
[old function, sorry for the s-at-the-end-confusion]
codes(animals) could return 1 2 3 1 2 3
[old function]
levels(animals) could return "dog" "cat" "rat"
[new function]
level(animals) could return "dog" "cat" "rat"
"dog" "cat" "rat"
[new function]
nlevels(animals) could return 4 5 6
[new function]
nlevel(animals) could return 4 5 6 4 5 6
for reasons of consistency, so everyone would know he has to write either
nlevel(animals)==4
nlevel(animals)[1] <- 4
or
level(animals)=="dog"
level(animals)[1] <- "dog"
Concerning defaults one probably would keep returning
animals
[1] "dog" "cat" "rat" "dog"
"cat" "rat"
for reasons of compatibility
but make the evaluator coerce animals to nlevel(animals) in
animals[1] <- 4
instead of
coercing 4 to "4"
I have no idea about how to change the constructor factor() to keep
compatibility, perhaps just keep arguments old.levels=new.nlevels and
old.labels=new.levels, together with a clear help function?
DOES THAT SOUND ANY BETTER?
yours sincerely
--
Jens Oehlschlaegel-Akiyoshi
Psychologist/Statistician
Project TR-EAT + COST Action B6
F.rankfurt
oehl@psyres-stuttgart.de A.ttention
+49 711 6781-408 (phone) I.nventory
+49 711 6876902 (fax) R .-----.
/ ----- \
Center for Psychotherapy Research | | 0 0 | |
Christian-Belser-Strasse 79a | | ? | |
D-70597 Stuttgart Germany \ ----- /
-------------------------------------------------- '-----' -
it's better
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:
r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-