There is a warning in the documentation for ?factor (R version 2.3.0) as follows: " The interpretation of a factor depends on both the codes and the '"levels"' attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, 'as.numeric' applied to a factor is meaningless, and may happen by implicit coercion. To "revert" a factor 'f' to its original numeric values, 'as.numeric(levels(f))[f]' is recommended and slightly more efficient than 'as.numeric(as.character(f))'. But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't always do anything useful. For example:> f<-factor(1:3,labels=c("A","B","C")) > f[1] A B C Levels: A B C> as.numeric(f)[1] 1 2 3> as.numeric(levels(f))[f][1] NA NA NA Warning message: NAs introduced by coercion And also,> f<-factor(1:3,labels=c(1,5,6)) > f[1] 1 5 6 Levels: 1 5 6> as.numeric(f)[1] 1 2 3> as.numeric(levels(f))[f][1] 1 5 6 Is the documentation wrong, or is the code wrong, or have I missed something? Cheers, Geoff Russell
Geoff Russell wrote:> There is a warning in the documentation for ?factor (R version 2.3.0) > as follows: > > " The interpretation of a factor depends on both the codes and the > '"levels"' attribute. Be careful only to compare factors with the > same set of levels (in the same order). In particular, > 'as.numeric' applied to a factor is meaningless, and may happen by > implicit coercion. To "revert" a factor 'f' to its original > numeric values, 'as.numeric(levels(f))[f]' is recommended and > slightly more efficient than 'as.numeric(as.character(f))'. > > > But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't > always do anything useful. > > For example: > > >> f<-factor(1:3,labels=c("A","B","C")) >> f >> > [1] A B C > Levels: A B C > >> as.numeric(f) >> > [1] 1 2 3 > >> as.numeric(levels(f))[f] >> > [1] NA NA NA > Warning message: > NAs introduced by coercion > > And also, > > >> f<-factor(1:3,labels=c(1,5,6)) >> f >> > [1] 1 5 6 > Levels: 1 5 6 > >> as.numeric(f) >> > [1] 1 2 3 > >> as.numeric(levels(f))[f] >> > [1] 1 5 6 > > Is the documentation wrong, or is the code wrong, or have I missed > something? >The documentation is somewhat unclear: The last sentence presupposes that the factor was generated from numeric data, i.e. the factor(c(7,9,13)) syndrome: > f <- factor (c(7,9,13)) > f [1] 7 9 13 Levels: 7 9 13 > as.numeric(f) [1] 1 2 3 Also, the statement that as.numeric(f) is meaningless is a bit strong. Probably should say "meaningless without knowledge of the levels and their order". And you can actually compare factors with their levels in different order: > g <- factor (c("7",9,13)) > g [1] 7 9 13 Levels: 13 7 9 > f==g [1] TRUE TRUE TRUE > as.numeric(f)==as.numeric(g) [1] FALSE FALSE FALSE Where you need to be careful is that if you do things like sexsymbols <- c(16, 19) plot(x, y, pch=sexsymbols[sex]), then you should also do legend(x0, y0, legend=levels(sex), pch=sexsymbols) in order to be sure the symbols match the legend. (Notice that indexing with [sex] implicitly coerces sex to numeric).
At 09:41 28.02.2007 +1030, Geoff Russell wrote:>There is a warning in the documentation for ?factor (R version 2.3.0) >as follows: > >" The interpretation of a factor depends on both the codes and the > '"levels"' attribute. Be careful only to compare factors with the > same set of levels (in the same order). In particular, > 'as.numeric' applied to a factor is meaningless, and may happen by > implicit coercion. To "revert" a factor 'f' to its original > numeric values, 'as.numeric(levels(f))[f]' is recommended and > slightly more efficient than 'as.numeric(as.character(f))'. > > >But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't >always do anything useful. > >For example: > >> f<-factor(1:3,labels=c("A","B","C")) >> f >[1] A B C >Levels: A B C >> as.numeric(f) >[1] 1 2 3 >> as.numeric(levels(f))[f] >[1] NA NA NA >Warning message: >NAs introduced by coercion > >And also, > >> f<-factor(1:3,labels=c(1,5,6)) >> f >[1] 1 5 6 >Levels: 1 5 6 >> as.numeric(f) >[1] 1 2 3 >> as.numeric(levels(f))[f] >[1] 1 5 6 > >Is the documentation wrong, or is the code wrong, or have I missed >something? > >Cheers, >Geoff Russell > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > >From "R Language Definition""2.3.1 Factors Factors are used to describe items that can have a finite number of values (gender, social class, etc.). ... Factors are currently implemented using an integer array to specify the actual levels and a second array of names that are mapped to the integers. Rather unfortunately users often make use of the implementation in order to make some calculations easier. This, however, is an implementation issue and is not guaranteed to hold in all implementations of R." In my view factors are (miss)used in different, not necessarily connected ways. A factor may represent a statistical concept i.e. a categorical variable. Further it may be an (internal) way of data reduction or some method for labelling values. In my view these concepts should not be mixed up and would I recommend to avoid factors for data reduction and labelling. Heinz