Matthew Lundberg
2013-Apr-01 20:08 UTC
[R] Factor to numeric conversion - as.numeric(levels(f))[f] - Language definition seems to say to not use this.
Note the edited subject line! I don't know why I typed it as it was before. This says that as.numeric(as.character(f)) will work regardless of the implementation, and I agree. It's the recommendation to use as.numeric(levels(f))[f] that has me wondering about section 2.3.1 of the language definition. I expect that this idiom is in widespread use, and perhaps the language definition should be changed. On Mon, Apr 1, 2013 at 2:58 PM, Bert Gunter <gunter.berton@gene.com> wrote:> Yup. Note also: > > > as.character.factor > function (x, ...) > levels(x)[x] > > But of course this is OK, since this can change if the implementation > does. Which is the whole point, of course. > > -- Bert > > > > On Mon, Apr 1, 2013 at 12:16 PM, Matthew Lundberg > <matthew.k.lundberg@gmail.com> wrote: > > > > When used as an index, the factor is implicitly converted to integer. In > > the expression as.numeric(levels(f))[f], the vector as.numeric(levels(f)) > > is indexed by as.integer(f). > > > > This appears to rely on the current implementation, as mentioned in > section > > 2.3.1 of the language definition. > > > > > > On Mon, Apr 1, 2013 at 1:49 PM, Peter Ehlers <ehlers@ucalgary.ca> wrote: > > > > > On 2013-04-01 10:48, Matthew Lundberg wrote: > > > > > >> These two seem to be at odds. Is this the case? > > >> > > >> From help(factor) - section Warning: > > >>> > > >> > > >> To transform a factor f to approximately its original numeric values, > > >> as.numeric(levels(f))[f] is recommended and slightly more efficient > than > > >> as.numeric(as.character(f)). > > >> > > >> From the language definition - section 2.3.1: > > >>> > > >> > > >> Factors are currently implemented using an integer array to specify > the > > >> actual levels and > > >> a second array of names that are mapped to the integers. Rather > > >> unfortunately users often > > >> make use of the implementation in order to make some calculations > easier. > > >> This, however, > > >> is an implementation issue and is not guaranteed to hold in all > > >> implementations of R. > > >> > > > > > > Hint: > > > > > > f <- factor(sample(5, 10, TRUE)) > > > as.numeric(levels(f))[f] > > > > > > g <- factor(sample(letters[1:5], 10, TRUE)) > > > as.numeric(levels(g))[g] > > > > > > Peter Ehlers > > > > > > > > > > > >> [[alternative HTML version deleted]] > > >> > > >> ______________________________**________________ > > >> R-help@r-project.org mailing list > > >> https://stat.ethz.ch/mailman/**listinfo/r-help< > https://stat.ethz.ch/mailman/listinfo/r-help> > > >> PLEASE do read the posting guide http://www.R-project.org/** > > >> posting-guide.html <http://www.R-project.org/posting-guide.html> > > >> and provide commented, minimal, self-contained, reproducible code. > > >> > > >> > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >[[alternative HTML version deleted]]
Peter Ehlers
2013-Apr-01 21:29 UTC
[R] Factor to numeric conversion - as.numeric(levels(f))[f] - Language definition seems to say to not use this.
On 2013-04-01 13:08, Matthew Lundberg wrote:> Note the edited subject line! I don't know why I typed it as it was before. > > This says that as.numeric(as.character(f)) will work regardless of the > implementation, and I agree. > > It's the recommendation to use as.numeric(levels(f))[f] that has me > wondering about section 2.3.1 of the language definition. I expect that > this idiom is in widespread use, and perhaps the language definition > should be changed.I think that I may be getting an inkling of what your complaint is: section 2.3.1 talks about "an integer array to specify the _actual_ levels" [emphasis added] and "a second array of _names_ that are mapped to the integers". [ditto] When you object to the use of "as.numeric(levels(f))[f]", are you assuming that "levels(f)" is the set of _integers_ or the set of _names_? Anyway, it's indeed the set of names, as returned by the levels() function. Peter Ehlers> > > On Mon, Apr 1, 2013 at 2:58 PM, Bert Gunter <gunter.berton at gene.com > <mailto:gunter.berton at gene.com>> wrote: > > Yup. Note also: > > > as.character.factor > function (x, ...) > levels(x)[x] > > But of course this is OK, since this can change if the implementation > does. Which is the whole point, of course. > > -- Bert > > > > On Mon, Apr 1, 2013 at 12:16 PM, Matthew Lundberg > <matthew.k.lundberg at gmail.com <mailto:matthew.k.lundberg at gmail.com>> > wrote: > > > > When used as an index, the factor is implicitly converted to > integer. In > > the expression as.numeric(levels(f))[f], the vector > as.numeric(levels(f)) > > is indexed by as.integer(f). > > > > This appears to rely on the current implementation, as mentioned > in section > > 2.3.1 of the language definition. > > > > > > On Mon, Apr 1, 2013 at 1:49 PM, Peter Ehlers <ehlers at ucalgary.ca > <mailto:ehlers at ucalgary.ca>> wrote: > > > > > On 2013-04-01 10:48, Matthew Lundberg wrote: > > > > > >> These two seem to be at odds. Is this the case? > > >> > > >> From help(factor) - section Warning: > > >>> > > >> > > >> To transform a factor f to approximately its original numeric > values, > > >> as.numeric(levels(f))[f] is recommended and slightly more > efficient than > > >> as.numeric(as.character(f)). > > >> > > >> From the language definition - section 2.3.1: > > >>> > > >> > > >> Factors are currently implemented using an integer array to > specify the > > >> actual levels and > > >> a second array of names that are mapped to the integers. Rather > > >> unfortunately users often > > >> make use of the implementation in order to make some > calculations easier. > > >> This, however, > > >> is an implementation issue and is not guaranteed to hold in all > > >> implementations of R. > > >> > > > > > > Hint: > > > > > > f <- factor(sample(5, 10, TRUE)) > > > as.numeric(levels(f))[f] > > > > > > g <- factor(sample(letters[1:5], 10, TRUE)) > > > as.numeric(levels(g))[g] > > > > > > Peter Ehlers > > > > > > > > > > > >> [[alternative HTML version deleted]] > > >> > > >> ______________________________**________________ > > >> R-help at r-project.org <mailto:R-help at r-project.org> mailing list > > >> > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > > >> PLEASE do read the posting guide http://www.R-project.org/** > > >> posting-guide.html <http://www.R-project.org/posting-guide.html> > > >> and provide commented, minimal, self-contained, reproducible code. > > >> > > >> > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > >
Reasonably Related Threads
- Factor to numeric conversion - as.numeric(as.character(f))[f] - Language definition seems to say to not use this.
- median by geometric mean -- are we missing what's important?
- Plotmath bug or my misunderstanding?
- Using seq_len() vs 1:n]
- Convenience-at-the-expense-of-clarity (was: quantmod's addTA plotting functions)