Duncan Murdoch
2005-Oct-13 14:02 UTC
[R] Removing and restoring factor levels (TYPO CORRECTED)
Sorry, a typo in my previous message (parens in the wrong place in the conversion). Here it is corrected: I'm doing a big slow computation, and profiling shows that it is spending a lot of time in match(), apparently because I have code like x %in% listofxvals Both x and listofxvals are factors with the same levels, so I could probably speed this up by stripping off the levels and just treating them as integer vectors, then restoring the levels at the end. What is the safest way to do this? I am worried that at some point x and listofxvals will *not* have the same levels, and the optimization will give the wrong answer. So I need code that guarantees they have the same coding. I think this works, where "master" is a factor with the master list of levels (guaranteed to be a superset of the levels of x and listofxvals), but can anyone spot anything that might go wrong? # Strip the levels x <- as.integer( factor(x, levels = levels(master) ) ) # Restore the levels x <- structure( x, levels = levels(master), class = "factor" ) Thanks for any advice... Duncan Murdoch
Marc Schwartz (via MN)
2005-Oct-13 17:07 UTC
[R] Removing and restoring factor levels (TYPO CORRECTED)
On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:> Sorry, a typo in my previous message (parens in the wrong place in the > conversion). > > Here it is corrected: > > I'm doing a big slow computation, and profiling shows that it is > spending a lot of time in match(), apparently because I have code like > > x %in% listofxvals > > Both x and listofxvals are factors with the same levels, so I could > probably speed this up by stripping off the levels and just treating > them as integer vectors, then restoring the levels at the end. > > What is the safest way to do this? I am worried that at some point x > and listofxvals will *not* have the same levels, and the optimization > will give the wrong answer. So I need code that guarantees they have > the same coding. > > I think this works, where "master" is a factor with the master list of > levels (guaranteed to be a superset of the levels of x and listofxvals), > but can anyone spot anything that might go wrong? > > # Strip the levels > x <- as.integer( factor(x, levels = levels(master) ) ) > > # Restore the levels > x <- structure( x, levels = levels(master), class = "factor" ) > > Thanks for any advice... > > Duncan MurdochDuncan, With the predicate that 'master' has the full superset of all possible factor levels defined, it would seem that this would be a reasonable way to go. This approach would also seem to eliminate whatever overhead is encountered as a result of the coercion of 'x' as a factor to a character vector, which is done by match(). One question I have is, what is the advantage of using structure() versus: x <- factor(x, levels = levels(master)) ? Thanks, Marc
Reasonably Related Threads
- Removing and restoring factor levels
- Re: [libnbd PATCH 1/2] generator: Refactor handling of closures in unlocked functions
- [v2v PATCH v2 2/3] lib/utils: make "chown_for_libvirt_rhbz_1045069" fail hard
- Coming Soon...
- [v2v PATCH v2 2/3] lib/utils: make "chown_for_libvirt_rhbz_1045069" fail hard