Hi all, Is there are reason that there is no c.factor method? Analogous to c.Date, I'd expect something like the following to be useful: c.factor <- function(...) { factors <- list(...) levels <- unique(unlist(lapply(factors, levels))) char <- unlist(lapply(factors, as.character)) factor(char, levels = levels) } c(factor("a"), factor("b"), factor(c("c", "b","a")), factor("d")) # [1] a b c b a d # Levels: a b c d Hadley -- http://had.co.nz/
The argument I have in 'The R Inferno' is that how you want to combine factors may differ from someone else's desires. There are lots of tricky questions: What about ordered factors? What if the ordered levels are different in different objects? ... Pat On 04/02/2010 15:53, Hadley Wickham wrote:> Hi all, > > Is there are reason that there is no c.factor method? Analogous to > c.Date, I'd expect something like the following to be useful: > > c.factor<- function(...) { > factors<- list(...) > levels<- unique(unlist(lapply(factors, levels))) > char<- unlist(lapply(factors, as.character)) > > factor(char, levels = levels) > } > > c(factor("a"), factor("b"), factor(c("c", "b","a")), factor("d")) > # [1] a b c b a d > # Levels: a b c d > > Hadley >-- Patrick Burns pburns at pburns.seanet.com http://www.burns-stat.com (home of 'The R Inferno' and 'A Guide for the Unwilling S User')
On Thu, 4 Feb 2010, Hadley Wickham wrote:> Hi all, > > Is there are reason that there is no c.factor method? Analogous to > c.Date, I'd expect something like the following to be useful: > > c.factor <- function(...) { > factors <- list(...) > levels <- unique(unlist(lapply(factors, levels))) > char <- unlist(lapply(factors, as.character)) > > factor(char, levels = levels) > } > > c(factor("a"), factor("b"), factor(c("c", "b","a")), factor("d")) > # [1] a b c b a d > # Levels: a b c d >It's well established that different people have different views on what factors should do, but this doesn't match mine. I think of factors as enumerated data types where the factor levels already specify all the valid values for the factor, so I wouldn't want to be able to combine two factors with different sets of levels. For example: A <- factor("orange",levels=c("orange","yellow","red","purple")) B <- factor("orange", levels=c("orange","apple","mango", "banananana")) On the other hand, I think the current behaviour, which reduces them to numbers, is just wrong. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
A search for "c.factor" returns tons of hits on this topic. Heres just one of the hits from 2006, when I asked the same question : http://tolstoy.newcastle.edu.au/R/e2/devel/06/11/1137.html So it appears to be complicated and there are good reasons. Since I needed it, I created c.factor in data.table package, below. It does it more efficiently since it doesn't convert each factor to character (hence losing some of the benefit). I've been told I'm not unique in this approach and that other packages also have their own c.factor. It deliberately isn't exported. Its worked well for me over the years anyway. c.factor = function(...) { args <- list(...) for (i in seq(along=args)) if (!is.factor(args[[i]])) args[[i]] = as.factor(args[[i]]) # The first must be factor otherwise we wouldn't be inside c.factor, its checked anyway in the line above. newlevels = sort(unique(unlist(lapply(args,levels)))) ans = unlist(lapply(args, function(x) { m = match(levels(x), newlevels) m[as.integer(x)] })) levels(ans) = newlevels class(ans) = "factor" ans } "Hadley Wickham" <hadley at rice.edu> wrote in message news:f8e6ff051002040753x33282f33l78fce9f98dc29ae8 at mail.gmail.com...> Hi all, > > Is there are reason that there is no c.factor method? Analogous to > c.Date, I'd expect something like the following to be useful: > > c.factor <- function(...) { > factors <- list(...) > levels <- unique(unlist(lapply(factors, levels))) > char <- unlist(lapply(factors, as.character)) > > factor(char, levels = levels) > } > > c(factor("a"), factor("b"), factor(c("c", "b","a")), factor("d")) > # [1] a b c b a d > # Levels: a b c d > > Hadley > > -- > http://had.co.nz/ >
>c() should have been put on the deprecated list a couple >of decades agoDon't you dare!>Back to realityphew! had me worried there. c() is no problem at all for lists, Dates and most simple vector types; why deprecate something solely because it doesn't behave for something it doesn't claim to work on? Steve E ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}