Dear list, subset has a 'drop' argument that I had often mistaken for the one in [.factor which removes unused levels. Clearly it doesn't work that way, as shown below, d <- data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3])) s <- subset(d, y=="A", drop=TRUE) str(s) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 15 levels "a","b","c","d",..: 1 4 7 10 13 $ y: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 The subset still retains all the unused factor levels. I wonder how people usually get rid of all unused levels in a data.frame after subsetting? I came up with this but I may have missed a better built-in solution, dropit <- function (d, columns = names(d), ...) { d[columns] = lapply(d[columns], "[", drop=TRUE, ...) d } str(dropit(s)) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 5 levels "a","d","g","j",..: 1 2 3 4 5 $ y: Factor w/ 1 level "A": 1 1 1 1 1 Best regards, baptiste
On Nov 10, 2009, at 10:49 AM, baptiste auguie wrote:> Dear list, > > subset has a 'drop' argument that I had often mistaken for the one in > [.factor which removes unused levels. > Clearly it doesn't work that way, as shown below, > > d <- data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3])) > s <- subset(d, y=="A", drop=TRUE) > str(s) > 'data.frame': 5 obs. of 2 variables: > $ x: Factor w/ 15 levels "a","b","c","d",..: 1 4 7 10 13 > $ y: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 > > The subset still retains all the unused factor levels. I wonder how > people usually get rid of all unused levels in a data.frame after > subsetting? I came up with this but I may have missed a better > built-in solution, > > dropit <- function (d, columns = names(d), ...) > { > d[columns] = lapply(d[columns], "[", drop=TRUE, ...) > d > } >If you are looking for a one-liner, then consider: data.frame(lapply(s, function(x) if (is.factor(x)){ factor(x)} else {x})) I added a numeric column to make sure I had not clobbered a non-factor variable. > d <- data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3]), N=1:15) > s <- subset(d, y=="A", drop=TRUE) > str( data.frame(lapply(s, function(x) if (is.factor(x)){ factor(x)} else {x})) ) 'data.frame': 5 obs. of 3 variables: $ x: Factor w/ 5 levels "a","d","g","j",..: 1 2 3 4 5 $ y: Factor w/ 1 level "A": 1 1 1 1 1 $ N: int 1 4 7 10 13> str(dropit(s)) > 'data.frame': 5 obs. of 2 variables: > $ x: Factor w/ 5 levels "a","d","g","j",..: 1 2 3 4 5 > $ y: Factor w/ 1 level "A": 1 1 1 1 1 > > > Best regards, > > baptiste > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
On Nov 10, 2009, at 9:49 AM, baptiste auguie wrote:> Dear list, > > subset has a 'drop' argument that I had often mistaken for the one in > [.factor which removes unused levels. > Clearly it doesn't work that way, as shown below, > > d <- data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3])) > s <- subset(d, y=="A", drop=TRUE) > str(s) > 'data.frame': 5 obs. of 2 variables: > $ x: Factor w/ 15 levels "a","b","c","d",..: 1 4 7 10 13 > $ y: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 > > The subset still retains all the unused factor levels. I wonder how > people usually get rid of all unused levels in a data.frame after > subsetting? I came up with this but I may have missed a better > built-in solution, > > dropit <- function (d, columns = names(d), ...) > { > d[columns] = lapply(d[columns], "[", drop=TRUE, ...) > d > } > > str(dropit(s)) > 'data.frame': 5 obs. of 2 variables: > $ x: Factor w/ 5 levels "a","d","g","j",..: 1 2 3 4 5 > $ y: Factor w/ 1 level "A": 1 1 1 1 1There is a page in the R wiki here: http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:drop_unused_levels that has some approaches. HTH, Marc Schwartz
If you don't want to preserve factor levels when subsetting use characters. There are very few other differences in behavior. Hadley On Tuesday, November 10, 2009, baptiste auguie <baptiste.auguie at googlemail.com> wrote:> Dear list, > > subset has a 'drop' argument that I had often mistaken for the one in > [.factor which removes unused levels. > Clearly it doesn't work that way, as shown below, > > d <- data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3])) > s <- subset(d, y=="A", drop=TRUE) > str(s) > 'data.frame': ? 5 obs. of ?2 variables: > ?$ x: Factor w/ 15 levels "a","b","c","d",..: 1 4 7 10 13 > ?$ y: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 > > The subset still retains all the unused factor levels. I wonder how > people usually get rid of all unused levels in a data.frame after > subsetting? I came up with this but I may have missed a better > built-in solution, > > dropit <- function (d, columns = names(d), ...) > { > ? ?d[columns] = lapply(d[columns], "[", drop=TRUE, ...) > ? ?d > } > > str(dropit(s)) > 'data.frame': ? 5 obs. of ?2 variables: > ?$ x: Factor w/ 5 levels "a","d","g","j",..: 1 2 3 4 5 > ?$ y: Factor w/ 1 level "A": 1 1 1 1 1 > > > Best regards, > > baptiste > > ______________________________________________ > R-help at r-project.org?mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- http://had.co.nz/