Tal Galili
2010-Mar-07 10:07 UTC
[R] Why can't "apply" be used with "as.factor" on a data.frame ?
Hi all, Let's say I have a data.frame and wants to turn each of it's columns into a factor. My instinct would be to use as.factor with apply. But this won't work, and result with a data.frame of characters. I found another solution for how to achieve this, but I would also like to understand - *WHY* does it work this way? Here is an example script: a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), x3 = factor(c(rep("a",50) , rep("b",50)))) apply(a2, 2,class) # why is column 3 not a factor ? a[,3] # since it IS a factor. a2 <- apply(a, 2,as.factor) # won't work - why not ? a2[,3] # Why was this just turned into a character ??? # A solution a2 <- lapply(a, as.factor) a3 <- as.data.frame(a2) str(a3) Thanks, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]]
hadley wickham
2010-Mar-07 12:05 UTC
[R] Why can't "apply" be used with "as.factor" on a data.frame ?
The basic reason because apply works with matrices - it first turns the input into a matrix, processes each column and then returns a matrix. See colwise in the plyr package for a function that works column wise on a data frame, returning a data frame. Hadley On Sun, Mar 7, 2010 at 11:07 AM, Tal Galili <tal.galili at gmail.com> wrote:> Hi all, > > Let's say I have a data.frame and wants to turn each of it's columns into a > factor. > My instinct would be to use as.factor with apply. But this won't work, and > result with a data.frame of characters. > I found another solution for how to achieve this, but I would also like to > understand - *WHY* does it work this way? > > Here is an example script: > a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), > x3 = factor(c(rep("a",50) , rep("b",50)))) > apply(a2, 2,class) # why is column 3 not a factor ? > a[,3] ?# since it IS a factor. > a2 <- apply(a, 2,as.factor) # won't work - why not ? > a2[,3] ?# Why was this just turned into a character ??? > # A solution > a2 <- lapply(a, as.factor) > a3 <- as.data.frame(a2) > str(a3) > > > Thanks, > Tal > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | ?972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Don MacQueen
2010-Mar-07 20:20 UTC
[R] Why can't "apply" be used with "as.factor" on a data.frame ?
And just a small followup. To find out what class each column is, you wanted> lapply(a,class)$x1 [1] "numeric" $x2 [1] "factor" $x3 [1] "factor" With regard to your solution, and why it works, it is my understanding that data frames are in some sense actually lists, each column corresponding to one element in a list. Hence, lapply() works column-wise on data frames. Also for this reason it's pretty easy to convert back and forth between data frames and lists . Provided, of course, that each element of the list has an appropriate structure; see this example:> data.frame( list(a=1:2, b=3:4) )a b 1 1 3 2 2 4> data.frame( list(a=1:2, b=3:7) )Error in data.frame(a = 1:2, b = 3:7, check.names = FALSE, stringsAsFactors = TRUE) : arguments imply differing number of rows: 2, 5 No doubt there are subtle details, but don't ask me to provide details on what exactly the "some sense" is! -Don At 12:07 PM +0200 3/7/10, Tal Galili wrote:>Hi all, > >Let's say I have a data.frame and wants to turn each of it's columns into a >factor. >My instinct would be to use as.factor with apply. But this won't work, and >result with a data.frame of characters. >I found another solution for how to achieve this, but I would also like to >understand - *WHY* does it work this way? > >Here is an example script: >a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), >x3 = factor(c(rep("a",50) , rep("b",50)))) >apply(a2, 2,class) # why is column 3 not a factor ? >a[,3] # since it IS a factor. >a2 <- apply(a, 2,as.factor) # won't work - why not ? >a2[,3] # Why was this just turned into a character ??? ># A solution >a2 <- lapply(a, as.factor) >a3 <- as.data.frame(a2) >str(a3) > > >Thanks, >Tal > > > >----------------Contact >Details:------------------------------------------------------- >Contact me: Tal.Galili at gmail.com | 972-52-7275845 >Read me: www.*talgalili.com (Hebrew) | www.*biostatistics.co.il (Hebrew) | >www.*r-statistics.com (English) >---------------------------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://*stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- --------------------------------- Don MacQueen Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 macq at llnl.gov