Marius Hofert
2011-Jun-28 06:47 UTC
[R] data.frame: How to get the classes of all components and how to remove their factor structure?
Dear expeRts, I have two questions concerning data frames: (1) How can I apply the class function to each component in a data.frame? As you can see below, applying class to each column is not the right approach; applying it to each component seems bulky. (2) After transforming the data frame a bit, the classes of certain components change to factor. How can I remove the factor structure? Cheers, Marius x <- c(2004:2010, 2002:2011, 2000:2011) df <- data.frame(x=x, group=c(rep("low",7), rep("middle",10), rep("high",12)), y=x+100*runif(length(x))) ## Question (1): why do the following lines do not give the same "class"? apply(df, 2, class) class(df$x) class(df$group) class(df$y) df. <- as.data.frame(xtabs(y ~ x + group, data=df)) class(df.$x) class(df.$group) class(df.$Freq) ## Question (2): how can I remove the factor structure from x? df.$x <- as.numeric(as.character(df.$x)) # seems bulky; note that as.numeric(df.$x) is not correct class(df.$x)
Petr PIKAL
2011-Jun-28 07:41 UTC
[R] Odp: data.frame: How to get the classes of all components and how to remove their factor structure?
Hi> Dear expeRts, > > I have two questions concerning data frames: > (1) How can I apply the class function to each component in adata.frame?> As you can see below, applying class to each column is not the right > approach; applying it to each component seems bulky. > (2) After transforming the data frame a bit, the classes of certain > components change to factor. How can I remove the factor structure? > > Cheers, > > Marius > > x <- c(2004:2010, 2002:2011, 2000:2011) > df <- data.frame(x=x, group=c(rep("low",7), rep("middle",10),rep("high",12)),> y=x+100*runif(length(x))) > > ## Question (1): why do the following lines do not give the same"class"? from help page ?apply Arguments X an array, including a matrix. array is not a data frame> apply(df, 2, class) > class(df$x) > class(df$group) > class(df$y)sapply(df, class) x group y "integer" "factor" "numeric"> > df. <- as.data.frame(xtabs(y ~ x + group, data=df)) > > class(df.$x) > class(df.$group) > class(df.$Freq) > > ## Question (2): how can I remove the factor structure from x? > df.$x <- as.numeric(as.character(df.$x)) # seems bulky; note that > as.numeric(df.$x) is not correctActually it is correct in a sense it behaves as documented ?factor Warning The interpretation of a factor depends on both the codes and the "levels" attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)). Regards Petr> class(df.$x)> ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Petr PIKAL
2011-Jun-28 07:49 UTC
[R] Odp: data.frame: How to get the classes of all components and how to remove their factor structure?
> > Dear expeRts, > > I have two questions concerning data frames: > (1) How can I apply the class function to each component in adata.frame?> As you can see below, applying class to each column is not the right > approach; applying it to each component seems bulky. > (2) After transforming the data frame a bit, the classes of certain > components change to factor. How can I remove the factor structure? > > Cheers, > > Marius > > x <- c(2004:2010, 2002:2011, 2000:2011) > df <- data.frame(x=x, group=c(rep("low",7), rep("middle",10),rep("high",12)),> y=x+100*runif(length(x))) > > ## Question (1): why do the following lines do not give the same"class"?> apply(df, 2, class) > class(df$x) > class(df$group) > class(df$y) > > df. <- as.data.frame(xtabs(y ~ x + group, data=df)) > > class(df.$x) > class(df.$group) > class(df.$Freq) > > ## Question (2): how can I remove the factor structure from x? > df.$x <- as.numeric(as.character(df.$x)) # seems bulky; note thatIf you do it often you can unfactor <- function(x) as.numeric(as.character(x)) df.$x <- unfactor(df.$x) or you can use df. <- as.data.frame(xtabs(y ~ x + group, data=df), stringsAsFactors=FALSE) df.$x <- as.numeric(df.$x) But it seems to me that it is not much less bulkier. Regards Petr> as.numeric(df.$x) is not correct > class(df.$x) > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Marius Hofert
2011-Jun-28 08:18 UTC
[R] Odp: data.frame: How to get the classes of all components and how to remove their factor structure?
Dear Petr, thanks for your posts, they perfectly answered my questions. Cheers, Marius On 2011-06-28, at 09:49 , Petr PIKAL wrote:>> >> Dear expeRts, >> >> I have two questions concerning data frames: >> (1) How can I apply the class function to each component in a > data.frame? >> As you can see below, applying class to each column is not the right >> approach; applying it to each component seems bulky. >> (2) After transforming the data frame a bit, the classes of certain >> components change to factor. How can I remove the factor structure? >> >> Cheers, >> >> Marius >> >> x <- c(2004:2010, 2002:2011, 2000:2011) >> df <- data.frame(x=x, group=c(rep("low",7), rep("middle",10), > rep("high",12)), >> y=x+100*runif(length(x))) >> >> ## Question (1): why do the following lines do not give the same > "class"? >> apply(df, 2, class) >> class(df$x) >> class(df$group) >> class(df$y) >> >> df. <- as.data.frame(xtabs(y ~ x + group, data=df)) >> >> class(df.$x) >> class(df.$group) >> class(df.$Freq) >> >> ## Question (2): how can I remove the factor structure from x? >> df.$x <- as.numeric(as.character(df.$x)) # seems bulky; note that > > If you do it often you can > > unfactor <- function(x) as.numeric(as.character(x)) > df.$x <- unfactor(df.$x) > > or you can use > df. <- as.data.frame(xtabs(y ~ x + group, data=df), > stringsAsFactors=FALSE) > df.$x <- as.numeric(df.$x) > > But it seems to me that it is not much less bulkier. > > Regards > Petr > > >> as.numeric(df.$x) is not correct >> class(df.$x) >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >