Luigi Marongiu
2018-Dec-19 10:58 UTC
[R] convert columns of dataframe to same factor levels
Dear all, I have a data frame with character values where each character is a level; however, not all columns of the data frame have the same characters thus, when generating the data frame with stringsAsFactors = TRUE, the levels are different for each column. Is there a way to provide a single vector of levels and assign the characters so that they match such vector? Is there a way to do that not only when setting the data frame but also when reading data from a file with read.table()? For instance, I have: column_1 = c("A", "B", "C", "D", "E") column_2 = c("B", "B", "C", "E", "E") column_3 = c("C", "C", "D", "D", "C") my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)> str(my.data)'data.frame': 5 obs. of 3 variables: $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5 $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3 $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1 Thank you -- Best regards, Luigi
Duncan Murdoch
2018-Dec-19 11:19 UTC
[R] convert columns of dataframe to same factor levels
On 19/12/2018 5:58 AM, Luigi Marongiu wrote:> Dear all, > I have a data frame with character values where each character is a > level; however, not all columns of the data frame have the same > characters thus, when generating the data frame with stringsAsFactors > = TRUE, the levels are different for each column. > Is there a way to provide a single vector of levels and assign the > characters so that they match such vector? > Is there a way to do that not only when setting the data frame but > also when reading data from a file with read.table()? > > For instance, I have: > column_1 = c("A", "B", "C", "D", "E") > column_2 = c("B", "B", "C", "E", "E") > column_3 = c("C", "C", "D", "D", "C") > my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE) >> str(my.data) > 'data.frame': 5 obs. of 3 variables: > $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5 > $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3 > $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1 > > Thank you >I don't think read.table() can do it for you automatically. To do it yourself, you need to get a vector of the levels. If you know this, just assign it to a variable; if you don't know it, compute it as thelevels <- unique(unlist(lapply(my.data, levels))) Then set the levels of each column to thelevels: my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x) <- thelevels; x})) Duncan Murdoch
Luigi Marongiu
2018-Dec-19 11:48 UTC
[R] convert columns of dataframe to same factor levels
Thank you, that worked fine for me. Best wishes of merry Christmas and happy new year, Luigi On Wed, Dec 19, 2018 at 12:19 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> > On 19/12/2018 5:58 AM, Luigi Marongiu wrote: > > Dear all, > > I have a data frame with character values where each character is a > > level; however, not all columns of the data frame have the same > > characters thus, when generating the data frame with stringsAsFactors > > = TRUE, the levels are different for each column. > > Is there a way to provide a single vector of levels and assign the > > characters so that they match such vector? > > Is there a way to do that not only when setting the data frame but > > also when reading data from a file with read.table()? > > > > For instance, I have: > > column_1 = c("A", "B", "C", "D", "E") > > column_2 = c("B", "B", "C", "E", "E") > > column_3 = c("C", "C", "D", "D", "C") > > my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE) > >> str(my.data) > > 'data.frame': 5 obs. of 3 variables: > > $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5 > > $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3 > > $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1 > > > > Thank you > > > > I don't think read.table() can do it for you automatically. To do it > yourself, you need to get a vector of the levels. If you know this, > just assign it to a variable; if you don't know it, compute it as > > thelevels <- unique(unlist(lapply(my.data, levels))) > > Then set the levels of each column to thelevels: > > my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x) > <- thelevels; x})) > > Duncan Murdoch-- Best regards, Luigi
William Dunlap
2018-Dec-19 17:50 UTC
[R] convert columns of dataframe to same factor levels
You can abuse the S4 class system to do this. setClass("Size") # no representation, no prototype setAs(from="character", to="Size", # nothing but a coercion method function(from){ ret <- factor(from, levels=c("Small","Medium","Large"), ordered=TRUE) class(ret) <- c("Size", class(ret)) ret }) z <- read.table(colClasses=c("integer", "Size"), text="7 Medium\n5 Large\n3 Large") dput(z) #structure(list(V1 = c(7L, 5L, 3L), V2 = structure(c(2L, 3L, 3L #), .Label = c("Small", "Medium", "Large"), class = c("Size", #"ordered", "factor"))), class = "data.frame", row.names = c(NA, #-3L)) I wonder if this behavior is intended or if there is a more sanctioned way to get read.table(colClasses=...) to make a factor with a specified set of levels. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Dec 19, 2018 at 3:19 AM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 19/12/2018 5:58 AM, Luigi Marongiu wrote: > > Dear all, > > I have a data frame with character values where each character is a > > level; however, not all columns of the data frame have the same > > characters thus, when generating the data frame with stringsAsFactors > > = TRUE, the levels are different for each column. > > Is there a way to provide a single vector of levels and assign the > > characters so that they match such vector? > > Is there a way to do that not only when setting the data frame but > > also when reading data from a file with read.table()? > > > > For instance, I have: > > column_1 = c("A", "B", "C", "D", "E") > > column_2 = c("B", "B", "C", "E", "E") > > column_3 = c("C", "C", "D", "D", "C") > > my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors > TRUE) > >> str(my.data) > > 'data.frame': 5 obs. of 3 variables: > > $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5 > > $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3 > > $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1 > > > > Thank you > > > > I don't think read.table() can do it for you automatically. To do it > yourself, you need to get a vector of the levels. If you know this, > just assign it to a variable; if you don't know it, compute it as > > thelevels <- unique(unlist(lapply(my.data, levels))) > > Then set the levels of each column to thelevels: > > my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x) > <- thelevels; x})) > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]