Hi all, Assume I have a data frame with numerical and factor variables that I got through merging various other data frames and subsetting the resulting data frame afterwards. The number levels of the factors seem to be the same as in the original data frames, probably because subset() calls [.factor without drop = TRUE (that's what I gather from scanning the mailing lists). I wonder if there is a easy way to refactor all factors in the data frame at once. I noted that fix(data_frame) does the trick, however, this needs user interaction, which I'd like to avoid. Subsequent write.table / read.table would be another option but I'm not sure if R can guess the factor/char/numeric-type correctly when reading the table. So, is there any way in drop the unused factor levels from *all* factors of a data frame without import/export ? Thanks in advance, Hilmar -- Hilmar Berger Studienkoordinator Institut f?r medizinische Informatik, Statistik und Epidemiologie Universit?t Leipzig H?rtelstr. 16-18 D-04107 Leipzig Tel. +49 341 97 16 101 Fax. +49 341 97 16 109 email: hilmar.berger at imise.uni-leipzig.de
Hi, the best solution I found so far is (assuming <data> is your data.frame): # identify all factor variables factor.list = colnames(data)[sapply(data,class) == "factor"] # use transform to apply factor() to all factor variables trans.vars =paste(factor.list,"=factor(",factor.list,")",sep="",collapse="," ) data = eval(parse(text=paste("transform(data,",trans.vars,")"))) Regards, Hilmar Hilmar Berger schrieb:> Hi all, > > Assume I have a data frame with numerical and factor variables that I > got through merging various other data frames and subsetting the > resulting data frame afterwards. The number levels of the factors seem > to be the same as in the original data frames, probably because subset() > calls [.factor without drop = TRUE (that's what I gather from scanning > the mailing lists). > > I wonder if there is a easy way to refactor all factors in the data > frame at once. I noted that fix(data_frame) does the trick, however, > this needs user interaction, which I'd like to avoid. Subsequent > write.table / read.table would be another option but I'm not sure if R > can guess the factor/char/numeric-type correctly when reading the table. > > So, is there any way in drop the unused factor levels from *all* factors > of a data frame without import/export ? > > Thanks in advance, > Hilmar >-- Hilmar Berger Studienkoordinator Institut f?r medizinische Informatik, Statistik und Epidemiologie Universit?t Leipzig H?rtelstr. 16-18 D-04107 Leipzig Tel. +49 341 97 16 101 Fax. +49 341 97 16 109 email: hilmar.berger at imise.uni-leipzig.de
Hi Hilmar, Try this: cat <- sapply(df, is.factor) df[cat] <- lapply(df[cat], factor) Hadley On 6/5/07, Hilmar Berger <hilmar.berger at imise.uni-leipzig.de> wrote:> Hi all, > > Assume I have a data frame with numerical and factor variables that I > got through merging various other data frames and subsetting the > resulting data frame afterwards. The number levels of the factors seem > to be the same as in the original data frames, probably because subset() > calls [.factor without drop = TRUE (that's what I gather from scanning > the mailing lists). > > I wonder if there is a easy way to refactor all factors in the data > frame at once. I noted that fix(data_frame) does the trick, however, > this needs user interaction, which I'd like to avoid. Subsequent > write.table / read.table would be another option but I'm not sure if R > can guess the factor/char/numeric-type correctly when reading the table. > > So, is there any way in drop the unused factor levels from *all* factors > of a data frame without import/export ? > > Thanks in advance, > Hilmar > > -- > > Hilmar Berger > Studienkoordinator > Institut f?r medizinische Informatik, Statistik und Epidemiologie > Universit?t Leipzig > H?rtelstr. 16-18 > D-04107 Leipzig > > Tel. +49 341 97 16 101 > Fax. +49 341 97 16 109 > email: hilmar.berger at imise.uni-leipzig.de > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dear Hilmar, You could use something like DF <- as.data.frame(lapply(DF, function (x) if (is.factor(x)) factor(x) else x)) Where DF is the data frame. I hope this helps, John -------------------------------- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Hilmar Berger > Sent: Tuesday, June 05, 2007 8:20 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Refactor all factors in a data frame > > Hi all, > > Assume I have a data frame with numerical and factor > variables that I got through merging various other data > frames and subsetting the resulting data frame afterwards. > The number levels of the factors seem to be the same as in > the original data frames, probably because subset() calls > [.factor without drop = TRUE (that's what I gather from > scanning the mailing lists). > > I wonder if there is a easy way to refactor all factors in > the data frame at once. I noted that fix(data_frame) does the > trick, however, this needs user interaction, which I'd like > to avoid. Subsequent write.table / read.table would be > another option but I'm not sure if R can guess the > factor/char/numeric-type correctly when reading the table. > > So, is there any way in drop the unused factor levels from > *all* factors of a data frame without import/export ? > > Thanks in advance, > Hilmar > > -- > > Hilmar Berger > Studienkoordinator > Institut f?r medizinische Informatik, Statistik und > Epidemiologie Universit?t Leipzig H?rtelstr. 16-18 > D-04107 Leipzig > > Tel. +49 341 97 16 101 > Fax. +49 341 97 16 109 > email: hilmar.berger at imise.uni-leipzig.de > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hilmar Berger <hilmar.berger <at> imise.uni-leipzig.de> writes: ...> So, is there any way in drop the unused factor levels from *all* factors > of a data frame without import/export ?There is a generic drop.levels in gdata. Here is part of its help page: "\code{drop.levels} is a generic function, where default method does nothing, while method for factor \code{s} drops all unused levels. There are also convenient methods for \code{list} and \code{data.frame}, where all unused levels are dropped in all factors (one by one) in a \code{list} or a \code{data.frame}." Gregor