I am sure this is a very basic question: I have 600,000 categorical variables in a data.frame - each of which is classified as "0", "1", or "2" What I would like to do is collapse "1" and "2" and leave "0" by itself, such that after re-categorizing "0" = "0"; "1" = "1" and "2" = "1" --- in the end I only want "0" and "1" as categories for each of the variables. Also, if possible I would rather not create 600,000 new variables, if I can replace the existing variables with the new values that would be great! What would be the best way to do this? Thank you! -- Thanks, CC [[alternative HTML version deleted]]
Do you want to replace specific values of a data set? df <- sample(c(0,1,2),600,replace=T) table(df) df[df==2]<-1 table(df) ----- A R learner. -- View this message in context: http://r.789695.n4.nabble.com/how-to-collapse-categories-or-re-categorize-variables-tp2291704p2291727.html Sent from the R help mailing list archive at Nabble.com.
Dennis Murphy
2010-Jul-16 19:14 UTC
[R] how to collapse categories or re-categorize variables?
Hi: See ? levels. Here's a toy example:> x <- factor(sample(0:2, 10, replace = TRUE)) > x[1] 1 2 1 0 2 2 2 2 2 1 Levels: 0 1 2> levels(x) <- c(0, 1, 1) # Change level 2 to 1 > x[1] 1 1 1 0 1 1 1 1 1 1 Levels: 0 1 HTH, Dennis On Fri, Jul 16, 2010 at 10:18 AM, CC <turtysmail@gmail.com> wrote:> I am sure this is a very basic question: > > I have 600,000 categorical variables in a data.frame - each of which is > classified as "0", "1", or "2" > > What I would like to do is collapse "1" and "2" and leave "0" by itself, > such that after re-categorizing "0" = "0"; "1" = "1" and "2" = "1" --- in > the end I only want "0" and "1" as categories for each of the variables. > > Also, if possible I would rather not create 600,000 new variables, if I can > replace the existing variables with the new values that would be great! > > What would be the best way to do this? > > Thank you! > > > -- > Thanks, > CC > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Ista Zahn
2010-Jul-16 19:19 UTC
[R] how to collapse categories or re-categorize variables?
Hi, On Fri, Jul 16, 2010 at 5:18 PM, CC <turtysmail at gmail.com> wrote:> I am sure this is a very basic question: > > I have 600,000 categorical variables in a data.frame - each of which is > classified as "0", "1", or "2" > > What I would like to do is collapse "1" and "2" and leave "0" by itself, > such that after re-categorizing "0" = "0"; "1" = "1" and "2" = "1" --- in > the end I only want "0" and "1" as categories for each of the variables.Something like this should work for (i in names(dat)) { dat[, i] <- factor(dat[, i], levels = c("0", "1", "2"), labels c("0", "1", "1)) } -Ista> > Also, if possible I would rather not create 600,000 new variables, if I can > replace the existing variables with the new values that would be great! > > What would be the best way to do this? > > Thank you! > > > -- > Thanks, > CC > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Reasonably Related Threads
- Project proposal/idea: Categorize traffic by behavior
- randomForest can not handle categorical predictors with more than 32 categories
- collapse a data column into a row
- Collapsing Categorical Variables
- Error: Can not handle categorical predictors with more than 32 categories.