Jennifer Walsh
2009-Dec-11 04:53 UTC
[R] Recoding factor labels that are lists into first element of list
Hi all, I've Googled far and wide but don't think I know the correct terms to search for to find an answer. I have a massive dataset where one of the factors is made up of both individual items and lists of items (for example, "cat" and "cat, dog, bird"). I would like to recode this factor somehow into only the first element of the list (so every list starting with "cat," plus the observations that were already just "cat" would all be set equal to "cat"). I would ideally like to do this in some simple way that does not require me to write hundreds of different sets of code (since the lists probably start with 300+ different items). Is this possible? Extremely complicated? Also, I am sure this is much simpler, but I cannot seem to get rid of levels of a factor that have no observations. I have tried setting the levels of the factor to only the ones with observations that I am interested in, but every time I summarize the variable there are still 100+ labels all with "0" as their count. This hasn't happened to me before; is there an explanation for it? Thanks very much, Jen --- Jennifer Walsh Graduate Student, Developmental Psychology University of Michigan 2020 East Hall, 530 Church St. Ann Arbor, MI 48109-1043
jim holtman
2009-Dec-11 10:21 UTC
[R] Recoding factor labels that are lists into first element of list
try this:> x <- data.frame(a=c('cat', 'cat,dog', 'dog', 'dog,cat')) > xa 1 cat 2 cat,dog 3 dog 4 dog,cat> levels(x$a)[1] "cat" "cat,dog" "dog" "dog,cat"> # change the factors > x$a <- factor(sapply(strsplit(as.character(x$a), ','), '[[', 1)) > xa 1 cat 2 cat 3 dog 4 dog> levels(x$a)[1] "cat" "dog" On Thu, Dec 10, 2009 at 10:53 PM, Jennifer Walsh <walshjen@umich.edu> wrote:> Hi all, > > I've Googled far and wide but don't think I know the correct terms to > search for to find an answer. > > I have a massive dataset where one of the factors is made up of both > individual items and lists of items (for example, "cat" and "cat, dog, > bird"). I would like to recode this factor somehow into only the first > element of the list (so every list starting with "cat," plus the > observations that were already just "cat" would all be set equal to "cat"). > I would ideally like to do this in some simple way that does not require me > to write hundreds of different sets of code (since the lists probably start > with 300+ different items). Is this possible? Extremely complicated? > > Also, I am sure this is much simpler, but I cannot seem to get rid of > levels of a factor that have no observations. I have tried setting the > levels of the factor to only the ones with observations that I am interested > in, but every time I summarize the variable there are still 100+ labels all > with "0" as their count. This hasn't happened to me before; is there an > explanation for it? > > Thanks very much, > Jen > > --- > Jennifer Walsh > Graduate Student, Developmental Psychology > University of Michigan > 2020 East Hall, 530 Church St. > Ann Arbor, MI 48109-1043 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]