Hi all, I am having a bit of trouble using the levels() function. I have a factor with many elements, and when I use the function levels() to extract the list of unique elements, some of the elements returned are not actually in the factor. For example I would have this:> vector <- dataset$Benchmark > class(vector)[1] "factor"> length(vector)[1] 35615> vector2 <- levels(vector) > length(which(!(vector2 %in% vector)))[1] 235 Does anyone know how this is possible? Many thanks! Borja [[alternative HTML version deleted]]
Hi, ?vec1<- factor(1:5,levels=1:10) ?vec1 #[1] 1 2 3 4 5 #Levels: 1 2 3 4 5 6 7 8 9 10 vec2<-droplevels(vec1) ?levels(vec2) #[1] "1" "2" "3" "4" "5" ?vec2 #[1] 1 2 3 4 5 #Levels: 1 2 3 4 5 A.K. Hi all, I am having a bit of trouble using the levels() function. I have a factor with many elements, and when I use the function levels() to extract the list of unique elements, some of the elements returned are not actually in the factor. For example I would have this:> vector <- dataset$Benchmark > class(vector)[1] "factor"> length(vector)[1] 35615> vector2 <- levels(vector) > length(which(!(vector2 %in% vector)))[1] 235 Does anyone know how this is possible? Many thanks! Borja
On Jul 24, 2013, at 6:25 AM, Borja Rivier wrote:> Hi all, > > I am having a bit of trouble using the levels() function. > I have a factor with many elements, and when I use the function levels() to > extract the list of unique elements, some of the elements returned are not > actually in the factor. > > For example I would have this: > >> vector <- dataset$Benchmark >> class(vector) > [1] "factor" >> length(vector) > [1] 35615 >> vector2 <- levels(vector) >> length(which(!(vector2 %in% vector))) > [1] 235 > > Does anyone know how this is possible? >When you take a subset of a factor vector, the levels are not reduced to the unique values in the new vector. There is droplevels function that would need to be applied if you already have such a vector, and there is a drop argument that you need to set to TRUE in the `[.factors` call if you want to "attack the problem at the source". ?`[.factor ?droplevels -- David Winsemius Alameda, CA, USA
Benchmark is probably a subset from a larger dataframe. R does not automatically remove empty levels but you can do it: set.seed(42) dataset <- data.frame(Benchmark=factor(sample(LETTERS[1:26], 50, replace=TRUE), levels=LETTERS[1:26])) levels(dataset$Benchmark) # [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" # [20] "T" "U" "V" "W" "X" "Y" "Z" dataset$Benchmark <- factor(dataset$Benchmark) levels(dataset$Benchmark) # [1] "A" "C" "D" "F" "G" "H" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "V" "X" # [20] "Y" "Z" There are times when you want to know if certain factor levels do not appear in a subset of the original data. ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 ----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Borja Rivier Sent: Wednesday, July 24, 2013 8:25 AM To: r-help at r-project.org Subject: [R] Levels of a factor Hi all, I am having a bit of trouble using the levels() function. I have a factor with many elements, and when I use the function levels() to extract the list of unique elements, some of the elements returned are not actually in the factor. For example I would have this:> vector <- dataset$Benchmark > class(vector)[1] "factor"> length(vector)[1] 35615> vector2 <- levels(vector) > length(which(!(vector2 %in% vector)))[1] 235 Does anyone know how this is possible? Many thanks! Borja [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.