Hi, Just something I don't understand: data <- data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4))) data_ac <- data[which(data$F1 !="b"), ] levels(data_ac$F1) Why the level "b" is always present ? thanks Tristan, R 2.0.1 for Linux Fedora 3 -- ------------------------------------------------------------ Tristan LEFEBURE Laboratoire d'?cologie des hydrosyst?mes fluviaux (UMR 5023) Universit? Lyon I - Campus de la Doua Bat. Darwin C 69622 Villeurbanne - France Phone: (33) (0)4 26 23 44 02 Fax: (33) (0)4 72 43 15 23
Lefebure Tristan <Tristan.Lefebure at univ-lyon1.fr> writes:> Hi, > Just something I don't understand: > > data <- data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4))) > data_ac <- data[which(data$F1 !="b"), ] > levels(data_ac$F1) > > Why the level "b" is always present ?Because it is a property of the definition, not of the data. E.g. if you tabulate it, you generally want to get a zero entry if there are no "b"s in the data. If, for some reason, you want to reduce the factor to only those levels that are present, factor() gets you there soon enough:> levels(factor(data_ac$F1))[1] "a" "c" -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Dimitris Rizopoulos
2005-Feb-28 13:27 UTC
[R] persistance of factor levels in a data frame
look at ?"[.data.frame" and also check this: dat <- data.frame(V1=c(1:12), F1=rep(letters[1:3], each=4)) dat.ac <- dat[dat$F1 !="b", ] ############### dat.ac$F1 dat.ac$F1[, drop=TRUE] ############### dat.ac$F1 <- dat.ac$F1[, drop=TRUE] levels(dat.ac$F1) I hope it helps. best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: med.kuleuven.ac.be/biostat student.kuleuven.ac.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Lefebure Tristan" <Tristan.Lefebure at univ-lyon1.fr> To: <r-help at stat.math.ethz.ch> Sent: Monday, February 28, 2005 2:07 PM Subject: [R] persistance of factor levels in a data frame> Hi, > Just something I don't understand: > > data <- > data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4))) > data_ac <- data[which(data$F1 !="b"), ] > levels(data_ac$F1) > > Why the level "b" is always present ? > > thanks > > Tristan, R 2.0.1 for Linux Fedora 3 > > -- > ------------------------------------------------------------ > Tristan LEFEBURE > Laboratoire d'?cologie des hydrosyst?mes fluviaux (UMR 5023) > Universit? Lyon I - Campus de la Doua > Bat. Darwin C 69622 Villeurbanne - France > > Phone: (33) (0)4 26 23 44 02 > Fax: (33) (0)4 72 43 15 23 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > R-project.org/posting-guide.html >
Lefebure Tristan wrote:> Hi, > Just something I don't understand: > > data <- data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4))) > data_ac <- data[which(data$F1 !="b"), ] > levels(data_ac$F1) > > Why the level "b" is always present ? > > thanks > > Tristan, R 2.0.1 for Linux Fedora 3 >You must explicitly drop unused levels of a factor created by subsetting. > levels(data_ac$F1[drop = TRUE]) [1] "a" "c"
On 28 Feb 2005 at 14:07, Lefebure Tristan wrote:> Hi, > Just something I don't understand: > > data <- data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4))) > data_ac <- data[which(data$F1 !="b"), ] levels(data_ac$F1) > > Why the level "b" is always present ?H Tristan from ?"[.factor" Extract or Replace Parts of a Factor Description: Extract or replace subsets of factors. Usage: x[i, drop = FALSE] x[i] <- value Arguments: x: a factor i: a specification of indices - see 'Extract'. drop: logical. If true, unused levels are dropped. *************************************** default is FALSE so unused levels are retained. factor(data_ac$F1) gives you the same factor with only existing levels. Cheers Petr> > thanks > > Tristan, R 2.0.1 for Linux Fedora 3 > > -- > ------------------------------------------------------------ > Tristan LEFEBURE > Laboratoire d'?cologie des hydrosyst?mes fluviaux (UMR 5023) > Universit? Lyon I - Campus de la Doua > Bat. Darwin C 69622 Villeurbanne - France > > Phone: (33) (0)4 26 23 44 02 > Fax: (33) (0)4 72 43 15 23 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz
On Mon, 2005-02-28 at 14:07 +0100, Lefebure Tristan wrote:> Hi, > Just something I don't understand: > > data <- data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4))) > data_ac <- data[which(data$F1 !="b"), ] > levels(data_ac$F1) > > Why the level "b" is always present ? > > thanks > > Tristan, R 2.0.1 for Linux Fedora 3See ?"[.factor" for details. You will note that the argument 'drop' is FALSE by default, which means that unused levels of a factor are not dropped when subsetting. This can be important if you might want to join or compare factors from more than one source, where you want to ensure that the factor levels are the same. If you were to drop the unused levels in one factor, but it is present in the other, the comparison would be problematic, since the levels for the same values in the two factors would be different. If you want to force the unused levels to be dropped before using a factor, just use:> data_ac$F1 <- factor(data_ac$F1)> data_ac$F1[1] a a a a c c c c Levels: a c See ?factor for more information. HTH, Marc Schwartz