R-help, I have a data frame wich I subset like : a <- subset(df,df$"column2" %in% c("factor1","factor2") & df$"column2"==1) But when I type levels(a$"column2") I still get the same levels as in df (my original data frame) Why is that? Is it right? Luis Luis Ridao Cruz Fiskiranns??knarstovan N??at??n 1 P.O. Box 3051 FR-110 T??rshavn Faroe Islands Phone: +298 353900 Phone(direct): +298 353912 Mobile: +298 580800 Fax: +298 353901 E-mail: luisr at frs.fo Web: www.frs.fo
On Tue, 2004-08-17 at 09:30, Luis Rideau Cruz wrote:> R-help, > > I have a data frame wich I subset like : > > a <- subset(df,df$"column2" %in% c("factor1","factor2") & df$"column2"==1) > > But when I type levels(a$"column2") I still get the same levels as in df (my original data frame) > > Why is that?The default for [.factor is: x[i, drop = FALSE] Hence, unused factor levels are retained.> Is it right?Yes. If you want to explicitly recode the factor based upon only those levels that are actually in use, you can do something like the following: a <- factor(a) However, I am a bit unclear as to the logic of the subset statement that you are using, perhaps b/c I don't know what your data is. You seem to be subsetting 'column2' on both the factor levels and a presumed numeric code. Is that really what you want to do? You might want to review the "Warning" section in ?factor BTW, when using subset(), the evaluation takes place within the data frame, so you do not need to use df$"column2" in the function call. You can just use column2, for example: subset(df, column2 %in% c("factor1", "factor2")) See ?factor and ?"[.factor" for more information. HTH, Marc Schwartz
Believe it or not, that's a feature, not a bug. The idea is that the factor COULD take on those levels, even if it doesn't in your particular subset. To drop them, you would have to re-initialize the factor as such: a$column2 <- factor(a$column2) Or, you could just download the Hmisc package, which redefines the subset operator "[" to behave as you'd like. Personally, I think the default behavior is clearer, however. By the way, there are some problems with your code. First of all, you should drop the quotes around column2--they're unnecessary. Secondly, your subset is redundant: only one of your factor levels can be numbered 1, so only one of the levels "factor1" and "factor2" is getting included in the result (whichever is numbered 1 -- I'm guessing it's "factor1"). Was this your intention? Kevin -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Luis Rideau Cruz Sent: Tuesday, August 17, 2004 7:30 AM To: r-help at stat.math.ethz.ch Subject: [R] levels of factor R-help, I have a data frame wich I subset like : a <- subset(df,df$"column2" %in% c("factor1","factor2") & df$"column2"==1) But when I type levels(a$"column2") I still get the same levels as in df (my original data frame) Why is that? Is it right? Luis Luis Ridao Cruz Fiskiranns??knarstovan N??at??n 1 P.O. Box 3051 FR-110 T??rshavn Faroe Islands Phone: +298 353900 Phone(direct): +298 353912 Mobile: +298 580800 Fax: +298 353901 E-mail: luisr at frs.fo Web: www.frs.fo ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html