R-help,
I have a data frame wich I subset like :
a <- subset(df,df$"column2" %in%
c("factor1","factor2") & df$"column2"==1)
But when I type levels(a$"column2") I still get the same levels as in
df (my original data frame)
Why is that?
Is it right?
Luis
Luis Ridao Cruz
Fiskiranns??knarstovan
N??at??n 1
P.O. Box 3051
FR-110 T??rshavn
Faroe Islands
Phone: +298 353900
Phone(direct): +298 353912
Mobile: +298 580800
Fax: +298 353901
E-mail: luisr at frs.fo
Web: www.frs.fo
On Tue, 2004-08-17 at 09:30, Luis Rideau Cruz wrote:> R-help, > > I have a data frame wich I subset like : > > a <- subset(df,df$"column2" %in% c("factor1","factor2") & df$"column2"==1) > > But when I type levels(a$"column2") I still get the same levels as in df (my original data frame) > > Why is that?The default for [.factor is: x[i, drop = FALSE] Hence, unused factor levels are retained.> Is it right?Yes. If you want to explicitly recode the factor based upon only those levels that are actually in use, you can do something like the following: a <- factor(a) However, I am a bit unclear as to the logic of the subset statement that you are using, perhaps b/c I don't know what your data is. You seem to be subsetting 'column2' on both the factor levels and a presumed numeric code. Is that really what you want to do? You might want to review the "Warning" section in ?factor BTW, when using subset(), the evaluation takes place within the data frame, so you do not need to use df$"column2" in the function call. You can just use column2, for example: subset(df, column2 %in% c("factor1", "factor2")) See ?factor and ?"[.factor" for more information. HTH, Marc Schwartz
Believe it or not, that's a feature, not a bug. The idea is that the factor
COULD take on those levels, even if it doesn't in your particular subset. To
drop them, you would have to re-initialize the factor as such:
a$column2 <- factor(a$column2)
Or, you could just download the Hmisc package, which redefines the subset
operator "[" to behave as you'd like. Personally, I think the
default
behavior is clearer, however.
By the way, there are some problems with your code. First of all, you should
drop the quotes around column2--they're unnecessary. Secondly, your subset
is redundant: only one of your factor levels can be numbered 1, so only one
of the levels "factor1" and "factor2" is getting included in
the result
(whichever is numbered 1 -- I'm guessing it's "factor1"). Was
this your
intention?
Kevin
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Luis Rideau Cruz
Sent: Tuesday, August 17, 2004 7:30 AM
To: r-help at stat.math.ethz.ch
Subject: [R] levels of factor
R-help,
I have a data frame wich I subset like :
a <- subset(df,df$"column2" %in%
c("factor1","factor2") & df$"column2"==1)
But when I type levels(a$"column2") I still get the same levels as in
df (my
original data frame)
Why is that?
Is it right?
Luis
Luis Ridao Cruz
Fiskiranns??knarstovan
N??at??n 1
P.O. Box 3051
FR-110 T??rshavn
Faroe Islands
Phone: +298 353900
Phone(direct): +298 353912
Mobile: +298 580800
Fax: +298 353901
E-mail: luisr at frs.fo
Web: www.frs.fo
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html