thr3ads.net - R help - [R] persistance of factor levels in a data frame [Feb 2005]

If this information is useful, please help other people find it:
Share via:

Lefebure Tristan

2005-Feb-28 13:07 UTC

[R] persistance of factor levels in a data frame

Hi,
Just something I don't understand:

data <-
data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4)))
data_ac <- data[which(data$F1 !="b"), ]  
levels(data_ac$F1)    

Why the level "b" is always present ?

thanks

Tristan, R 2.0.1 for Linux Fedora 3

-- 
------------------------------------------------------------
Tristan LEFEBURE
Laboratoire d'?cologie des hydrosyst?mes fluviaux (UMR 5023)
Universit? Lyon I - Campus de la Doua
Bat. Darwin C 69622 Villeurbanne - France

Phone: (33) (0)4 26 23 44 02
Fax: (33) (0)4 72 43 15 23

Peter Dalgaard

2005-Feb-28 13:21 UTC

head link

[R] persistance of factor levels in a data frame

Lefebure Tristan <Tristan.Lefebure at univ-lyon1.fr> writes:
> Hi,
> Just something I don't understand:
> 
> data <-
data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4)))
> data_ac <- data[which(data$F1 !="b"), ]  
> levels(data_ac$F1)    
> 
> Why the level "b" is always present ?
Because it is a property of the definition, not of the data. E.g. if
you tabulate it, you generally want to get a zero entry if there are
no "b"s in the data. If, for some reason, you want to reduce the
factor to only those levels that are present, factor() gets you there
soon enough:
>  levels(factor(data_ac$F1))[1] "a" "c"


-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Dimitris Rizopoulos

2005-Feb-28 13:27 UTC

head link

[R] persistance of factor levels in a data frame

look at ?"[.data.frame" and also check this:

dat <- data.frame(V1=c(1:12), F1=rep(letters[1:3], each=4))
dat.ac <- dat[dat$F1 !="b", ]
###############
dat.ac$F1
dat.ac$F1[, drop=TRUE]
###############
dat.ac$F1 <- dat.ac$F1[, drop=TRUE]
levels(dat.ac$F1)

I hope it helps.

best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Lefebure Tristan" <Tristan.Lefebure at univ-lyon1.fr>
To: <r-help at stat.math.ethz.ch>
Sent: Monday, February 28, 2005 2:07 PM
Subject: [R] persistance of factor levels in a data frame

> Hi,
> Just something I don't understand:
>
> data <- 
>
data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4)))
> data_ac <- data[which(data$F1 !="b"), ]
> levels(data_ac$F1)
>
> Why the level "b" is always present ?
>
> thanks
>
> Tristan, R 2.0.1 for Linux Fedora 3
>
> -- 
> ------------------------------------------------------------
> Tristan LEFEBURE
> Laboratoire d'?cologie des hydrosyst?mes fluviaux (UMR 5023)
> Universit? Lyon I - Campus de la Doua
> Bat. Darwin C 69622 Villeurbanne - France
>
> Phone: (33) (0)4 26 23 44 02
> Fax: (33) (0)4 72 43 15 23
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Douglas Bates

2005-Feb-28 13:31 UTC

head link

[R] persistance of factor levels in a data frame

Lefebure Tristan wrote:> Hi,
> Just something I don't understand:
> 
> data <-
data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4)))
> data_ac <- data[which(data$F1 !="b"), ]  
> levels(data_ac$F1)    
> 
> Why the level "b" is always present ?
> 
> thanks
> 
> Tristan, R 2.0.1 for Linux Fedora 3
> 
You must explicitly drop unused levels of a factor created by subsetting.

 > levels(data_ac$F1[drop = TRUE])
[1] "a" "c"

Petr Pikal

2005-Feb-28 13:35 UTC

head link

[R] persistance of factor levels in a data frame

On 28 Feb 2005 at 14:07, Lefebure Tristan wrote:
> Hi,
> Just something I don't understand:
> 
> data <-
data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4)))
> data_ac <- data[which(data$F1 !="b"), ]  levels(data_ac$F1)
> 
> Why the level "b" is always present ?
H Tristan

from ?"[.factor"

Extract or Replace Parts of a Factor

Description:

     Extract or replace subsets of factors.

Usage:

     x[i, drop = FALSE]

     x[i] <- value

Arguments:

       x: a factor

       i: a specification of indices - see 'Extract'.

    drop: logical.  If true, unused levels are dropped.
***************************************
default is FALSE so unused levels are retained.

factor(data_ac$F1)

gives you the same factor with only existing levels.

Cheers
Petr

> 
> thanks
> 
> Tristan, R 2.0.1 for Linux Fedora 3
> 
> -- 
> ------------------------------------------------------------
> Tristan LEFEBURE
> Laboratoire d'?cologie des hydrosyst?mes fluviaux (UMR 5023)
> Universit? Lyon I - Campus de la Doua
> Bat. Darwin C 69622 Villeurbanne - France
> 
> Phone: (33) (0)4 26 23 44 02
> Fax: (33) (0)4 72 43 15 23
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
Petr Pikal
petr.pikal at precheza.cz

Marc Schwartz

2005-Feb-28 13:40 UTC

head link

[R] persistance of factor levels in a data frame

On Mon, 2005-02-28 at 14:07 +0100, Lefebure Tristan
wrote:> Hi,
> Just something I don't understand:
> 
> data <-
data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4)))
> data_ac <- data[which(data$F1 !="b"), ]  
> levels(data_ac$F1)    
> 
> Why the level "b" is always present ?
> 
> thanks
> 
> Tristan, R 2.0.1 for Linux Fedora 3
See ?"[.factor" for details. You will note that the argument
'drop' is
FALSE by default, which means that unused levels of a factor are not
dropped when subsetting.

This can be important if you might want to join or compare factors from
more than one source, where you want to ensure that the factor levels
are the same. If you were to drop the unused levels in one factor, but
it is present in the other, the comparison would be problematic, since
the levels for the same values in the two factors would be different.

If you want to force the unused levels to be dropped before using a
factor, just use:
> data_ac$F1 <- factor(data_ac$F1)
> data_ac$F1[1] a a a a c c c c
Levels: a c

See ?factor for more information.

HTH,

Marc Schwartz

Reasonably Related Threads

Search for more apparently analagous threads

R help - Feb 2005 - persistance of factor levels in a data frame

[R] persistance of factor levels in a data frame

[R] persistance of factor levels in a data frame

[R] persistance of factor levels in a data frame

[R] persistance of factor levels in a data frame

[R] persistance of factor levels in a data frame

[R] persistance of factor levels in a data frame

Reasonably Related Threads