thr3ads.net - R help - [R] Is this an artifact of using "which"? [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Tania Oh

2008-Apr-14 11:32 UTC

[R] Is this an artifact of using "which"?

Dear all,

I used "which" to obtain a subset of values from my data.frame.  
however, I find that there is a "trace" of the values I  have removed.
Any suggestions would be greatly appreciate.

Below is my data:

d <- data.frame( val   = 1:10,
                 group = sample(LETTERS[1:5], 10, repl=TRUE) )

 >d
    val group
1    1     B
2    2     E
3    3     B
4    4     C
5    5     A
6    6     B
7    7     A
8    8     E
9    9     E
10  10     A

## selecting everything that is not group "A"
  d<-d[which(d$group !="A"),]

 > d
   val group
1   1     B
2   2     E
3   3     B
4   4     C
6   6     B
8   8     E
9   9     E

 > levels(d$group)
[1] "A" "B" "C" "E"

## why is group A still reflected here?

Many thanks in advance,
tania

D.phil student
Department of Physiology, Anatomy and Genetics
Oxford University

Uwe Ligges

2008-Apr-14 11:39 UTC

head link

[R] Is this an artifact of using "which"?

Tania Oh wrote:> Dear all,
> 
> I used "which" to obtain a subset of values from my data.frame.  
> however, I find that there is a "trace" of the values I  have
removed.
> Any suggestions would be greatly appreciate.
> 
> Below is my data:
> 
> d <- data.frame( val   = 1:10,
>                  group = sample(LETTERS[1:5], 10, repl=TRUE) )
> 
>  >d
>     val group
> 1    1     B
> 2    2     E
> 3    3     B
> 4    4     C
> 5    5     A
> 6    6     B
> 7    7     A
> 8    8     E
> 9    9     E
> 10  10     A
> 
> ## selecting everything that is not group "A"
>   d<-d[which(d$group !="A"),]
> 
>  > d
>    val group
> 1   1     B
> 2   2     E
> 3   3     B
> 4   4     C
> 6   6     B
> 8   8     E
> 9   9     E
> 
>  > levels(d$group)
> [1] "A" "B" "C" "E"
> 
> ## why is group A still reflected here?

Because you have removed elements from a factor objects that has 
particular levels. You remove elements (=observations), but the factor 
still knows that all levels are possible (stired in attributes of the 
object).

If you want to remove all levels without corresponding observations, use 
explicit drop=TRUE as the help page suggests, e.g.:


d <- d[d$group != "A", ]
d$group <- d$group[ , drop = TRUE]

Uwe Ligges


> Many thanks in advance,
> tania
> 
> D.phil student
> Department of Physiology, Anatomy and Genetics
> Oxford University
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Richard.Cotton at hsl.gov.uk

2008-Apr-14 11:50 UTC

head link

[R] Is this an artifact of using "which"?

> I used "which" to obtain a subset of values from my data.frame. 
> however, I find that there is a "trace" of the values I  have
removed.
> Any suggestions would be greatly appreciate.
> 
> Below is my data:
> 
> d <- data.frame( val   = 1:10,
>                  group = sample(LETTERS[1:5], 10, repl=TRUE) )
> 
>  >d
>     val group
> 1    1     B
> 2    2     E
> 3    3     B
> 4    4     C
> 5    5     A
> 6    6     B
> 7    7     A
> 8    8     E
> 9    9     E
> 10  10     A
> 
> ## selecting everything that is not group "A"
>   d<-d[which(d$group !="A"),]
> 
>  > d
>    val group
> 1   1     B
> 2   2     E
> 3   3     B
> 4   4     C
> 6   6     B
> 8   8     E
> 9   9     E
> 
>  > levels(d$group)
> [1] "A" "B" "C" "E"
The (imho) unintuitive behaviour is to do with the subsetting function 
[.factor, not which.  There are a couple of workarounds:

1. Call factor to recreate the levels, and get rid of "A"
factor(d$group)

2. Redefine [.factor; see dropUnusedLevels in the Hmisc package.

Regards,
Richie.

Mathematical Sciences Unit
HSL


------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

Richard.Cotton at hsl.gov.uk

2008-Apr-14 14:37 UTC

head link

[R] Is this an artifact of using "which"?

> > The (imho) unintuitive behaviour is to do with the subsetting function
> > [.factor, not which.  There are a couple of workarounds:
> > 
> In that case, your intuition needs readjustment....
> 
> There are other systems which (de facto) drop unused levels by default,
> and it is a real pain to work around, especially for subgroup analyses.
> E.g. there is no way to get PROC FREQ in SAS to include a count of zero,
> and barplots of ratings fro 0 to 10 lose columns "randomly" in
SPSS
> (this _can_ be worked around, though).
> 
> Anyways, it is illogical: There's no reason that a tabulation of gender
> distribution for (say) tenured CS professors should suddenly pretend
> that the female gender does not exist!
I didn't mean to be a troll, and I can certainly see the virtue in 
preserving levels for the cases as you described, but it was something 
that caught me out me when I first learned R.  Having the levels of a 
factor as "the values that my categorical data takes", rather than
"the
_possible_ values that my categorical data takes" was more natural to me. 
The important thing is that it is possible to include or drop the unused 
levels easily as required.

Btw, has the behaviour of the drop argument to '[' changed recently?  I 
seem to remember that drop=TRUE didn't remove unused factor levels in 
older versions, though my memory may be mistaken.

Regards,
Richie.

Mathematical Sciences Unit
HSL


------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

Reasonably Related Threads

Search for more seemingly similar threads

R help - Apr 2008 - Is this an artifact of using "which"?

[R] Is this an artifact of using "which"?

[R] Is this an artifact of using "which"?

[R] Is this an artifact of using "which"?

[R] Is this an artifact of using "which"?

Reasonably Related Threads