Fernando Henrique Ferraz Pereira da Rosa
2003-Jun-19 20:37 UTC
[R] Subseting by more than one factor...
Is it possible in R to subset a dataframe by more than one factor, all at
once?
For instance, I have the dataframe:
>data
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred
1 0 1 0 0 0 0 0 0 0 0 0.5862069
4 0 0 0 0 0 0 0 0 0 1 0.5862069
5 0 0 0 0 0 0 1 0 0 0 0.5862069
6 0 0 0 0 0 0 0 1 0 0 0.5862069
7 0 0 1 0 0 0 0 0 0 0 0.5862069
9 0 0 0 0 1 0 0 0 0 0 0.5862069
20 0 1 1 0 0 0 0 0 0 0 0.5862069
22 0 1 0 0 1 0 0 0 0 0 0.5862069
24 0 1 0 0 0 0 1 0 0 0 0.5862069
25 0 1 0 0 0 0 0 1 0 0 0.5862069
27 0 1 0 0 0 0 0 0 0 1 0.5862069
If I want to subset only those points that have p4 = 1, I do:
> subset(data,p4 == 1)
And that's fine. Now suppose I want to subset those that not only have p4
= 1, but also p6 = 1.
I tried subset(data,p4 == 1 && p6 == 1) or subset(data,p4==1 &
p6==1).
But it didn't work.
Then I found a clumsy way to do it :
subset(subset(data,p4==1),p6==1)
Which works. But it soon gets very clumsy as the number of conditions
increase (I end up with a really large number of nested subsets). Is there a
simpler way to do that?
--
Fernando Henrique Ferraz Pereira da Rosa wrote:> Is it possible in R to subset a dataframe by more than one factor, all at > once? > For instance, I have the dataframe: > >data > p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred > 1 0 1 0 0 0 0 0 0 0 0 0.5862069 > 4 0 0 0 0 0 0 0 0 0 1 0.5862069 > 5 0 0 0 0 0 0 1 0 0 0 0.5862069 > 6 0 0 0 0 0 0 0 1 0 0 0.5862069 > 7 0 0 1 0 0 0 0 0 0 0 0.5862069 > 9 0 0 0 0 1 0 0 0 0 0 0.5862069 > 20 0 1 1 0 0 0 0 0 0 0 0.5862069 > 22 0 1 0 0 1 0 0 0 0 0 0.5862069 > 24 0 1 0 0 0 0 1 0 0 0 0.5862069 > 25 0 1 0 0 0 0 0 1 0 0 0.5862069 > 27 0 1 0 0 0 0 0 0 0 1 0.5862069 > > If I want to subset only those points that have p4 = 1, I do: > > subset(data,p4 == 1) > And that's fine. Now suppose I want to subset those that not only have p4 > = 1, but also p6 = 1. > I tried subset(data,p4 == 1 && p6 == 1) or subset(data,p4==1 & p6==1). > But it didn't work.It didn't? It does for me: R> subset(z, p4 == 1 & p6 == 1) [1] p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred <0 rows> (or 0-length row.names) R> subset(z, p2 == 1 & p8 == 1) p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred 10 0 1 0 0 0 0 0 1 0 0 0.5862069 R> subset(z, (p2 == 1 & p3 == 0) | p5 == 1) p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred 1 0 1 0 0 0 0 0 0 0 0 0.5862069 6 0 0 0 0 1 0 0 0 0 0 0.5862069 8 0 1 0 0 1 0 0 0 0 0 0.5862069 9 0 1 0 0 0 0 1 0 0 0 0.5862069 10 0 1 0 0 0 0 0 1 0 0 0.5862069 11 0 1 0 0 0 0 0 0 0 1 0.5862069 R> version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 1 minor 7.0 year 2003 month 04 day 16 language R R> [snip] Regards, Sundar
Fernando Henrique Ferraz Pereira da Rosa <mentus at gmx.de> writes:> Is it possible in R to subset a dataframe by more than one factor, all at > once? > For instance, I have the dataframe: > >data > p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred > 1 0 1 0 0 0 0 0 0 0 0 0.5862069 > 4 0 0 0 0 0 0 0 0 0 1 0.5862069 > 5 0 0 0 0 0 0 1 0 0 0 0.5862069 > 6 0 0 0 0 0 0 0 1 0 0 0.5862069 > 7 0 0 1 0 0 0 0 0 0 0 0.5862069 > 9 0 0 0 0 1 0 0 0 0 0 0.5862069 > 20 0 1 1 0 0 0 0 0 0 0 0.5862069 > 22 0 1 0 0 1 0 0 0 0 0 0.5862069 > 24 0 1 0 0 0 0 1 0 0 0 0.5862069 > 25 0 1 0 0 0 0 0 1 0 0 0.5862069 > 27 0 1 0 0 0 0 0 0 0 1 0.5862069 > > If I want to subset only those points that have p4 = 1, I do: > > subset(data,p4 == 1) > And that's fine. Now suppose I want to subset those that not only have p4 > = 1, but also p6 = 1. > I tried subset(data,p4 == 1 && p6 == 1) or subset(data,p4==1 & p6==1).As Sundar pointed out it is the second form that you want. When intersecting conditions in subset() use &, not &&. The way that you pasted the output in your message the column names did not align with the columns. I changed this in the part that I quoted above. This shows that you chose the wrong example, I think, because that intersection is empty. Try subset(data, p2 == 1 & p3 == 1) instead.