Fernando Henrique Ferraz Pereira da Rosa
2003-Jun-19 20:37 UTC
[R] Subseting by more than one factor...
Is it possible in R to subset a dataframe by more than one factor, all at once? For instance, I have the dataframe: >data p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred 1 0 1 0 0 0 0 0 0 0 0 0.5862069 4 0 0 0 0 0 0 0 0 0 1 0.5862069 5 0 0 0 0 0 0 1 0 0 0 0.5862069 6 0 0 0 0 0 0 0 1 0 0 0.5862069 7 0 0 1 0 0 0 0 0 0 0 0.5862069 9 0 0 0 0 1 0 0 0 0 0 0.5862069 20 0 1 1 0 0 0 0 0 0 0 0.5862069 22 0 1 0 0 1 0 0 0 0 0 0.5862069 24 0 1 0 0 0 0 1 0 0 0 0.5862069 25 0 1 0 0 0 0 0 1 0 0 0.5862069 27 0 1 0 0 0 0 0 0 0 1 0.5862069 If I want to subset only those points that have p4 = 1, I do: > subset(data,p4 == 1) And that's fine. Now suppose I want to subset those that not only have p4 = 1, but also p6 = 1. I tried subset(data,p4 == 1 && p6 == 1) or subset(data,p4==1 & p6==1). But it didn't work. Then I found a clumsy way to do it : subset(subset(data,p4==1),p6==1) Which works. But it soon gets very clumsy as the number of conditions increase (I end up with a really large number of nested subsets). Is there a simpler way to do that? --
Fernando Henrique Ferraz Pereira da Rosa wrote:> Is it possible in R to subset a dataframe by more than one factor, all at > once? > For instance, I have the dataframe: > >data > p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred > 1 0 1 0 0 0 0 0 0 0 0 0.5862069 > 4 0 0 0 0 0 0 0 0 0 1 0.5862069 > 5 0 0 0 0 0 0 1 0 0 0 0.5862069 > 6 0 0 0 0 0 0 0 1 0 0 0.5862069 > 7 0 0 1 0 0 0 0 0 0 0 0.5862069 > 9 0 0 0 0 1 0 0 0 0 0 0.5862069 > 20 0 1 1 0 0 0 0 0 0 0 0.5862069 > 22 0 1 0 0 1 0 0 0 0 0 0.5862069 > 24 0 1 0 0 0 0 1 0 0 0 0.5862069 > 25 0 1 0 0 0 0 0 1 0 0 0.5862069 > 27 0 1 0 0 0 0 0 0 0 1 0.5862069 > > If I want to subset only those points that have p4 = 1, I do: > > subset(data,p4 == 1) > And that's fine. Now suppose I want to subset those that not only have p4 > = 1, but also p6 = 1. > I tried subset(data,p4 == 1 && p6 == 1) or subset(data,p4==1 & p6==1). > But it didn't work.It didn't? It does for me: R> subset(z, p4 == 1 & p6 == 1) [1] p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred <0 rows> (or 0-length row.names) R> subset(z, p2 == 1 & p8 == 1) p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred 10 0 1 0 0 0 0 0 1 0 0 0.5862069 R> subset(z, (p2 == 1 & p3 == 0) | p5 == 1) p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred 1 0 1 0 0 0 0 0 0 0 0 0.5862069 6 0 0 0 0 1 0 0 0 0 0 0.5862069 8 0 1 0 0 1 0 0 0 0 0 0.5862069 9 0 1 0 0 0 0 1 0 0 0 0.5862069 10 0 1 0 0 0 0 0 1 0 0 0.5862069 11 0 1 0 0 0 0 0 0 0 1 0.5862069 R> version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 1 minor 7.0 year 2003 month 04 day 16 language R R> [snip] Regards, Sundar
Fernando Henrique Ferraz Pereira da Rosa <mentus at gmx.de> writes:> Is it possible in R to subset a dataframe by more than one factor, all at > once? > For instance, I have the dataframe: > >data > p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 pred > 1 0 1 0 0 0 0 0 0 0 0 0.5862069 > 4 0 0 0 0 0 0 0 0 0 1 0.5862069 > 5 0 0 0 0 0 0 1 0 0 0 0.5862069 > 6 0 0 0 0 0 0 0 1 0 0 0.5862069 > 7 0 0 1 0 0 0 0 0 0 0 0.5862069 > 9 0 0 0 0 1 0 0 0 0 0 0.5862069 > 20 0 1 1 0 0 0 0 0 0 0 0.5862069 > 22 0 1 0 0 1 0 0 0 0 0 0.5862069 > 24 0 1 0 0 0 0 1 0 0 0 0.5862069 > 25 0 1 0 0 0 0 0 1 0 0 0.5862069 > 27 0 1 0 0 0 0 0 0 0 1 0.5862069 > > If I want to subset only those points that have p4 = 1, I do: > > subset(data,p4 == 1) > And that's fine. Now suppose I want to subset those that not only have p4 > = 1, but also p6 = 1. > I tried subset(data,p4 == 1 && p6 == 1) or subset(data,p4==1 & p6==1).As Sundar pointed out it is the second form that you want. When intersecting conditions in subset() use &, not &&. The way that you pasted the output in your message the column names did not align with the columns. I changed this in the part that I quoted above. This shows that you chose the wrong example, I think, because that intersection is empty. Try subset(data, p2 == 1 & p3 == 1) instead.