On 17/02/2015 11:19 AM, John Posner wrote:> In the course of slicing-and-dicing some data, I had occasion to create a
list like this:
>
> list(
> subset(my_dataframe, GR1=="XX1"),
> subset(my_dataframe, GR1=="XX2"),
> subset(my_dataframe, GR1=="YY"),
> subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
> subset(my_dataframe, GR2=="Remission"),
> subset(my_dataframe, GR2=="Relapse"))
>
> I used %in% only once, because there was only one "compound
value" (XX1 or XX2) for subsetting. But then it occurred to me to use %in%
everywhere, taking advantage of the fact that a scalar value is the same as a
length-1 vector:
>
> list(
> subset(my_dataframe, GR1 %in% "XX1"),
> subset(my_dataframe, GR1 %in% "XX2"),
> subset(my_dataframe, GR1 %in% "YY"),
> subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
> subset(my_dataframe, GR2 %in% "Remission"),
> subset(my_dataframe, GR2 %in% "Relapse"))
>
> It works just fine. Are there any problems with this style, from the
standpoints of correctness, aesthetics, etc.?
If GR1 or GR2 has a missing value, you get NA from the equality tests,
but FALSE from the %in% tests. That won't affect subset (where NA and
FALSE both result in the omission of the observation), but it might
affect other code like this. For example, if you had selected rows
using a logical index instead of using subset, the NA entries in the
index would result in NA selections in the data.
Duncan Murdoch