thr3ads.net - R help - [R] Coding style question [Feb 2015]

If this information is useful, please help other people find it:
Share via:

John Posner

2015-Feb-17 16:19 UTC

[R] Coding style question

In the course of slicing-and-dicing some data, I had occasion to create a list
like this:

list(
    subset(my_dataframe, GR1=="XX1"),
    subset(my_dataframe, GR1=="XX2"),
    subset(my_dataframe, GR1=="YY"),
    subset(my_dataframe, GR1 %in% c("XX1", "XX2")), 
    subset(my_dataframe, GR2=="Remission"),
    subset(my_dataframe, GR2=="Relapse"))

I used %in% only once, because there was only one "compound value"
(XX1 or XX2) for subsetting. But then it occurred to me to use %in% everywhere,
taking advantage of the fact that a scalar value is the same as a length-1
vector:

list(
    subset(my_dataframe, GR1 %in% "XX1"),
    subset(my_dataframe, GR1 %in% "XX2"),
    subset(my_dataframe, GR1 %in% "YY"),
    subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
    subset(my_dataframe, GR2 %in% "Remission"),
    subset(my_dataframe, GR2 %in% "Relapse"))

It works just fine.  Are there any problems with this style, from the
standpoints of correctness, aesthetics, etc.?

-John

Duncan Murdoch

2015-Feb-17 18:42 UTC

head link

[R] Coding style question

On 17/02/2015 11:19 AM, John Posner wrote:> In the course of slicing-and-dicing some data, I had occasion to create a
list like this:
>
> list(
>      subset(my_dataframe, GR1=="XX1"),
>      subset(my_dataframe, GR1=="XX2"),
>      subset(my_dataframe, GR1=="YY"),
>      subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
>      subset(my_dataframe, GR2=="Remission"),
>      subset(my_dataframe, GR2=="Relapse"))
>
> I used %in% only once, because there was only one "compound
value" (XX1 or XX2) for subsetting. But then it occurred to me to use %in%
everywhere, taking advantage of the fact that a scalar value is the same as a
length-1 vector:
>
> list(
>      subset(my_dataframe, GR1 %in% "XX1"),
>      subset(my_dataframe, GR1 %in% "XX2"),
>      subset(my_dataframe, GR1 %in% "YY"),
>      subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
>      subset(my_dataframe, GR2 %in% "Remission"),
>      subset(my_dataframe, GR2 %in% "Relapse"))
>
> It works just fine.  Are there any problems with this style, from the
standpoints of correctness, aesthetics, etc.?
If GR1 or GR2 has a missing value, you get NA from the equality tests, 
but FALSE from the %in% tests.  That won't affect subset (where NA and 
FALSE both result in the omission of the observation), but it might 
affect other code like this.  For example, if you had selected rows 
using a logical index instead of using subset, the NA entries in the 
index would result in NA selections in the data.

Duncan Murdoch

R help - Feb 2015 - Coding style question

[R] Coding style question

[R] Coding style question