thr3ads.net - R help - [R] any and all [Apr 2024]

If this information is useful, please help other people find it:
Share via:

Duncan Murdoch

2024-Apr-12 21:59 UTC

[R] any and all

On 12/04/2024 3:52 p.m., avi.e.gross at gmail.com wrote:> Base R has generic functions called any() and all() that I am having
trouble
> using.
>   
> It works fine when I play with it in a base R context as in:
>   
>> all(any(TRUE, TRUE), any(TRUE, FALSE))
> [1] TRUE
>> all(any(TRUE, TRUE), any(FALSE, FALSE))
> [1] FALSE
>   
> But in a tidyverse/dplyr environment, it returns wrong answers.
>   
> Consider this example. I have data I have joined together with pairs of
> columns representing a first generation and several other pairs
representing
> additional generations. I want to consider any pair where at least one of
> the pair is not NA as a success. But in order to keep the entire row, I
want
> all three pairs to have some valid data. This seems like a fairly common
> reasonable thing often needed when evaluating data.
>   
> So to make it very general, I chose to do something a bit like this:
We can't really help you without a reproducible example.  It's not 
enough to show us something that doesn't run but is a bit like the real 
code.

Duncan Murdoch
>   
> result <- filter(mydata,
>                   all(
>                     any(!is.na(first.a), !is.na(first.b)),
>                     any(!is.na(second.a), !is.na(second.b)),
>                     any(!is.na(third.a), !is.na(third.b))))
>   
> I apologize if the formatting is not seen properly. The above logically
> should work. And it should be extendable to scenarios where you want at
> least one of M columns to contain data as a group with N such groups of any
> size.
>   
> But since it did not work, I tried a plan that did work and feels silly. I
> used mutate() to make new columns such as:
>   
> result <-
>    mydata |>
>    mutate(
>      usable.1 = (!is.na(first.a) | !is.na(first.b)),
>      usable.2 = (!is.na(second.a) | !is.na(second.b)),
>      usable.3 = (!is.na(third.a) | !is.na(third.b)),
>      usable = (usable.1 & usable.2 & usable.3)
>    ) |>
>    filter(usable == TRUE)
>   
> The above wastes time and effort making new columns so I can check the
> calculations then uses the combined columns to make a Boolean that can be
> used to filter the result.
>   
> I know this is not the place to discuss dplyr. I want to check first if I
am
> doing anything wrong in how I use any/all. One guess is that the generic is
> messed with by dplyr or other packages I libraried.
>   
> And, of course, some aspects of delayed evaluation can interfere in subtle
> ways.
>   
> I note I have had other problems with these base R functions before and
> generally solved them by not using them, as shown above. I would much
rather
> use them, or something similar.
>   
>   
> Avi
>   
>   
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Dénes Tóth

2024-Apr-12 22:42 UTC

head link

[R] any and all

Hi Avi,

As Duncan already mentioned, a reproducible example would be helpful to 
assist you better. Having said that, I think you misunderstand how 
`dplyr::filter` works: it performs row-wise filtering, so the filtering 
expression shall return a logical vector of the same length as the 
data.frame, or must be a single boolean value meaning "keep all"
(TRUE)
or "drop all" (FALSE). If you use `any()` or `all()`, they return a 
single boolean value, so you have an all-or-nothing filter in the end, 
which is probably not what you want.

Note also that you do not need to use `mutate` to use `filter` (read 
?dpylr::filter carefully):
```
filter(
   .data = mydata,
   !is.na(first.a) | !is.na(first.b),
   !is.na(second.a) | !is.na(second.b),
   !is.na(third.a) | !is.na(third.b)
)
```

Or you can use `base::subset()`:
```
subset(
   mydata,
   (!is.na(first.a) | !is.na(first.b))
   & (!is.na(second.a) | !is.na(second.b))
   & (!is.na(third.a) | !is.na(third.b))
)
```

Regards,
Denes

On 4/12/24 23:59, Duncan Murdoch wrote:> On 12/04/2024 3:52 p.m., avi.e.gross at gmail.com wrote:
>> Base R has generic functions called any() and all() that I am having 
>> trouble
>> using.
>> It works fine when I play with it in a base R context as in:
>>> all(any(TRUE, TRUE), any(TRUE, FALSE))
>> [1] TRUE
>>> all(any(TRUE, TRUE), any(FALSE, FALSE))
>> [1] FALSE
>> But in a tidyverse/dplyr environment, it returns wrong answers.
>> Consider this example. I have data I have joined together with pairs of
>> columns representing a first generation and several other pairs 
>> representing
>> additional generations. I want to consider any pair where at least one
of
>> the pair is not NA as a success. But in order to keep the entire row, 
>> I want
>> all three pairs to have some valid data. This seems like a fairly
common
>> reasonable thing often needed when evaluating data.
>> So to make it very general, I chose to do something a bit like this:
> 
> We can't really help you without a reproducible example.? It's not 
> enough to show us something that doesn't run but is a bit like the real
> code.
> 
> Duncan Murdoch
> 
>> result <- filter(mydata,
>> ????????????????? all(
>> ??????????????????? any(!is.na(first.a), !is.na(first.b)),
>> ??????????????????? any(!is.na(second.a), !is.na(second.b)),
>> ??????????????????? any(!is.na(third.a), !is.na(third.b))))
>> I apologize if the formatting is not seen properly. The above logically
>> should work. And it should be extendable to scenarios where you want at
>> least one of M columns to contain data as a group with N such groups 
>> of any
>> size.
>> But since it did not work, I tried a plan that did work and feels 
>> silly. I
>> used mutate() to make new columns such as:
>> result <-
>> ?? mydata |>
>> ?? mutate(
>> ???? usable.1 = (!is.na(first.a) | !is.na(first.b)),
>> ???? usable.2 = (!is.na(second.a) | !is.na(second.b)),
>> ???? usable.3 = (!is.na(third.a) | !is.na(third.b)),
>> ???? usable = (usable.1 & usable.2 & usable.3)
>> ?? ) |>
>> ?? filter(usable == TRUE)
>> The above wastes time and effort making new columns so I can check the
>> calculations then uses the combined columns to make a Boolean that can
be
>> used to filter the result.
>> I know this is not the place to discuss dplyr. I want to check first 
>> if I am
>> doing anything wrong in how I use any/all. One guess is that the 
>> generic is
>> messed with by dplyr or other packages I libraried.
>> And, of course, some aspects of delayed evaluation can interfere in 
>> subtle
>> ways.
>> I note I have had other problems with these base R functions before and
>> generally solved them by not using them, as shown above. I would much 
>> rather
>> use them, or something similar.
>> Avi
>>
>> ????[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Maybe Matching Threads

Search for more seemingly similar threads

R help - Apr 2024 - any and all

[R] any and all

[R] any and all

Maybe Matching Threads