thr3ads.net - R help - [R] combine filter() and select() [Aug 2020]

If this information is useful, please help other people find it:
Share via:

Ivan Calandra

2020-Aug-19 14:56 UTC

[R] combine filter() and select()

Dear useRs,

I'm new to the tidyverse world and I need some help on basic things.

I have the following tibble:
mytbl <- structure(list(files = c("a", "b",
"c", "d", "e", "f"), prop 1:6),
row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))

I want to subset the rows with "a" in the column "files",
and keep only
that column.

So I did:
myfile <- mytbl %>%
? filter(grepl("a", files)) %>%
? select(files)

It works, but I believe there must be an easier way to combine filter()
and select(), right?

Thank you!
Ivan

-- 
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

Chris Evans

2020-Aug-19 17:21 UTC

head link

[R] combine filter() and select()

Inline

----- Original Message -----> From: "Ivan Calandra" <calandra at rgzm.de>
> To: "R-help" <r-help at r-project.org>
> Sent: Wednesday, 19 August, 2020 16:56:32
> Subject: [R] combine filter() and select()
> Dear useRs,
> 
> I'm new to the tidyverse world and I need some help on basic things.
> 
> I have the following tibble:
> mytbl <- structure(list(files = c("a", "b",
"c", "d", "e", "f"), prop > 1:6),
row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
> 
> I want to subset the rows with "a" in the column
"files", and keep only
> that column.
> 
> So I did:
> myfile <- mytbl %>%
>? filter(grepl("a", files)) %>%
>? select(files)
> 
> It works, but I believe there must be an easier way to combine filter()
> and select(), right?
I would write 

mytbl %>%
  filter(grepl("a", files)) %>%
  select(files) -> myfile

as I like to keep a sort of "top to bottom and left to right" flow
when writing in the tidyverse dialect of R but that's really not important.

Apart from that I think what you've done is "proper tidyverse". To
me another difference between the dialects is that classical R often seems to
put value on, and make it easy, to do things with incredible few characters.  I
think the people who are brilliant at that sort of coding, and there are many on
this list, that sort of coding is also easy to read.  I know that Chinese is
easy to read if you grew up on it but to a bear of little brain like me, the
much more verbose style of tidyverse repays typing time with readability when I
come back to my code and, though I have little experience of this yet, when I
read other poeple's code.

What did you think wasn't "easy" about what you wrote?

Very best (all),

Chris
> 
> Thank you!
> Ivan
> 
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Small contribution in our coronavirus rigours: 
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/

Chris Evans <chris at psyctc.org> Visiting Professor, University of
Sheffield <chris.evans at sheffield.ac.uk>
I do some consultation work for the University of Roehampton <chris.evans at
roehampton.ac.uk> and other places
but <chris at psyctc.org> remains my main Email address.  I have a work
web site at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see: 
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
  
https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/

If you want an Emeeting, I am trying to keep them to Thursdays and my diary is
at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

Jeff Newmiller

2020-Aug-19 17:27 UTC

head link

[R] combine filter() and select()

The whole point of dplyr primitives is to support data frames... that is, lists
of columns. When you pare your data frame down to one column you are almost
certainly using the wrong tool for the job.

So, sure, your code works... and it even does what you wanted in the dplyr
style, but what a pointless exercise.

grep( "a", mytbl$file, value=TRUE )

On August 19, 2020 7:56:32 AM PDT, Ivan Calandra <calandra at rgzm.de>
wrote:>Dear useRs,
>
>I'm new to the tidyverse world and I need some help on basic things.
>
>I have the following tibble:
>mytbl <- structure(list(files = c("a", "b",
"c", "d", "e", "f"), prop >1:6),
row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
>
>I want to subset the rows with "a" in the column
"files", and keep only
>that column.
>
>So I did:
>myfile <- mytbl %>%
>? filter(grepl("a", files)) %>%
>? select(files)
>
>It works, but I believe there must be an easier way to combine filter()
>and select(), right?
>
>Thank you!
>Ivan
-- 
Sent from my phone. Please excuse my brevity.

Ivan Calandra

2020-Aug-20 06:40 UTC

head link

[R] combine filter() and select()

Dear Chris,

I didn't think about having the assignment at the end as you showed; it
indeed fits the pipe workflow better.

By "easy", I actually meant shorter. As you said, in base R, I usually
do that in 1 line, so I was hoping to do the same in tidyverse. But I'm
glad to hear that I'm using tidyverse the proper way :)

Best regards,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 19/08/2020 19:21, Chris Evans wrote:> Inline
>
> ----- Original Message -----
>> From: "Ivan Calandra" <calandra at rgzm.de>
>> To: "R-help" <r-help at r-project.org>
>> Sent: Wednesday, 19 August, 2020 16:56:32
>> Subject: [R] combine filter() and select()
>> Dear useRs,
>>
>> I'm new to the tidyverse world and I need some help on basic
things.
>>
>> I have the following tibble:
>> mytbl <- structure(list(files = c("a", "b",
"c", "d", "e", "f"), prop >> 1:6),
row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
>>
>> I want to subset the rows with "a" in the column
"files", and keep only
>> that column.
>>
>> So I did:
>> myfile <- mytbl %>%
>> ? filter(grepl("a", files)) %>%
>> ? select(files)
>>
>> It works, but I believe there must be an easier way to combine filter()
>> and select(), right?
> I would write 
>
> mytbl %>%
>   filter(grepl("a", files)) %>%
>   select(files) -> myfile
>
> as I like to keep a sort of "top to bottom and left to right"
flow when writing in the tidyverse dialect of R but that's really not
important.
>
> Apart from that I think what you've done is "proper
tidyverse". To me another difference between the dialects is that classical
R often seems to put value on, and make it easy, to do things with incredible
few characters.  I think the people who are brilliant at that sort of coding,
and there are many on this list, that sort of coding is also easy to read.  I
know that Chinese is easy to read if you grew up on it but to a bear of little
brain like me, the much more verbose style of tidyverse repays typing time with
readability when I come back to my code and, though I have little experience of
this yet, when I read other poeple's code.
>
> What did you think wasn't "easy" about what you wrote?
>
> Very best (all),
>
> Chris
>
>> Thank you!
>> Ivan
>>
>> --
>> Dr. Ivan Calandra
>> TraCEr, laboratory for Traceology and Controlled Experiments
>> MONREPOS Archaeological Research Centre and
>> Museum for Human Behavioural Evolution
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> https://www.researchgate.net/profile/Ivan_Calandra
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Ivan Calandra

2020-Aug-20 06:46 UTC

head link

[R] combine filter() and select()

Hi Jeff,

The code you show is exactly what I usually do, in base R; but I wanted
to play with tidyverse to learn it (and also understand when it makes
sense and when it doesn't).

And yes, of course, in the example I gave, I end up with a 1-cell
tibble, which could be better extracted as a length-1 vector. But my
real goal is not to end up with a single value or even a single column.
I just thought that simplifying my example was the best approach to ask
for advice.

But thank you for letting me know that what I'm doing is pointless!

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 19/08/2020 19:27, Jeff Newmiller wrote:> The whole point of dplyr primitives is to support data frames... that is,
lists of columns. When you pare your data frame down to one column you are
almost certainly using the wrong tool for the job.
>
> So, sure, your code works... and it even does what you wanted in the dplyr
style, but what a pointless exercise.
>
> grep( "a", mytbl$file, value=TRUE )
>
> On August 19, 2020 7:56:32 AM PDT, Ivan Calandra <calandra at
rgzm.de> wrote:
>> Dear useRs,
>>
>> I'm new to the tidyverse world and I need some help on basic
things.
>>
>> I have the following tibble:
>> mytbl <- structure(list(files = c("a", "b",
"c", "d", "e", "f"), prop >> 1:6),
row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
>>
>> I want to subset the rows with "a" in the column
"files", and keep only
>> that column.
>>
>> So I did:
>> myfile <- mytbl %>%
>> ? filter(grepl("a", files)) %>%
>> ? select(files)
>>
>> It works, but I believe there must be an easier way to combine filter()
>> and select(), right?
>>
>> Thank you!
>> Ivan

Hadley Wickham

2020-Aug-20 16:52 UTC

head link

[R] combine filter() and select()

On Wed, Aug 19, 2020 at 10:03 AM Ivan Calandra <calandra at rgzm.de>
wrote:>
> Dear useRs,
>
> I'm new to the tidyverse world and I need some help on basic things.
>
> I have the following tibble:
> mytbl <- structure(list(files = c("a", "b",
"c", "d", "e", "f"), prop > 1:6),
row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
>
> I want to subset the rows with "a" in the column
"files", and keep only
> that column.
>
> So I did:
> myfile <- mytbl %>%
>   filter(grepl("a", files)) %>%
>   select(files)
>
> It works, but I believe there must be an easier way to combine filter()
> and select(), right?
Not in the tidyverse. As others have mentioned, both [ and subset() in
base R allow you to simultaneously subset rows and columns, but
there's no single verb in the tidyverse that does both. This is
somewhat informed by the observation that in data frames, unlike
matrices, rows and columns are not exchangeable, and you typically
want to express subsetting in rather different ways.

Hadley

-- 
http://hadley.nz

R help - Aug 2020 - combine filter() and select()

[R] combine filter() and select()

[R] combine filter() and select()

[R] combine filter() and select()

[R] combine filter() and select()

[R] combine filter() and select()

[R] combine filter() and select()