thr3ads.net - R devel - [Rd] should base R have a piping operator ? [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Ant F

2019-Oct-05 18:26 UTC

[Rd] should base R have a piping operator ?

Yes but this exageration precisely misses the point.

Concerning your examples:

* I love fread but I think it makes a lot of subjective choices that are
best associated with a package. I think it
changed a lot with time and can still change, and we have great developers
willing to maintain it and be reactive
regarding feature requests or bug reports

*.group_by() adds a class that works only (or mostly) with tidyverse verbs,
that's very easy to dismiss it as an inclusion in base R.

* summarize is an alternative to aggregate, that would be very confusing to
have both

Now to be fair to your argument we could think of other functions such as
data.table::rleid() which I believe base R misses deeply,
and there is nothing wrong with packaged functions making their way to base
R.

Maybe there's an existing list of criteria for inclusion, in base R but if
not I can make one up for the sake of this discussion :) :
* 1) the functionality should not already exist
* 2) the function should be general enough
* 3) the function should have a large amount of potential of users
* 4) the function should be robust, and not require extensive maintenance
* 5) the function should be stable, we shouldn't expect new features ever 2
months
* 6) the function should have an intuitive interface in the context of the
rest ot base R

I guess 1 and 6 could be held against my proposal, because :
(1) everything can be done without pipes
(6) They are somewhat surprising (though with explicit dots not that much,
and not more surprising than say `bquote()`)

In my opinion the + offset the -.

I wouldn't advise taking magrittr's pipe (providing the license allows
so)
for instance, because it makes a lot of design choices and has a complex
behavior, what I propose is 2 lines of code very unlikely to evolve or
require maintenance.

Antoine

PS: I just receive the digest once a day so If you don't "reply
all" I can
only react later.

Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at gmail.com> a
?crit :
> I exaggerated the comparison for effect. However, it is not very difficult
> to find functions in dplyr or data.table or indeed other packages that one
> may wish to be in base R. Examples, for me, could include
> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
Also,
> the "popularity" of magrittr::`%>%` is mostly attributable to
the tidyverse
> (an advanced superset of R). Many R users don't even know that they are
> installing the magrittr package.
>
> On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at
fedoraproject.org> wrote:
>
>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at
gmail.com> wrote:
>> >
>> > How is your argument different to, say,  "Should dplyr or
data.table be
>> > part of base R as they are the most popular data science packages
and
>> they
>> > are used by a large number of users?"
>>
>> Two packages with many features, dozens of functions and under heavy
>> development to fix bugs, add new features and improve performance, vs.
>> a single operator with a limited and well-defined functionality, and a
>> reference implementation that hasn't changed in years (but
certainly
>> hackish in a way that probably could only be improved from R itself).
>>
>> Can't you really spot the difference?
>>
>> I?aki
>>
>
	[[alternative HTML version deleted]]

Gabriel Becker

2019-Oct-05 23:50 UTC

head link

[Rd] should base R have a piping operator ?

Hi all,

I think there's some nuance here that makes makes me agree partially with
each "side".

The pipe is inarguably extremely popular. Many probably think of it as a
core feature of R, along with the tidyverse that (as was pointed out)
largely surrounds it and drives its popularity. Whether its a good or bad
thing that they think that doesn't change the fact that by my estimation
that Ant is correct that they do. BUT, I don't agree with him that that, by
itself, is a reason to put it in base R in the form that it exists now. For
the current form, there aren't really any major downsides that I see to
having people just use the package version.

Sure it may be a little weird, but it doesn't ever really stop the
people from using it or present a significant barrier. Another major point
is that many (most?) base R functions are not necessarily tooled to be
endomorphic, which in my personal opinion is *largely* the only place that
the pipes are really compelling.

That was for pipes as the exist in package space, though. There is another
way the pipe could go into base R that could not be done in package space
and has the potential to mitigate some pretty serious downsides to the
pipes relating to debugging, which would be to implement them in the parser.

If

iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length))
%>%
filter(mean_sl > 5)

were *parsed* as, for example, into

local({
            . = group_by(iris, Species)

            ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))

            filter(., mean_sl > 5)
       })

Then debuggiing (once you knew that) would be much easier but behavaior
would be the same as it is now. There could even be some sort of
step-through-pipe debugger at that point added as well for additional
convenience.

There is some minor precedent for that type of transformative parsing:
> expr = parse(text = "5 -> x")
> expr
expression(5 -> x)
> expr[[1]]
x <- 5

Though thats a much more minor transformation.

All of that said, I believe Jim Hester (cc'ed) suggested something along
these lines at the RSummit a couple of years ago, and thus far R-core has
not shown much appetite for changing things in the parser.

Without that changing, I'd have to say that my vote, for whatever its
worth, comes down on the side of pipes being fine in packages. A summary of
my reasoning being that it only makes sense for them to go into R itself if
doing so fixes an issue that cna't be fixed with them in package space.

Best,
~G

On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com> wrote:
> Yes but this exageration precisely misses the point.
>
> Concerning your examples:
>
> * I love fread but I think it makes a lot of subjective choices that are
> best associated with a package. I think it
> changed a lot with time and can still change, and we have great developers
> willing to maintain it and be reactive
> regarding feature requests or bug reports
>
> *.group_by() adds a class that works only (or mostly) with tidyverse verbs,
> that's very easy to dismiss it as an inclusion in base R.
>
> * summarize is an alternative to aggregate, that would be very confusing to
> have both
>
> Now to be fair to your argument we could think of other functions such as
> data.table::rleid() which I believe base R misses deeply,
> and there is nothing wrong with packaged functions making their way to base
> R.
>
> Maybe there's an existing list of criteria for inclusion, in base R but
if
> not I can make one up for the sake of this discussion :) :
> * 1) the functionality should not already exist
> * 2) the function should be general enough
> * 3) the function should have a large amount of potential of users
> * 4) the function should be robust, and not require extensive maintenance
> * 5) the function should be stable, we shouldn't expect new features
ever 2
> months
> * 6) the function should have an intuitive interface in the context of the
> rest ot base R
>
> I guess 1 and 6 could be held against my proposal, because :
> (1) everything can be done without pipes
> (6) They are somewhat surprising (though with explicit dots not that much,
> and not more surprising than say `bquote()`)
>
> In my opinion the + offset the -.
>
> I wouldn't advise taking magrittr's pipe (providing the license
allows so)
> for instance, because it makes a lot of design choices and has a complex
> behavior, what I propose is 2 lines of code very unlikely to evolve or
> require maintenance.
>
> Antoine
>
> PS: I just receive the digest once a day so If you don't "reply
all" I can
> only react later.
>
> Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at gmail.com> a
?crit :
>
> > I exaggerated the comparison for effect. However, it is not very
> difficult
> > to find functions in dplyr or data.table or indeed other packages that
> one
> > may wish to be in base R. Examples, for me, could include
> > data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo,
etc.
> Also,
> > the "popularity" of magrittr::`%>%` is mostly
attributable to the
> tidyverse
> > (an advanced superset of R). Many R users don't even know that
they are
> > installing the magrittr package.
> >
> > On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at
fedoraproject.org>
> wrote:
> >
> >> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at
gmail.com> wrote:
> >> >
> >> > How is your argument different to, say,  "Should dplyr
or data.table
> be
> >> > part of base R as they are the most popular data science
packages and
> >> they
> >> > are used by a large number of users?"
> >>
> >> Two packages with many features, dozens of functions and under
heavy
> >> development to fix bugs, add new features and improve performance,
vs.
> >> a single operator with a limited and well-defined functionality,
and a
> >> reference implementation that hasn't changed in years (but
certainly
> >> hackish in a way that probably could only be improved from R
itself).
> >>
> >> Can't you really spot the difference?
> >>
> >> I?aki
> >>
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Joris Meys

2019-Oct-06 08:13 UTC

head link

[Rd] should base R have a piping operator ?

I'm largely with Gabriel Becker on this one: if pipes enter base R, they
should be a well thought out and integrated part of the language.

I do see merit though in providing a pipe in base R. Reason is mainly that
right now there's not a single pipe. A pipe function exists in different
packages, and it's not impossible that at one point piping operators might
behave slightly different depending on the package you load. So I hope
someone from RStudio is reading this thread and decides to do the heavy
lifting for R core. After all, it really is mainly their packages that
would benefit from it. I can't think of a non-tidyverse package that's
easier to use with pipes than without.

Best
Joris

On Sun, Oct 6, 2019 at 1:50 AM Gabriel Becker <gabembecker at gmail.com>
wrote:
> Hi all,
>
> I think there's some nuance here that makes makes me agree partially
with
> each "side".
>
> The pipe is inarguably extremely popular. Many probably think of it as a
> core feature of R, along with the tidyverse that (as was pointed out)
> largely surrounds it and drives its popularity. Whether its a good or bad
> thing that they think that doesn't change the fact that by my
estimation
> that Ant is correct that they do. BUT, I don't agree with him that
that, by
> itself, is a reason to put it in base R in the form that it exists now. For
> the current form, there aren't really any major downsides that I see to
> having people just use the package version.
>
> Sure it may be a little weird, but it doesn't ever really stop the
> people from using it or present a significant barrier. Another major point
> is that many (most?) base R functions are not necessarily tooled to be
> endomorphic, which in my personal opinion is *largely* the only place that
> the pipes are really compelling.
>
> That was for pipes as the exist in package space, though. There is another
> way the pipe could go into base R that could not be done in package space
> and has the potential to mitigate some pretty serious downsides to the
> pipes relating to debugging, which would be to implement them in the
> parser.
>
> If
>
> iris %>% group_by(Species) %>% summarize(mean_sl =
mean(Sepal.Length)) %>%
> filter(mean_sl > 5)
>
>
> were *parsed* as, for example, into
>
> local({
>             . = group_by(iris, Species)
>
>             ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
>
>             filter(., mean_sl > 5)
>        })
>
>
>
>
> Then debuggiing (once you knew that) would be much easier but behavaior
> would be the same as it is now. There could even be some sort of
> step-through-pipe debugger at that point added as well for additional
> convenience.
>
> There is some minor precedent for that type of transformative parsing:
>
> > expr = parse(text = "5 -> x")
>
> > expr
>
> expression(5 -> x)
>
> > expr[[1]]
>
> x <- 5
>
>
> Though thats a much more minor transformation.
>
> All of that said, I believe Jim Hester (cc'ed) suggested something
along
> these lines at the RSummit a couple of years ago, and thus far R-core has
> not shown much appetite for changing things in the parser.
>
> Without that changing, I'd have to say that my vote, for whatever its
> worth, comes down on the side of pipes being fine in packages. A summary of
> my reasoning being that it only makes sense for them to go into R itself if
> doing so fixes an issue that cna't be fixed with them in package space.
>
> Best,
> ~G
>
>
>
> On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com>
wrote:
>
> > Yes but this exageration precisely misses the point.
> >
> > Concerning your examples:
> >
> > * I love fread but I think it makes a lot of subjective choices that
are
> > best associated with a package. I think it
> > changed a lot with time and can still change, and we have great
> developers
> > willing to maintain it and be reactive
> > regarding feature requests or bug reports
> >
> > *.group_by() adds a class that works only (or mostly) with tidyverse
> verbs,
> > that's very easy to dismiss it as an inclusion in base R.
> >
> > * summarize is an alternative to aggregate, that would be very
confusing
> to
> > have both
> >
> > Now to be fair to your argument we could think of other functions such
as
> > data.table::rleid() which I believe base R misses deeply,
> > and there is nothing wrong with packaged functions making their way to
> base
> > R.
> >
> > Maybe there's an existing list of criteria for inclusion, in base
R but
> if
> > not I can make one up for the sake of this discussion :) :
> > * 1) the functionality should not already exist
> > * 2) the function should be general enough
> > * 3) the function should have a large amount of potential of users
> > * 4) the function should be robust, and not require extensive
maintenance
> > * 5) the function should be stable, we shouldn't expect new
features
> ever 2
> > months
> > * 6) the function should have an intuitive interface in the context of
> the
> > rest ot base R
> >
> > I guess 1 and 6 could be held against my proposal, because :
> > (1) everything can be done without pipes
> > (6) They are somewhat surprising (though with explicit dots not that
> much,
> > and not more surprising than say `bquote()`)
> >
> > In my opinion the + offset the -.
> >
> > I wouldn't advise taking magrittr's pipe (providing the
license allows
> so)
> > for instance, because it makes a lot of design choices and has a
complex
> > behavior, what I propose is 2 lines of code very unlikely to evolve or
> > require maintenance.
> >
> > Antoine
> >
> > PS: I just receive the digest once a day so If you don't
"reply all" I
> can
> > only react later.
> >
> > Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at
gmail.com> a
> ?crit :
> >
> > > I exaggerated the comparison for effect. However, it is not very
> > difficult
> > > to find functions in dplyr or data.table or indeed other packages
that
> > one
> > > may wish to be in base R. Examples, for me, could include
> > > data.table::fread, dplyr::group_by & dplyr::summari[sZ]e
combo, etc.
> > Also,
> > > the "popularity" of magrittr::`%>%` is mostly
attributable to the
> > tidyverse
> > > (an advanced superset of R). Many R users don't even know
that they are
> > > installing the magrittr package.
> > >
> > > On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at
fedoraproject.org>
> > wrote:
> > >
> > >> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at
gmail.com>
> wrote:
> > >> >
> > >> > How is your argument different to, say,  "Should
dplyr or data.table
> > be
> > >> > part of base R as they are the most popular data science
packages
> and
> > >> they
> > >> > are used by a large number of users?"
> > >>
> > >> Two packages with many features, dozens of functions and
under heavy
> > >> development to fix bugs, add new features and improve
performance, vs.
> > >> a single operator with a limited and well-defined
functionality, and a
> > >> reference implementation that hasn't changed in years
(but certainly
> > >> hackish in a way that probably could only be improved from R
itself).
> > >>
> > >> Can't you really spot the difference?
> > >>
> > >> I?aki
> > >>
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2018-2019
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

John Mount

2019-Oct-06 16:58 UTC

head link

[Rd] should base R have a piping operator ?

Except for the isolation of local() R pretty much already has the parsing
transformation you mention.


as.list(parse(text="

iris ->.; 
  group_by(., Species) ->.; 
  summarize(., mean_sl = mean(Sepal.Length)) ->.;
  filter(., mean_sl > 5)

"))

#> [[1]]
#> . <- iris
#> 
#> [[2]]
#> . <- group_by(., Species)
#> 
#> [[3]]
#> . <- summarize(., mean_sl = mean(Sepal.Length))
#> 
#> [[4]]
#> filter(., mean_sl > 5)


<sup>Created on 2019-10-06 by the [reprex
package](https://reprex.tidyverse.org) (v0.3.0)</sup>
> On Oct 5, 2019, at 4:50 PM, Gabriel Becker <gabembecker at gmail.com>
wrote:
> 
> 
> iris %>% group_by(Species) %>% summarize(mean_sl =
mean(Sepal.Length)) %>%
> filter(mean_sl > 5)
> 
> 
> were *parsed* as, for example, into
> 
> local({
>            . = group_by(iris, Species)
> 
>            ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
> 
>            filter(., mean_sl > 5)
>       })
> 
> 
> 
> 
> Then debuggiing (once you knew that) would be much easier but behavaior
> would be the same as it is now. There could even be some sort of
> step-through-pipe debugger at that point added as well for additional
> convenience.
> 
> There is some minor precedent for that type of transformative parsing:
> 
>> expr = parse(text = "5 -> x")
> 
>> expr
> 
> expression(5 -> x)
> 
>> expr[[1]]
> 
> x <- 5
> 
> 
> Though thats a much more minor transformation.

Duncan Murdoch

2019-Oct-06 20:56 UTC

head link

[Rd] should base R have a piping operator ?

On 05/10/2019 7:50 p.m., Gabriel Becker wrote:> Hi all,
> 
> I think there's some nuance here that makes makes me agree partially
with
> each "side".
> 
> The pipe is inarguably extremely popular. Many probably think of it as a
> core feature of R, along with the tidyverse that (as was pointed out)
> largely surrounds it and drives its popularity. Whether its a good or bad
> thing that they think that doesn't change the fact that by my
estimation
> that Ant is correct that they do. BUT, I don't agree with him that
that, by
> itself, is a reason to put it in base R in the form that it exists now. For
> the current form, there aren't really any major downsides that I see to
> having people just use the package version.
> 
> Sure it may be a little weird, but it doesn't ever really stop the
> people from using it or present a significant barrier. Another major point
> is that many (most?) base R functions are not necessarily tooled to be
> endomorphic, which in my personal opinion is *largely* the only place that
> the pipes are really compelling.
> 
> That was for pipes as the exist in package space, though. There is another
> way the pipe could go into base R that could not be done in package space
> and has the potential to mitigate some pretty serious downsides to the
> pipes relating to debugging, which would be to implement them in the
parser.
Actually, that could be done in package space too:  just write a 
function to do the transformation.  That is, something like

    transformPipe( a %>% b %>% c )

could convert the original expression into one like yours below.  This 
could be done by a smart IDE like RStudio without the user typing anything.

A really strong argument for doing this in a package instead of Bison/C 
code in the parser is the help page ?magrittr::"%>%".  There are so
many
special cases there that it's certainly hard and possibly impossible for 
the parser to do the transformation:  I think some parts of the 
transformation depend on run-time values, not syntax.

Of course, a simpler operator like Antoine's would be easier, but that 
would break code that uses magrittr pipes, and I think those are the 
most commonly accepted ones.

So a workable plan would be for all the pipe authors to agree on syntax 
for transformPipe(), and then for IDE authors to support it.  R Core 
doesn't need to be involved at all unless they want to update Rgui or 
R.app or command line R.

Duncan Murdoch
> 
> If
> 
> iris %>% group_by(Species) %>% summarize(mean_sl =
mean(Sepal.Length)) %>%
> filter(mean_sl > 5)
> 
> 
> were *parsed* as, for example, into
> 
> local({
>              . = group_by(iris, Species)
> 
>              ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
> 
>              filter(., mean_sl > 5)
>         })
> 
> 
> 
> 
> Then debuggiing (once you knew that) would be much easier but behavaior
> would be the same as it is now. There could even be some sort of
> step-through-pipe debugger at that point added as well for additional
> convenience.
> 
> There is some minor precedent for that type of transformative parsing:
> 
>> expr = parse(text = "5 -> x")
> 
>> expr
> 
> expression(5 -> x)
> 
>> expr[[1]]
> 
> x <- 5
> 
> 
> Though thats a much more minor transformation.
> 
> All of that said, I believe Jim Hester (cc'ed) suggested something
along
> these lines at the RSummit a couple of years ago, and thus far R-core has
> not shown much appetite for changing things in the parser.
> 
> Without that changing, I'd have to say that my vote, for whatever its
> worth, comes down on the side of pipes being fine in packages. A summary of
> my reasoning being that it only makes sense for them to go into R itself if
> doing so fixes an issue that cna't be fixed with them in package space.
> 
> Best,
> ~G
> 
> 
> 
> On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com>
wrote:
> 
>> Yes but this exageration precisely misses the point.
>>
>> Concerning your examples:
>>
>> * I love fread but I think it makes a lot of subjective choices that
are
>> best associated with a package. I think it
>> changed a lot with time and can still change, and we have great
developers
>> willing to maintain it and be reactive
>> regarding feature requests or bug reports
>>
>> *.group_by() adds a class that works only (or mostly) with tidyverse
verbs,
>> that's very easy to dismiss it as an inclusion in base R.
>>
>> * summarize is an alternative to aggregate, that would be very
confusing to
>> have both
>>
>> Now to be fair to your argument we could think of other functions such
as
>> data.table::rleid() which I believe base R misses deeply,
>> and there is nothing wrong with packaged functions making their way to
base
>> R.
>>
>> Maybe there's an existing list of criteria for inclusion, in base R
but if
>> not I can make one up for the sake of this discussion :) :
>> * 1) the functionality should not already exist
>> * 2) the function should be general enough
>> * 3) the function should have a large amount of potential of users
>> * 4) the function should be robust, and not require extensive
maintenance
>> * 5) the function should be stable, we shouldn't expect new
features ever 2
>> months
>> * 6) the function should have an intuitive interface in the context of
the
>> rest ot base R
>>
>> I guess 1 and 6 could be held against my proposal, because :
>> (1) everything can be done without pipes
>> (6) They are somewhat surprising (though with explicit dots not that
much,
>> and not more surprising than say `bquote()`)
>>
>> In my opinion the + offset the -.
>>
>> I wouldn't advise taking magrittr's pipe (providing the license
allows so)
>> for instance, because it makes a lot of design choices and has a
complex
>> behavior, what I propose is 2 lines of code very unlikely to evolve or
>> require maintenance.
>>
>> Antoine
>>
>> PS: I just receive the digest once a day so If you don't
"reply all" I can
>> only react later.
>>
>> Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at
gmail.com> a ?crit :
>>
>>> I exaggerated the comparison for effect. However, it is not very
>> difficult
>>> to find functions in dplyr or data.table or indeed other packages
that
>> one
>>> may wish to be in base R. Examples, for me, could include
>>> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo,
etc.
>> Also,
>>> the "popularity" of magrittr::`%>%` is mostly
attributable to the
>> tidyverse
>>> (an advanced superset of R). Many R users don't even know that
they are
>>> installing the magrittr package.
>>>
>>> On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at
fedoraproject.org>
>> wrote:
>>>
>>>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at
gmail.com> wrote:
>>>>>
>>>>> How is your argument different to, say,  "Should dplyr
or data.table
>> be
>>>>> part of base R as they are the most popular data science
packages and
>>>> they
>>>>> are used by a large number of users?"
>>>>
>>>> Two packages with many features, dozens of functions and under
heavy
>>>> development to fix bugs, add new features and improve
performance, vs.
>>>> a single operator with a limited and well-defined
functionality, and a
>>>> reference implementation that hasn't changed in years (but
certainly
>>>> hackish in a way that probably could only be improved from R
itself).
>>>>
>>>> Can't you really spot the difference?
>>>>
>>>> I?aki
>>>>
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Lionel Henry

2019-Oct-07 08:22 UTC

head link

[Rd] should base R have a piping operator ?

Hi Gabe,
> There is another way the pipe could go into base R that could not be
> done in package space and has the potential to mitigate some pretty
> serious downsides to the pipes relating to debugging
I assume you're thinking about the large stack trace of the magrittr
pipe? You don't need a parser transformation to solve this problem
though, the pipe could be implemented as a regular function with a
very limited impact on the stack. And if implemented as a SPECIALSXP,
it would be completely invisible. We've been planning to rewrite %>%
to fix the performance and the stack print, it's just low priority.

About the semantics of local evaluation that were proposed in this
thread, I think that wouldn't be right. A native pipe should be
consistent with other control flow constructs like `if` and `for` and
evaluate in the current environment. In that case, the `.` binding, if
any, would be restored to its original value in `on.exit()` (or through
unwind-protection if implemented in C).

Best,
Lionel

> On 6 Oct 2019, at 01:50, Gabriel Becker <gabembecker at gmail.com>
wrote:
> 
> Hi all,
> 
> I think there's some nuance here that makes makes me agree partially
with
> each "side".
> 
> The pipe is inarguably extremely popular. Many probably think of it as a
> core feature of R, along with the tidyverse that (as was pointed out)
> largely surrounds it and drives its popularity. Whether its a good or bad
> thing that they think that doesn't change the fact that by my
estimation
> that Ant is correct that they do. BUT, I don't agree with him that
that, by
> itself, is a reason to put it in base R in the form that it exists now. For
> the current form, there aren't really any major downsides that I see to
> having people just use the package version.
> 
> Sure it may be a little weird, but it doesn't ever really stop the
> people from using it or present a significant barrier. Another major point
> is that many (most?) base R functions are not necessarily tooled to be
> endomorphic, which in my personal opinion is *largely* the only place that
> the pipes are really compelling.
> 
> That was for pipes as the exist in package space, though. There is another
> way the pipe could go into base R that could not be done in package space
> and has the potential to mitigate some pretty serious downsides to the
> pipes relating to debugging, which would be to implement them in the
parser.
> 
> If
> 
> iris %>% group_by(Species) %>% summarize(mean_sl =
mean(Sepal.Length)) %>%
> filter(mean_sl > 5)
> 
> 
> were *parsed* as, for example, into
> 
> local({
>            . = group_by(iris, Species)
> 
>            ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
> 
>            filter(., mean_sl > 5)
>       })
> 
> 
> 
> 
> Then debuggiing (once you knew that) would be much easier but behavaior
> would be the same as it is now. There could even be some sort of
> step-through-pipe debugger at that point added as well for additional
> convenience.
> 
> There is some minor precedent for that type of transformative parsing:
> 
>> expr = parse(text = "5 -> x")
> 
>> expr
> 
> expression(5 -> x)
> 
>> expr[[1]]
> 
> x <- 5
> 
> 
> Though thats a much more minor transformation.
> 
> All of that said, I believe Jim Hester (cc'ed) suggested something
along
> these lines at the RSummit a couple of years ago, and thus far R-core has
> not shown much appetite for changing things in the parser.
> 
> Without that changing, I'd have to say that my vote, for whatever its
> worth, comes down on the side of pipes being fine in packages. A summary of
> my reasoning being that it only makes sense for them to go into R itself if
> doing so fixes an issue that cna't be fixed with them in package space.
> 
> Best,
> ~G
> 
> 
> 
> On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com>
wrote:
> 
>> Yes but this exageration precisely misses the point.
>> 
>> Concerning your examples:
>> 
>> * I love fread but I think it makes a lot of subjective choices that
are
>> best associated with a package. I think it
>> changed a lot with time and can still change, and we have great
developers
>> willing to maintain it and be reactive
>> regarding feature requests or bug reports
>> 
>> *.group_by() adds a class that works only (or mostly) with tidyverse
verbs,
>> that's very easy to dismiss it as an inclusion in base R.
>> 
>> * summarize is an alternative to aggregate, that would be very
confusing to
>> have both
>> 
>> Now to be fair to your argument we could think of other functions such
as
>> data.table::rleid() which I believe base R misses deeply,
>> and there is nothing wrong with packaged functions making their way to
base
>> R.
>> 
>> Maybe there's an existing list of criteria for inclusion, in base R
but if
>> not I can make one up for the sake of this discussion :) :
>> * 1) the functionality should not already exist
>> * 2) the function should be general enough
>> * 3) the function should have a large amount of potential of users
>> * 4) the function should be robust, and not require extensive
maintenance
>> * 5) the function should be stable, we shouldn't expect new
features ever 2
>> months
>> * 6) the function should have an intuitive interface in the context of
the
>> rest ot base R
>> 
>> I guess 1 and 6 could be held against my proposal, because :
>> (1) everything can be done without pipes
>> (6) They are somewhat surprising (though with explicit dots not that
much,
>> and not more surprising than say `bquote()`)
>> 
>> In my opinion the + offset the -.
>> 
>> I wouldn't advise taking magrittr's pipe (providing the license
allows so)
>> for instance, because it makes a lot of design choices and has a
complex
>> behavior, what I propose is 2 lines of code very unlikely to evolve or
>> require maintenance.
>> 
>> Antoine
>> 
>> PS: I just receive the digest once a day so If you don't
"reply all" I can
>> only react later.
>> 
>> Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at
gmail.com> a ?crit :
>> 
>>> I exaggerated the comparison for effect. However, it is not very
>> difficult
>>> to find functions in dplyr or data.table or indeed other packages
that
>> one
>>> may wish to be in base R. Examples, for me, could include
>>> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo,
etc.
>> Also,
>>> the "popularity" of magrittr::`%>%` is mostly
attributable to the
>> tidyverse
>>> (an advanced superset of R). Many R users don't even know that
they are
>>> installing the magrittr package.
>>> 
>>> On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at
fedoraproject.org>
>> wrote:
>>> 
>>>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at
gmail.com> wrote:
>>>>> 
>>>>> How is your argument different to, say,  "Should dplyr
or data.table
>> be
>>>>> part of base R as they are the most popular data science
packages and
>>>> they
>>>>> are used by a large number of users?"
>>>> 
>>>> Two packages with many features, dozens of functions and under
heavy
>>>> development to fix bugs, add new features and improve
performance, vs.
>>>> a single operator with a limited and well-defined
functionality, and a
>>>> reference implementation that hasn't changed in years (but
certainly
>>>> hackish in a way that probably could only be improved from R
itself).
>>>> 
>>>> Can't you really spot the difference?
>>>> 
>>>> I?aki
>>>> 
>>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Possibly Parallel Threads

Search for more maybe matching threads

R devel - Oct 2019 - should base R have a piping operator ?

[Rd] should base R have a piping operator ?

[Rd] should base R have a piping operator ?

[Rd] should base R have a piping operator ?

[Rd] should base R have a piping operator ?

[Rd] should base R have a piping operator ?

[Rd] should base R have a piping operator ?

Possibly Parallel Threads