Regarding special treatment for |>, isn't it getting special treatment
anyway, because it's implemented as a syntax transformation from x |>
f(y)
to f(x, y), rather than as an operator?
That said, the point about wanting a block of code submitted line-by-line
to work the same as a block of code submitted all at once is a fair one.
Maybe the better solution would be if there were a way to say "Submit the
selected code as a single expression, ignoring line-breaks". Then I could
run any number of lines with pipes at the start and no special character at
the end, and have it treated as a single pipeline. I suppose that'd need
to be a feature offered by the environment (RStudio's RNotebooks in my
case). I could wrap my pipelines in parentheses (to make the "pipes at
start of line" syntax valid R code), and then could use the hypothetical
"submit selected code ignoring line-breaks" feature when running just
the
first part of the pipeline -- i.e., selecting full lines, but starting
after the opening paren so as not to need to insert a closing paren.
- Tim
On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <murdoch.duncan at
gmail.com>
wrote:
> On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
> > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute
the
> > command in the Notebook environment I'm using) I certainly *would*
> > expect R to treat it as a complete statement.
> >
> > But what I'm talking about is a different case, where I highlight
a
> > multi-line statement in my notebook:
> >
> > my_data_frame1
> > |> filter(some_conditions_1)
> >
> > and then press Ctrl+Enter.
>
> I don't think I'd like it if parsing changed between passing one
line at
> a time and passing a block of lines. I'd like to be able to highlight
a
> few lines and pass those, then type one, then highlight some more and
> pass those: and have it act as though I just passed the whole combined
> block, or typed everything one line at a time.
>
>
> Or, I suppose the equivalent would be to run
> > an R script containing those two lines of code, or to run a multi-line
> > statement like that from the console (which in RStudio I can do by
> > pressing Shift+Enter between the lines.)
> >
> > In those cases, R could either (1) Give an error message [the current
> > behavior], or (2) understand that the first line is meant to be piped
to
> > the second. The second option would be significantly more useful, and
> > is almost certainly what the user intended.
> >
> > (For what it's worth, there are some languages, such as
Javascript, that
> > consider the first token of the next line when determining if the
> > previous line was complete. JavaScript's rules around this are
overly
> > complicated, but a rule like "a pipe following a line break is
treated
> > as continuing the previous line" would be much simpler. And
while it
> > might be objectionable to treat the operator %>% different from
other
> > operators, the addition of |>, which isn't truly an operator at
all,
> > seems like the right time to consider it.)
>
> I think this would be hard to implement with R's current parser, but
> possible. I think it could be done by distinguishing between EOL
> markers within a block of text and "end of block" marks. If it
applied
> only to the |> operator it would be *really* ugly.
>
> My strongest objection to it is the one at the top, though. If I have a
> block of lines sitting in my editor that I just finished executing, with
> the cursor pointing at the next line, I'd like to know that it
didn't
> matter whether the lines were passed one at a time, as a block, or some
> combination of those.
>
> Duncan Murdoch
>
> >
> > -Tim
> >
> > On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <murdoch.duncan at
gmail.com
> > <mailto:murdoch.duncan at gmail.com>> wrote:
> >
> > The requirement for operators at the end of the line comes from
the
> > interactive nature of R. If you type
> >
> > my_data_frame_1
> >
> > how could R know that you are not done, and are planning to type
the
> > rest of the expression
> >
> > %>% filter(some_conditions_1)
> > ...
> >
> > before it should consider the expression complete? The way
languages
> > like C do this is by requiring a statement terminator at the end.
> You
> > can also do it by wrapping the entire thing in parentheses ().
> >
> > However, be careful: Don't use braces: they don't work.
And parens
> > have the side effect of removing invisibility from the result
(which
> is
> > a design flaw or bonus, depending on your point of view). So I
> > actually
> > wouldn't advise this workaround.
> >
> > Duncan Murdoch
> >
> >
> > On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> > > Hi,
> > >
> > > I'm a data scientist who routinely uses R in my
day-to-day work,
> > for tasks
> > > such as cleaning and transforming data, exploratory data
> > analysis, etc.
> > > This includes frequent use of the pipe operator from the
magrittr
> > and dplyr
> > > libraries, %>%. So, I was pleased to hear about the
recent work
> on a
> > > native pipe operator, |>.
> > >
> > > This seems like a good time to bring up the main pain point
I
> > encounter
> > > when using pipes in R, and some suggestions on what could be
done
> > about
> > > it. The issue is that the pipe operator can't be placed
at the
> > start of a
> > > line of code (except in parentheses). That's no
different than
> > any binary
> > > operator in R, but I find it's a source of difficulty
for the
> > pipe because
> > > of how pipes are often used.
> > >
> > > [I'm assuming here that my usage is fairly typical of a
lot of
> > users; at
> > > any rate, I don't think I'm *too* unusual.]
> > >
> > > === Why this is a problem ==> > >
> > > It's very common (for me, and I suspect for many users
of dplyr)
> > to write
> > > multi-step pipelines and put each step on its own line for
> > readability.
> > > Something like this:
> > >
> > > ### Example 1 ###
> > > my_data_frame_1 %>%
> > > filter(some_conditions_1) %>%
> > > inner_join(my_data_frame_2, by = some_columns_1) %>%
> > > group_by(some_columns_2) %>%
> > > summarize(some_aggregate_functions_1) %>%
> > > filter(some_conditions_2) %>%
> > > left_join(my_data_frame_3, by = some_columns_3) %>%
> > > group_by(some_columns_4) %>%
> > > summarize(some_aggregate_functions_2) %>%
> > > arrange(some_columns_5)
> > >
> > > [I guess some might consider this an overly long pipeline;
for me
> > it's
> > > pretty typical. I *could* split it up by assigning
intermediate
> > results to
> > > variables, but much of the value I get from the pipe is that
it
> > lets my
> > > code communicate which results are temporary, and which will
be
> > used again
> > > later. Assigning variables for single-use results would
remove
> that
> > > expressiveness.]
> > >
> > > I would prefer (for reasons I'll explain) to be able to
write the
> > above
> > > example like this, which isn't valid R:
> > >
> > > ### Example 2 (not valid R) ###
> > > my_data_frame_1
> > > %>% filter(some_conditions_1)
> > > %>% inner_join(my_data_frame_2, by = some_columns_1)
> > > %>% group_by(some_columns_2)
> > > %>% summarize(some_aggregate_functions_1)
> > > %>% filter(some_conditions_2)
> > > %>% left_join(my_data_frame_3, by = some_columns_3)
> > > %>% group_by(some_columns_4)
> > > %>% summarize(some_aggregate_functions_2)
> > > %>% arrange(some_columns_5)
> > >
> > > One (minor) advantage is obvious: It lets you easily line up
the
> > pipes,
> > > which means that you can see at a glance that the whole
block is
> > a single
> > > pipeline, and you'd immediately notice if you
inadvertently
> > omitted a pipe,
> > > which otherwise can lead to confusing output. [It's
also
> > aesthetically
> > > pleasing, especially when %>% is replaced with |>, but
that's
> > subjective.]
> > >
> > > But the bigger issue happens when I want to re-run just
*part* of
> the
> > > pipeline. I do this often when debugging: if the output of
the
> > pipeline
> > > seems wrong, I re-run the first few steps and check the
output,
> then
> > > include a little more and re-run again, etc., until I locate
my
> > mistake.
> > > Working in an interactive notebook environment, this
involves
> > using the
> > > cursor to select just the part of the code I want to re-run.
> > >
> > > It's fast and easy to select *entire* lines of code, but
> > unfortunately with
> > > the pipes placed at the end of the line I must instead
select
> > everything
> > > *except* the last three characters of the line (the last two
> > characters for
> > > the new pipe). Then when I want to re-run the same partial
> > pipeline with
> > > the next line of code included, I can't just press
SHIFT+Down to
> > select it
> > > as I otherwise would, but instead must move the cursor
> > horizontally to a
> > > position three characters before the end of *that* line
(which is
> > generally
> > > different due to varying line lengths). And so forth each
time I
> > want to
> > > include an additional line.
> > >
> > > Moreover, with the staggered positions of the pipes at the
end of
> > each
> > > line, it's very easy to accidentally select the final
pipe on a
> > line, and
> > > then sit there for a moment wondering if the environment has
> stopped
> > > responding before realizing it's just waiting for
further input
> > (i.e., for
> > > the right-hand side). These small delays and disruptions
add up
> > over the
> > > course of a day.
> > >
> > > This desire to select and re-run the first part of a
pipeline is
> > also the
> > > reason why it doesn't suffice to achieve syntax like my
"Example
> > 2" by
> > > wrapping the entire pipeline in parentheses. That's of
no use if
> > I want to
> > > re-run a selection that doesn't include the final
close-paren.
> > >
> > > === Possible Solutions ==> > >
> > > I can think of two, but maybe there are others. The first
would
> make
> > > "Example 2" into valid code, and the second would
allow you to
> run a
> > > selection that included a trailing pipe.
> > >
> > > Solution 1: Add a special case to how R is parsed, so if
the
> first
> > > (non-whitespace) token after an end-line is a pipe, that
pipe
> > gets moved to
> > > before the end-line.
> > > - Argument for: This lets you write code like example
2,
> which
> > > addresses the pain point around re-running part of a
pipeline,
> > and has
> > > advantages for readability. Also, since starting a line
with a
> pipe
> > > operator is currently invalid, the change wouldn't break
any
> > working code.
> > > - Argument against: It would make the behavior of
%>%
> > inconsistent with
> > > that of other binary operators in R. (However, this
objection
> > might not
> > > apply to the new pipe, |>, which I understand is being
> > implemented as a
> > > syntax transformation rather than a binary operator.)
> > >
> > > Solution 2: Ignore the pipe operator if it occurs as the
final
> > token of
> > > the code being executed.
> > > - Argument for: This would mean the user could select
and
> > re-run the
> > > first few lines of a longer pipeline (selecting *entire*
lines),
> > avoiding
> > > the difficulties described above.
> > > - Argument against: This means that %>% would be
valid even
> > if it
> > > occurred without a right-hand side, which is inconsistent
with
> other
> > > operators in R. (But, as above, this objection might not
apply
> > to |>.)
> > > Also, this solution still doesn't enable the syntax of
"Example
> > 2", with
> > > its readability benefit.
> > >
> > > Thanks for reading this and considering it.
> > >
> > > - Tim Goodman
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-devel at r-project.org <mailto:R-devel at
r-project.org> mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > <https://stat.ethz.ch/mailman/listinfo/r-devel>
> > >
> >
>
>
[[alternative HTML version deleted]]