On 09/12/2020 3:45 p.m., Timothy Goodman wrote:> Regarding special treatment for |>, isn't it getting special
treatment
> anyway, because it's implemented as a syntax transformation from x
|>
> f(y) to f(x, y), rather than as an operator?
That's different. Currently |> is parsed just like any other binary
operator, it's the code emitted after parsing that is different from
most other cases. I think your suggestion would need changes in the
parsing itself.
It's a few years since I worked with Bison (the parser generator that R
uses), but I recall that handling inconsistencies was always tricky.
> That said, the point about wanting a block of code submitted
> line-by-line to work the same as a block of code submitted all at once
> is a fair one.? Maybe the better solution would be if there were a way
> to say "Submit the selected code as a single expression,?ignoring
> line-breaks".
The way to do that is to replace some of the line breaks with
semicolons, which act as statement separators. The tricky bit is to
figure out which ones to replace. So if your block is
x +
y
z
you'd glue it together as "x + y; z". RStudio appears to know
enough
about R parsing to do that, and presumably if it was allowed to look at
the start of the next line could handle things like
x
|> f()
z
and rewrite them as "x |> f(); z". It would mess up debugging a
little
(z is now on line 1, not line 3), but maybe it could undo the
transformation if R told it there was a problem at line 1, column 11.
Then I could run any number of lines with pipes at the> start and no special character at the end, and have it treated as a
> single pipeline.? I suppose that'd need to be a feature offered by the
> environment (RStudio's RNotebooks in my case).? I could wrap my
> pipelines in parentheses (to make the "pipes at start of line"
syntax
> valid R code), and then could use the hypothetical "submit selected
code
> ignoring line-breaks" feature when running just the first part of the
> pipeline -- i.e., selecting full lines, but starting after the opening
> paren so as not to need to insert a closing paren.
I think I don't understand your workflow enough to comment on this.
Duncan
>
> - Tim
>
> On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <murdoch.duncan at
gmail.com
> <mailto:murdoch.duncan at gmail.com>> wrote:
>
> On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
> > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to
> execute the
> > command in the Notebook environment I'm using) I certainly
*would*
> > expect R to treat it as a complete statement.
> >
> > But what I'm talking about is a different case, where I
highlight a
> > multi-line statement in my notebook:
> >
> >? ? ? my_data_frame1
> >? ? ? ? ? |> filter(some_conditions_1)
> >
> > and then press Ctrl+Enter.
>
> I don't think I'd like it if parsing changed between passing
one
> line at
> a time and passing a block of lines.? I'd like to be able to
> highlight a
> few lines and pass those, then type one, then highlight some more and
> pass those:? and have it act as though I just passed the whole combined
> block, or typed everything one line at a time.
>
>
> ?? Or, I suppose the equivalent would be to run
> > an R script containing those two lines of code, or to run a
> multi-line
> > statement like that from the console (which in RStudio I can do
by
> > pressing Shift+Enter between the lines.)
> >
> > In those cases, R could either (1) Give an error message [the
> current
> > behavior], or (2) understand that the first line is meant to be
> piped to
> > the second.? The second option would be significantly more
> useful, and
> > is almost certainly what the user intended.
> >
> > (For what it's worth, there are some languages, such as
> Javascript, that
> > consider the first token of the next line when determining if the
> > previous line was complete.? JavaScript's rules around this
are
> overly
> > complicated, but a rule like "a pipe following a line break
is
> treated
> > as continuing the previous line" would be much simpler.? And
> while it
> > might be objectionable to treat the operator %>% different
from
> other
> > operators, the addition of |>, which isn't truly an
operator at all,
> > seems like the right time to consider it.)
>
> I think this would be hard to implement with R's current parser,
but
> possible.? I think it could be done by distinguishing between EOL
> markers within a block of text and "end of block" marks.? If
it applied
> only to the |> operator it would be *really* ugly.
>
> My strongest objection to it is the one at the top, though.? If I
> have a
> block of lines sitting in my editor that I just finished executing,
> with
> the cursor pointing at the next line, I'd like to know that it
didn't
> matter whether the lines were passed one at a time, as a block, or some
> combination of those.
>
> Duncan Murdoch
>
> >
> > -Tim
> >
> > On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch
> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at
gmail.com>
> > <mailto:murdoch.duncan at gmail.com
> <mailto:murdoch.duncan at gmail.com>>> wrote:
> >
> >? ? ?The requirement for operators at the end of the line comes
> from the
> >? ? ?interactive nature of R.? If you type
> >
> >? ? ? ? ? ?my_data_frame_1
> >
> >? ? ?how could R know that you are not done, and are planning to
> type the
> >? ? ?rest of the expression
> >
> >? ? ? ? ? ? ?%>% filter(some_conditions_1)
> >? ? ? ? ? ? ?...
> >
> >? ? ?before it should consider the expression complete?? The way
> languages
> >? ? ?like C do this is by requiring a statement terminator at the
> end.? You
> >? ? ?can also do it by wrapping the entire thing in parentheses
().
> >
> >? ? ?However, be careful: Don't use braces:? they don't
work.? And
> parens
> >? ? ?have the side effect of removing invisibility from the result
> (which is
> >? ? ?a design flaw or bonus, depending on your point of view).? So
I
> >? ? ?actually
> >? ? ?wouldn't advise this workaround.
> >
> >? ? ?Duncan Murdoch
> >
> >
> >? ? ?On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> >? ? ? > Hi,
> >? ? ? >
> >? ? ? > I'm a data scientist who routinely uses R in my
day-to-day
> work,
> >? ? ?for tasks
> >? ? ? > such as cleaning and transforming data, exploratory
data
> >? ? ?analysis, etc.
> >? ? ? > This includes frequent use of the pipe operator from
the
> magrittr
> >? ? ?and dplyr
> >? ? ? > libraries, %>%.? So, I was pleased to hear about the
> recent work on a
> >? ? ? > native pipe operator, |>.
> >? ? ? >
> >? ? ? > This seems like a good time to bring up the main pain
point I
> >? ? ?encounter
> >? ? ? > when using pipes in R, and some suggestions on what
could
> be done
> >? ? ?about
> >? ? ? > it.? The issue is that the pipe operator can't be
placed
> at the
> >? ? ?start of a
> >? ? ? > line of code (except in parentheses).? That's no
different
> than
> >? ? ?any binary
> >? ? ? > operator in R, but I find it's a source of
difficulty for the
> >? ? ?pipe because
> >? ? ? > of how pipes are often used.
> >? ? ? >
> >? ? ? > [I'm assuming here that my usage is fairly typical
of a lot of
> >? ? ?users; at
> >? ? ? > any rate, I don't think I'm *too* unusual.]
> >? ? ? >
> >? ? ? > === Why this is a problem ==> >? ? ? >
> >? ? ? > It's very common (for me, and I suspect for many
users of
> dplyr)
> >? ? ?to write
> >? ? ? > multi-step pipelines and put each step on its own line
for
> >? ? ?readability.
> >? ? ? > Something like this:
> >? ? ? >
> >? ? ? >? ? ### Example 1 ###
> >? ? ? >? ? my_data_frame_1 %>%
> >? ? ? >? ? ? filter(some_conditions_1) %>%
> >? ? ? >? ? ? inner_join(my_data_frame_2, by = some_columns_1)
%>%
> >? ? ? >? ? ? group_by(some_columns_2) %>%
> >? ? ? >? ? ? summarize(some_aggregate_functions_1) %>%
> >? ? ? >? ? ? filter(some_conditions_2) %>%
> >? ? ? >? ? ? left_join(my_data_frame_3, by = some_columns_3)
%>%
> >? ? ? >? ? ? group_by(some_columns_4) %>%
> >? ? ? >? ? ? summarize(some_aggregate_functions_2) %>%
> >? ? ? >? ? ? arrange(some_columns_5)
> >? ? ? >
> >? ? ? > [I guess some might consider this an overly long
pipeline;
> for me
> >? ? ?it's
> >? ? ? > pretty typical.? I *could* split it up by assigning
> intermediate
> >? ? ?results to
> >? ? ? > variables, but much of the value I get from the pipe is
> that it
> >? ? ?lets my
> >? ? ? > code communicate which results are temporary, and which
> will be
> >? ? ?used again
> >? ? ? > later.? Assigning variables for single-use results
would
> remove that
> >? ? ? > expressiveness.]
> >? ? ? >
> >? ? ? > I would prefer (for reasons I'll explain) to be
able to
> write the
> >? ? ?above
> >? ? ? > example like this, which isn't valid R:
> >? ? ? >
> >? ? ? >? ? ### Example 2 (not valid R) ###
> >? ? ? >? ? my_data_frame_1
> >? ? ? >? ? ? %>% filter(some_conditions_1)
> >? ? ? >? ? ? %>% inner_join(my_data_frame_2, by =
some_columns_1)
> >? ? ? >? ? ? %>% group_by(some_columns_2)
> >? ? ? >? ? ? %>% summarize(some_aggregate_functions_1)
> >? ? ? >? ? ? %>% filter(some_conditions_2)
> >? ? ? >? ? ? %>% left_join(my_data_frame_3, by =
some_columns_3)
> >? ? ? >? ? ? %>% group_by(some_columns_4)
> >? ? ? >? ? ? %>% summarize(some_aggregate_functions_2)
> >? ? ? >? ? ? %>% arrange(some_columns_5)
> >? ? ? >
> >? ? ? > One (minor) advantage is obvious: It lets you easily
line
> up the
> >? ? ?pipes,
> >? ? ? > which means that you can see at a glance that the whole
> block is
> >? ? ?a single
> >? ? ? > pipeline, and you'd immediately notice if you
inadvertently
> >? ? ?omitted a pipe,
> >? ? ? > which otherwise can lead to confusing output.?
[It's also
> >? ? ?aesthetically
> >? ? ? > pleasing, especially when %>% is replaced with
|>, but that's
> >? ? ?subjective.]
> >? ? ? >
> >? ? ? > But the bigger issue happens when I want to re-run just
> *part* of the
> >? ? ? > pipeline.? I do this often when debugging: if the
output
> of the
> >? ? ?pipeline
> >? ? ? > seems wrong, I re-run the first few steps and check the
> output, then
> >? ? ? > include a little more and re-run again, etc., until I
> locate my
> >? ? ?mistake.
> >? ? ? > Working in an interactive notebook environment, this
involves
> >? ? ?using the
> >? ? ? > cursor to select just the part of the code I want to
re-run.
> >? ? ? >
> >? ? ? > It's fast and easy to select *entire* lines of
code, but
> >? ? ?unfortunately with
> >? ? ? > the pipes placed at the end of the line I must instead
select
> >? ? ?everything
> >? ? ? > *except* the last three characters of the line (the
last two
> >? ? ?characters for
> >? ? ? > the new pipe).? Then when I want to re-run the same
partial
> >? ? ?pipeline with
> >? ? ? > the next line of code included, I can't just press
> SHIFT+Down to
> >? ? ?select it
> >? ? ? > as I otherwise would, but instead must move the cursor
> >? ? ?horizontally to a
> >? ? ? > position three characters before the end of *that* line
> (which is
> >? ? ?generally
> >? ? ? > different due to varying line lengths).? And so forth
each
> time I
> >? ? ?want to
> >? ? ? > include an additional line.
> >? ? ? >
> >? ? ? > Moreover, with the staggered positions of the pipes at
the
> end of
> >? ? ?each
> >? ? ? > line, it's very easy to accidentally select the
final pipe
> on a
> >? ? ?line, and
> >? ? ? > then sit there for a moment wondering if the
environment
> has stopped
> >? ? ? > responding before realizing it's just waiting for
further
> input
> >? ? ?(i.e., for
> >? ? ? > the right-hand side).? These small delays and
disruptions
> add up
> >? ? ?over the
> >? ? ? > course of a day.
> >? ? ? >
> >? ? ? > This desire to select and re-run the first part of a
> pipeline is
> >? ? ?also the
> >? ? ? > reason why it doesn't suffice to achieve syntax
like my
> "Example
> >? ? ?2" by
> >? ? ? > wrapping the entire pipeline in parentheses.?
That's of no
> use if
> >? ? ?I want to
> >? ? ? > re-run a selection that doesn't include the final
close-paren.
> >? ? ? >
> >? ? ? > === Possible Solutions ==> >? ? ? >
> >? ? ? > I can think of two, but maybe there are others.? The
first
> would make
> >? ? ? > "Example 2" into valid code, and the second
would allow
> you to run a
> >? ? ? > selection that included a trailing pipe.
> >? ? ? >
> >? ? ? >? ? Solution 1: Add a special case to how R is parsed,
so
> if the first
> >? ? ? > (non-whitespace) token after an end-line is a pipe,
that pipe
> >? ? ?gets moved to
> >? ? ? > before the end-line.
> >? ? ? >? ? ? - Argument for: This lets you write code like
example
> 2, which
> >? ? ? > addresses the pain point around re-running part of a
pipeline,
> >? ? ?and has
> >? ? ? > advantages for readability.? Also, since starting a
line
> with a pipe
> >? ? ? > operator is currently invalid, the change wouldn't
break any
> >? ? ?working code.
> >? ? ? >? ? ? - Argument against: It would make the behavior of
%>%
> >? ? ?inconsistent with
> >? ? ? > that of other binary operators in R.? (However, this
objection
> >? ? ?might not
> >? ? ? > apply to the new pipe, |>, which I understand is
being
> >? ? ?implemented as a
> >? ? ? > syntax transformation rather than a binary operator.)
> >? ? ? >
> >? ? ? >? ? Solution 2: Ignore the pipe operator if it occurs as
> the final
> >? ? ?token of
> >? ? ? > the code being executed.
> >? ? ? >? ? ? - Argument for: This would mean the user could
select and
> >? ? ?re-run the
> >? ? ? > first few lines of a longer pipeline (selecting
*entire*
> lines),
> >? ? ?avoiding
> >? ? ? > the difficulties described above.
> >? ? ? >? ? ? - Argument against: This means that %>% would
be
> valid even
> >? ? ?if it
> >? ? ? > occurred without a right-hand side, which is
inconsistent
> with other
> >? ? ? > operators in R.? (But, as above, this objection might
not
> apply
> >? ? ?to |>.)
> >? ? ? > Also, this solution still doesn't enable the syntax
of
> "Example
> >? ? ?2", with
> >? ? ? > its readability benefit.
> >? ? ? >
> >? ? ? > Thanks for reading this and considering it.
> >? ? ? >
> >? ? ? > - Tim Goodman
> >? ? ? >
> >? ? ? >? ? ? ?[[alternative HTML version deleted]]
> >? ? ? >
> >? ? ? > ______________________________________________
> >? ? ? > R-devel at r-project.org <mailto:R-devel at
r-project.org>
> <mailto:R-devel at r-project.org <mailto:R-devel at
r-project.org>>
> mailing list
> >? ? ? > https://stat.ethz.ch/mailman/listinfo/r-devel
> <https://stat.ethz.ch/mailman/listinfo/r-devel>
> >? ? ?<https://stat.ethz.ch/mailman/listinfo/r-devel
> <https://stat.ethz.ch/mailman/listinfo/r-devel>>
> >? ? ? >
> >
>