thr3ads.net - llvm dev - [llvm-dev] RFC: Coroutine Optimization Passes [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Gor Nishanov via llvm-dev

2016-Jul-15 04:28 UTC

[llvm-dev] RFC: Coroutine Optimization Passes

Hi David:
>> How do you deal with basic blocks which appear to be used by multiple
parts
>> of the coroutine? We handled this in WinEHPrepare by cloning any BBs
which
>> were shared.
I experimented with several approaches, but, cloning ended up being the simplest
and most reliable. Suspend points express three different control flows that
can happen at the suspend point: a suspension (default), resumption (0) and
destruction (1).

  %0 = call i8 @llvm.coro.suspend([..])
  switch i8 %0, label %suspend [i8 0, label %resume,
                                i8 1, label %destroy]

I slap a switch that jumps to all suspend points in the function. Then I clone
the function twice (one will become resume clone, and another destroy clone).
This switch becomes the entry block in the clones. Then, I RAUW coro.suspends
with -1, 0, or 1 (in original, resume and destroy clones respectively) and let
SimplifyCFG do the rest. (This is slightly simplified explanation, but it should
 give the idea).
>> I would remove the attribute.  There are all sorts of tricks you can do
to
>> avoid scanning the function for calls to the intrinsic.  For example,
you
>> can see if a declaration of your intrinsic exists and, if so, if it has
an
>> users in the function in question (under the assumption that there are
few).
Aye-aye. Will remove the attribute.

With respect to lessening the impact of coroutine passes, one approach I tried
was to look during doInitialize whether there are any uses of coroutine
intrinsics and set a flag if there are any, or maybe build a set of functions
with coroutines intrinsics in doInitialize, so that in runOnFunction, I can just
check whether the function is in the set and skip if it is not.

Then, I scared myself silly that some optimization passes can materialize
new functions or new function bodies and I will miss them. So I stopped doing
that.

I think your approach takes care of my "materialization" concern.
(BTW, I don't
even know if that is a real concern). But may not be profitable if there are a
lot of small functions that are faster to scan for coroutine intrinsics then to
scan potentially longer list of coroutine intrinsics users.

BTW, Do you have a preference on how to restart CGSCC pipeline? One of the four
options I listed (repeated in P.S), or even some better way I did not think
about?

Thank you,
Gor

P.S.

Option 1: https://reviews.llvm.org/D21569 (no longer relevant, since we are
                                           removing AttrKind::Coroutine)
Option 2: https://reviews.llvm.org/D21570 (bool& Devirt in runSCC)
Option 3: https://reviews.llvm.org/D21572 (virtual bool restatedRequested())
Option 4: Fake devirtualized call in a function pass, so that RefreshSCC will
          detect devirtualization and restart the pipeline by itself.

Vadim Chugunov via llvm-dev

2016-Jul-15 05:48 UTC

head link

[llvm-dev] RFC: Coroutine Optimization Passes

Hi!
Sorry for jumping in late, but I have a general question (that
I perhaps should have asked during round 1):

This proposal jumps straight into the thick of implementation, but I don't
think I've seen a motivation of why coroutines need to be represented at
the LLVM IR level.   Can't this transform be performed entirely in the
front-end?

Vadim

On Thu, Jul 14, 2016 at 9:28 PM, Gor Nishanov via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi David:
>
> >> How do you deal with basic blocks which appear to be used by
multiple
> parts
> >> of the coroutine? We handled this in WinEHPrepare by cloning any
BBs
> which
> >> were shared.
>
> I experimented with several approaches, but, cloning ended up being the
> simplest
> and most reliable. Suspend points express three different control flows
> that
> can happen at the suspend point: a suspension (default), resumption (0) and
> destruction (1).
>
>   %0 = call i8 @llvm.coro.suspend([..])
>   switch i8 %0, label %suspend [i8 0, label %resume,
>                                 i8 1, label %destroy]
>
> I slap a switch that jumps to all suspend points in the function. Then I
> clone
> the function twice (one will become resume clone, and another destroy
> clone).
> This switch becomes the entry block in the clones. Then, I RAUW
> coro.suspends
> with -1, 0, or 1 (in original, resume and destroy clones respectively) and
> let
> SimplifyCFG do the rest. (This is slightly simplified explanation, but it
> should
>  give the idea).
>
> >> I would remove the attribute.  There are all sorts of tricks you
can do
> to
> >> avoid scanning the function for calls to the intrinsic.  For
example,
> you
> >> can see if a declaration of your intrinsic exists and, if so, if
it has
> an
> >> users in the function in question (under the assumption that there
are
> few).
>
> Aye-aye. Will remove the attribute.
>
> With respect to lessening the impact of coroutine passes, one approach I
> tried
> was to look during doInitialize whether there are any uses of coroutine
> intrinsics and set a flag if there are any, or maybe build a set of
> functions
> with coroutines intrinsics in doInitialize, so that in runOnFunction, I
> can just
> check whether the function is in the set and skip if it is not.
>
> Then, I scared myself silly that some optimization passes can materialize
> new functions or new function bodies and I will miss them. So I stopped
> doing
> that.
>
> I think your approach takes care of my "materialization" concern.
(BTW, I
> don't
> even know if that is a real concern). But may not be profitable if there
> are a
> lot of small functions that are faster to scan for coroutine intrinsics
> then to
> scan potentially longer list of coroutine intrinsics users.
>
> BTW, Do you have a preference on how to restart CGSCC pipeline? One of the
> four
> options I listed (repeated in P.S), or even some better way I did not think
> about?
>
> Thank you,
> Gor
>
> P.S.
>
> Option 1: https://reviews.llvm.org/D21569 (no longer relevant, since we
> are
>                                            removing AttrKind::Coroutine)
> Option 2: https://reviews.llvm.org/D21570 (bool& Devirt in runSCC)
> Option 3: https://reviews.llvm.org/D21572 (virtual bool
> restatedRequested())
> Option 4: Fake devirtualized call in a function pass, so that RefreshSCC
> will
>           detect devirtualization and restart the pipeline by itself.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160714/ff171e98/attachment.html>

Gor Nishanov via llvm-dev

2016-Jul-15 14:17 UTC

head link

[llvm-dev] RFC: Coroutine Optimization Passes

Hi Vadim
> Can't this transform be performed entirely in the front-end?
Absolutely it can. But, such a coroutine becomes unoptimizable. Once
split, we lose SSA-form, don't see the control flow clearly. The
wonderful property of the proposed approach is that a coroutine stays
intact looking like a normal function for as long as possible. We let
the optimizer to clean it up, doing constant propagation, constant
folding, CSE, dead code elimination, heap allocation elision, etc.

The frontend does not have the tools to make coroutines efficient.

Cheers,
Gor

On Thu, Jul 14, 2016 at 10:48 PM, Vadim Chugunov <vadimcn at gmail.com>
wrote:> Hi!
> Sorry for jumping in late, but I have a general question (that I perhaps
> should have asked during round 1):
>
> This proposal jumps straight into the thick of implementation, but I
don't
> think I've seen a motivation of why coroutines need to be represented
at the
> LLVM IR level.   Can't this transform be performed entirely in the
> front-end?
>
> Vadim
>
> On Thu, Jul 14, 2016 at 9:28 PM, Gor Nishanov via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi David:
>>
>> >> How do you deal with basic blocks which appear to be used by
multiple
>> >> parts
>> >> of the coroutine? We handled this in WinEHPrepare by cloning
any BBs
>> >> which
>> >> were shared.
>>
>> I experimented with several approaches, but, cloning ended up being the
>> simplest
>> and most reliable. Suspend points express three different control flows
>> that
>> can happen at the suspend point: a suspension (default), resumption (0)
>> and
>> destruction (1).
>>
>>   %0 = call i8 @llvm.coro.suspend([..])
>>   switch i8 %0, label %suspend [i8 0, label %resume,
>>                                 i8 1, label %destroy]
>>
>> I slap a switch that jumps to all suspend points in the function. Then
I
>> clone
>> the function twice (one will become resume clone, and another destroy
>> clone).
>> This switch becomes the entry block in the clones. Then, I RAUW
>> coro.suspends
>> with -1, 0, or 1 (in original, resume and destroy clones respectively)
and
>> let
>> SimplifyCFG do the rest. (This is slightly simplified explanation, but
it
>> should
>>  give the idea).
>>
>> >> I would remove the attribute.  There are all sorts of tricks
you can do
>> >> to
>> >> avoid scanning the function for calls to the intrinsic.  For
example,
>> >> you
>> >> can see if a declaration of your intrinsic exists and, if so,
if it has
>> >> an
>> >> users in the function in question (under the assumption that
there are
>> >> few).
>>
>> Aye-aye. Will remove the attribute.
>>
>> With respect to lessening the impact of coroutine passes, one approach
I
>> tried
>> was to look during doInitialize whether there are any uses of coroutine
>> intrinsics and set a flag if there are any, or maybe build a set of
>> functions
>> with coroutines intrinsics in doInitialize, so that in runOnFunction, I
>> can just
>> check whether the function is in the set and skip if it is not.
>>
>> Then, I scared myself silly that some optimization passes can
materialize
>> new functions or new function bodies and I will miss them. So I stopped
>> doing
>> that.
>>
>> I think your approach takes care of my "materialization"
concern. (BTW, I
>> don't
>> even know if that is a real concern). But may not be profitable if
there
>> are a
>> lot of small functions that are faster to scan for coroutine intrinsics
>> then to
>> scan potentially longer list of coroutine intrinsics users.
>>
>> BTW, Do you have a preference on how to restart CGSCC pipeline? One of
the
>> four
>> options I listed (repeated in P.S), or even some better way I did not
>> think
>> about?
>>
>> Thank you,
>> Gor
>>
>> P.S.
>>
>> Option 1: https://reviews.llvm.org/D21569 (no longer relevant, since we
>> are
>>                                            removing
AttrKind::Coroutine)
>> Option 2: https://reviews.llvm.org/D21570 (bool& Devirt in runSCC)
>> Option 3: https://reviews.llvm.org/D21572 (virtual bool
>> restatedRequested())
>> Option 4: Fake devirtualized call in a function pass, so that
RefreshSCC
>> will
>>           detect devirtualization and restart the pipeline by itself.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

David Majnemer via llvm-dev

2016-Jul-15 17:59 UTC

head link

[llvm-dev] RFC: Coroutine Optimization Passes

On Thu, Jul 14, 2016 at 9:28 PM, Gor Nishanov <gornishanov at gmail.com>
wrote:
> Hi David:
>
> >> How do you deal with basic blocks which appear to be used by
multiple
> parts
> >> of the coroutine? We handled this in WinEHPrepare by cloning any
BBs
> which
> >> were shared.
>
> I experimented with several approaches, but, cloning ended up being the
> simplest
> and most reliable. Suspend points express three different control flows
> that
> can happen at the suspend point: a suspension (default), resumption (0) and
> destruction (1).
>
>   %0 = call i8 @llvm.coro.suspend([..])
>   switch i8 %0, label %suspend [i8 0, label %resume,
>                                 i8 1, label %destroy]
>
> I slap a switch that jumps to all suspend points in the function. Then I
> clone
> the function twice (one will become resume clone, and another destroy
> clone).
> This switch becomes the entry block in the clones. Then, I RAUW
> coro.suspends
> with -1, 0, or 1 (in original, resume and destroy clones respectively) and
> let
> SimplifyCFG do the rest. (This is slightly simplified explanation, but it
> should
>  give the idea).
>
I like it, sounds nice and simple :)

>
> >> I would remove the attribute.  There are all sorts of tricks you
can do
> to
> >> avoid scanning the function for calls to the intrinsic.  For
example,
> you
> >> can see if a declaration of your intrinsic exists and, if so, if
it has
> an
> >> users in the function in question (under the assumption that there
are
> few).
>
> Aye-aye. Will remove the attribute.
>
> With respect to lessening the impact of coroutine passes, one approach I
> tried
> was to look during doInitialize whether there are any uses of coroutine
> intrinsics and set a flag if there are any, or maybe build a set of
> functions
> with coroutines intrinsics in doInitialize, so that in runOnFunction, I
> can just
> check whether the function is in the set and skip if it is not.
>
> Then, I scared myself silly that some optimization passes can materialize
> new functions or new function bodies and I will miss them. So I stopped
> doing
> that.
>
> I think your approach takes care of my "materialization" concern.
(BTW, I
> don't
> even know if that is a real concern).

Functions can be created from nothing in LLVM.

> But may not be profitable if there are a
> lot of small functions that are faster to scan for coroutine intrinsics
> then to
> scan potentially longer list of coroutine intrinsics users.
>
I find that unlikely but we can always benchmark it if we get concerned.
I'd use a naive approach to start out with.
If it shows up on profiles, we can optimize it.

>
> BTW, Do you have a preference on how to restart CGSCC pipeline? One of the
> four
> options I listed (repeated in P.S), or even some better way I did not think
> about?
>
I'm not an expert in that area.  I think you will want someone like
Chandler or Hal to give advice here.

>
> Thank you,
> Gor
>
> P.S.
>
> Option 1: https://reviews.llvm.org/D21569 (no longer relevant, since we
> are
>                                            removing AttrKind::Coroutine)
> Option 2: https://reviews.llvm.org/D21570 (bool& Devirt in runSCC)
> Option 3: https://reviews.llvm.org/D21572 (virtual bool
> restatedRequested())
> Option 4: Fake devirtualized call in a function pass, so that RefreshSCC
> will
>           detect devirtualization and restart the pipeline by itself.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160715/0c3941be/attachment.html>

Philip Reames via llvm-dev

2016-Jul-18 16:44 UTC

head link

[llvm-dev] RFC: Coroutine Optimization Passes

I also haven't seen this discussion.  Can you provide a pointer to the 
thread where this was discussed?

p.s. If this discussion *hasn't* happened, that would definitely be a 
blocker for any of the specific work discussed below.

Philip

On 07/14/2016 10:48 PM, Vadim Chugunov via llvm-dev
wrote:> Hi!
> Sorry for jumping in late, but I have a general question (that 
> I perhaps should have asked during round 1):
>
> This proposal jumps straight into the thick of implementation, but I 
> don't think I've seen a motivation of why coroutines need to be 
> represented at the LLVM IR level.   Can't this transform be performed 
> entirely in the front-end?
>
> Vadim
>
> On Thu, Jul 14, 2016 at 9:28 PM, Gor Nishanov via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Hi David:
>
>     >> How do you deal with basic blocks which appear to be used by
>     multiple parts
>     >> of the coroutine? We handled this in WinEHPrepare by cloning
>     any BBs which
>     >> were shared.
>
>     I experimented with several approaches, but, cloning ended up
>     being the simplest
>     and most reliable. Suspend points express three different control
>     flows that
>     can happen at the suspend point: a suspension (default),
>     resumption (0) and
>     destruction (1).
>
>       %0 = call i8 @llvm.coro.suspend([..])
>       switch i8 %0, label %suspend [i8 0, label %resume,
>                                     i8 1, label %destroy]
>
>     I slap a switch that jumps to all suspend points in the function.
>     Then I clone
>     the function twice (one will become resume clone, and another
>     destroy clone).
>     This switch becomes the entry block in the clones. Then, I RAUW
>     coro.suspends
>     with -1, 0, or 1 (in original, resume and destroy clones
>     respectively) and let
>     SimplifyCFG do the rest. (This is slightly simplified explanation,
>     but it should
>      give the idea).
>
>     >> I would remove the attribute.  There are all sorts of tricks
>     you can do to
>     >> avoid scanning the function for calls to the intrinsic.  For
>     example, you
>     >> can see if a declaration of your intrinsic exists and, if so,
>     if it has an
>     >> users in the function in question (under the assumption that
>     there are few).
>
>     Aye-aye. Will remove the attribute.
>
>     With respect to lessening the impact of coroutine passes, one
>     approach I tried
>     was to look during doInitialize whether there are any uses of
>     coroutine
>     intrinsics and set a flag if there are any, or maybe build a set
>     of functions
>     with coroutines intrinsics in doInitialize, so that in
>     runOnFunction, I can just
>     check whether the function is in the set and skip if it is not.
>
>     Then, I scared myself silly that some optimization passes can
>     materialize
>     new functions or new function bodies and I will miss them. So I
>     stopped doing
>     that.
>
>     I think your approach takes care of my "materialization"
concern.
>     (BTW, I don't
>     even know if that is a real concern). But may not be profitable if
>     there are a
>     lot of small functions that are faster to scan for coroutine
>     intrinsics then to
>     scan potentially longer list of coroutine intrinsics users.
>
>     BTW, Do you have a preference on how to restart CGSCC pipeline?
>     One of the four
>     options I listed (repeated in P.S), or even some better way I did
>     not think
>     about?
>
>     Thank you,
>     Gor
>
>     P.S.
>
>     Option 1: https://reviews.llvm.org/D21569 (no longer relevant,
>     since we are
>                                                removing
>     AttrKind::Coroutine)
>     Option 2: https://reviews.llvm.org/D21570 (bool& Devirt in runSCC)
>     Option 3: https://reviews.llvm.org/D21572 (virtual bool
>     restatedRequested())
>     Option 4: Fake devirtualized call in a function pass, so that
>     RefreshSCC will
>               detect devirtualization and restart the pipeline by itself.
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160718/932bc991/attachment-0001.html>

llvm dev - Jul 2016 - RFC: Coroutine Optimization Passes

[llvm-dev] RFC: Coroutine Optimization Passes

[llvm-dev] RFC: Coroutine Optimization Passes

[llvm-dev] RFC: Coroutine Optimization Passes

[llvm-dev] RFC: Coroutine Optimization Passes

[llvm-dev] RFC: Coroutine Optimization Passes