thr3ads.net - llvm dev - [llvm-dev] Fwd: [RFC] LLVM Coroutines [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Gor Nishanov via llvm-dev

2016-Jun-09 05:57 UTC

[llvm-dev] Fwd: [RFC] LLVM Coroutines

Hi all:

Below is a proposal to add experimental coroutine support to LLVM. Though this
proposal is motivated primarily by the desire to support C++ Coroutines [1],
the llvm representation is language neutral and can be used to support
coroutines in other languages as well.

Clang + llvm coroutines allows you to take this code:

  generator<int> range(int from, int to) {
     for(int i = from; i < to; ++n)
        co_yield i;
  }
  int main() {
     int sum = 0;
     for (auto v: range(1,100))
        sum += v;
     return sum;
  }

And translate it down to this:

  define i32 @main() #5 {
  entry:
    ret i32 4950
  }

I prototyped llvm changes to support this proposal and extended clang coroutine
implementation [2] to emit proposed intrinsics to smoke test the proposed
design and to get simple generator and async task style coroutines working.
See "More Attention Required" in the doc/Coroutines.rst for details on
what
addition work is required beyond cleanup and bug fixes.

I would like to start the discussion on the overall design, representation and
a roadmap to start upstreaming the changes with the intention to continue the
development in-tree incrementally.

Roadmap:
--------
1) Get agreement on coroutine representation and overall direction (this mail).
  .. repeat 1) until happy
2) Get agreement on how coroutine transformation passes integrate into the
   optimizer pipeline.
  .. repeat 2) until happy
3) update IR/Intrinsics.td + doc/Coroutines.rst + doc/index.rst
4) trickle in coroutine transformation passes + tests in digestible chunks.
5) get clang changes in (sometime after step 3).
6) fix bugs / remove limitations / add functionality to coroutine passes
 .. repeat 6) until happy

Background
=========Coroutine representation is the result of several offline discussions
with Richard Smith,
Reid Kleckner, and David Majnemer. All of the good came from the
collective effort.
All the yuckiness was sneaked in by me after those discussions.

I'll start with a quick overview.

LLVM coroutines are functions that have one or more `suspend points`.
When a suspend point is reached, the execution of a coroutine is suspended and
control is returned back to its caller. A suspended coroutine can be resumed
to continue execution from the last suspend point or be destroyed.

In the following example, function `f` (which may or may not be a
coroutine itself)
returns a handle to a suspended coroutine (**coroutine handle**) that is used
by `main` to resume the coroutine twice and then destroy it:

.. code-block:: llvm

  define i32 @main() {
  entry:
    %hdl = call i8* @f(i32 4)
    call void @llvm.experimental.coro.resume(i8* %hdl)
    call void @llvm.experimental.coro.resume(i8* %hdl)
    call void @llvm.experimental.coro.destroy(i8* %hdl)
    ret i32 0
  }

In addition to the function stack frame which exists when a coroutine
is executing,
there is an additional region of storage that contains values that keep the
coroutine state when a coroutine is suspended. This region of storage
is called **coroutine frame**. It is created when a coroutine is called. It is
destroyed when a coroutine runs to completion or destroyed by a call to
the `coro.destroy`_ intrinsic. Unlike stackful coroutines [3] which maintains a
stack per coroutine, an llvm coroutine frame only contains the values that need
to persist across suspend points (there is a path from a use to the definition
that crosses a suspend point).

Overall idea is that a coroutine is presented to an llvm as an ordinary function
with suspension points marked up with intrinsics. We let the optimizer party
on the coroutine for awhile. Shortly before the coroutine is eligible to be
inlined into its callers, we outline parts of the coroutine that correspond to
the code that needs to get executed after the coroutine is resumed or destroyed.
The coroutine resumption intrinsics get replaced with direct calls to those
outlined functions where possible. Then inliner can inline much leaner and nicer
coroutine or any of the outlined parts as desired.

If we discover that the lifetime of a coroutine is fully enclosed in
the lifetime
of the caller, we remove dynamic allocation for coroutine frame and replace it
with an `alloca` on the caller's frame.

Please see doc/Coroutines.rst for more details:

doc/Coroutines.rst: http://reviews.llvm.org/D21170
IR/Intrinsics.td:   http://reviews.llvm.org/D21169

Concerns:
========(This section assumes that you looked at doc/Coroutins.rst and
IR/Intrinsics.td)

Invoke like intrinsics via call + conditional branch:
-----------------------------------------------------

The `llvm.experimental.coro.fork` and `llvm.experimental.coro.suspend`
intrinsics model behavior somewhat similar to the invoke instruction. Namely,
they are used to show a normal continuation branch and an alternative branch.

They are used in the following pattern:

    %where.to = call i1 @llvm.experimental.coro.XXXX()
    br i1 %where.to, label %normal, label %alternative

My concern is that this is somewhat fragile. After optimizations some code can
sneak in between the intrinsic and the branch and cause trouble. I am looking
for suggestions on how to make it more robust. One thought I had was to put some
intrinsics at the beginning of the %normal and %alternative blocks so as to
prevent code motion above them. Something like:

  ...
    %where.to = call i1 @llvm.experimental.coro.XXXX()
    br i1 %where.to, label %normal, label %alternative
  normal:
   <phi>
   call i1 @llvm.experimental.coro.normal();
  ...
  alternative:
   <phi>
   call i1 @llvm.experimental.coro.alt();
  ...

After coro.xxx is lowered, coro.normal and coro.alt are removed.

Looking up outlined coroutine parts
-----------------------------------

Last parameter of the `llvm.experimental.coro.init" intrinsic is a pointer
to
a coroutine function. To get to the outlined parts, I get the name of that
function and glue ".resume", ".destroy" or
".cleanup" suffixes to it and then
look them up by name.

Probably a better alternative is for a coroutine f to create a constant
containing an array of pointers to f.resume, f.destroy and other parts of the
coroutine and use the last parameter to refer to that constant.

Other questions
---------------

I have more questions about the best way of plugging in coroutine optimization
passes into the PassManager, but, I will defer those for later discussion.

All the feedback and comments is greatly appreciated!!!

Thank you,
Gor

References:
==========
[1] http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0057r4.pdf
[2] http://llvmweekly.org/issue/95 (Coroutine support added to clang)
[3] http://www.boost.org/doc/libs/1_61_0/libs/coroutine/doc/html/index.html

Review Links:
============
doc/Coroutines.rst: http://reviews.llvm.org/D21170
IR/Intrinsics.td:   http://reviews.llvm.org/D21169

David Chisnall via llvm-dev

2016-Jun-09 08:25 UTC

head link

[llvm-dev] [RFC] LLVM Coroutines

On 9 Jun 2016, at 06:57, Gor Nishanov via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> Though this
> proposal is motivated primarily by the desire to support C++ Coroutines
[1],
> the llvm representation is language neutral and can be used to support
> coroutines in other languages as well.
I’m sorry if this is off-topic, but has anyone in WG21 yet come up with a sane
proposal about how coroutines interact with thread-local storage?  In
LLVM-compiled code, it is possible that the address of a thread-local variable
will be cached to avoid repeated lookups.  In an environment that contains both
coroutines and TLS, it is possible that the coroutine will be migrated to a
different thread between suspension and resumption.  Does this mean that, for
correctness, we must regard these suspension points as barriers that prevent
loads or stores of thread-local variables and prevent caching of loads of
addresses of thread-local variables?  Do we have to treat every function call as
a suspension point, as no function can know whether it is executing in a
coroutine or whether a function that it calls may trigger a thread migration? 
This sounds as if it would cause noticeable performance regressions in code that
makes heavy use of thread-local variables.

David

Eli Friedman via llvm-dev

2016-Jun-09 08:33 UTC

head link

[llvm-dev] Fwd: [RFC] LLVM Coroutines

On Wed, Jun 8, 2016 at 10:57 PM, Gor Nishanov via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all:
>
> Below is a proposal to add experimental coroutine support to LLVM. Though
> this
> proposal is motivated primarily by the desire to support C++ Coroutines
> [1],
> the llvm representation is language neutral and can be used to support
> coroutines in other languages as well.
>
I looked over the proposal... a few comments:

1. This adds a lot of intrinsics, and they have weird semantics.  This is
bad in general; the more weirdness you add to the IR, the more you'll
struggle to make the optimizer actually understand what you're doing.  And
magic pattern matching *will* break.  You've already pointed out the issue
with branches.  (It's actually worse than you think: there isn't any
guarantee there will even be a branch when you've thrown the optimizer at
it.)  The save intrinsic says "Its return value should be consumed by
exactly one `coro.suspend` intrinsic.", but it might have zero uses by the
time your lowering pass runs.  You could end up with code getting inserted
before llvm.experimental.coro.fork.
2. If you need some sort of unusual control flow construct, make it a
proper terminator instruction; don't try to glue your intrinsic to a normal
branch instruction.  A new kind of terminator instruction might be
appropriate.  Or maybe you can make "invoke" work for you.
3. Maybe you could reduce the amount of magic involved by splitting the
initialization into a separate function?  So the initialization is always
just something like "f(int x) { frame = init(); f_inner(x, frame); return
frame; }", and you don't have to deal with the weirdness of fork().

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160609/5794ab5f/attachment.html>

Gor Nishanov via llvm-dev

2016-Jun-09 14:41 UTC

head link

[llvm-dev] [RFC] LLVM Coroutines

Hi David:
>>   has anyone in WG21 yet come up with a sane proposal about how
coroutines interact with thread-local storage?
There is a discussion in SG1 subgroup (Concurrency and Parallelism) on
the meaning of TLS in the lightweight thread of execution. I don't
think there is a resolution yet. Whether to think of a coroutine of
this kind as a thread of execution or just an "unusual" function is
not decided either.
>> Does this mean that, for correctness, we must regard these suspension
points as barriers that prevent loads or stores of thread-local variables and
prevent caching of loads of addresses of thread-local variables?
You are right! P0057 implies (probably need to make a note about it)
that when a coroutine is executing, you get thread_local data for the
current thread. For your first point (caching TLS loads), I think I am
getting it for free. coro.suspend intrinsic, from llvm standpoint, may
read and write any memory it can get access to, therefore, if a value
was loaded from TLS before coro.suspend, it would have to be reloaded
after a call to coro.suspend.

With respect to caching of addresses of TLS variables.

1) if a user does the caching, we should let him:

  thread_local int a;
  task f() {
     auto x = &a; // captures the address of a in Thread 1
     co_await some_async_call();
     *x = *x + 1; // updates the value in Thread1, whatever the
current thread is
  }

2) If llvm is does it, than, absolutely, we should prevent "prevent
caching of loads of addresses across suspend points". I poked around a
little and was not able to come up with the case where it would, but,
I'll keep looking. If you know of such a case, let me know.
Coroutines are split into normal functions in the middle optimizer, so
if there is any caching done during lowering, coroutines won't be
affected.
>> Do we have to treat every function call as a suspension point, as no
function can know whether it is executing in
>> a coroutine or whether a function that it calls may trigger a thread
migration?  This sounds as if
>> it would cause noticeable performance regressions in code that makes
heavy use of thread-local variables.
That should not be necessary. The proposed coroutines are "stackless",
they can only get suspended when control is in the body of the
coroutine at explicitly marked suspend points. Thus thread migration
can only happen at suspend point boundaries.

With stackful coroutines (aka Fibers, Green Threads, User-Mode
threads, etc), it is a big concern, since none of the functions that
runs on a fiber are aware that they run on a fiber and potentially can
get a thread switch at every call. Yep. TLS is pretty much busted
there.

Thank you very much for your comments!!!

On Thu, Jun 9, 2016 at 1:25 AM, David Chisnall
<David.Chisnall at cl.cam.ac.uk> wrote:> On 9 Jun 2016, at 06:57, Gor Nishanov via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> Though this
>> proposal is motivated primarily by the desire to support C++ Coroutines
[1],
>> the llvm representation is language neutral and can be used to support
>> coroutines in other languages as well.
>
> I’m sorry if this is off-topic, but has anyone in WG21 yet come up with a
sane proposal about how coroutines interact with thread-local storage?  In
LLVM-compiled code, it is possible that the address of a thread-local variable
will be cached to avoid repeated lookups.  In an environment that contains both
coroutines and TLS, it is possible that the coroutine will be migrated to a
different thread between suspension and resumption.  Does this mean that, for
correctness, we must regard these suspension points as barriers that prevent
loads or stores of thread-local variables and prevent caching of loads of
addresses of thread-local variables?  Do we have to treat every function call as
a suspension point, as no function can know whether it is executing in a
coroutine or whether a function that it calls may trigger a thread migration? 
This sounds as if it would cause noticeable performance regressions in code that
makes heavy use of thread-local variables.
>
> David
>

Gor Nishanov via llvm-dev

2016-Jun-09 17:16 UTC

head link

[llvm-dev] Fwd: [RFC] LLVM Coroutines

Hi Eli:

Thank you very much for your comments!
>> If you need some sort of unusual control flow construct, make it a
>> proper terminator instruction
I would love to. I was going by the advice in
"docs/ExtendingLLVM.rst":

       "WARNING: Adding instructions changes the bitcode format, and it
will
        take some effort to maintain compatibility with the previous version.
        Only add an instruction if it is absolutely necessary.

Having coro.fork and coro.suspend to be proper terminators (as they are) will
be much cleaner. Something like:

  corobegin to label %start
         suspend label %retblock

   corosuspend [final] [save %token]
           resume label %resume
            cleanup label %cleanup

I did consider "repurposing" 'invoke', but felt it is too much
of an abuse.
If there is a support for making corobegin/corosuspend instructions, I would
love to do it. Also, if there is a support for invoke abuse :-), I will be
glad to try it out and come back with the report on how it went.
>> Maybe you could reduce the amount of magic involved by splitting the
>> initialization into a separate function?  So the initialization is
always
>> just something like "f(int x) { frame = init(); f_inner(x, frame);
return
>> frame; }", and you don't have to deal with the weirdness of
fork().
I think outlining the "start" part will be sufficient (starting at the
coro.fork
and cutting all of the branches ending in coro.end). In this case llvm won't
see
coro.fork and coro.end as the clang will do the outlining work. Something like

 T f(int x, A y) {
   auto mem = coro.elide();
   if (!mem) mem = malloc(llvm.frame.size(&f_inner));
  auto hdl = coro.init(mem);
  T result = <whatever>;

  f_inner(hdl, x, &y);

  return result;
 }

f will be a normal function and f_inner will be magic function that will require
to be split at suspend points. It will be processed in the IPO order, so by the
time f_inner is considered for inlining into f, it will be a normal function.

I think this can work, but we are trading off one complexity for another.

As presented:

 coroutine is defined as a magic function (that has weird intrinsics) and
 requires splitting during IPO

With coro.fork / coro.end removed

 coroutine is defined as a combination of normal function + magic function that
 requires splitting during IPO. If there are other languages that acquire
 coroutines they would need to have the same outlining pass as clang.

I need to think about it more, but, I am currently leaning toward the first
option (with coro.fork and coro.end). The "f_inner" is still part of
the f,
but nicely delineated with corobegin and coro.end.

  corobegin to label %coro.start
         suspend label %retblock

   corosuspend [final] [save %token] resume label %resume
            cleanup label %cleanup

    call void @llvm.coro.end();

Does it look better?

Gor

On Thu, Jun 9, 2016 at 1:33 AM, Eli Friedman <eli.friedman at gmail.com>
wrote:> On Wed, Jun 8, 2016 at 10:57 PM, Gor Nishanov via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi all:
>>
>> Below is a proposal to add experimental coroutine support to LLVM.
Though
>> this
>> proposal is motivated primarily by the desire to support C++ Coroutines
>> [1],
>> the llvm representation is language neutral and can be used to support
>> coroutines in other languages as well.
>
>
> I looked over the proposal... a few comments:
>
> 1. This adds a lot of intrinsics, and they have weird semantics.  This is
> bad in general; the more weirdness you add to the IR, the more you'll
> struggle to make the optimizer actually understand what you're doing. 
And
> magic pattern matching *will* break.  You've already pointed out the
issue
> with branches.  (It's actually worse than you think: there isn't
any
> guarantee there will even be a branch when you've thrown the optimizer
at
> it.)  The save intrinsic says "Its return value should be consumed by
> exactly one `coro.suspend` intrinsic.", but it might have zero uses by
the
> time your lowering pass runs.  You could end up with code getting inserted
> before llvm.experimental.coro.fork.
> 2. If you need some sort of unusual control flow construct, make it a
proper
> terminator instruction; don't try to glue your intrinsic to a normal
branch
> instruction.  A new kind of terminator instruction might be appropriate. 
Or
> maybe you can make "invoke" work for you.
> 3. Maybe you could reduce the amount of magic involved by splitting the
> initialization into a separate function?  So the initialization is always
> just something like "f(int x) { frame = init(); f_inner(x, frame);
return
> frame; }", and you don't have to deal with the weirdness of
fork().
>
> -Eli

Josh Klontz via llvm-dev

2016-Jun-11 00:05 UTC

head link

[llvm-dev] Fwd: [RFC] LLVM Coroutines

Sorry for the novice question, but how are coroutines lowered in the
backend? It requires making operating system calls to something like
pthreads right?

On Wed, Jun 8, 2016 at 11:57 PM, Gor Nishanov via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all:
>
> Below is a proposal to add experimental coroutine support to LLVM. Though
> this
> proposal is motivated primarily by the desire to support C++ Coroutines
> [1],
> the llvm representation is language neutral and can be used to support
> coroutines in other languages as well.
>
> Clang + llvm coroutines allows you to take this code:
>
>   generator<int> range(int from, int to) {
>      for(int i = from; i < to; ++n)
>         co_yield i;
>   }
>   int main() {
>      int sum = 0;
>      for (auto v: range(1,100))
>         sum += v;
>      return sum;
>   }
>
> And translate it down to this:
>
>   define i32 @main() #5 {
>   entry:
>     ret i32 4950
>   }
>
> I prototyped llvm changes to support this proposal and extended clang
> coroutine
> implementation [2] to emit proposed intrinsics to smoke test the proposed
> design and to get simple generator and async task style coroutines working.
> See "More Attention Required" in the doc/Coroutines.rst for
details on what
> addition work is required beyond cleanup and bug fixes.
>
> I would like to start the discussion on the overall design, representation
> and
> a roadmap to start upstreaming the changes with the intention to continue
> the
> development in-tree incrementally.
>
> Roadmap:
> --------
> 1) Get agreement on coroutine representation and overall direction (this
> mail).
>   .. repeat 1) until happy
> 2) Get agreement on how coroutine transformation passes integrate into the
>    optimizer pipeline.
>   .. repeat 2) until happy
> 3) update IR/Intrinsics.td + doc/Coroutines.rst + doc/index.rst
> 4) trickle in coroutine transformation passes + tests in digestible chunks.
> 5) get clang changes in (sometime after step 3).
> 6) fix bugs / remove limitations / add functionality to coroutine passes
>  .. repeat 6) until happy
>
> Background
> =========> Coroutine representation is the result of several offline
discussions
> with Richard Smith,
> Reid Kleckner, and David Majnemer. All of the good came from the
> collective effort.
> All the yuckiness was sneaked in by me after those discussions.
>
> I'll start with a quick overview.
>
> LLVM coroutines are functions that have one or more `suspend points`.
> When a suspend point is reached, the execution of a coroutine is suspended
> and
> control is returned back to its caller. A suspended coroutine can be
> resumed
> to continue execution from the last suspend point or be destroyed.
>
> In the following example, function `f` (which may or may not be a
> coroutine itself)
> returns a handle to a suspended coroutine (**coroutine handle**) that is
> used
> by `main` to resume the coroutine twice and then destroy it:
>
> .. code-block:: llvm
>
>   define i32 @main() {
>   entry:
>     %hdl = call i8* @f(i32 4)
>     call void @llvm.experimental.coro.resume(i8* %hdl)
>     call void @llvm.experimental.coro.resume(i8* %hdl)
>     call void @llvm.experimental.coro.destroy(i8* %hdl)
>     ret i32 0
>   }
>
> In addition to the function stack frame which exists when a coroutine
> is executing,
> there is an additional region of storage that contains values that keep the
> coroutine state when a coroutine is suspended. This region of storage
> is called **coroutine frame**. It is created when a coroutine is called.
> It is
> destroyed when a coroutine runs to completion or destroyed by a call to
> the `coro.destroy`_ intrinsic. Unlike stackful coroutines [3] which
> maintains a
> stack per coroutine, an llvm coroutine frame only contains the values that
> need
> to persist across suspend points (there is a path from a use to the
> definition
> that crosses a suspend point).
>
> Overall idea is that a coroutine is presented to an llvm as an ordinary
> function
> with suspension points marked up with intrinsics. We let the optimizer
> party
> on the coroutine for awhile. Shortly before the coroutine is eligible to be
> inlined into its callers, we outline parts of the coroutine that
> correspond to
> the code that needs to get executed after the coroutine is resumed or
> destroyed.
> The coroutine resumption intrinsics get replaced with direct calls to those
> outlined functions where possible. Then inliner can inline much leaner and
> nicer
> coroutine or any of the outlined parts as desired.
>
> If we discover that the lifetime of a coroutine is fully enclosed in
> the lifetime
> of the caller, we remove dynamic allocation for coroutine frame and
> replace it
> with an `alloca` on the caller's frame.
>
> Please see doc/Coroutines.rst for more details:
>
> doc/Coroutines.rst: http://reviews.llvm.org/D21170
> IR/Intrinsics.td:   http://reviews.llvm.org/D21169
>
> Concerns:
> ========> (This section assumes that you looked at doc/Coroutins.rst and
> IR/Intrinsics.td)
>
> Invoke like intrinsics via call + conditional branch:
> -----------------------------------------------------
>
> The `llvm.experimental.coro.fork` and `llvm.experimental.coro.suspend`
> intrinsics model behavior somewhat similar to the invoke instruction.
> Namely,
> they are used to show a normal continuation branch and an alternative
> branch.
>
> They are used in the following pattern:
>
>     %where.to = call i1 @llvm.experimental.coro.XXXX()
>     br i1 %where.to, label %normal, label %alternative
>
> My concern is that this is somewhat fragile. After optimizations some code
> can
> sneak in between the intrinsic and the branch and cause trouble. I am
> looking
> for suggestions on how to make it more robust. One thought I had was to
> put some
> intrinsics at the beginning of the %normal and %alternative blocks so as to
> prevent code motion above them. Something like:
>
>   ...
>     %where.to = call i1 @llvm.experimental.coro.XXXX()
>     br i1 %where.to, label %normal, label %alternative
>   normal:
>    <phi>
>    call i1 @llvm.experimental.coro.normal();
>   ...
>   alternative:
>    <phi>
>    call i1 @llvm.experimental.coro.alt();
>   ...
>
> After coro.xxx is lowered, coro.normal and coro.alt are removed.
>
> Looking up outlined coroutine parts
> -----------------------------------
>
> Last parameter of the `llvm.experimental.coro.init" intrinsic is a
pointer
> to
> a coroutine function. To get to the outlined parts, I get the name of that
> function and glue ".resume", ".destroy" or
".cleanup" suffixes to it and
> then
> look them up by name.
>
> Probably a better alternative is for a coroutine f to create a constant
> containing an array of pointers to f.resume, f.destroy and other parts of
> the
> coroutine and use the last parameter to refer to that constant.
>
> Other questions
> ---------------
>
> I have more questions about the best way of plugging in coroutine
> optimization
> passes into the PassManager, but, I will defer those for later discussion.
>
> All the feedback and comments is greatly appreciated!!!
>
> Thank you,
> Gor
>
> References:
> ==========>
> [1] http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0057r4.pdf
> [2] http://llvmweekly.org/issue/95 (Coroutine support added to clang)
> [3]
> http://www.boost.org/doc/libs/1_61_0/libs/coroutine/doc/html/index.html
>
> Review Links:
> ============>
> doc/Coroutines.rst: http://reviews.llvm.org/D21170
> IR/Intrinsics.td:   http://reviews.llvm.org/D21169
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160610/e1975b50/attachment.html>

Gor Nishanov via llvm-dev

2016-Jun-11 00:42 UTC

head link

[llvm-dev] Fwd: [RFC] LLVM Coroutines

Hi Josh:
> Sorry for the novice question, but how are coroutines lowered in the
> backend? It requires making operating system calls to something like
> pthreads right?
No OS, no target specific code required. Just simple rewriting of a function
during the middle optimizer. One of the first examples in the RFC shows
that a coroutine is optimized down to a constant. Try doing that with
pthreads :-)
>>   generator<int> range(int from, int to) {
>>      for(int i = from; i < to; ++n)
>>         co_yield i;
>>   }
>>   int main() {
>>      int sum = 0;
>>      for (auto v: range(1,100))
>>         sum += v;
>>      return sum;
>>   }
>>
>> And translate it down to this:
>>
>>   define i32 @main() #5 {
>>   entry:
>>     ret i32 4950
>>   }

Very briefly:

LLVM coroutines are functions that have one or more `suspend points`_.
When a suspend point is reached, the execution of a coroutine is suspended and
control is returned back to its caller. A suspended coroutine can be resumed
to continue execution from the last suspend point or be destroyed.

Let's say you get coroutine that looks something like this

  void *f(int n) {
     for(;;) {
       bar(n++);
       <suspend> // magic: returns a coroutine handle on first suspend
     }
  }

We will split it up into start, resume and destroy functions and will create
a coroutine frame that will keep values that need to be live across suspension
points.

  struct f.frame { int n; }

  void* f(int n) {
    auto s = new f.frame{n};
    bar(s->n++);
    return s;
  }

  void f.resume(f.frame* s) {
    bar(s->n++);
  }

  void f.destroy(f.frame* s) {
    delete s;
  }

You can read more details in the doc/Coroutines.rst up for review at

http://reviews.llvm.org/D21170

or in the RFC that started this thread.

Cheers
Gor

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Jun 2016 - Fwd: [RFC] LLVM Coroutines

[llvm-dev] Fwd: [RFC] LLVM Coroutines

[llvm-dev] [RFC] LLVM Coroutines

[llvm-dev] Fwd: [RFC] LLVM Coroutines

[llvm-dev] [RFC] LLVM Coroutines

[llvm-dev] Fwd: [RFC] LLVM Coroutines

[llvm-dev] Fwd: [RFC] LLVM Coroutines

[llvm-dev] Fwd: [RFC] LLVM Coroutines

Reasonably Related Threads