thr3ads.net - llvm dev - [llvm-dev] RFC: LLVM Coroutine Representation, Round 2 [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Vadim Chugunov via llvm-dev

2016-Jul-21 16:58 UTC

[llvm-dev] RFC: LLVM Coroutine Representation, Round 2

cc llvm-dev

On Thu, Jul 21, 2016 at 9:57 AM, Vadim Chugunov <vadimcn at gmail.com>
wrote:
> Hi Gor,
> Does you design support resumption with parameter(s)?  (such as
Python's
> generator.send(x)).  I suppose the "promise" could be used for
passing data
> both ways, but if that's the plan, please mention this explicitly in
the
> design doc.
> Also, how is loading/storing to promise going to be lowered?
>
> Vadim
>
> On Mon, Jul 11, 2016 at 6:47 AM, Gor Nishanov via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all:
>>
>> Thank you very much for the feedback during the first round! Special
>> thanks to
>> Eli, Sanjoy, Hal and Chandler for their detailed public and private
>> comments.
>> The model is simpler now and the intrinsics have nice symmetry to them:
>> coro.begin + coro.end, coro.alloc + coro.free. Compared to the previous
>> RFC,
>> coro.fork is gone, coro.init and coro.start are collapsed into one
>> coro.begin,
>> coro.suspend is three-way as opposed to two-way. coro.elide and
>> coro.delete
>> renamed coro.alloc and coro.free.
>>
>> All the changes implemented and tested. Full document, (nicely
formatted
>> by
>> github is at:
>>
>>   https://github.com/GorNishanov/llvm/blob/coro-rfc/docs/Coroutines.rst
>>
>> Below is a summary of a proposal to add coroutine support to LLVM.
>> The proposal is motivated primarily by the desire to support C++
>> Coroutines [1],
>> however the llvm representation is language neutral and can be used
>> to support coroutines in other languages as well.
>>
>> Clang + llvm coroutines allows you to take this code:
>>
>>   generator<int> range(int from, int to) {
>>      for(int i = from; i < to; ++n)
>>         co_yield i;
>>   }
>>   int main() {
>>      int sum = 0;
>>      for (auto v: range(1,100))
>>         sum += v;
>>      return sum;
>>   }
>>
>> And translate it down to this:
>>
>>   define i32 @main() #5 {
>>   entry:
>>     ret i32 4950
>>   }
>>
>> You can also use coroutines in plain C, by calling the builtins mapping
>> to the
>> intrinsics described by this proposal, so that your coroutine can look
>> like:
>>
>>     #include "Inputs/coro.h"
>>     void print(int v);
>>
>>     void* f(int n) {
>>       CORO_BEGIN(malloc);
>>
>>       while (n-- > 0) {
>>         print(n+1);
>>         CORO_SUSPEND();
>>       }
>>
>>       CORO_END(free);
>>     }
>>
>>     // CHECK-LABEL: @main
>>     int main() {
>>       void* coro = f(4);
>>       CORO_RESUME(coro);
>>       CORO_RESUME(coro);
>>       CORO_DESTROY(coro);
>>     }
>>
>>
https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/coro.c
>>
>>
https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/coro.h
>>
>> I prototyped llvm changes to support this proposal and extended clang
>> coroutine
>> implementation [2] to emit proposed intrinsics to smoke test the
proposed
>> design. See "More Attention Required" in the
doc/Coroutines.rst [4]for
>> details
>> on what additional work is required beyond cleanup and bug fixes.
>>
>> I would like to continue the discussion on the overall design,
>> representation
>> and a roadmap to start upstreaming the changes with the intention to
>> continue
>> the development in tree incrementally.
>>
>> Roadmap:
>> --------
>> 1) Get agreement on coroutine representation and overall direction
(this
>> mail).
>>   .. repeat 1) until happy <=== ARE ARE HERE
>> 2) Get agreement on how coroutine transformation passes integrate into
the
>>    optimizer pipeline.
>>   .. repeat 2) until happy
>> 3) update IR/Intrinsics.td + doc/Coroutines.rst + doc/LangRef.rst
>> 4) trickle in coroutine transformation passes + tests in digestible
>> chunks.
>> 5) get clang changes in (sometime after step 3).
>> 6) fix bugs / remove limitations / add functionality to coroutine
passes
>>  .. repeat 6) until happy
>>
>> Background
>> =========>> LLVM coroutines are functions that have one or more
`suspend points`.
>> When a suspend point is reached, the execution of a coroutine is
>> suspended and
>> control is returned back to its caller. A suspended coroutine can be
>> resumed
>> to continue execution from the last suspend point or it can be
destroyed.
>>
>> In the following example, function `f` (which may or may not be a
>> coroutine
>> itself) returns a handle to a suspended coroutine (**coroutine
handle**)
>> that is
>> used by `main` to resume the coroutine twice and then destroy it:
>>
>> .. code-block:: llvm
>>
>>   define i32 @main() {
>>   entry:
>>     %hdl = call i8* @f(i32 4)
>>     call void @llvm.coro.resume(i8* %hdl)
>>     call void @llvm.coro.resume(i8* %hdl)
>>     call void @llvm.coro.destroy(i8* %hdl)
>>     ret i32 0
>>   }
>>
>> In addition to the stack frame which exists when a coroutine is
executing,
>> there is an additional region of storage that contains values that keep
>> the
>> coroutine state when a coroutine is suspended. This region of storage
>> is called **coroutine frame**. It is created when a coroutine is called
>> and
>> destroyed when a coroutine runs to completion or destroyed by a call to
>> the `coro.destroy`_ intrinsic. Unlike stackful coroutines [3] which
>> maintains a
>> stack per coroutine, an llvm coroutine frame only contains the values
>> that need
>> to persist across suspend points (there is a path from a use to the
>> definition
>> that crosses a suspend point).
>>
>> Overall idea is that a coroutine is presented to an llvm as an ordinary
>> function
>> with suspension points marked up with intrinsics. We let the optimizer
>> party
>> on the coroutine for awhile. Shortly before the coroutine is eligible
to
>> be
>> inlined into its callers, we outline parts of the coroutine that
>> correspond to
>> the code that needs to get executed after the coroutine is resumed or
>> destroyed.
>> The coroutine resumption intrinsics get replaced with direct calls to
>> those
>> outlined functions where possible. Then inliner can inline much leaner
>> and nicer
>> coroutine or any of the outlined parts as desired.
>>
>> If we discover that the lifetime of a coroutine is fully enclosed in
the
>> lifetime of its caller, we remove dynamic allocation for coroutine
frame
>> and
>> replace it with a static `alloca` on the caller's frame.
>>
>> Please see doc/Coroutines.rst for more details:
>>
>> https://github.com/GorNishanov/llvm/blob/coro-rfc/docs/Coroutines.rst
>>
>> Concerns:
>> ========>> (This section assumes that you looked at
doc/Coroutins.rst)
>>
>> coro.begin: 4 arguments or 5?
>> -----------------------------
>>
>> Background:
>>
>> For standard allocation functions recognized by LLVM, coroutine frame
>> allocation
>> code looks like:
>>
>>   entry:
>>     %size = call i32 @llvm.coro.size.i32(i8* null)
>>     %alloc = call i8* @malloc(i32 %size)
>>     %hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8*
null,
>>                                                                 i8*
null)
>>
>> To enable coroutine heap elision with custom allocation functions,
>> `coro.alloc`
>> intrinsic is used to exclude the allocation code when not needed.
>>
>>   entry:
>>     %elide = call i8* @llvm.coro.alloc()
>>     %0 = icmp ne i8* %elide, null
>>     br i1 %0, label %coro.begin, label %coro.alloc
>>
>>   coro.alloc:
>>     %frame.size = call i32 @llvm.coro.size()
>>     %alloc = call i8* @MyAlloc(i32 %frame.size)
>>     br label %coro.begin
>>
>>   coro.begin:
>>     %phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ]
>>     %frame = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8*
>> null)
>>
>> When using a static alloca for coroutine frame, coro.begin and
coro.alloc
>> intrinsics are replaced with that very alloca (bitcasted to i8* to
match
>> the
>> type).
>>
>> To find a coro.alloc matching a particular coro.begin, I hunt through
the
>> %phi
>> until I find definition of coro.alloc. Probably more robust
implementation
>> need to use alias analysis to check whether the %phi and %elide may
alias.
>>
>> A simpler approach would be just to add another parameter to directly
>> point to
>> `coro.alloc`. With this change, coro.begin block above would look like:
>>
>>   coro.begin:
>>     %phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ]
>>     %frame = call i8* @llvm.coro.begin(i8* %phi, i8 %elide, i32 0, i8*
>> null,
>>                                                  ^^^^^^^^^         i8*
>> null)
>>
>> All feedback and comments are greatly appreciated!!!
>>
>> Thank you,
>> Gor
>>
>> References:
>> ==========>>
>> [1] http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0057r4.pdf
>> [2] http://llvmweekly.org/issue/95 (Coroutine support added to clang)
>> [3]
>> http://www.boost.org/doc/libs/1_61_0/libs/coroutine/doc/html/index.html
>> [4]
https://github.com/GorNishanov/llvm/blob/coro-rfc/docs/Coroutines.rst
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160721/5d3bd168/attachment.html>

Gor Nishanov via llvm-dev

2016-Jul-21 18:43 UTC

head link

[llvm-dev] RFC: LLVM Coroutine Representation, Round 2

Hi Vadim:
> Does you design support resumption with parameter(s)?  (such as
Python's
> generator.send(x)).  I suppose the "promise" could be used for
passing data
> both ways,
Correct.

The docs/Coroutines.rst states that:

   "A coroutine author or a frontend may designate a distinguished `alloca`
    that can be used to communicate with the coroutine."

What kind of communication happens is determined by the source language
coroutine constructs. An example later in the section shows how, as a consumer
of a coroutine, you can read the data from the promise. But that communication
can go the other way if you, instead, will be storing into the promise in main,
and loading from it in the coroutine.

coro.promise intrinsic gives you the address of the promise associated with a
particular coroutine instance and you (as frontend writer, or coroutine library
designer is free to do whatever you want with it).
> please mention this explicitly in the design doc.
Would you suggest changes to the exact wording to make it clearer? I put it up
for review at: https://reviews.llvm.org/D22603, so you can just mark up the
changes you would like to see.
> Also, how is loading/storing to promise going to be lowered?
The coro.promise intrinsics just gives you an address of the coroutine promise.
You, as frontend / library writer, know what is the promise and you emit
appropriate load or stores instructions.

If you have a synchronous scenario and a promise is plain int, you would use
plain loads and stores. If your promise is some complicated data structure
which has atomics and you use it from different threads, you would use
appropriate loads, stores, fences and synchronization primitives.

But all this is out of scope for docs/Coroutines.rst . It is up to the frontend,
library writer to decide how it wants to use the promise. The only special
handling that LLVM does for the coroutine promise is:

1) places the promise at deterministic offset in the coroutine frame
2) allows you to get an address of the promise given a coroutine handle and
   vice versa.

Cheers,
Gor

P.S. If you'd like to see a library implementation of a generator, please
see:

Generator itself:
https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/generator.h

coroutine_handle (a C++ level abstraction mapping to llvm intrinsics)
https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/coroutine.h

and the use:

https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/generator.cpp

Well, if start looking at those tests, you may enjoy coroutines in C via macros,
lowered to the same coroutine intrinsics :-).

https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/coro.c

and

https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/coro.h

Vadim Chugunov via llvm-dev

2016-Jul-21 20:08 UTC

head link

[llvm-dev] RFC: LLVM Coroutine Representation, Round 2

What I am getting at, is that ideally the parameters/return value of
generator.send() should become parameters/return value of @f.Resume:
    define <return type> @f.Resume(%f.Frame* %FramePtr, <input type>
%Param);

They don't need to be allocated in the coroutine frame at all.  Do you
think that LLVM will be able to perform such an optimization?

On Thu, Jul 21, 2016 at 11:43 AM, Gor Nishanov <gornishanov at gmail.com>
wrote:
> Hi Vadim:
>
> > Does you design support resumption with parameter(s)?  (such as
Python's
> > generator.send(x)).  I suppose the "promise" could be used
for passing
> data
> > both ways,
>
> Correct.
>
> The docs/Coroutines.rst states that:
>
>    "A coroutine author or a frontend may designate a distinguished
`alloca`
>     that can be used to communicate with the coroutine."
>
> What kind of communication happens is determined by the source language
> coroutine constructs. An example later in the section shows how, as a
> consumer
> of a coroutine, you can read the data from the promise. But that
> communication
> can go the other way if you, instead, will be storing into the promise in
> main,
> and loading from it in the coroutine.
>
> coro.promise intrinsic gives you the address of the promise associated
> with a
> particular coroutine instance and you (as frontend writer, or coroutine
> library
> designer is free to do whatever you want with it).
>
> > please mention this explicitly in the design doc.
>
> Would you suggest changes to the exact wording to make it clearer? I put
> it up
> for review at: https://reviews.llvm.org/D22603, so you can just mark up
> the
> changes you would like to see.
>
> > Also, how is loading/storing to promise going to be lowered?
>
> The coro.promise intrinsics just gives you an address of the coroutine
> promise.
> You, as frontend / library writer, know what is the promise and you emit
> appropriate load or stores instructions.
>
> If you have a synchronous scenario and a promise is plain int, you would
> use
> plain loads and stores. If your promise is some complicated data structure
> which has atomics and you use it from different threads, you would use
> appropriate loads, stores, fences and synchronization primitives.
>
> But all this is out of scope for docs/Coroutines.rst . It is up to the
> frontend,
> library writer to decide how it wants to use the promise. The only special
> handling that LLVM does for the coroutine promise is:
>
> 1) places the promise at deterministic offset in the coroutine frame
> 2) allows you to get an address of the promise given a coroutine handle and
>    vice versa.
>
> Cheers,
> Gor
>
> P.S. If you'd like to see a library implementation of a generator,
please
> see:
>
> Generator itself:
>
>
https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/generator.h
>
> coroutine_handle (a C++ level abstraction mapping to llvm intrinsics)
>
>
https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/coroutine.h
>
> and the use:
>
>
>
https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/generator.cpp
>
> Well, if start looking at those tests, you may enjoy coroutines in C via
> macros,
> lowered to the same coroutine intrinsics :-).
>
> https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/coro.c
>
> and
>
>
>
https://github.com/GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/coro.h
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160721/52be5fc6/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jul 2016 - RFC: LLVM Coroutine Representation, Round 2

[llvm-dev] RFC: LLVM Coroutine Representation, Round 2

[llvm-dev] RFC: LLVM Coroutine Representation, Round 2

[llvm-dev] RFC: LLVM Coroutine Representation, Round 2

Maybe Matching Threads