thr3ads.net - llvm dev - [llvm-dev] [RFC] LLVM Coroutines [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Gor Nishanov via llvm-dev

2016-Jun-12 04:11 UTC

[llvm-dev] [RFC] LLVM Coroutines

Hi Eli:
>>   Block1:
>>     %0 = call i8 coro.suspend()
>>     switch i8 %0, label suspend1 [i8 0 %return] ; or icmp + br i1
>>   Suspend1:
>>     switch i8 %0, label %resume1 [i8 1 %destroy1] ; or icmp + br i1
>>
>> This doesn't look right: intuitively the suspend happens after the
return
>> block runs.
Perhaps, but, that is not the intended semantics.

The following two things are distinct concepts:

  * Coroutine is considered suspended

  * Coroutine returns control back to its caller
    (initial caller, resumer, or destroyer)

Coroutine is considered suspended means that it is legal to use coro.done,
coro.resume, coro.destroy intrinsics (either in this thread or in a different
one). It may seem strange, but, consider this example:

  task<void> f() {
    char buf[N];
    ... fill buff with stuff
    await async_send(buf); // <suspend-point>
    ... do more
  }

IR for await expression will be (conceptually):

  %0 = coro.save()
  async_send(%buf)
  coro.suspend(%0)

coro.save is the point when coroutine is considered suspended.
This allows async_send to resume the coroutine even before async_send
returned control back to f.

To make it safe, absolutely nothing should sneak-in between coro.save
and coro.suspend beyond what frontend/user put there.

Aside:
-----
Based on what we discussed earlier, I suspect that optimizer may try
to sneak stuff in between coro.save and coro.suspend. If it is the case,
I have a solution :-), Richard Smith suggested it earlier to me.
Use a distinct function (Just like you said with coro.fork :-)).

There will be no distinct coro.save. Instead, coro.suspend will take an
optional function pointer parameter and frontend will generate a little
function for all the code that needs to go in between coro.save and
coro.suspend.

  f.await(%frame) {
    %buf = get from %frame
    async_send(%buf)
  }

  in f()

    coro.suspend([...], &f.await)

CoroEarly will do the extraction. Frontend will keep emitting
coro.save/coro.suspend pairs. After CoroEarly, coro.save's will be removed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  P A U S E
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Okay getting back to the switch in the return block.
>> The switch answers the question of where the control flow actually goes
>> after the return block runs.
Since control can reenter the coroutine at any point after coro.save
and before, I don't think there is any point where putting the switch can
help.

We already told the optimizer that we have three possible paths out of suspend:
return (first time), resume and destroy. It think it is as close to the
truth as we can get.
>> Another way to put it is that it should be possible to lower a
coroutine
>> to a thread rather than performing the state machine transformation.
That is very good point! It is a great way to think about the semantics.

Unfortunately, I cannot avoid doing state machine transformation in general
case.

For the coroutines with "initial_suspend" point, i.e. it suspends
before
entering user authored body, it is very easy.

  future<void> f(Params) { // <initial suspend point> at open curly
    char buf[N];
    ... fill buff with stuff
    await async_send(buf); // <suspend-point>
    ... do more
  }

Will be transformed into equivalent of:

  future<void> f(Params) {
    promise<void> prom;
    future<void> result = prom.get_future();

    std::thread([prom = std::move(prom), Params = std::move(params)] {
      char buf[N];
      ... fill buff with stuff
      BLOCK_WAIT async_send(buf); // block on event instead of suspend
      ... do more
    }).detach();

    return result;
  }

No state machine transformation whatsoever, just capture the parameters and
promise and good to go. [No need to involve llvm either :-) it is a simple
lambda like transformation that can be done easily in frontend]

What if we don't have an initial suspend point? That means that the
coroutine executes in the caller thread until it hits the suspend point.
Well, we need to build up a coroutine frame, we need to chop up the function
at coro.suspends. We need to put an index in a coro frame and a switch in an
entry block of resume and suspend (if more than one suspend point).

The only difference is what coro.suspend is replaced with:

In start: coro.suspend launches a thread and gives f.resume as a start
routine and a frame pointer as a context. So, f.resume will be executing
concurrently with f slowly finishing up and returning back to the caller.
(Btw, that is exactly the semantics we are trying to capture)

In resume, coro.suspend is replaced with a blocking call waiting on
whatever will signal that the thread can resume.

====================Two function approach
====================
Even though I pushing back against it, I might prototype it and see how it
looks anyways.

I will outline in CoroEarly and during CoroSplit inline it back so that
I can build the coroutine frame properly.

But even as I am typing it, it just feels too much.
Look what we are trying to solve is this.

        %0 = coro.suspend()
        ; A thing should not be hoisted here
        %first = icmp %0, 0
        ; or here
        br %first, %return, %suspend ; <=== please don't move stuff above
me

      suspend:
        %1 = icmp %0, 1
        br %1, label %resume, label %destroy

      resume:
        A Thing That Can Be Hoisted (same as in destroy)
        [...]

      destroy:
        A Thing That Can Be Hoisted (same as in resume)
        [...]

      return:
        Something Else

Or maybe

        %0 = coro.suspend()
        ; A thing should not be hoisted here
        %first = coro.is.first(%0)
        ; or here
        br %first, %return, %suspend ; <=== please don't move stuff above
me

      suspend:
        br %0, label %resume, label %destroy

      resume:
        A Thing That Can Be Hoisted (same as in destroy)
        [...]

      destroy:
        A Thing That Can Be Hoisted (same as in resume)
        [...]

      return:
        Something Else

Would the optimizer really try to stick 'A thing' above br %first?

As I mentioned before, I am very new to this and probably missing a lot,
but, (apart from outlining a portion of f), is there no simple way of
preventing instructions with side effects from resume or destroy blocks
being moved above 'br %first branch'?

(and if there is no good answer, two function it is. I will stop
torturing you with this :-))

Gor

Gor Nishanov via llvm-dev

2016-Jun-12 04:16 UTC

head link

[llvm-dev] [RFC] LLVM Coroutines

Typo:

"and before" should not be in this sentence:

Since control can reenter the coroutine at any point after
coro.save>>> and before <<<<, I don't think there is any point
where putting the switch canhelp

This sentence should read:

Since control can reenter the coroutine at any point after coro.save,
I don't think
there is any point where putting the switch can help

On Sat, Jun 11, 2016 at 9:11 PM, Gor Nishanov <gornishanov at gmail.com>
wrote:> Hi Eli:
>
>>>   Block1:
>>>     %0 = call i8 coro.suspend()
>>>     switch i8 %0, label suspend1 [i8 0 %return] ; or icmp + br i1
>>>   Suspend1:
>>>     switch i8 %0, label %resume1 [i8 1 %destroy1] ; or icmp + br i1
>>>
>>> This doesn't look right: intuitively the suspend happens after
the return
>>> block runs.
>
> Perhaps, but, that is not the intended semantics.
>
> The following two things are distinct concepts:
>
>   * Coroutine is considered suspended
>
>   * Coroutine returns control back to its caller
>     (initial caller, resumer, or destroyer)
>
> Coroutine is considered suspended means that it is legal to use coro.done,
> coro.resume, coro.destroy intrinsics (either in this thread or in a
different
> one). It may seem strange, but, consider this example:
>
>   task<void> f() {
>     char buf[N];
>     ... fill buff with stuff
>     await async_send(buf); // <suspend-point>
>     ... do more
>   }
>
> IR for await expression will be (conceptually):
>
>   %0 = coro.save()
>   async_send(%buf)
>   coro.suspend(%0)
>
> coro.save is the point when coroutine is considered suspended.
> This allows async_send to resume the coroutine even before async_send
> returned control back to f.
>
> To make it safe, absolutely nothing should sneak-in between coro.save
> and coro.suspend beyond what frontend/user put there.
>
> Aside:
> -----
> Based on what we discussed earlier, I suspect that optimizer may try
> to sneak stuff in between coro.save and coro.suspend. If it is the case,
> I have a solution :-), Richard Smith suggested it earlier to me.
> Use a distinct function (Just like you said with coro.fork :-)).
>
> There will be no distinct coro.save. Instead, coro.suspend will take an
> optional function pointer parameter and frontend will generate a little
> function for all the code that needs to go in between coro.save and
> coro.suspend.
>
>   f.await(%frame) {
>     %buf = get from %frame
>     async_send(%buf)
>   }
>
>   in f()
>
>     coro.suspend([...], &f.await)
>
> CoroEarly will do the extraction. Frontend will keep emitting
> coro.save/coro.suspend pairs. After CoroEarly, coro.save's will be
removed.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   P A U S E
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Okay getting back to the switch in the return block.
>
>>> The switch answers the question of where the control flow actually
goes
>>> after the return block runs.
>
> Since control can reenter the coroutine at any point after coro.save
> and before, I don't think there is any point where putting the switch
can
> help.
>
> We already told the optimizer that we have three possible paths out of
suspend:
> return (first time), resume and destroy. It think it is as close to the
> truth as we can get.
>
>>> Another way to put it is that it should be possible to lower a
coroutine
>>> to a thread rather than performing the state machine
transformation.
>
> That is very good point! It is a great way to think about the semantics.
>
> Unfortunately, I cannot avoid doing state machine transformation in general
> case.
>
> For the coroutines with "initial_suspend" point, i.e. it suspends
before
> entering user authored body, it is very easy.
>
>   future<void> f(Params) { // <initial suspend point> at open
curly
>     char buf[N];
>     ... fill buff with stuff
>     await async_send(buf); // <suspend-point>
>     ... do more
>   }
>
> Will be transformed into equivalent of:
>
>   future<void> f(Params) {
>     promise<void> prom;
>     future<void> result = prom.get_future();
>
>     std::thread([prom = std::move(prom), Params = std::move(params)] {
>       char buf[N];
>       ... fill buff with stuff
>       BLOCK_WAIT async_send(buf); // block on event instead of suspend
>       ... do more
>     }).detach();
>
>     return result;
>   }
>
> No state machine transformation whatsoever, just capture the parameters and
> promise and good to go. [No need to involve llvm either :-) it is a simple
> lambda like transformation that can be done easily in frontend]
>
> What if we don't have an initial suspend point? That means that the
> coroutine executes in the caller thread until it hits the suspend point.
> Well, we need to build up a coroutine frame, we need to chop up the
function
> at coro.suspends. We need to put an index in a coro frame and a switch in
an
> entry block of resume and suspend (if more than one suspend point).
>
> The only difference is what coro.suspend is replaced with:
>
> In start: coro.suspend launches a thread and gives f.resume as a start
> routine and a frame pointer as a context. So, f.resume will be executing
> concurrently with f slowly finishing up and returning back to the caller.
> (Btw, that is exactly the semantics we are trying to capture)
>
> In resume, coro.suspend is replaced with a blocking call waiting on
> whatever will signal that the thread can resume.
>
> ====================> Two function approach
> ====================>
> Even though I pushing back against it, I might prototype it and see how it
> looks anyways.
>
> I will outline in CoroEarly and during CoroSplit inline it back so that
> I can build the coroutine frame properly.
>
> But even as I am typing it, it just feels too much.
> Look what we are trying to solve is this.
>
>         %0 = coro.suspend()
>         ; A thing should not be hoisted here
>         %first = icmp %0, 0
>         ; or here
>         br %first, %return, %suspend ; <=== please don't move stuff
above me
>
>       suspend:
>         %1 = icmp %0, 1
>         br %1, label %resume, label %destroy
>
>       resume:
>         A Thing That Can Be Hoisted (same as in destroy)
>         [...]
>
>       destroy:
>         A Thing That Can Be Hoisted (same as in resume)
>         [...]
>
>       return:
>         Something Else
>
> Or maybe
>
>         %0 = coro.suspend()
>         ; A thing should not be hoisted here
>         %first = coro.is.first(%0)
>         ; or here
>         br %first, %return, %suspend ; <=== please don't move stuff
above me
>
>       suspend:
>         br %0, label %resume, label %destroy
>
>       resume:
>         A Thing That Can Be Hoisted (same as in destroy)
>         [...]
>
>       destroy:
>         A Thing That Can Be Hoisted (same as in resume)
>         [...]
>
>       return:
>         Something Else
>
> Would the optimizer really try to stick 'A thing' above br %first?
>
> As I mentioned before, I am very new to this and probably missing a lot,
> but, (apart from outlining a portion of f), is there no simple way of
> preventing instructions with side effects from resume or destroy blocks
> being moved above 'br %first branch'?
>
> (and if there is no good answer, two function it is. I will stop
> torturing you with this :-))
>
> Gor

Gor Nishanov via llvm-dev

2016-Jun-12 16:31 UTC

head link

[llvm-dev] [RFC] LLVM Coroutines

I think I got it. Original model (with coro.fork and two-way coro.suspend)
will work with a tiny tweak.

In the original model, I was replacing coro.suspend with br %return in original
function. The problem was coming from potential phi-nodes introduces into
return block during optimizations.

Let's make sure that there is only entry into the return block.
In the original model, ReturnBB had two entries, one from coro.fork,
another from fall-through from the delete block.

  T coro() {
    %mem = malloc(...);
    if (!coro.fork()) {
      Body Of The Coroutine
      With Suspends and All The Fun
  DeleteBB:
      free(%mem)
      coro.end()
    }
  ReturnBB:
    return some_code_creating_T();
  }

In the original model,
  coro.end is replaced with `no-op` in start, 'ret void' in clones (*)
  suspend is replaced with `br %return` in start, 'ret void' in clones

Let's tweak it. Add an i1 %fallthru parameter to coro.end. Now the behavior
is:
  in clones coro.end behaves as before: replaced with 'ret void' (*)
  in original:
    coro.end(false) => no-op (as before). (*)
    coro.end(true) => 'br %return'

Now the body of the coroutine will look like this:

T coro() {
    %mem = malloc(...);
    if (coro.fork()) {
ReturnBB:
      return some_code_creating_T();
    }
    Body Of The Coroutine
    With Suspends and All The Fun
DeleteBB:
    free(%mem)
    coro.end(true)
    UNREACHABLE
  }

Now I don't think any code that has side effects can sneak up from the
"Body of The Coroutine" into ReturnBB. So we are good, right?

Gor

-----------------
(*) There are two uses for coro.end. One is for the fallthrough point in
the delete block. Another, in unwind code responsible for cleanup in
the original function if there is an exception **BEFORE** coroutine had a
chance to suspend.

coro.end in unwind blocks is replaced with appropriate 'unwind to
caller'
instruction, cutting the rest out.

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Jun 2016 - [RFC] LLVM Coroutines

[llvm-dev] [RFC] LLVM Coroutines

[llvm-dev] [RFC] LLVM Coroutines

[llvm-dev] [RFC] LLVM Coroutines

Reasonably Related Threads