Hi Eli:>> Block1: >> %0 = call i8 coro.suspend() >> switch i8 %0, label suspend1 [i8 0 %return] ; or icmp + br i1 >> Suspend1: >> switch i8 %0, label %resume1 [i8 1 %destroy1] ; or icmp + br i1 >> >> This doesn't look right: intuitively the suspend happens after the return >> block runs.Perhaps, but, that is not the intended semantics. The following two things are distinct concepts: * Coroutine is considered suspended * Coroutine returns control back to its caller (initial caller, resumer, or destroyer) Coroutine is considered suspended means that it is legal to use coro.done, coro.resume, coro.destroy intrinsics (either in this thread or in a different one). It may seem strange, but, consider this example: task<void> f() { char buf[N]; ... fill buff with stuff await async_send(buf); // <suspend-point> ... do more } IR for await expression will be (conceptually): %0 = coro.save() async_send(%buf) coro.suspend(%0) coro.save is the point when coroutine is considered suspended. This allows async_send to resume the coroutine even before async_send returned control back to f. To make it safe, absolutely nothing should sneak-in between coro.save and coro.suspend beyond what frontend/user put there. Aside: ----- Based on what we discussed earlier, I suspect that optimizer may try to sneak stuff in between coro.save and coro.suspend. If it is the case, I have a solution :-), Richard Smith suggested it earlier to me. Use a distinct function (Just like you said with coro.fork :-)). There will be no distinct coro.save. Instead, coro.suspend will take an optional function pointer parameter and frontend will generate a little function for all the code that needs to go in between coro.save and coro.suspend. f.await(%frame) { %buf = get from %frame async_send(%buf) } in f() coro.suspend([...], &f.await) CoroEarly will do the extraction. Frontend will keep emitting coro.save/coro.suspend pairs. After CoroEarly, coro.save's will be removed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ P A U S E ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Okay getting back to the switch in the return block.>> The switch answers the question of where the control flow actually goes >> after the return block runs.Since control can reenter the coroutine at any point after coro.save and before, I don't think there is any point where putting the switch can help. We already told the optimizer that we have three possible paths out of suspend: return (first time), resume and destroy. It think it is as close to the truth as we can get.>> Another way to put it is that it should be possible to lower a coroutine >> to a thread rather than performing the state machine transformation.That is very good point! It is a great way to think about the semantics. Unfortunately, I cannot avoid doing state machine transformation in general case. For the coroutines with "initial_suspend" point, i.e. it suspends before entering user authored body, it is very easy. future<void> f(Params) { // <initial suspend point> at open curly char buf[N]; ... fill buff with stuff await async_send(buf); // <suspend-point> ... do more } Will be transformed into equivalent of: future<void> f(Params) { promise<void> prom; future<void> result = prom.get_future(); std::thread([prom = std::move(prom), Params = std::move(params)] { char buf[N]; ... fill buff with stuff BLOCK_WAIT async_send(buf); // block on event instead of suspend ... do more }).detach(); return result; } No state machine transformation whatsoever, just capture the parameters and promise and good to go. [No need to involve llvm either :-) it is a simple lambda like transformation that can be done easily in frontend] What if we don't have an initial suspend point? That means that the coroutine executes in the caller thread until it hits the suspend point. Well, we need to build up a coroutine frame, we need to chop up the function at coro.suspends. We need to put an index in a coro frame and a switch in an entry block of resume and suspend (if more than one suspend point). The only difference is what coro.suspend is replaced with: In start: coro.suspend launches a thread and gives f.resume as a start routine and a frame pointer as a context. So, f.resume will be executing concurrently with f slowly finishing up and returning back to the caller. (Btw, that is exactly the semantics we are trying to capture) In resume, coro.suspend is replaced with a blocking call waiting on whatever will signal that the thread can resume. ====================Two function approach ==================== Even though I pushing back against it, I might prototype it and see how it looks anyways. I will outline in CoroEarly and during CoroSplit inline it back so that I can build the coroutine frame properly. But even as I am typing it, it just feels too much. Look what we are trying to solve is this. %0 = coro.suspend() ; A thing should not be hoisted here %first = icmp %0, 0 ; or here br %first, %return, %suspend ; <=== please don't move stuff above me suspend: %1 = icmp %0, 1 br %1, label %resume, label %destroy resume: A Thing That Can Be Hoisted (same as in destroy) [...] destroy: A Thing That Can Be Hoisted (same as in resume) [...] return: Something Else Or maybe %0 = coro.suspend() ; A thing should not be hoisted here %first = coro.is.first(%0) ; or here br %first, %return, %suspend ; <=== please don't move stuff above me suspend: br %0, label %resume, label %destroy resume: A Thing That Can Be Hoisted (same as in destroy) [...] destroy: A Thing That Can Be Hoisted (same as in resume) [...] return: Something Else Would the optimizer really try to stick 'A thing' above br %first? As I mentioned before, I am very new to this and probably missing a lot, but, (apart from outlining a portion of f), is there no simple way of preventing instructions with side effects from resume or destroy blocks being moved above 'br %first branch'? (and if there is no good answer, two function it is. I will stop torturing you with this :-)) Gor
Typo: "and before" should not be in this sentence: Since control can reenter the coroutine at any point after coro.save>>> and before <<<<, I don't think there is any point where putting the switch canhelp This sentence should read: Since control can reenter the coroutine at any point after coro.save, I don't think there is any point where putting the switch can help On Sat, Jun 11, 2016 at 9:11 PM, Gor Nishanov <gornishanov at gmail.com> wrote:> Hi Eli: > >>> Block1: >>> %0 = call i8 coro.suspend() >>> switch i8 %0, label suspend1 [i8 0 %return] ; or icmp + br i1 >>> Suspend1: >>> switch i8 %0, label %resume1 [i8 1 %destroy1] ; or icmp + br i1 >>> >>> This doesn't look right: intuitively the suspend happens after the return >>> block runs. > > Perhaps, but, that is not the intended semantics. > > The following two things are distinct concepts: > > * Coroutine is considered suspended > > * Coroutine returns control back to its caller > (initial caller, resumer, or destroyer) > > Coroutine is considered suspended means that it is legal to use coro.done, > coro.resume, coro.destroy intrinsics (either in this thread or in a different > one). It may seem strange, but, consider this example: > > task<void> f() { > char buf[N]; > ... fill buff with stuff > await async_send(buf); // <suspend-point> > ... do more > } > > IR for await expression will be (conceptually): > > %0 = coro.save() > async_send(%buf) > coro.suspend(%0) > > coro.save is the point when coroutine is considered suspended. > This allows async_send to resume the coroutine even before async_send > returned control back to f. > > To make it safe, absolutely nothing should sneak-in between coro.save > and coro.suspend beyond what frontend/user put there. > > Aside: > ----- > Based on what we discussed earlier, I suspect that optimizer may try > to sneak stuff in between coro.save and coro.suspend. If it is the case, > I have a solution :-), Richard Smith suggested it earlier to me. > Use a distinct function (Just like you said with coro.fork :-)). > > There will be no distinct coro.save. Instead, coro.suspend will take an > optional function pointer parameter and frontend will generate a little > function for all the code that needs to go in between coro.save and > coro.suspend. > > f.await(%frame) { > %buf = get from %frame > async_send(%buf) > } > > in f() > > coro.suspend([...], &f.await) > > CoroEarly will do the extraction. Frontend will keep emitting > coro.save/coro.suspend pairs. After CoroEarly, coro.save's will be removed. > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > P A U S E > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Okay getting back to the switch in the return block. > >>> The switch answers the question of where the control flow actually goes >>> after the return block runs. > > Since control can reenter the coroutine at any point after coro.save > and before, I don't think there is any point where putting the switch can > help. > > We already told the optimizer that we have three possible paths out of suspend: > return (first time), resume and destroy. It think it is as close to the > truth as we can get. > >>> Another way to put it is that it should be possible to lower a coroutine >>> to a thread rather than performing the state machine transformation. > > That is very good point! It is a great way to think about the semantics. > > Unfortunately, I cannot avoid doing state machine transformation in general > case. > > For the coroutines with "initial_suspend" point, i.e. it suspends before > entering user authored body, it is very easy. > > future<void> f(Params) { // <initial suspend point> at open curly > char buf[N]; > ... fill buff with stuff > await async_send(buf); // <suspend-point> > ... do more > } > > Will be transformed into equivalent of: > > future<void> f(Params) { > promise<void> prom; > future<void> result = prom.get_future(); > > std::thread([prom = std::move(prom), Params = std::move(params)] { > char buf[N]; > ... fill buff with stuff > BLOCK_WAIT async_send(buf); // block on event instead of suspend > ... do more > }).detach(); > > return result; > } > > No state machine transformation whatsoever, just capture the parameters and > promise and good to go. [No need to involve llvm either :-) it is a simple > lambda like transformation that can be done easily in frontend] > > What if we don't have an initial suspend point? That means that the > coroutine executes in the caller thread until it hits the suspend point. > Well, we need to build up a coroutine frame, we need to chop up the function > at coro.suspends. We need to put an index in a coro frame and a switch in an > entry block of resume and suspend (if more than one suspend point). > > The only difference is what coro.suspend is replaced with: > > In start: coro.suspend launches a thread and gives f.resume as a start > routine and a frame pointer as a context. So, f.resume will be executing > concurrently with f slowly finishing up and returning back to the caller. > (Btw, that is exactly the semantics we are trying to capture) > > In resume, coro.suspend is replaced with a blocking call waiting on > whatever will signal that the thread can resume. > > ====================> Two function approach > ====================> > Even though I pushing back against it, I might prototype it and see how it > looks anyways. > > I will outline in CoroEarly and during CoroSplit inline it back so that > I can build the coroutine frame properly. > > But even as I am typing it, it just feels too much. > Look what we are trying to solve is this. > > %0 = coro.suspend() > ; A thing should not be hoisted here > %first = icmp %0, 0 > ; or here > br %first, %return, %suspend ; <=== please don't move stuff above me > > suspend: > %1 = icmp %0, 1 > br %1, label %resume, label %destroy > > resume: > A Thing That Can Be Hoisted (same as in destroy) > [...] > > destroy: > A Thing That Can Be Hoisted (same as in resume) > [...] > > return: > Something Else > > Or maybe > > %0 = coro.suspend() > ; A thing should not be hoisted here > %first = coro.is.first(%0) > ; or here > br %first, %return, %suspend ; <=== please don't move stuff above me > > suspend: > br %0, label %resume, label %destroy > > resume: > A Thing That Can Be Hoisted (same as in destroy) > [...] > > destroy: > A Thing That Can Be Hoisted (same as in resume) > [...] > > return: > Something Else > > Would the optimizer really try to stick 'A thing' above br %first? > > As I mentioned before, I am very new to this and probably missing a lot, > but, (apart from outlining a portion of f), is there no simple way of > preventing instructions with side effects from resume or destroy blocks > being moved above 'br %first branch'? > > (and if there is no good answer, two function it is. I will stop > torturing you with this :-)) > > Gor
I think I got it. Original model (with coro.fork and two-way coro.suspend) will work with a tiny tweak. In the original model, I was replacing coro.suspend with br %return in original function. The problem was coming from potential phi-nodes introduces into return block during optimizations. Let's make sure that there is only entry into the return block. In the original model, ReturnBB had two entries, one from coro.fork, another from fall-through from the delete block. T coro() { %mem = malloc(...); if (!coro.fork()) { Body Of The Coroutine With Suspends and All The Fun DeleteBB: free(%mem) coro.end() } ReturnBB: return some_code_creating_T(); } In the original model, coro.end is replaced with `no-op` in start, 'ret void' in clones (*) suspend is replaced with `br %return` in start, 'ret void' in clones Let's tweak it. Add an i1 %fallthru parameter to coro.end. Now the behavior is: in clones coro.end behaves as before: replaced with 'ret void' (*) in original: coro.end(false) => no-op (as before). (*) coro.end(true) => 'br %return' Now the body of the coroutine will look like this: T coro() { %mem = malloc(...); if (coro.fork()) { ReturnBB: return some_code_creating_T(); } Body Of The Coroutine With Suspends and All The Fun DeleteBB: free(%mem) coro.end(true) UNREACHABLE } Now I don't think any code that has side effects can sneak up from the "Body of The Coroutine" into ReturnBB. So we are good, right? Gor ----------------- (*) There are two uses for coro.end. One is for the fallthrough point in the delete block. Another, in unwind code responsible for cleanup in the original function if there is an exception **BEFORE** coroutine had a chance to suspend. coro.end in unwind blocks is replaced with appropriate 'unwind to caller' instruction, cutting the rest out.