Gor Nishanov via llvm-dev
2016-Jul-15 02:52 UTC
[llvm-dev] RFC: Coroutine Optimization Passes
Hi all: I've included below a brief description of coroutine related optimization passes and some questions/thoughts related to them. Looking forward to your feedback, comments and questions. Thank you! Roadmap: =======1) Get agreement on coroutine representation and overall direction. .. repeat 1) until happy http://lists.llvm.org/pipermail/llvm-dev/2016-June/100838.html (Initial) http://lists.llvm.org/pipermail/llvm-dev/2016-July/102133.html (Round 2) 2) Get agreement on how coroutine transformation passes integrate into the optimizer pipeline. (this mail) <=== WE ARE HERE .. repeat 2) until happy 3) update IR/Intrinsics.td + doc/Coroutines.rst + doc/LangRef.rst https://github.com/GorNishanov/llvm/blob/coro-rfc/docs/Coroutines.rst 4) trickle in coroutine transformation passes + tests in digestible chunks. 5) get clang changes in (sometime after step 3). 6) fix bugs / remove limitations / add functionality to coroutine passes .. repeat 6) until happy Overview: ========LLVM coroutines are functions that have one or more `suspend points`. When a suspend point is reached, the execution of a coroutine is suspended and control is returned back to its caller. A suspended coroutine can be resumed to continue execution from the last suspend point or it can be destroyed. CoroSplit Pass: (CGSCC Pass @ new extension point EP_CGSCCOptimizerLate) --------------- Overall idea is that a coroutine is presented to an llvm as an ordinary function with suspension points marked up with intrinsics. We let the optimizer party on the coroutine as a single function for as long as possible. Shortly before the coroutine is eligible to be inlined into its callers, we outline parts of the coroutine that correspond to the code that needs to get executed after the coroutine is resumed or destroyed. We also create a struct that will keep the objects that need to persist across suspend points. This transformation is done by CoroSplit CGSCC pass run as a part of the IPO optimizations pipeline. CoroElide Pass: (Function Pass @ EP_ScalarOptimizerLate extension point) --------------- The coroutine resumption intrinsics get replaced with direct calls to those outlined functions where possible. Then inliner can inline much leaner and nicer coroutine or any of the outlined parts as desired. If we discover that the lifetime of a coroutine is fully enclosed in the lifetime of the caller, we remove dynamic allocation of coroutine state and replace it with an `alloca` on the caller's frame. These optimizations are done by a function pass CoroElide run by the main IPO optimization at EP_ScalarOptimizerLate extension point. CoroEarly Pass: (Function Pass @ EP_EarlyAsPossible) --------------- Pretty boring pass that lowers coroutine intrinsics that are not needed for later coroutine passes. CoroCleanup Pass: (Function Pass @ EP_OptimizerLast) -------------- Another boring pass that lowers all remaining coroutine intrinsics that were not processed by earlier coroutine passes. Questions / Concerns / Thoughts: =============================== Coroutine attribute or not. --------------------------- I added a AttrKind::coroutine to flag the pre-split coroutine. The intention is to lessen the impact of CoroSpit pass since it can simply check for an attribute to learn if there is any work to be done on a function or not. Without the attribute, it would need to examine every functions body to see if it has an llvm.coro.begin. On the other hand, three other coroutine passes have to look at every function body anyway to lower other coroutine related intrinsics, so we only saving a little. Another negative aspect of having this attribute is that there is a possibility of other passes checking if a function is a coroutine or not a doing and doing something differently if it is. I'd like to avoid this development and keep all coroutine logic in the coroutine passes and not affect other optimizations passes. Currently I am leaning towards removing this attribute. Integration with CGSCC pipeline. -------------------------------- As mentioned in the introduction, I would like to run entire IPO pipeline on pre-split coroutine, then, split it, add the new functions to the SCC and repeat the pipeline on the updated SCC before proceeding to the next SCC. To facilitate that, CoroSplit pass ignores the coroutine the first time it sees it, and requests restart of the pipeline and splits the coroutine when it sees it the second time around. There are at least four different ways how I can request a restart of the pipeline, and I would like to solicit your advice on which one to pick. Option 1: Force revisit of the SCC if there are any coroutine in it. This makes CGPassManager aware of a coroutine attribute and keep restarting the pipeline until no functions in SCC has coroutine attribute on them. https://reviews.llvm.org/D21569 Option 2: Add a ref parameter bool& Devirt parameter to runSCC virtual method. If a pass sets Devirt to true, it indicates to CGPassManager that the changes were made that require restart of the pipeline. https://reviews.llvm.org/D21570 Option 3: Similar to Option 2, but instead of an extra parameter to runSCC, I add another virtual function to CGSCC pass: virtual bool restartRequested() const { return false; } a pass can override it and return true, if restart is required. Option 4: CoroSplit can add a fake indirect call to be replaced later with a direct call in a CoroElide function pass. CGPassManager::RefreshCallGraph will detect devirtualization and restart the pipeline. Hacky, but, no changes to CGPassManager required. Out of these four options, I have a slight preference for 2 or 3, but, I would like to hear your opinion. Thoughts on ThinLTO and LTO ========================== CoroEarly can run during regular compilation. CoroSplit, CoroEarly and CoroCleanup would run during the link optimization step to take advantage of more inlining opportunities. This is all for now. All the feedback and comments are greatly appreciated!!! Thank you, Gor
David Majnemer via llvm-dev
2016-Jul-15 03:39 UTC
[llvm-dev] RFC: Coroutine Optimization Passes
On Thu, Jul 14, 2016 at 7:52 PM, Gor Nishanov via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi all: > > I've included below a brief description of coroutine related optimization > passes and some questions/thoughts related to them. Looking forward to your > feedback, comments and questions. > > Thank you! > > Roadmap: > =======> 1) Get agreement on coroutine representation and overall direction. > .. repeat 1) until happy > > http://lists.llvm.org/pipermail/llvm-dev/2016-June/100838.html (Initial) > http://lists.llvm.org/pipermail/llvm-dev/2016-July/102133.html (Round 2) > > 2) Get agreement on how coroutine transformation passes integrate into the > optimizer pipeline. (this mail) <=== WE ARE HERE > .. repeat 2) until happy > > 3) update IR/Intrinsics.td + doc/Coroutines.rst + doc/LangRef.rst > > https://github.com/GorNishanov/llvm/blob/coro-rfc/docs/Coroutines.rst > > 4) trickle in coroutine transformation passes + tests in digestible chunks. > 5) get clang changes in (sometime after step 3). > 6) fix bugs / remove limitations / add functionality to coroutine passes > .. repeat 6) until happy > > Overview: > ========> LLVM coroutines are functions that have one or more `suspend points`. > When a suspend point is reached, the execution of a coroutine is suspended > and > control is returned back to its caller. A suspended coroutine can be > resumed > to continue execution from the last suspend point or it can be destroyed. > > CoroSplit Pass: (CGSCC Pass @ new extension point EP_CGSCCOptimizerLate) > --------------- > Overall idea is that a coroutine is presented to an llvm as an ordinary > function > with suspension points marked up with intrinsics. We let the optimizer > party > on the coroutine as a single function for as long as possible. Shortly > before > the coroutine is eligible to be inlined into its callers, we outline parts > of > the coroutine that correspond to the code that needs to get executed after > the > coroutine is resumed or destroyed.Seems reasonable. How do you deal with basic blocks which appear to be used by multiple parts of the coroutine? We handled this in WinEHPrepare by cloning any BBs which were shared.> We also create a struct that will keep the > objects that need to persist across suspend points. This transformation is > done by CoroSplit CGSCC pass run as a part of the IPO optimizations > pipeline. > > CoroElide Pass: (Function Pass @ EP_ScalarOptimizerLate extension point) > --------------- > The coroutine resumption intrinsics get replaced with direct calls to those > outlined functions where possible. Then inliner can inline much leaner and > nicer > coroutine or any of the outlined parts as desired. > > If we discover that the lifetime of a coroutine is fully enclosed in the > lifetime of the caller, we remove dynamic allocation of coroutine state and > replace it with an `alloca` on the caller's frame. > > These optimizations are done by a function pass CoroElide run by the main > IPO > optimization at EP_ScalarOptimizerLate extension point. > > CoroEarly Pass: (Function Pass @ EP_EarlyAsPossible) > --------------- > Pretty boring pass that lowers coroutine intrinsics that are not needed for > later coroutine passes. > > CoroCleanup Pass: (Function Pass @ EP_OptimizerLast) > -------------- > Another boring pass that lowers all remaining coroutine intrinsics that > were not > processed by earlier coroutine passes. > > Questions / Concerns / Thoughts: > ===============================> > Coroutine attribute or not. > --------------------------- > I added a AttrKind::coroutine to flag the pre-split coroutine. The > intention is > to lessen the impact of CoroSpit pass since it can simply check for an > attribute > to learn if there is any work to be done on a function or not. Without the > attribute, it would need to examine every functions body to see if it has > an > llvm.coro.begin. On the other hand, three other coroutine passes have to > look at > every function body anyway to lower other coroutine related intrinsics, so > we > only saving a little. Another negative aspect of having this attribute is > that > there is a possibility of other passes checking if a function is a > coroutine or > not a doing and doing something differently if it is. I'd like to avoid > this > development and keep all coroutine logic in the coroutine passes and not > affect > other optimizations passes. Currently I am leaning towards removing this > attribute. >I would remove the attribute. There are all sorts of tricks you can do to avoid scanning the function for calls to the intrinsic. For example, you can see if a declaration of your intrinsic exists and, if so, if it has an users in the function in question (under the assumption that there are few).> > Integration with CGSCC pipeline. > -------------------------------- > > As mentioned in the introduction, I would like to run entire IPO pipeline > on > pre-split coroutine, then, split it, add the new functions to the SCC and > repeat the pipeline on the updated SCC before proceeding to the next SCC. > > To facilitate that, CoroSplit pass ignores the coroutine the first time it > sees > it, and requests restart of the pipeline and splits the coroutine when it > sees > it the second time around. There are at least four different ways how I can > request a restart of the pipeline, and I would like to solicit your advice > on > which one to pick. > > Option 1: Force revisit of the SCC if there are any coroutine in it. This > makes > CGPassManager aware of a coroutine attribute and keep restarting the > pipeline > until no functions in SCC has coroutine attribute on them. > > https://reviews.llvm.org/D21569 > > Option 2: Add a ref parameter bool& Devirt parameter to runSCC virtual > method. > If a pass sets Devirt to true, it indicates to CGPassManager that the > changes > were made that require restart of the pipeline. > > https://reviews.llvm.org/D21570 > > Option 3: Similar to Option 2, but instead of an extra parameter to > runSCC, I > add another virtual function to CGSCC pass: > > virtual bool restartRequested() const { return false; } > > a pass can override it and return true, if restart is required. > > Option 4: CoroSplit can add a fake indirect call to be replaced later with > a > direct call in a CoroElide function pass. > CGPassManager::RefreshCallGraph will > detect devirtualization and restart the pipeline. Hacky, but, no changes > to > CGPassManager required. > > Out of these four options, I have a slight preference for 2 or 3, but, I > would > like to hear your opinion. > > Thoughts on ThinLTO and LTO > ==========================> > CoroEarly can run during regular compilation. CoroSplit, CoroEarly and > CoroCleanup would run during the link optimization step to take advantage > of > more inlining opportunities. > > This is all for now. > All the feedback and comments are greatly appreciated!!! > > Thank you, > Gor > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160714/1966955e/attachment.html>
Gor Nishanov via llvm-dev
2016-Jul-15 04:28 UTC
[llvm-dev] RFC: Coroutine Optimization Passes
Hi David:>> How do you deal with basic blocks which appear to be used by multiple parts >> of the coroutine? We handled this in WinEHPrepare by cloning any BBs which >> were shared.I experimented with several approaches, but, cloning ended up being the simplest and most reliable. Suspend points express three different control flows that can happen at the suspend point: a suspension (default), resumption (0) and destruction (1). %0 = call i8 @llvm.coro.suspend([..]) switch i8 %0, label %suspend [i8 0, label %resume, i8 1, label %destroy] I slap a switch that jumps to all suspend points in the function. Then I clone the function twice (one will become resume clone, and another destroy clone). This switch becomes the entry block in the clones. Then, I RAUW coro.suspends with -1, 0, or 1 (in original, resume and destroy clones respectively) and let SimplifyCFG do the rest. (This is slightly simplified explanation, but it should give the idea).>> I would remove the attribute. There are all sorts of tricks you can do to >> avoid scanning the function for calls to the intrinsic. For example, you >> can see if a declaration of your intrinsic exists and, if so, if it has an >> users in the function in question (under the assumption that there are few).Aye-aye. Will remove the attribute. With respect to lessening the impact of coroutine passes, one approach I tried was to look during doInitialize whether there are any uses of coroutine intrinsics and set a flag if there are any, or maybe build a set of functions with coroutines intrinsics in doInitialize, so that in runOnFunction, I can just check whether the function is in the set and skip if it is not. Then, I scared myself silly that some optimization passes can materialize new functions or new function bodies and I will miss them. So I stopped doing that. I think your approach takes care of my "materialization" concern. (BTW, I don't even know if that is a real concern). But may not be profitable if there are a lot of small functions that are faster to scan for coroutine intrinsics then to scan potentially longer list of coroutine intrinsics users. BTW, Do you have a preference on how to restart CGSCC pipeline? One of the four options I listed (repeated in P.S), or even some better way I did not think about? Thank you, Gor P.S. Option 1: https://reviews.llvm.org/D21569 (no longer relevant, since we are removing AttrKind::Coroutine) Option 2: https://reviews.llvm.org/D21570 (bool& Devirt in runSCC) Option 3: https://reviews.llvm.org/D21572 (virtual bool restatedRequested()) Option 4: Fake devirtualized call in a function pass, so that RefreshSCC will detect devirtualization and restart the pipeline by itself.