(Moving this discussion on list as this could be of general interest.) My current work-in-progress implementation is attempting to map out the blocks used by a landing pad before it starts outlining. It creates a table of catch and cleanup handlers with the block at which each one starts. During outlining I intend to have another mechanism to check to see if we’ve already outlined the handler at the specified start block (which handles the case of handlers shared between landing pads) and if not starts the outlining process at that block. For cleanup handlers I’m treating any catch dispatch block (identified by looking for the eh_typeid_for+cmp+conditional branch pattern) as a stopping point for the cloning process. Here’s a snippet of comments explaining how I am doing the block mapping. // The landing pad sequence will have this basic shape: // // <cleanup handler> // <selector comparison> // <catch handler> // <cleanup handler> // <selector comparison> // <catch handler> // <cleanup handler> // ... // // Any of the cleanup slots may be absent. The cleanup slots may be occupied by // any arbitrary control flow, but all paths through the cleanup code must // eventually reach the next selector comparison and no path can skip to a // different selector comparisons, though some paths may terminate abnormally. // Therefore, we will use a depth first search from the start of any given // cleanup block and stop searching when we find the next selector comparison. // // The catch handlers may also have any control structure, but we are only // interested in the start of the catch handlers, so we don't need to actually // follow the flow of the catch handlers. The start of the catch handlers can // be located from the compare instructions, but they can be skipped in the // flow by following the contrary branch. // This has a flaw in that cleanup code could contain blocks that are shared with control flows outside the cleanup block and use phi-dependent branching. If the shared flow isn’t part of another cleanup handler this is only an efficiency problem as the non-cleanup flow will reach a landingpad or a function terminator before it hits a selector. However, if the shared block is shared with another cleanup handler, it could lead to a catch dispatch that has nothing to do with the current block. It turns out that the cloning code handles this problem of shared blocks really well. It just clones everything and prunes the unwanted code paths based on phi resolution and subsequent instruction simplification. I suppose I could clone the entire landing pad to do the block mapping and just discard the clone when I’m done, but that seems really horrific as a way of resolving this corner case problem. What I’d like is to have a way to determine whether the successors of a block are phi-dependent and if so find a way to limit the search to paths that result from predecessors I’ve already seen. I can do this easily enough for simple phi dependence (like constant phi values and a single comparison to choose a branch target) but I haven’t tried to see if the process I have in mind work would for arbitrarily complex cases. If you have any suggestions on this (like maybe a “isSuccessorPhiDependent()” function tucked away somewhere that I haven’t noticed), I’d love to hear them. -Andy From: Reid Kleckner [mailto:rnk at google.com] Sent: Friday, February 13, 2015 12:53 PM To: Kaylor, Andrew Cc: David Majnemer Subject: Re: C++ exception handling review One thing I'm curious about is how are we going to identify cleanup regions? Will we figure that out organically with an up front analysis, or what? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150213/b2580b88/attachment.html>
On Fri, Feb 13, 2015 at 1:37 PM, Kaylor, Andrew <andrew.kaylor at intel.com> wrote:> (Moving this discussion on list as this could be of general interest.) > > > > My current work-in-progress implementation is attempting to map out the > blocks used by a landing pad before it starts outlining. It creates a > table of catch and cleanup handlers with the block at which each one > starts. During outlining I intend to have another mechanism to check to > see if we’ve already outlined the handler at the specified start block > (which handles the case of handlers shared between landing pads) and if not > starts the outlining process at that block. For cleanup handlers I’m > treating any catch dispatch block (identified by looking for the > eh_typeid_for+cmp+conditional branch pattern) as a stopping point for the > cloning process. > > > > Here’s a snippet of comments explaining how I am doing the block mapping. > > > > // The landing pad sequence will have this basic shape: > > // > > // <cleanup handler> > > // <selector comparison> > > // <catch handler> > > // <cleanup handler> > > // <selector comparison> > > // <catch handler> > > // <cleanup handler> > > // ... > > // > > // Any of the cleanup slots may be absent. The cleanup slots may be > occupied by > > // any arbitrary control flow, but all paths through the cleanup code must > > // eventually reach the next selector comparison and no path can skip to a > > // different selector comparisons, though some paths may terminate > abnormally. > > // Therefore, we will use a depth first search from the start of any given > > // cleanup block and stop searching when we find the next selector > comparison. > > // > > // The catch handlers may also have any control structure, but we are only > > // interested in the start of the catch handlers, so we don't need to > actually > > // follow the flow of the catch handlers. The start of the catch handlers > can > > // be located from the compare instructions, but they can be skipped in the > > // flow by following the contrary branch. > > // > > > > This has a flaw in that cleanup code could contain blocks that are shared > with control flows outside the cleanup block and use phi-dependent > branching. If the shared flow isn’t part of another cleanup handler this > is only an efficiency problem as the non-cleanup flow will reach a > landingpad or a function terminator before it hits a selector. However, if > the shared block is shared with another cleanup handler, it could lead to a > catch dispatch that has nothing to do with the current block. > > > > It turns out that the cloning code handles this problem of shared blocks > really well. It just clones everything and prunes the unwanted code paths > based on phi resolution and subsequent instruction simplification. I > suppose I could clone the entire landing pad to do the block mapping and > just discard the clone when I’m done, but that seems really horrific as a > way of resolving this corner case problem. What I’d like is to have a way > to determine whether the successors of a block are phi-dependent and if so > find a way to limit the search to paths that result from predecessors I’ve > already seen. I can do this easily enough for simple phi dependence (like > constant phi values and a single comparison to choose a branch target) but > I haven’t tried to see if the process I have in mind work would for > arbitrarily complex cases. If you have any suggestions on this (like maybe > a “isSuccessorPhiDependent()” function tucked away somewhere that I haven’t > noticed), I’d love to hear them. >I think complex cases are unlikely and should be handled conservatively by cloning. A good example of the simple case is the way we lower __finally. Internally Clang has the invariant that it cannot double-emit local declarations twice while emitting a function. So for this test case, we had to make the exceptional cleanup path branch into normal path with a phi, and then have a conditional branch back over to the exceptional path. // C++: void f() { __try { might_crash(); } __finally { mylabel: // better not emit twice! int r; // better not emit twice! r = do_cleanup(); if (!r) goto mylabel; } } // Simplified IR: define void @f() { %abnormal_termination = alloca i8 invoke void @might_crash() to label %cont unwind label %lpad cont: store i8 0, i8* %abnormal_termination br label %__finally lpad: landingpad { i8*, i32 } personality i32 (...)* @__C_specific_handler cleanup store i8 1, i8* %abnormal_termination br label %__finally __finally: ; code simplified for brevity %r = call i1 @do_cleanup() br i1 %r, label %__finally, label %done done: %ab = load i8* %abnormal_termination %tobool = icmp eq i8 %ab, i8 0 br i1 %tobool, label %ret, label %resume ret: ret void resume: resume } You can see at -O0 we have these loads and stores, but at -O1 we'll have a phi-dependent branch that's really easy to analyze. It'd be good if the preparation pass could understand this much, and no more. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150213/f00fac15/attachment.html>
I’m not sure I understand what you are suggesting in terms of how this should be handled. I agree that the cleanup code is unlikely to be complex, but obviously we still need some way to handle any cases that unlikely but possible. I hadn’t even thought about what might be in a __finally block. I was thinking about destructors being inlined. Obviously that puts a practical limit on what will be there, but not really a theoretical limit. So I’m looking for an implementation that will handle the typical cases efficiently but still not break if the very rare case turns up. It occurs to me that maybe the depth first search I was planning is exactly wrong in this case. Given a cleanup handler that goes into a shared block and can either branch back out to a block that belongs to the landing pad or branch to a block at some arbitrary point in the normal flow, the path we’re looking for is almost certainly going to meet the selector comparison or a resume within a block or two. So a breadth-first search would handle this case better and probably is reasonable for the common case (which usually won’t have any branches at all). There are two scenarios I have in mind here. 1) A shared block that might branch into normal code: lpad: landingpad … cleanup catch … … br label %shared.block shared.block: %dst = phi i1 [1, %normal.cont], [0, %lpad] ; do something that happens in cleanup and in regular code … br i1 %dst, label %normal.cont2, label %catch.dispatch catch.dispatch: That sort of PHI dependence is very easy to handle, but can we count on it always being that simple? It seems like in theory there could be an arbitrary use-def chain between the PHI node and the branch condition that we may or may not be able to resolve at compile-time. I wouldn’t be happy with whoever created such an atrocity, but I didn’t feel like we could just assume it can’t happen. If we follow the branch to “normal.cont2” it could conceivably take us through the entire CFG for the function. Obviously we want to avoid that. 2) A block shared between unconnected landing pads lpad1: landingpad … cleanup catch <some type> … … br label %shared.block lpad2: landingpad … cleanup catch <some other type> … … br label %shared.block shared.block: %dst = phi i1 [1, %lpad1], [0, %lpad2] ; do something that happens in both cleanup handlers … br i1 %dst, label %lpad1.catch.dispatch, label %lpad2.catch.dispatch lpad1.catch.dispatch: … lpad2.catch.dispatch: … I haven’t tried to devise code that would lead to this, but my intuition is that this kind of sharing is more likely than the other and if it does happen it could be problematic. Again, the PHI-dependence here is easy to sort out here and it would be trivial to determine which branch we should follow. But if a case arises where we can’t figure out which predecessor leads to which successor, what I had planned just won’t work. I’d like to believe that all PHI-dependent control flow that arises from block merging will always be simple and that if anyone tried to do something more complex someone would ask them not to, but I worry that you never know what is going to happen over the course of many transformation passes. Like, imagine: shared.block: %foo = phi i1 [1, %lpad1], [0, %lpad2] … %fubar = call i1 @bar(i1 %foo) … br i1 %fubar, label %dispatch1, label %dispatch2 Am I overthinking this? -Andy From: Reid Kleckner [mailto:rnk at google.com] Sent: Friday, February 13, 2015 2:06 PM To: Kaylor, Andrew Cc: David Majnemer; LLVM Developers Mailing List Subject: Re: C++ exception handling On Fri, Feb 13, 2015 at 1:37 PM, Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at intel.com>> wrote: (Moving this discussion on list as this could be of general interest.) My current work-in-progress implementation is attempting to map out the blocks used by a landing pad before it starts outlining. It creates a table of catch and cleanup handlers with the block at which each one starts. During outlining I intend to have another mechanism to check to see if we’ve already outlined the handler at the specified start block (which handles the case of handlers shared between landing pads) and if not starts the outlining process at that block. For cleanup handlers I’m treating any catch dispatch block (identified by looking for the eh_typeid_for+cmp+conditional branch pattern) as a stopping point for the cloning process. Here’s a snippet of comments explaining how I am doing the block mapping. // The landing pad sequence will have this basic shape: // // <cleanup handler> // <selector comparison> // <catch handler> // <cleanup handler> // <selector comparison> // <catch handler> // <cleanup handler> // ... // // Any of the cleanup slots may be absent. The cleanup slots may be occupied by // any arbitrary control flow, but all paths through the cleanup code must // eventually reach the next selector comparison and no path can skip to a // different selector comparisons, though some paths may terminate abnormally. // Therefore, we will use a depth first search from the start of any given // cleanup block and stop searching when we find the next selector comparison. // // The catch handlers may also have any control structure, but we are only // interested in the start of the catch handlers, so we don't need to actually // follow the flow of the catch handlers. The start of the catch handlers can // be located from the compare instructions, but they can be skipped in the // flow by following the contrary branch. // This has a flaw in that cleanup code could contain blocks that are shared with control flows outside the cleanup block and use phi-dependent branching. If the shared flow isn’t part of another cleanup handler this is only an efficiency problem as the non-cleanup flow will reach a landingpad or a function terminator before it hits a selector. However, if the shared block is shared with another cleanup handler, it could lead to a catch dispatch that has nothing to do with the current block. It turns out that the cloning code handles this problem of shared blocks really well. It just clones everything and prunes the unwanted code paths based on phi resolution and subsequent instruction simplification. I suppose I could clone the entire landing pad to do the block mapping and just discard the clone when I’m done, but that seems really horrific as a way of resolving this corner case problem. What I’d like is to have a way to determine whether the successors of a block are phi-dependent and if so find a way to limit the search to paths that result from predecessors I’ve already seen. I can do this easily enough for simple phi dependence (like constant phi values and a single comparison to choose a branch target) but I haven’t tried to see if the process I have in mind work would for arbitrarily complex cases. If you have any suggestions on this (like maybe a “isSuccessorPhiDependent()” function tucked away somewhere that I haven’t noticed), I’d love to hear them. I think complex cases are unlikely and should be handled conservatively by cloning. A good example of the simple case is the way we lower __finally. Internally Clang has the invariant that it cannot double-emit local declarations twice while emitting a function. So for this test case, we had to make the exceptional cleanup path branch into normal path with a phi, and then have a conditional branch back over to the exceptional path. // C++: void f() { __try { might_crash(); } __finally { mylabel: // better not emit twice! int r; // better not emit twice! r = do_cleanup(); if (!r) goto mylabel; } } // Simplified IR: define void @f() { %abnormal_termination = alloca i8 invoke void @might_crash() to label %cont unwind label %lpad cont: store i8 0, i8* %abnormal_termination br label %__finally lpad: landingpad { i8*, i32 } personality i32 (...)* @__C_specific_handler cleanup store i8 1, i8* %abnormal_termination br label %__finally __finally: ; code simplified for brevity %r = call i1 @do_cleanup() br i1 %r, label %__finally, label %done done: %ab = load i8* %abnormal_termination %tobool = icmp eq i8 %ab, i8 0 br i1 %tobool, label %ret, label %resume ret: ret void resume: resume } You can see at -O0 we have these loads and stores, but at -O1 we'll have a phi-dependent branch that's really easy to analyze. It'd be good if the preparation pass could understand this much, and no more. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150213/474cdfe2/attachment.html>