Sebastian Roland via llvm-dev
2019-Jan-29 20:23 UTC
[llvm-dev] Finding label of basic block where a conditional branch merges
Hi, is there any standard way to figure out the label of the basic block where a conditional branch merges? If we have a branch statement without an else part this is straightforward: br i1 %cmp, label %if.then, label %if.end, !dbg !24 We only need to check the operand names whether they contain "end". However things are getting more complex when there is an else branch and in particular when we have nested branches. Take for example the following C code: if (x == NULL || y == NULL) { do { // ... } while (0); } This yields the following IR: br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31 lor.lhs.false: ; preds = %entry %1 = load %struct.s1*, %struct.s1** %s1.addr, align 8, !dbg !32 %cmp1 = icmp eq %struct.s1* %1, null, !dbg !33 br i1 %cmp1, label %if.then, label %if.end, !dbg !34 if.then: ; preds = %lor.lhs.false, %entry br label %do.body, !dbg !35, !llvm.loop !37 do.body: ; preds = %if.then br label %do.end, !dbg !39 do.end: ; preds = %do.body br label %if.end, !dbg !41 if.end: ; preds = %do.end, %lor.lhs.false call void @llvm.dbg.declare(metadata i32* %a, metadata !42, metadata !24), !dbg !43 The question now is how to obtain "if.end" given br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31. My current algorithm basically works for a lot of cases but fails here. What I am doing is to push the branch instruction to a stack and then always use the left most label of the next terminator instruction to move forward. Whenever I encounter a basic block that contains an "end" in its label I pop from the stack, whenever there is a conditional branch I push. If the stack is empty I have reached the correct basic block. For the example given it would however yield "do.end" which is not correct. I am wondering whether there is any known graph algorithm I could use to solve the problem or even better something that is already implemented within LLVM? --Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3992 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190129/d982ce5e/attachment.bin>
David Greene via llvm-dev
2019-Jan-29 20:40 UTC
[llvm-dev] Finding label of basic block where a conditional branch merges
Sebastian Roland via llvm-dev <llvm-dev at lists.llvm.org> writes:> is there any standard way to figure out the label of the basic block > where a conditional branch merges?What do you need to use this block for? The immediate post-dominator will be the block where control essentially re-converges. It is the "nearest" block guaranteed to execute if the block containing the branch executes. See PostDominatorTreeAnalysis. Depending on where you are in the pass pipeline, the CFG maty bear little resemblence to high-level control flow.> If we have a branch statement without an else part this is straightforward: > > br i1 %cmp, label %if.then, label %if.end, !dbg !24 > > We only need to check the operand names whether they contain > "end".This seems extremely brittle. Post-dominators is the compiler-theory-y way to determine this.> However things are getting more complex when there is an else branch > and in particular when we have nested branches. > > Take for example the following C code: > > if (x == NULL || y == NULL) { > do { > // ... > } while (0); > } > > This yields the following IR: > > br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31 > > lor.lhs.false: ; preds = %entry > %1 = load %struct.s1*, %struct.s1** %s1.addr, align 8, !dbg !32 > %cmp1 = icmp eq %struct.s1* %1, null, !dbg !33 > br i1 %cmp1, label %if.then, label %if.end, !dbg !34 > > if.then: ; preds > %lor.lhs.false, %entry > br label %do.body, !dbg !35, !llvm.loop !37 > > do.body: ; preds = %if.then > br label %do.end, !dbg !39 > > do.end: ; preds = %do.body > br label %if.end, !dbg !41 > > if.end: ; preds = %do.end, > %lor.lhs.false > call void @llvm.dbg.declare(metadata i32* %a, metadata !42, metadata > !24), !dbg !43 > > > The question now is how to obtain "if.end" given br i1 %cmp, label > %if.then, label %lor.lhs.false, !dbg !31.if.end post-dominates the block containing the branch in question and it is also the immediate post-dominator (it does not post-dominate any other block that post-dominates the block containing the branch).> I am wondering whether there is any known graph algorithm I could use > to solve the problem or even better something that is already > implemented within LLVM?Post-dominators. :) -David
Cranmer, Joshua via llvm-dev
2019-Jan-29 21:16 UTC
[llvm-dev] Finding label of basic block where a conditional branch merges
The answer to this question depends a lot as to what you mean by "where a conditional branch merges." The immediate post-dominator of a basic block is the first point where all possible paths from that basic block must execute. When your basic block in question is the immediate dominator (or indeed, just any dominator), then that means that you can say that you are the head of an if statement and the immediate post-dominator is the tail of the same if statement. However, when gotos, or goto-like statements (break, continue, return, throw, etc.) are involved, then the tail of an if statement is no longer a postdominator. Note that such statements can be introduced by optimizations such as jump threading. if (...) { return true; } else { ... } // This node is no longer a postdominator because of the return. The question here is what you want to do. If your goal is akin to decompilation, where you want to "ignore" these branches for the purposes of making simple graphs, then the approach is going to be a heuristic-filled approach for which there are no obvious answers. If your goal is instead mere correctness--you want to undo something you did before the if statement--then you're going to have to adapt your algorithm to account for the possibility of these kinds of CFGs. Also note that LLVM does not guarantee the existence of value names--clang, in release builds, does not generate any value names whatsoever, for blocks or values. Any approach relying on clang generating these names is not going to work in such scenarios. -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Sebastian Roland via llvm-dev Sent: Tuesday, January 29, 2019 15:23 To: llvm-dev at lists.llvm.org Subject: [llvm-dev] Finding label of basic block where a conditional branch merges Hi, is there any standard way to figure out the label of the basic block where a conditional branch merges? If we have a branch statement without an else part this is straightforward: br i1 %cmp, label %if.then, label %if.end, !dbg !24 We only need to check the operand names whether they contain "end". However things are getting more complex when there is an else branch and in particular when we have nested branches. Take for example the following C code: if (x == NULL || y == NULL) { do { // ... } while (0); } This yields the following IR: br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31 lor.lhs.false: ; preds = %entry %1 = load %struct.s1*, %struct.s1** %s1.addr, align 8, !dbg !32 %cmp1 = icmp eq %struct.s1* %1, null, !dbg !33 br i1 %cmp1, label %if.then, label %if.end, !dbg !34 if.then: ; preds = %lor.lhs.false, %entry br label %do.body, !dbg !35, !llvm.loop !37 do.body: ; preds = %if.then br label %do.end, !dbg !39 do.end: ; preds = %do.body br label %if.end, !dbg !41 if.end: ; preds = %do.end, %lor.lhs.false call void @llvm.dbg.declare(metadata i32* %a, metadata !42, metadata !24), !dbg !43 The question now is how to obtain "if.end" given br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31. My current algorithm basically works for a lot of cases but fails here. What I am doing is to push the branch instruction to a stack and then always use the left most label of the next terminator instruction to move forward. Whenever I encounter a basic block that contains an "end" in its label I pop from the stack, whenever there is a conditional branch I push. If the stack is empty I have reached the correct basic block. For the example given it would however yield "do.end" which is not correct. I am wondering whether there is any known graph algorithm I could use to solve the problem or even better something that is already implemented within LLVM? --Sebastian
Sebastian Roland via llvm-dev
2019-Jan-29 22:03 UTC
[llvm-dev] Finding label of basic block where a conditional branch merges
Joshua, David much appreciate your quick help! What I am actually doing is statically tracing values. If one of the traced values is part of a condition (e.g. in an if statement) all instructions in the then and else part are also traced. The automatic tracing of instructions however needs to stop when I hit the first instruction after the if statement (same for switch). Dominators seem to be a good starting point! Sebastian> On 29. Jan 2019, at 22:16, Cranmer, Joshua <joshua.cranmer at intel.com> wrote: > > The answer to this question depends a lot as to what you mean by "where a conditional branch merges." > > The immediate post-dominator of a basic block is the first point where all possible paths from that basic block must execute. When your basic block in question is the immediate dominator (or indeed, just any dominator), then that means that you can say that you are the head of an if statement and the immediate post-dominator is the tail of the same if statement. > > However, when gotos, or goto-like statements (break, continue, return, throw, etc.) are involved, then the tail of an if statement is no longer a postdominator. Note that such statements can be introduced by optimizations such as jump threading. > > if (...) { > return true; > } else { > ... > } > // This node is no longer a postdominator because of the return. > > The question here is what you want to do. If your goal is akin to decompilation, where you want to "ignore" these branches for the purposes of making simple graphs, then the approach is going to be a heuristic-filled approach for which there are no obvious answers. If your goal is instead mere correctness--you want to undo something you did before the if statement--then you're going to have to adapt your algorithm to account for the possibility of these kinds of CFGs. > > Also note that LLVM does not guarantee the existence of value names--clang, in release builds, does not generate any value names whatsoever, for blocks or values. Any approach relying on clang generating these names is not going to work in such scenarios. > > -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Sebastian Roland via llvm-dev > Sent: Tuesday, January 29, 2019 15:23 > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] Finding label of basic block where a conditional branch merges > > Hi, > > is there any standard way to figure out the label of the basic block where a conditional branch merges? > > If we have a branch statement without an else part this is straightforward: > > br i1 %cmp, label %if.then, label %if.end, !dbg !24 > > We only need to check the operand names whether they contain "end". > However things are getting > more complex when there is an else branch and in particular when we have nested branches. > > Take for example the following C code: > > if (x == NULL || y == NULL) { > do { > // ... > } while (0); > } > > This yields the following IR: > > br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31 > > lor.lhs.false: ; preds = %entry > %1 = load %struct.s1*, %struct.s1** %s1.addr, align 8, !dbg !32 > %cmp1 = icmp eq %struct.s1* %1, null, !dbg !33 > br i1 %cmp1, label %if.then, label %if.end, !dbg !34 > > if.then: ; preds = %lor.lhs.false, %entry > br label %do.body, !dbg !35, !llvm.loop !37 > > do.body: ; preds = %if.then > br label %do.end, !dbg !39 > > do.end: ; preds = %do.body > br label %if.end, !dbg !41 > > if.end: ; preds = %do.end, %lor.lhs.false > call void @llvm.dbg.declare(metadata i32* %a, metadata !42, metadata !24), !dbg !43 > > > The question now is how to obtain "if.end" given br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31. > > My current algorithm basically works for a lot of cases but fails here. > What I am doing is to push the branch instruction to a stack and then always use the left most label of the next terminator instruction to move forward. Whenever I encounter a basic block that contains an "end" > in its label I pop from the stack, whenever there is a conditional branch I push. If the stack is empty I have reached the correct basic block. For the example given it would however yield "do.end" which is not correct. > > I am wondering whether there is any known graph algorithm I could use to solve the problem or even better something that is already implemented within LLVM? > > --Sebastian >