章明 via llvm-dev
2017-Nov-11 02:00 UTC
[llvm-dev] Update control flow graph when splitting a machine basic block?
> The right way to update the CFG very much depends on how you're > transforming it.I would like to export the CFG for control flow checking. Theoretically, it should be possible for a compiler to know every target of every control flow instruction, except for computed targets that are not known at compile-time. When a machine basic block is split between two branches, as shown below: BB: ; Old machine basic block shortened by the split ... branch to A ------------------- ; Split here BB': ; New machine basic block created by the split branch to B ... BB' is a successor of BB (if there is no control flow barrier near the end of BB). In addition, targets of terminators like A should be added to the set of successors of BB and possibly removed from the set of successors of BB'. It is not always safe to remove B from the successors of BB', because A and B may be the same machine basic block. If edges in the CFG are interpreted as possible control flow transfers, rather than definite control flow transfers, the most conservative way to update the CFG might be: 1) Let BB' have all successors of the original machine basic block before the split. 2) Add BB' and all successors of BB' to the set of successors of BB. However, this introduce false control flow transfers into the CFG. False control flow transfers are harmful to control flow checking, because different basic blocks have to be assigned the same signature, which is called aliasing. To update the CFG without introducing any false control flow transfers, one needs to determine the target(s) of each terminator. That is what I desire. I am not quite familiar with LLVM, so I am asking whether it is possible to achieve that with the current LLVM API.> Your pseudo-code looks similar to ARMBaseInstrInfo::analyzeBranch.ARMBaseInstrInfo::analyzeBranch also attempts to determine targets of terminators. However, that method is not adequate to achieve what I desire, because it only works for simple conditional/unconditional branches. When it sees something else, e.g., a jump table, it simply reports that the machine basic block cannot be analyzed.
Friedman, Eli via llvm-dev
2017-Nov-11 02:57 UTC
[llvm-dev] Update control flow graph when splitting a machine basic block?
On 11/10/2017 6:00 PM, 章明 wrote:>> The right way to update the CFG very much depends on how you're >> transforming it. > I would like to export the CFG for control flow checking. > Theoretically, it should be possible for a compiler to know every target of every control flow instruction, except for computed targets that are not known at compile-time.Every MachineBasicBlock has a list of successors; you can access it with the successors() accessor. That's what you should be using for any CFG analysis.> When a machine basic block is split between two branches, as shown below: > > BB: ; Old machine basic block shortened by the split > ... > branch to A > ------------------- ; Split here > BB': ; New machine basic block created by the split > branch to B > ... > > BB' is a successor of BB (if there is no control flow barrier near the end of BB). > In addition, targets of terminators like A should be added to the set of successors of BB and possibly removed from the set of successors of BB'. > It is not always safe to remove B from the successors of BB', because A and B may be the same machine basic block. > > If edges in the CFG are interpreted as possible control flow transfers, rather than definite control flow transfers, the most conservative way to update the CFG might be: > 1) Let BB' have all successors of the original machine basic block before the split. > 2) Add BB' and all successors of BB' to the set of successors of BB. > However, this introduce false control flow transfers into the CFG. False control flow transfers are harmful to control flow checking, because different basic blocks have to be assigned the same signature, which is called aliasing. > > To update the CFG without introducing any false control flow transfers, one needs to determine the target(s) of each terminator. > That is what I desire. I am not quite familiar with LLVM, so I am asking whether it is possible to achieve that with the current LLVM API.I don't think this actually has any impact in practice; I mean, I guess it's an issue in theory, but in practice we don't stick branches into the middle of basic blocks. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
章明 via llvm-dev
2017-Nov-11 08:20 UTC
[llvm-dev] Update control flow graph when splitting a machine basic block?
Thank you for your reply!> Every MachineBasicBlock has a list of successors; you can access it with > the successors() accessor. That's what you should be using for any CFG > analysis.I am aware of these methods of class MachineBasicBlock, which allows one to access a MachineBasicBlock's successors and predecessors in the CFG. But the CFG itself may no longer be valid if a MachineBasicBlock is split between two control flow instructions. The accessors of class MachineBasicBlock do not automatically update the CFG. So there is no way to access the up-to-date CFG.> I don't think this actually has any impact in practice; I mean, I guess > it's an issue in theory, but in practice we don't stick branches into > the middle of basic blocks.I did not expect a branch in the middle of a basic block either, until yesterday LLVM Release 4.0.0 produced the following machine basic block before the pass ARMConstantIslands is run: bb.1.if.end: successors: %bb.3.for.body(0x80000000) liveins: %r4 %r0 = tMOVr %r4, 14, _, debug-location !23 tBL 14, _, $__aeabi_i2d, csr_aapcs, implicit-def dead %lr, implicit %sp, implicit %r0, implicit-def %sp, implicit-def %r0, implicit-def %r1, debug-location !23 tBL 14, _, @sqrt, csr_aapcs, implicit-def dead %lr, implicit %sp, implicit %r0, implicit %r1, implicit-def %sp, implicit-def %r0, implicit-def %r1, debug-location !24 tBL 14, _, $__aeabi_d2iz, csr_aapcs, implicit-def dead %lr, implicit %sp, implicit %r0, implicit %r1, implicit-def %sp, implicit-def %r0, debug-location !25 DBG_VALUE 2, 0, !17, !18, debug-location !27 DBG_VALUE debug-use %r0, debug-use _, !16, !18, debug-location !26 tCMPi8 %r0, 2, 14, _, implicit-def %cpsr, debug-location !32 t2IT 11, 28, implicit-def %itstate %r0 = tMOVi8 _, 1, 11, %cpsr, implicit %r0, implicit %itstate tPOP_RET 11, %cpsr, def %r4, def %r6, def %r7, def %pc, implicit %r0, implicit %r4, implicit killed %itstate, debug-location !44 %r1 = t2MOVi 2, 14, _, _ t2B %bb.3.for.body, 14, _ Note that a terminator tPOP_RET is before a non-terminator t2MOVi. The command line to produce this is as follows: llc -mtriple=thumbv7m-none-none-eabi -mcpu=cortex-m3 -O1 -stop-before=arm-cp-islands -o prime-factorize.mir prime-factorize.ll Attached are the input file prime-factorize.ll and output file prime-factorize.mir. The machine basic block above violates my previous assumption that only terminators and debug instructions may appear after the first terminator in a machine basic block. I don't know how "bad" this could be, i.e., how many non-terminators in a program could be generated between two terminators. So I have to consider split such basic blocks before I could instrument the program with control flow checks. I don't know whether this is a bug or not. If it is not a bug, this apparently isn't a purely theoretical issue. Best regards! Ming Zhang