Bjorn De Sutter
2012-Oct-31 20:13 UTC
[LLVMdev] : Predication on SIMD architectures and LLVM
Hi all, I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we support or, and, and conditional predicates (see Scott Mahlke's papers on this issue) as supported by the OpenIMPACT compiler (=Trimaran). If anyone is interested, I can show some of the code. It is rather ad-hoc, however, so it is not at all ready for integration in the trunk (I think). The problem we are still facing is that this predication works post instruction selection and post register allocation. This is problematic because some of the earlier optimizations such as loop unrolling should ideally be applied on if-converted code, on which it is easier to judge the opportunities for, e.g., modulo scheduling and initiation interval constraints (such as ResMII, RecMII). In my view, the ideal would be to have very generic, full (OpenIMPACT-like) predication support throughout LLVM, with the option of enabling/skipping early if-conversion just like one can enable or disable aggressive inlining. Best, Bjorn De Sutter Computer Systems Lab Ghent University
On Wed, Oct 31, 2012 at 09:13:43PM +0100, Bjorn De Sutter wrote:> Hi all, > > I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we support or, and, and conditional predicates (see Scott Mahlke's papers on this issue) as supported by the OpenIMPACT compiler (=Trimaran). If anyone is interested, I can show some of the code. It is rather ad-hoc, however, so it is not at all ready for integration in the trunk (I think). >I would be interested in looking at the code, since Southern Islands GPUs also require predicates for nested conditions. -Tom> The problem we are still facing is that this predication works post instruction selection and post register allocation. This is problematic because some of the earlier optimizations such as loop unrolling should ideally be applied on if-converted code, on which it is easier to judge the opportunities for, e.g., modulo scheduling and initiation interval constraints (such as ResMII, RecMII). > > In my view, the ideal would be to have very generic, full (OpenIMPACT-like) predication support throughout LLVM, with the option of enabling/skipping early if-conversion just like one can enable or disable aggressive inlining. > > Best, > > Bjorn De Sutter > Computer Systems Lab > Ghent University > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Bjorn De Sutter
2012-Nov-01 15:15 UTC
[LLVMdev] : Predication on SIMD architectures and LLVM
Hi Tom, here is the main part of the patch. It requires some additional helper methods in the backend, but the meaning of those will be clear from this example I think. By the way, this patch has only been tested on our own backend, of which we know where it can and cannot insert predicates. For one thing, we only insert predicates for if-conversion, in which case the predicate registers are only used locally. My code might be buggy under different assumptions. Best, Bjorn De Sutter Computer Systems Lab Ghent University Index: IfConversion.cpp ==================================================================--- IfConversion.cpp (revision 167230) +++ IfConversion.cpp (working copy) @@ -64,6 +64,9 @@ STATISTIC(NumDupBBs, "Number of duplicated blocks"); STATISTIC(NumUnpred, "Number of true blocks of diamonds unpredicated"); +//#undef DEBUG +//#define DEBUG(x) x + namespace { class IfConverter : public MachineFunctionPass { enum IfcvtKind { @@ -654,6 +657,9 @@ BBI.ExtraCost = 0; BBI.ExtraCost2 = 0; BBI.ClobbersPred = false; + + std::vector<MachineOperand> PredUses; // predicates used in the block ADDED BY BJORN + std::vector<MachineOperand> PredDefs; // predicates certainly defined in the block ADDED BY BJORN for (MachineBasicBlock::iterator I = BBI.BB->begin(), E = BBI.BB->end(); I != E; ++I) { if (I->isDebugValue()) @@ -662,7 +668,8 @@ if (I->isNotDuplicable()) BBI.CannotBeCopied = true; - bool isPredicated = TII->isPredicated(I); + PredUses.clear(); + bool isPredicated = TII->isPredicatedRecordUse(I,PredUses); // add guarding predicate to PredUses ADDED BY BJORN bool isCondBr = BBI.IsBrAnalyzable && I->isConditionalBranch(); if (!isCondBr) { @@ -675,11 +682,53 @@ BBI.ExtraCost += NumCycles-1; BBI.ExtraCost2 += ExtraPredCost; } else if (!AlreadyPredicated) { - // FIXME: This instruction is already predicated before the - // if-conversion pass. It's probably something like a conditional move. - // Mark this block unpredicable for now. - BBI.IsUnpredicable = true; - return; + if (PredUses.empty()) { // TEST ADDED BY BJORN + // ORIGINAL CODE: + + // FIXME: This instruction is already predicated before the + // if-conversion pass. It's probably something like a conditional move. + // Mark this block unpredicable for now. + BBI.IsUnpredicable = true; + return; + } + else { // ELSE CASE ADDED BY BJORN + // this is to support nested conditionals + // rationale: if a guarding predicate (of an ordinary instruction) is defined + // by an instruction in the block to be predicated itself, + // it will suffice to predicate the predicate defining instruction + // using the correct type of or, and, or cond predicate + // and to leave the guarding predicate unchanged + // example code fragment ( with dest operand on the right, guard left of the |): + // cmpeq_ct R1, R2, P1 + // P1 | add R1,R3,R4 + // can be predicated with a guarding predicate (suppose P2) as follows: + // P2 | cmpeq_ut R1, R2, P1 + // P1 | add R1, R3, R4 + // where _ct and _ut have the meaning depicted in Table 1 of the paper titled + // "Accurate and Efficient Predicate Analysis with Binary Decision Diagrams" by + // Sias, Hwo, and August in Micro 2000. + + // this is not a complete fix yet, as we cannot handle predicate + // generating instructions that are guarded with the same predicate + // but in the backend we have an assertion that will alert us if this situation + // ever occurs in our code base, so for the time being this is fine + for (unsigned i = 0; i < PredUses.size() ; i++){ + + unsigned used_reg = PredUses[i].getReg(); + bool use_after_def = false; + for (unsigned j = 0; j < PredDefs.size() ; j++) { + unsigned defined_reg = PredDefs[j].getReg(); + if (used_reg == defined_reg) { + use_after_def = true; + break; + } + } + if (!use_after_def) { + BBI.IsUnpredicable = true; + return; + } + } + } } } @@ -693,13 +742,13 @@ // Predicate may have been modified, the subsequent (currently) // unpredicated instructions cannot be correctly predicated. - BBI.IsUnpredicable = true; - return; + // BBI.IsUnpredicable = true; BY BJORN + // return; BY BJORN } // FIXME: Make use of PredDefs? e.g. ADDC, SUBC sets predicates but are // still potentially predicable. - std::vector<MachineOperand> PredDefs; + // std::vector<MachineOperand> PredDefs; // BY BJORN if (TII->DefinesPredicate(I, PredDefs)) BBI.ClobbersPred = true; @@ -736,7 +785,7 @@ return false; } if (TII->ReverseBranchCondition(RevPred) || - !TII->SubsumesPredicate(Cond, RevPred)) + (!TII->SubsumesPredicate(Cond, RevPred) && !TII->CanPredicateAlreadyPredicated(Cond,RevPred))) return false; } @@ -1116,6 +1165,15 @@ if (TII->ReverseBranchCondition(Cond)) llvm_unreachable("Unable to reverse branch condition!"); + // added by BJORN + bool HasEarlyExit = CvtBBI->FalseBB != NULL; + if (HasEarlyExit) { + SmallVector<MachineOperand, 4> RevCond(CvtBBI->BrCond.begin(), + CvtBBI->BrCond.end()); + if (TII->ReverseBranchCondition(RevCond)) + return false; + } + if (Kind == ICTriangleRev || Kind == ICTriangleFRev) { if (ReverseBranchCondition(*CvtBBI)) { // BB has been changed, modify its predecessors (except for this @@ -1140,7 +1198,7 @@ InitPredRedefs(CvtBBI->BB, Redefs, TRI); InitPredRedefs(NextBBI->BB, Redefs, TRI); - bool HasEarlyExit = CvtBBI->FalseBB != NULL; + // bool HasEarlyExit = CvtBBI->FalseBB != NULL; // BY BJORN if (CvtBBI->BB->pred_size() > 1) { BBI.NonPredSize -= TII->RemoveBranch(*BBI.BB); // Copy instructions in the true block, predicate them, and add them to On 01 Nov 2012, at 15:06, Tom Stellard <tom at stellard.net> wrote:> On Wed, Oct 31, 2012 at 09:13:43PM +0100, Bjorn De Sutter wrote: >> Hi all, >> >> I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we support or, and, and conditional predicates (see Scott Mahlke's papers on this issue) as supported by the OpenIMPACT compiler (=Trimaran). If anyone is interested, I can show some of the code. It is rather ad-hoc, however, so it is not at all ready for integration in the trunk (I think). >> > > I would be interested in looking at the code, since Southern Islands > GPUs also require predicates for nested conditions. > > -Tom > >> The problem we are still facing is that this predication works post instruction selection and post register allocation. This is problematic because some of the earlier optimizations such as loop unrolling should ideally be applied on if-converted code, on which it is easier to judge the opportunities for, e.g., modulo scheduling and initiation interval constraints (such as ResMII, RecMII). >> >> In my view, the ideal would be to have very generic, full (OpenIMPACT-like) predication support throughout LLVM, with the option of enabling/skipping early if-conversion just like one can enable or disable aggressive inlining. >> >> Best, >> >> Bjorn De Sutter >> Computer Systems Lab >> Ghent University >> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Oct 31, 2012, at 1:13 PM, Bjorn De Sutter <bjorn.desutter at elis.ugent.be> wrote:> Hi all, > > I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we support or, and, and conditional predicates (see Scott Mahlke's papers on this issue) as supported by the OpenIMPACT compiler (=Trimaran). If anyone is interested, I can show some of the code. It is rather ad-hoc, however, so it is not at all ready for integration in the trunk (I think). > > The problem we are still facing is that this predication works post instruction selection and post register allocation. This is problematic because some of the earlier optimizations such as loop unrolling should ideally be applied on if-converted code, on which it is easier to judge the opportunities for, e.g., modulo scheduling and initiation interval constraints (such as ResMII, RecMII). > > In my view, the ideal would be to have very generic, full (OpenIMPACT-like) predication support throughout LLVM, with the option of enabling/skipping early if-conversion just like one can enable or disable aggressive inlining.In theory, MachineInstrs can be predicated before register allocation (in SSA), but the machine code will be full of false dependencies and liveness will not be predicate-aware. (Predicated instructions would need implicit use operand for any virtual register defs). You would basically lose reaching defs after predication. You could implement a loop unroller for SSA MachineInstrs. It's conceptually similar to early tail duplication. "Predicating" at IR level would have to take the form of an analysis that indicates which blocks *will* be predicated. You would have to leave the original CFG and phi nodes in tact to preserve control dependence and dataflow. Then you just have the problem of preserving that analysis across any changes to the CFG. -Andy