Andrea Di Biagio via llvm-dev
2018-May-10 15:58 UTC
[llvm-dev] [RFC] MC support for variant scheduling classes.
Hi all, The goal of this RFC is to make information related to variant scheduling classes accessible at MC level. This would help tools like llvm-mca understand/resolve variant scheduling classes. To achieve this goal, I plan to introduce a new class of scheduling predicates named MCSchedPredicate. An MCSchedPredicate allows the definition of boolean expressions with a well-known semantic, that can be used to generate code for both MachineInstr and MCInst. The new predicates are designed to be completely optional. Scheduling models can use a combination of SchedPredicate and MCSchedPredicate to describe variant reads and writes. Old scheduling predicate definitions would still be valid. New MCSchedPredicates would behave like normal scheduling predicates. A bit of background ------------------- Variant scheduling classes model situations where the instruction profile depends on the value of certain operands. For example, modern x86 processors know that a register-register XOR is a zero-idiom if both operands are the same register. That means, the XOR would be optimized out at register renaming stage, and no opcode issued to the pipelines. A variant scheduling class can be used to describe this case (see example below): ``` def ZeroIdiomWrite : SchedWriteRes<[]> { let Latency = 0; } def ZeroIdiom : SchedPredicate<[{ MI->getOpcode() == X86::XORrr && MI->getOperand(0).getReg() == MI->getOperand(1).getReg() }]>; def WriteXOR : SchedWriteVariant<[ SchedVar<ZeroIdiom, [ZeroIdiomWrite], SchedVar<NoSchedPred, [WriteALU]>;``` Problems with the current design -------------------------------- A SchedPredicate is essentially a custom block of C++ code used by the SubtargetEmitter to generate a condition through a boolean expression. A SchedPredicate sees all the definitions that are "captured" by the `PredicateProlog` (another block of C++ code). It can also access public members of TargetSchedule. A common pattern used by the ARM scheduling models to define predicates is: - PredicateProlog "captures" the TargetInstrInfo object from the TargetSchedule object. - Each predicate uses the "captured" TargetInstrInfo object (TII) to call helpers exposed by the (target specific) InstrInfo interface. Note that TargetSchedule and TargetInstrInfo are both CodeGen concepts. SchedPredicate definitions only work on MachineInstr objects. Therefore, the C++ code block is not portable (i.e. it doesn't work if the input instruction is a MCInst). The `MI` used by the ZeroIdiom definition from the previous example is a MachineInstr *. The main problem with this design is that predicates don't have a "portable" semantic. A predicate is essentially an opaque block of code, and the semantic of predicates is unknown to tablegen. Tablegen can only trust the user, and just "copy-paste" code blocks from the various predicates to an auto-generated `XXXGenSubtargetInfo::resolveSchedClass()` function. This limits our ability to reason on predicates. In particular, it makes it extremely hard (if not impossible) for tools that can only access the MC layer to reuse predicate definitions to resolve variant scheduling classes. If instead we expose the semantic of predicates to tablegen, we can then teach tablegen how to generate an equivalent code-block that works on MCInst. In the next section I show how I plan to expose the semantic of scheduling predicates to tablegen. I will then go through a couple of examples describing how the new predicate syntax can be used, and finally I will describe the patches required to implement this feature. A new class of scheduling predicates ------------------------------------ MCSchedPredicate allows the definition of scheduling predicates that have a well-defined portable semantic. They can be used in place of SchedPredicate to define SchedReadVariant and SchedWriteVariant definitions in tablegen. An MCSchedPredicate definition is built on top of an MCPredicate. MCPredicate definitions can be composed together to form complex boolean expressions. To better understand how these new predicates work, let's have a look at the following example. ``` def M3BranchLinkFastPred : SchedPredicate<[{MI->getOpcode() =AArch64::BLR && MI->getOperand(0).isReg() && MI->getOperand(0).getReg() ! AArch64::LR}]>; ``` This tablegen code snippet has been taken from AArch64/AArch64SchedExynosM3.td Predicate `M3BranchLinkFastPred` can be rewritten using an MCSchedPredicate definition as follows: ``` def M3BranchLinkFastPred : MCSchedPredicate< CheckAllOf<[ CheckOpcode<[BLR]>, CheckRegOperand<0>, CheckNot<CheckRegOperandValue<0, LR>>]> >; ``` The MCSchedPredicate uses a `CheckAllOf`, which is a "composition of predicates", and returns true only if every predicate in the composition returns true. Note that `CheckAllOf`, `CheckOpcode`, `CheckRegOperand` and `CheckNot` are all MCPredicate classes. Each predicate class has a well known semantic. For example, `CheckOpcode` is only used to check if the opcode of an instruction is part of a set of opcodes. In this example, CheckOpcode is used to check if the instruction is a BLR. This new syntax allows the definition of predicates in a declarative way. These new predicates don't require custom blocks of C++, and can be used to define conditions without being bound to a particular representation (i.e. MachineInstr vs MCInst). It also means that tablegen backends are now able to parse and understand the logic of each predicate check. But more importantly, tablegen backends gained the ability to "lower" scheduling predicates into code that work on MCInst too. A more complicated example involving TII method calls. ------------------------------------------------------ This code is taken from the AArch64 Cyclone scheduling model: ``` def WriteZPred : SchedPredicate<[{TII->isGPRZero(*MI)}]>; def WriteImmZ : SchedWriteVariant<[ SchedVar<WriteZPred, [WriteX]>, SchedVar<NoSchedPred, [WriteImm]>]>; ``` Predicate WriteZPred is used to check if a GPR instruction is a zero-idiom. The rationale is that zero-idioms have zero latency and don't consume processor resources. The predicate logic is defined by method `isGPRZero()`, which is accessible through the TII object (i.e. a `const AArch64InstrInfo *`). Below is the definition of `isGPRZero` in AArch64/AArch64InstrInfo.cpp: ``` // Return true if this instruction simply sets its single destination register // to zero. This is equivalent to a register rename of the zero-register. bool AArch64InstrInfo::isGPRZero(const MachineInstr &MI) { switch (MI.getOpcode()) { default: break; case AArch64::MOVZWi: case AArch64::MOVZXi: // movz Rd, #0 (LSL #0) if (MI.getOperand(1).isImm() && MI.getOperand(1).getImm() == 0) { assert(MI.getDesc().getNumOperands() == 3 && MI.getOperand(2).getImm() == 0 && "invalid MOVZi operands"); return true; } break; case AArch64::ANDWri: // and Rd, Rzr, #imm return MI.getOperand(1).getReg() == AArch64::WZR; case AArch64::ANDXri: return MI.getOperand(1).getReg() == AArch64::XZR; case TargetOpcode::COPY: return MI.getOperand(1).getReg() == AArch64::WZR; } return false; } ``` That logic can be replaced by the following MCPredicate definitions: ``` def CheckMOVZ : CheckAllOf<[ CheckOpcode<[MOVZWi, MOVZXi]>, CheckNumOperands<3>, CheckImmOperand<1>, CheckZeroOperand<1>, CheckImmOperand<2>, CheckZeroOperand<2> ]>; def CheckANDW : CheckAllOf<[ CheckOpcode<[ANDWri]>, CheckRegOperand<1>, CheckRegOperandValue<1, WZR> ]>; def CheckANDX : CheckAllOf<[ CheckOpcode<[ANDXri]>, CheckRegOperand<1>, CheckRegOperandValue<1, XZR> ]>; def CheckCOPY : CheckAllOf<[ CheckPseudo<[COPY]>, CheckRegOperand<1>, CheckRegOperandValue<1, WZR> ]>; // Return true if this instruction simply sets its single destination register // to zero. This is equivalent to a register rename of the zero-register. def IsGPRZero : TIIPredicate<"AArch64", "isGPRZero", AnyOfMCPredicates<[CheckMOVZ, CheckANDW, CheckANDX, CheckCOPY]>>; ``` TIIPredicate definitions are used to model calls to the target-specific InstrInfo. A TIIPredicate definition is treated specially by the InstrInfoEmitter tablegen backend, which will use it to automatically generate a definition in the target specific `GenInstrInfo` class. Basically, we can tell tablegen to generate that definition for us. Now that the description of IsGPRZero is available in the form of a MCPredicate, we can modify the original SchedWriteVariant WriteImmZ as follows: ``` def WriteZPred : MCSchedPredicate<IsGPRZero>; def WriteImmZ : SchedWriteVariant<[ SchedVar<WriteZPred, [WriteX]>, SchedVar<SchedDefault, [WriteImm]>]>; ``` How to resolve scheduling classes from MC ----------------------------------------- MCSubtargetInfo will gain a new method: ``` /// Resolve a variant scheduling class for the given MCInst and CPU. virtual unsigned resolveVariantSchedClass(unsigned SchedClass, const MCInst *MI, unsigned CPUID) const { return 0; } ``` The SubtargetEmitter is resonsible for processing scheduling classes and generate an override for that method. This is what the SubtargetEmitter generates for the Cyclone and Exynos3M if we implement the changes described by the previous sections: ``` unsigned resolveVariantSchedClass(unsigned SchedClass, const MCInst *MI, unsigned CPUID) const override { switch (SchedClass) { case 117: // BLR if (CPUID == 5) { // ExynosM3Model if (( ( MI->getOpcode() == AArch64::BLR ) && MI->getOperand(0).isReg() && MI->getOperand(0).getReg() != AArch64::LR )) return 934; // M3WriteAB if (true) return 935; // M3WriteAC } break; case 386: // MOVZWi_MOVZXi if (CPUID == 3) { // CycloneModel if (AArch64_MC::isGPRZero(*MI)) return 930; // WriteX if (true) return 962; // WriteImm } break; case 387: // ANDWri_ANDXri if (CPUID == 3) { // CycloneModel if (AArch64_MC::isGPRZero(*MI)) return 930; // WriteX if (true) return 962; // WriteImm } break; case 695: // ANDWri if (CPUID == 3) { // CycloneModel if (AArch64_MC::isGPRZero(*MI)) return 930; // WriteX if (true) return 962; // WriteImm } break; }; // Don't know how to resolve this scheduling class. return 0; } }; ``` Note that this override will become a member of a new tablegen'd class named AArch64GenMCSubtargetInfo. That class would directly extend MCSubtargetInfo. Class AArch64GenMCSubtargetInfo is what will get instantiated by method `Target::createMCSubtargetInfo()`. ---- Let's go back to the definition of IsGPRZero using a TIIPredicate. ``` def IsGPRZero : TIIPredicate<"AArch64", "isGPRZero", AnyOfMCPredicates<[CheckMOVZ, CheckANDW, CheckANDX, CheckCOPY]>>; ``` This is how the InstructionInfoEmitter expands the method in the tablegen'd class AArch64GenInstrInfo: ``` static bool isGPRZero(const MachineInstr &MI) { return ( ( ( MI.getOpcode() == AArch64::MOVZWi || MI.getOpcode() == AArch64::MOVZXi ) && MI.getNumOperands() == 3 && MI.getOperand(1).isImm() && MI.getOperand(1).getImm() == 0 && MI.getOperand(2).isImm() && MI.getOperand(2).getImm() == 0 ) || ( ( MI.getOpcode() == AArch64::ANDWri ) && MI.getOperand(1).isReg() && MI.getOperand(1).getReg() == AArch64::WZR ) || ( ( MI.getOpcode() == AArch64::ANDXri ) && MI.getOperand(1).isReg() && MI.getOperand(1).getReg() == AArch64::XZR ) || ( ( MI.getOpcode() == TargetOpcode::COPY ) && MI.getOperand(1).isReg() && MI.getOperand(1).getReg() == AArch64::WZR ) ); } ``` Another variant of function `isGPRZero` is expanded in the AArch64_MC namespace (see below): ``` #ifdef GET_GENINSTRINFO_MC_DECL #undef GET_GENINSTRINFO_MC_DECL namespace llvm { class MCInst; namespace AArch64_MC { bool isGPRZero(const MCInst &MI); } // end AArch64_MC namespace } // end llvm namespace #endif // GET_GENINSTRINFO_MC_DECL #ifdef GET_GENINSTRINFO_MC_HELPERS #undef GET_GENINSTRINFO_MC_HELPERS namespace llvm { namespace AArch64_MC { bool isGPRZero(const MCInst &MI) { return ( ( ( MI.getOpcode() == AArch64::MOVZWi || MI.getOpcode() == AArch64::MOVZXi ) && <...snip...> ) ); } } // end AArch64_MC namespace } // end llvm namespace #endif // GET_GENISTRINFO_MC_HELPERS ``` Function isGPRZero would live in namespace AArch64_MC. The declaration of AArch64_MC::isGPRZero has to be made visible to AArch64MCTargetDesc.h, so that it becomes known to the new `resolveVariantSchedClass()` method. As a side note: all this code is guarded by macro definitions. This allows to control their expansion (if we decide that we don't want them). What to do next --------------- I have a series of three patches ready to be sent upstream for review. The first patch is mostly a no functional change. It introduces the new scheduling predicate class in tablegen, and it teaches the InstructionInfoEmitter and the SubtargetEmitter how to expand MCSchedPredicate definitions. The first patch is up for review here: https:://reviews.llvm.org/D46695. The second patch would teach the SubtargetEmitter how to generate method resolveVariantSchedClass(). The last patch of the sequence will teach llvm-mca how to use method `resolveVariantSchedClass()` to resolve variant classes. llvm-mca will generate an error if the variant scheduling class cannot be resolved. Review https://reviews.llvm.org/D46697 is the union of patch1 and patch2 only. It is not meant to be reviewed at this stage, since it contains the code changes related to patch1. The third patch is available here: https://reviews.llvm.org/D46698. D46698 requires patch1 and patch2. Bonus (optional) patches: 1) [X86] Teach scheduling models how to recognize zero-idioms. This would make easier to review the llvm-mca change. 2) [X86] Add variant scheduling classes for LEA instructions. 3) [AArch64] Rewrite the predicates mentioned by this RFC. People that are interested in seeing how to implement "optional" patch 3 can have a look at the review here: https://reviews.llvm.org/D46701 Please let me know what you think. Thanks, Andrea -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180510/f969d70d/attachment.html>
Andrew Trick via llvm-dev
2018-May-10 20:58 UTC
[llvm-dev] [RFC] MC support for variant scheduling classes.
> On May 10, 2018, at 8:58 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote: > > Hi all, > > The goal of this RFC is to make information related to variant scheduling > classes accessible at MC level. This would help tools like llvm-mca > understand/resolve variant scheduling classes. > > To achieve this goal, I plan to introduce a new class of scheduling predicates > named MCSchedPredicate. An MCSchedPredicate allows the definition of boolean > expressions with a well-known semantic, that can be used to generate code for > both MachineInstr and MCInst. > > The new predicates are designed to be completely optional. Scheduling models > can use a combination of SchedPredicate and MCSchedPredicate to describe > variant reads and writes. Old scheduling predicate definitions would still be > valid. New MCSchedPredicates would behave like normal scheduling predicates.> <snip>> What to do next > --------------- > I have a series of three patches ready to be sent upstream for review. > > The first patch is mostly a no functional change. It introduces the new > scheduling predicate class in tablegen, and it teaches the > InstructionInfoEmitter and the SubtargetEmitter how to expand MCSchedPredicate > definitions. > The first patch is up for review here: https:://reviews.llvm.org/D46695 <http://reviews.llvm.org/D46695>. > > The second patch would teach the SubtargetEmitter how to generate method > resolveVariantSchedClass(). > > The last patch of the sequence will teach llvm-mca how to use method > `resolveVariantSchedClass()` to resolve variant classes. llvm-mca will generate an error if the variant scheduling class cannot be resolved. > > Review https://reviews.llvm.org/D46697 <https://reviews.llvm.org/D46697> is the union of patch1 and patch2 only. > It is not meant to be reviewed at this stage, since it contains the code > changes related to patch1. > > The third patch is available here: https://reviews.llvm.org/D46698 <https://reviews.llvm.org/D46698>. > D46698 requires patch1 and patch2. > > Bonus (optional) patches: > 1) [X86] Teach scheduling models how to recognize zero-idioms. > This would make easier to review the llvm-mca change. > 2) [X86] Add variant scheduling classes for LEA instructions. > 3) [AArch64] Rewrite the predicates mentioned by this RFC. > > People that are interested in seeing how to implement "optional" patch 3 can > have a look at the review here: https://reviews.llvm.org/D46701 <https://reviews.llvm.org/D46701> > > Please let me know what you think. > > Thanks, > AndreaFantastic writeup! It’s great to see so much progress on fundamental infrastructure. My time for LLVM code review is extremely limited. Can someone work with Andrea to get these patches in? -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180510/ab352e74/attachment.html>
Renato Golin via llvm-dev
2018-May-10 21:24 UTC
[llvm-dev] [RFC] MC support for variant scheduling classes.
On 10 May 2018 at 21:58, Andrew Trick <atrick at apple.com> wrote:> Fantastic writeup! It’s great to see so much progress on fundamental > infrastructure. > > My time for LLVM code review is extremely limited. Can someone work with > Andrea to get these patches in?Hi Andrew, Same here, but this has been a long goal for me, too, so I'll do my best. -- cheers, --renato