Alex Bradbury via llvm-dev
2018-Oct-03 09:27 UTC
[llvm-dev] [RFC] Implementing a general purpose 64-bit target (RISC-V 64-bit) with i64 as the only legal integer type
# Purpose of this RFC This RFC describes the challenges of modelling the 64-bit RISC-V target (RV64) and details the two most obvious implementation choices: 1) Having i64 as the only legal integer type 2) Introducing i32 subregisters I've worked on implementing both approaches and fleshed out a pretty complete implementation of 1), which is my preferred option. With this RFC, I would welcome further feedback and insight, as well as suggestions or comments on the target-independent modifications (e.g. TargetInstrInfo hooks) I suggest as worthwhile. # Background: RV64 The RISC-V instruction set is structured as a set of bases (RV32I, RV32E, RV64I, RV128I) with a series of optional extensions (e.g. M for multiply/divide, A for atomics, F+D for single+double precision floating point). It's important to note that RV64I is not just RV32I with some additional instructions, it's a completely different base where operations work on 64-bit rather than 32-bit values. RV64I also introduces 10 new instructions: ld/sd (64-bit load/store), addiw, slliw, srliw, sraiw, addw, subw, sllw, srlw, sraw. The `*W` instructions all produce a sign-extended result and take the lower 32-bits of their operands as inputs. Unlike MIPS64, there is no requirement that inputs to these `*W` are sign-extended in order to avoid unpredictable behaviour. # Background: RISC-V backend implementation. Other backends aiming to support both 32-bit and 64-bit architecture variants handle this by defining two versions of each instruction with overlapping encodings, with one marked as isCodeGenOnly. This leads to unwanted duplication, both in terms of tablegen descriptions and throughout the C++ implementation of the backend (e.g. any code checking for RISCV::ADD would also want to check for RISCV::ADD64). Fortunately we can avoid this thanks to the work Krzysztof Parzyszek contributed to support variable-sized register classes <http://lists.llvm.org/pipermail/llvm-dev/2016-September/105027.html>. The in-tree RISC-V backend exploits this, parameterising the base instruction definitions by XLEN (the size of the general purpose registers). # Option 1: Have i64 as the only legal type ## Approach Every register class in RISCVRegisterInfo.td is parameterised by XLenVT, which is i32 for RV32 and i64 for RV64. No subregisters are defined, meaning i32 is not a legal type. Patterns for the `*W` instructions tend to look something like: def : Pat<(sext_inreg (add GPR:$rs1, GPR:$rs2), i32), (ADDW GPR:$rs1, GPR:$rs2)>; Essentially all patterns for RV32I are also valid for RV64I. ## Changes needed * Introduction of new patterns, RV64I-specific immediate materialisation * A number of SelectionDAG nodes generated from LLVM intrinsics take i32 arguments and the DAG legalizer doesn't currently know how to legalize them. Promoting these arguments is trivial but requires additions to LegalizeIntegerTypes.cpp. So far I've had to do this for frameaddr/returnaddr/prefetch, but there are likely more. * The shift amount type is i64. If the shift amount operand is smaller than this, SelectionDAGBuilder will zero-extend it (changed from any-extend in rL125457). i32->i64 zero-extension is more expensive than sign-extension, but it's unnecessary anyway as only the lower 6 bits are used. Introduce TargetLowering::getExtendForShiftAmount which is called during SelectionDAGBuilder::visitShift. * When promoting setcc operands, DAGTypeLegalizer::PromoteSetCCOperands makes the arbitrary choice to zero-extend. It is cheaper to sign-extend from i32 to i64, so introduce TargetLowering::isSExtCheaperThanZExt(FromTY, ToTy). For now this is only used through PromoteSetCCOperands, but perhaps there are other cases where it would be useful? * When 32-bit srl is legalized, the dag combiner will try to reduce the bits in the mask in: (srl (and val, 0xffffffff), imm) based on the knowledge of the lower bits that will be shifted out. This means a tablegen pattern matching 0xffffff won't work. Custom selection code in RISCVDAGToDAGISel can recognize when this has happened and produce SRLIW. * New i64 versions of the target-specific intrinsics added to aid the lowering of part-word atomicrmw must be defined. * RV64F (single-precision floating point) requires a little extra work due to the fact i32 is not a legal type. When call lowering happens post-legalisation (e.g. when an intrinsic was inserted during legalisation). A bitcast from f32 to i32 can't be introduced. There's a similar challenge for RV32D. Introduce target-specific DAG nodes that perform bitcast+sext for f32->i64 and trunc+bitcast for i64->f32. Custom-lower ISD::BITCAST to ensure these nodes are selected. ## Questions Does anyone have any reservations about this approach of having i64 as the only legal type? Some of the target hooks could perhaps be replaced with more heroics in the backend. What are people's feelings here? # Option 2: Model 32-bit subregs ## Approach Define 32-bit subregisters for the GPRs that can be used in patterns and instruction definitions. The following node types are potentially useful: * `EXTRACT_SUBREG`: Supports getting the lower 32-bits of a 64-bit register * `INSERT_SUBREG`: Assumes only the lower bits are modified. Can be used with `IMPLICIT_DEF` to indicate that the upper bits are undefined. You can't directly represent sign-extension, but you can do what Mips64 does and define extra patterns to catch redundant sign-extension after one of the `*W` instructions. * `SUBREG_TO_REG`: a constant argument asserts the value of the bits left in the upper portion of the register. This is perfect for zero-extension, and not much good for the sign-extension RISC-V performs. You end up with patterns like: def : Pat<(anyext GPR32:$reg), (SUBREG_TO_REG (i64 0), GPR32:$reg, sub_32)>; def : Pat<(trunc GPR:$reg), (EXTRACT_SUBREG GPR:$reg, sub_32)>; def : Pat<(add GPR32:$src, GPR32:$src2), (ADDW GPR32:$src, GPR32:$src2)>; def : Pat<(add GPR32:$rs1, simm12_i32:$imm12), (ADDIW GPR32:$rs1, simm12_i32:$imm12)>; ## Changes needed * 32-bit subregisters must be defined. Some register classes need GPR32 versions, e.g. GPR, GPRNoX0, GPRC. * The RISCVAsmParser and RISCVDisassembler must be modified to support the new register classes used for the 32-bit subregs. * The calling convention implementation must handle promotion of i32 arguments/returns to i64. * The `*W` instructions must be defined using GPR32. * New `Operand<i32>` types must be defined and used in the `*W` instructions. * When defining a variable-sized register class you specify a DefaultMode. This must be set to i64 to avoid breaking RV32 compilation. * This gives enough to define working support for the `*W` operations, but to enable codegen for the other integer instructions requires either duplication or smarts. To write patterns using i32 you need to define a new variant of the instruction. TableGen changes might remove the need for this. Even with such support, it's not particularly desirable to write a bunch of new patterns for instructions other than the `*W` ones. I'm sure solutions are possible, but given that the i64-only approach seems to work very well, I'm not sure it's worth pushing further. # Conclusion Taking full advantage of support for variable-sized register classes and sticking with i64 as the only legal integer type seems very workable and is definitely my preference based on the work I've done. I'd be really interested if anyone has any particular concerns or advice, or feedback on the suggested new target hooks. Best, Alex Bradbury, lowRISC CIC