Dan
2013-Jul-09 17:17 UTC
[LLVMdev] Optimization issue for target's offset field of load operation in DAGSelection
I am working on an experimental target and trying to make sure that the load offset field is used to the best way. There appears to be some control over the architecture's offset range and whether the offset is too large and needs to be lowered/converted into a separate sequence of operations in DAGSelection? Can someone point me to what might be the case? For example, the difference between index=63 and 64 causes the difference in address+offset being generated as separate operation versus built into the architecture versus just a load operation. In my architecture, there are larger offsets and 63 and 64 should not be the dividing line. Is there a limit on the ranges specified effectively for all targets or is somehow a constraint for my target set and causing this? Suggestions? long long array[100]; long long func() { return array[63]; // return array[64]; } Here is the difference in the .ll code with the 63 or 64 as the index: %0 = load i64* getelementptr inbounds ([10000 x i64]* @array, i32 0, i64 63), align 8 ret i64 %0 %0 = load i64* getelementptr inbounds ([10000 x i64]* @array, i32 0, i64 64), align 8 ret i64 %0 Here is the Instruction Selection for size 63: ISEL: Starting pattern match on root node: 0x3d9ad80: i64,ch = load 0x3d866f8, 0x3d9aa80, 0x3d9ac80<LD8[getelementptr inbounds ([10000 x i64]* @array, i32 0, i64 63)]> [ORD=2] [ID=6] Initial Opcode index to 813 Skipped scope entry (due to false predicate) at index 822, continuing at 876 Skipped scope entry (due to false predicate) at index 877, continuing at 931 TypeSwitch[i64] from 934 to 937 Morphed node: 0x3d9ad80: i64,ch = LDWri 0x3d9a880, 0x3d9ab80, 0x3d866f8<Mem:LD8[getelementptr inbounds ([10000 x i64]* @array, i32 0, i64 63)]> [ORD=2] ISEL: Match complete! ===== Instruction selection ends: Here is the Instruction Selection for size 64: ISEL: Match complete! ISEL: Starting pattern match on root node: 0x2d2cda0: i64,ch = load 0x2d18718, 0x2d2caa0, 0x2d2cca0<LD8[getelementptr inbounds ([10000 x i64]* @array, i32 0, i64 64)]> [ORD=2] [ID=6] Initial Opcode index to 813 Skipped scope entry (due to false predicate) at index 822, continuing at 876 Skipped scope entry (due to false predicate) at index 877, continuing at 931 TypeSwitch[i64] from 934 to 937 Morphed node: 0x2d2cda0: i64,ch = LDWri 0x2d2caa0, 0x2d2cba0, 0x2d18718<Mem:LD8[getelementptr inbounds ([10000 x i64]* @array, i32 0, i64 64)]> [ORD=2] ISEL: Match complete! ISEL: Starting pattern match on root node: 0x2d2caa0: i64 = add 0x2d2c8a0, 0x2d2c9a0 [ORD=1] [ID=5] Initial Opcode index to 1473 Match failed at index 1482 Continuing at 1498 Morphed node: 0x2d2caa0: i64 = ADD 0x2d2c8a0, 0x2d2c9a0 [ORD=1] etc
Krzysztof Parzyszek
2013-Jul-09 17:46 UTC
[LLVMdev] Optimization issue for target's offset field of load operation in DAGSelection
On 7/9/2013 12:17 PM, Dan wrote:> I am working on an experimental target and trying to make sure that > the load offset field is used to the best way. There appears to be > some control over the architecture's offset range and whether the > offset is too large and needs to be lowered/converted into a separate > sequence of operations in DAGSelection? > > Can someone point me to what might be the case?Instruction patterns can have predicates on each operand to make sure that the operand meets the required criteria. For example, in lib/Target/PowerPC/PPCInstrInfo.td, there is a definition of ADDI: def ADDI : DForm_2<14, (outs gprc:$rD), (ins gprc_nor0:$rA, symbolLo:$imm), "addi $rD, $rA, $imm", IntSimple, [(set i32:$rD, (add i32:$rA, immSExt16:$imm))]>; The "immSExt16" is a predicate, and it's defined in the same file: def immSExt16 : PatLeaf<(imm), [{ // immSExt16 predicate - True if the immediate fits in a 16-bit // sign extended field. Used by instructions like 'addi'. if (N->getValueType(0) == MVT::i32) return (int32_t)N->getZExtValue() == (short)N->getZExtValue(); else return (int64_t)N->getZExtValue() == (short)N->getZExtValue(); }]>; In this case, the ADDI will be generated only if the immediate operand satisfies the predicate. Otherwise, the ADDI pattern won't match, and the instruction selector will attempt to match other patterns. In this case it will most likely be the immediate by itself (loaded into a register), and then the pattern for ADD (register+register) will match on the result. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation