Alex Bradbury via llvm-dev
2018-May-16 09:36 UTC
[llvm-dev] GlobalAddress lowering strategy
I've been looking at GlobalAddress lowering strategy recently and was wondering if anyone had any ideas, insights, or experience to share. ## Background When lowering global address, which is typically done in FooTargetLowering::LowerGlobalAddress you have the option of folding in the global offset into the global address or else emitting the base globaladdress and a separate ADD node for the offset. Which is best depends on the references to the base GlobalAddress within the function. AArch64 recently gained a DAGCombine for folding offsets into addresses where all users are of the form (globaladdr + constant) <https://reviews.llvm.org/rL330630>. We've been looking at the best GlobalAddress lowering strategy for RISC-V here <https://reviews.llvm.org/D45748> (thanks Sameer!), and that discussion has prompted me to reach out here on llvm-dev. For a RISC target I'd suggest that the ideal strategy would be: 1. If the global base has only a single reference across the whole function, or every reference has the same offset then combine the global with offset 2. If the global base has multiple references with different offsets then never combine the global with the offset. MachineCSE can remove redundant instructions. It isn't straightforward to implement such a strategy due to the basic-block granularity of the SelectionDAG and lack of use information for GlobalAddress values. I was wondering whether anybody has looked into this sort of issue for an in-tree or out-of-tree backend, or had any thoughts on addressing it. The numbers for introducing performDAGCombine to the AArch64 backend certainly indicate that performing the combine is a net win (46KB reduction in .text size of chromium), but it would interesting to look at addressing cases where the combine is counterproductive. ## Appendix: example 1 For the following code snippet, folding the offset into the global is ideal and AArch64 will choose to do so: @foo = global [6 x i16] [i16 1, i16 2, i16 3, i16 4, i16 5, i16 0], align 2 define i32 @main() nounwind { entry: %0 = load i16, i16* getelementptr inbounds ([6 x i16], [6 x i16]* @foo, i32 0, i32 4), align 2 %cmp = icmp eq i16 %0, 140 br i1 %cmp, label %if.end, label %if.then if.then: tail call void @abort() unreachable if.end: ret i32 0 } declare void @abort() ## Appendix: example 2 For this example, you produce fewer instructions if you don't fold the offset into the global and instead rely on MachineCSE to remove redundant instructions for forming the base global address. AArch64 will fold in the offset in performGlobalAddressCombine. @a = global [4 x i32] zeroinitializer, align 4 ; Function Attrs: noreturn nounwind define i32 @main() nounwind { entry: %0 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 0), align 4 %cmp = icmp eq i32 %0, 0 br i1 %cmp, label %if.end, label %if.then if.then: ; preds = %entry tail call void @abort() #3 unreachable if.end: ; preds = %entry %1 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 1), align 4 %cmp1 = icmp eq i32 %1, 3 br i1 %cmp1, label %if.end3, label %if.then2 if.then2: ; preds = %if.end tail call void @abort() #3 unreachable if.end3: ; preds = %if.end %2 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 2), align 4 %cmp4 = icmp eq i32 %2, 2 br i1 %cmp4, label %if.end6, label %if.then5 if.then5: ; preds = %if.end3 tail call void @abort() #3 unreachable if.end6: ; preds = %if.end3 %3 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 3), align 4 %cmp7 = icmp eq i32 %3, 1 br i1 %cmp7, label %if.end9, label %if.then8 if.then8: ; preds = %if.end6 tail call void @abort() #3 unreachable if.end9: ; preds = %if.end6 tail call void @exit(i32 0) #3 unreachable } declare void @abort() declare void @exit(i32)
Possibly Parallel Threads
- An issue with "lifetime.start" and storing "undef"
- Manipulating global address inside GlobalAddress SDNode in (RISCV) LLVM backend
- Manipulating global address inside GlobalAddress SDNode in (RISCV) LLVM backend
- [LLVMdev] GVNPRE /PRE is not effective
- Manipulating global address inside GlobalAddress SDNode in (RISCV) LLVM backend