Dilan Manatunga via llvm-dev
2016-Jul-06 17:36 UTC
[llvm-dev] Automatically scaled offset Load/stores for arrays
Hi, I have a question on how I would support load/store instructions where the offset is automatically scaled by the type. Simply put, any array index would be scaled by the width, so that there no longer needs to be a separate int32_t arr = {...} for (int i = 0; i < 100. i++) { x += arr[i] } the llvm code of (code below approximation of what would showup) %a = ld i32 %arr_ptr, %offset %x = add i32 %x %a %offset = offset + 4 %i = %i + 1 %cond = icmp %i 100 ICMP_ULT br for.body would lower to: %a = ld i32 %arr_ptr, %i %x = add i32 %x %a %i = %i + 1 %cond = icmp %i 100 ICMP_ULT br for.body Right now, the only idea I would have is to replace getelementptr instructions with an intrinsic that computes the offset and pass that in to the load instruction, and in instruction selection look for patterns of loads with intrinsic, and convert that to a scaled_arr load. Sorry if what I am asking is somewhat confusing. -Dilan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160706/211be971/attachment.html>
Tim Northover via llvm-dev
2016-Jul-06 17:58 UTC
[llvm-dev] Automatically scaled offset Load/stores for arrays
On 6 July 2016 at 10:36, Dilan Manatunga via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I have a question on how I would support load/store instructions where the > offset is automatically scaled by the type. Simply put, any array index > would be scaled by the width, so that there no longer needs to be a separate.This is a reasonably common pattern. Typically earlier LLVM passes (Loop Strength Reduction in particular) wrangle equivalent induction variables and offsets into what's best for your machine (using callbacks like TargetTransformInfo::getScalingFactorCost by the looks of it). In this case, I'd expect that after setting scale-4 to free and scale-1 to expensive it will produce something like %ptr = %arr_ptr + 4 * %i %a = load i32 %ptr %x = %x + %a %i = %i + 1 loop! At that point you have the much simpler task of looking for patterns like "(load (add $base, (mul $offset, 4)))" during ISel. Quite a few other targets do this (they usually find it's actually simpler to do it in C++ using ComplexPatterns, see AArch64's "SelectAddrModeXRO" functions for example). Let me know if I've been unclear anywhere. Tim.