Sanjoy Das
2014-Dec-04 00:15 UTC
[LLVMdev] RFC: inductive range check elimination via iteration space splitting
Hi all, I've been working on a pass that tries to eliminate range checks like this: void foo(int *a, int n, int a_len) { assert(a_len >= 0); for (int i = 0; i < n; i++ /* wrapping add */) { if (0 <= (i - 2) && (i - 2) < a_len) { a[i - 2] = 42; } else { out_of_bounds_error(i + 2); // noreturn unreachable; } } } Since n is unrelated to a_len in the above loop, we cannot eliminate the range checks in the loop as-is. But if we split the iteration space for `i` into: [0, min(n, 2)) \cup [min(n, 2), min(a_len + 2, n)) \cup [min(a_len + 2, n), n). (with the convention that for "a >= b", [a, b) is an empty set) we can elide the range checks in the middle split. We realize this split iteration space by breaking up the original loop into three loops; and only have the first and third loop have range checks. This approach can be extended to range checks on expressions like (P + Q * I) where I is the canonical induction variable -- there will be a set of disjoint contiguous ranges for which (P + Q * I) is provably within bounds. If one of those contiguous ranges can be "naturally split out" from I's normal iteration space, we can do so and elide the range checks in that subspace. It can also be extended to multiple range checks in the loop body. The first question I have: am I reinventing the wheel here? Has this approach been tried in LLVM and discarded because of fundamental issues? Are there better ways to eliminate range checks on affine functions on the canonical induction variable? The closest thing I could find to this is the LoopIndexSplit pass which was removed in 2010. Implementing this requires some design choices I'd like input on: 1. computing the splits in the iteration range -- the math is slightly tricky because we have to deal with wrapping addition, but in principle this can be done. The output of this phase is to split the iteration space of the canonical IV into [b0, e0) \cup [e0, b2) \cup [b2, e2). 2. code to transform the loop -- I could not find any helper functions in llvm to base this on. As far as I can tell there are two approaches: a. write a helper function that creates three (maybe two in some cases) copies of the loop and constrains their ranges as needed. b. abuse the loopunroll pass -- instead of creating the three loops "by hand", wrap the loop to be transformed into an outer loop that runs exactly thrice, add an exit from the inner to the outer loop on (outer_i == 0 && inner_i \in [b0, e0)) || (outer_i == 1 && inner_i \in [e0, b2)) || (outer_i == 2 && inner_i \in [b2, e2)) (the expression can be optimized, but semantically this is what we want) and ask llvm::UnrollLoop to completely unroll this outer loop. The advantage of (a) is that it feels cleaner; and potential code-sharing with loop unrolling and loop unswitching. (b)'s advantages are simplicity, but it feels like somewhat of a hack; and we still have the problem of transforming (a < x && a < y) to (a < min(x, y)). I'm personally leaning towards (a). 3. eliding the range checks -- once the transform has been done, we need to teach llvm to remove the now redundant range checks. LLVM does a lot of this already, but it may need some more help. In any case, there is a design choice: a. run a pass (passes?) that eliminates the redundant range checks from loop bodies b. have the transform elide the range checks itself (b) is faster while (a) is safer. What is the idiomatic solution here? Thanks, -- Sanjoy