Florian Hahn via llvm-dev
2021-Mar-24 17:47 UTC
[llvm-dev] [RFC] Adding range metadata to array subscripts.
> On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 3/24/21 9:06 AM, Clement Courbet wrote: >> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert < >> johannesdoerfert at gmail.com> wrote: >> >>> I really like encoding more (range) information in the IR, >>> more thoughts inlined. >>> >>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote: >>>> Hi everyone, >>>> >>>> tl;dr: I would like to teach clang to output range metadata so that LLVM >>>> can do better alias analysis. I have a proposal as D99248 >>>> <https://reviews.llvm.org/D99248> (clang part) and D99247 >>>> <https://reviews.llvm.org/D99247> (llvm part). But there are other >>> possible >>>> options that I'm detailing below. >>>> >>>> Consider the following code, adapted from brotli >>>> <https://en.wikipedia.org/wiki/Brotli>: >>>> >>>> ``` >>>> >>>> struct Histogram { >>>> >>>> int values[256]; >>>> >>>> int total; >>>> >>>> }; >>>> >>>> Histogram DoIt(const int* image, int size) { >>>> >>>> Histogram histogram; >>>> >>>> for (int i = 0; i < size; ++i) { >>>> >>>> ++histogram.values[image[i]]; // (A) >>>> >>>> ++histogram.total; // (B) >>>> >>>> } >>>> >>>> return histogram; >>>> >>>> } >>>> ``` >>>> >>>> In this code, the compiler does not know anything about the values of >>>> images[i], so it assumes that 256 is a possible value for it. In that >>> case, >>>> (A) would change the value of histogram.total, so (B) has to load, add >>> one >>>> and store [godbolt <https://godbolt.org/z/KxE343>]. >>>> >>>> Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use >>>> values to form a pointer to total and dereference it. What valid C/C++ >>> code >>>> is allowed to do with values is: >>>> - Form any pointer in [values, values + 256]. >>>> - Form and dereference any pointer in [values, values + 256) >>>> >>>> Note that the LLVM memory model is much laxer than that of C/C++. It has >>> no >>>> notion of types. In particular, given an LLVM aggregate definition: >>>> >>>> ``` >>>> %struct.S = type { [42 x i32], i32, i32 } >>>> ``` >>>> >>>> It is perfectly valid to use an address derived from a GEP(0,0,%i) [gep >>>> reference] representing indexing into the [42 x i32] array to load the >>> i32 >>>> member at index 2. It is also valid for %i to be 43 (though not 44 if an >>>> inbound GEP is used). >>>> So clang has to give LLVM more information about the C/C++ rules. >>>> >>>> *IR representation:* >>>> LLVM has several ways of representing ranges of values: >>>> - *!range* metadata can be attached to integer call and load >>> instructions >>>> to indicate the allowed range of values of the result. LLVM's >>> ValueTracking >>>> provides a function for querying the range for any llvm::Variable. >>>> - The *llvm.assume* intrinsic takes a boolean condition that can also >>> be >>>> used by ValueTracking to infer range of values. >>>> - The *inrange* attribute of GEP can be used to indicate C-like >>> semantics >>>> for the structure field marked with the inrange attribute. It can only be >>>> used for GEP constantexprs (ie.e. GEPs defined inline), but not for >>>> standalone GEPs defining instructions. relevant discussion >>>> <https://reviews.llvm.org/D22793?id=65626#inline-194653>. >>>> >>>> Alternatives: >>>> *(1) *Annotate each array subscript index value with a range, e.g.: >>>> ``` >>>> %i = i64 … >>>> %ri = call i64 @llvm.annotation.i64(%index), !range !0 >>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, >>> i32 >>>> %ri >>>> ... >>>> !0 = !{i64 0, i64 42} >>>> ``` >>>> *(2) *(variant of 1) relax the constraint that !range metadata can only >>> be >>>> set on call and load instructions, and set the !range metadata on the >>> index >>>> expression. We still need annotations for function parameters though: >>>> ``` >>>> %i = i64 … , !range !0 >>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, >>> i32 >>>> %i >>>> ... >>>> !0 = !{i64 0, i64 42} >>>> ``` >>>> This is slightly more compact. >>>> >>>> *(3)* Same as (1), with llvm.assume. This feels inferior to annotations. >>>> *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this >>> will >>>> interfere with optimizations. >>> I would very much like not to introduce another way to encode >>> assumptions other than `llvm.assume`. If you want to avoid the extra >>> instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`, >>> which is in line with our move towards operand bundle use. >>> >> Thanks, I did not know about that. I've just tried it but it appears that >> tags have to be attribute names, and `!range` is not a valid attribute, >> it's a metadata node. Is there a way to encode this ? > > We need to get rid of that assertion. There are other non-attributes > to be used in assume operand bundles in the (near) future, so the this > work has to be done anyway.+1 on trying to use assume, rather than adding another way. But are value ranges special for assumes, so that we need to handle them in a bundle? Is that just so we can easier skip ‘artificial’ assume users? Cheers, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/bf9fb888/attachment.html>
Johannes Doerfert via llvm-dev
2021-Mar-24 19:32 UTC
[llvm-dev] [RFC] Adding range metadata to array subscripts.
On 3/24/21 12:47 PM, Florian Hahn wrote:> >> On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> >> On 3/24/21 9:06 AM, Clement Courbet wrote: >>> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert < >>> johannesdoerfert at gmail.com> wrote: >>> >>>> I really like encoding more (range) information in the IR, >>>> more thoughts inlined. >>>> >>>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote: >>>>> Hi everyone, >>>>> >>>>> tl;dr: I would like to teach clang to output range metadata so that LLVM >>>>> can do better alias analysis. I have a proposal as D99248 >>>>> <https://reviews.llvm.org/D99248> (clang part) and D99247 >>>>> <https://reviews.llvm.org/D99247> (llvm part). But there are other >>>> possible >>>>> options that I'm detailing below. >>>>> >>>>> Consider the following code, adapted from brotli >>>>> <https://en.wikipedia.org/wiki/Brotli>: >>>>> >>>>> ``` >>>>> >>>>> struct Histogram { >>>>> >>>>> int values[256]; >>>>> >>>>> int total; >>>>> >>>>> }; >>>>> >>>>> Histogram DoIt(const int* image, int size) { >>>>> >>>>> Histogram histogram; >>>>> >>>>> for (int i = 0; i < size; ++i) { >>>>> >>>>> ++histogram.values[image[i]]; // (A) >>>>> >>>>> ++histogram.total; // (B) >>>>> >>>>> } >>>>> >>>>> return histogram; >>>>> >>>>> } >>>>> ``` >>>>> >>>>> In this code, the compiler does not know anything about the values of >>>>> images[i], so it assumes that 256 is a possible value for it. In that >>>> case, >>>>> (A) would change the value of histogram.total, so (B) has to load, add >>>> one >>>>> and store [godbolt <https://godbolt.org/z/KxE343>]. >>>>> >>>>> Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use >>>>> values to form a pointer to total and dereference it. What valid C/C++ >>>> code >>>>> is allowed to do with values is: >>>>> - Form any pointer in [values, values + 256]. >>>>> - Form and dereference any pointer in [values, values + 256) >>>>> >>>>> Note that the LLVM memory model is much laxer than that of C/C++. It has >>>> no >>>>> notion of types. In particular, given an LLVM aggregate definition: >>>>> >>>>> ``` >>>>> %struct.S = type { [42 x i32], i32, i32 } >>>>> ``` >>>>> >>>>> It is perfectly valid to use an address derived from a GEP(0,0,%i) [gep >>>>> reference] representing indexing into the [42 x i32] array to load the >>>> i32 >>>>> member at index 2. It is also valid for %i to be 43 (though not 44 if an >>>>> inbound GEP is used). >>>>> So clang has to give LLVM more information about the C/C++ rules. >>>>> >>>>> *IR representation:* >>>>> LLVM has several ways of representing ranges of values: >>>>> - *!range* metadata can be attached to integer call and load >>>> instructions >>>>> to indicate the allowed range of values of the result. LLVM's >>>> ValueTracking >>>>> provides a function for querying the range for any llvm::Variable. >>>>> - The *llvm.assume* intrinsic takes a boolean condition that can also >>>> be >>>>> used by ValueTracking to infer range of values. >>>>> - The *inrange* attribute of GEP can be used to indicate C-like >>>> semantics >>>>> for the structure field marked with the inrange attribute. It can only be >>>>> used for GEP constantexprs (ie.e. GEPs defined inline), but not for >>>>> standalone GEPs defining instructions. relevant discussion >>>>> <https://reviews.llvm.org/D22793?id=65626#inline-194653>. >>>>> >>>>> Alternatives: >>>>> *(1) *Annotate each array subscript index value with a range, e.g.: >>>>> ``` >>>>> %i = i64 … >>>>> %ri = call i64 @llvm.annotation.i64(%index), !range !0 >>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, >>>> i32 >>>>> %ri >>>>> ... >>>>> !0 = !{i64 0, i64 42} >>>>> ``` >>>>> *(2) *(variant of 1) relax the constraint that !range metadata can only >>>> be >>>>> set on call and load instructions, and set the !range metadata on the >>>> index >>>>> expression. We still need annotations for function parameters though: >>>>> ``` >>>>> %i = i64 … , !range !0 >>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, >>>> i32 >>>>> %i >>>>> ... >>>>> !0 = !{i64 0, i64 42} >>>>> ``` >>>>> This is slightly more compact. >>>>> >>>>> *(3)* Same as (1), with llvm.assume. This feels inferior to annotations. >>>>> *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this >>>> will >>>>> interfere with optimizations. >>>> I would very much like not to introduce another way to encode >>>> assumptions other than `llvm.assume`. If you want to avoid the extra >>>> instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`, >>>> which is in line with our move towards operand bundle use. >>>> >>> Thanks, I did not know about that. I've just tried it but it appears that >>> tags have to be attribute names, and `!range` is not a valid attribute, >>> it's a metadata node. Is there a way to encode this ? >> We need to get rid of that assertion. There are other non-attributes >> to be used in assume operand bundles in the (near) future, so the this >> work has to be done anyway. > > +1 on trying to use assume, rather than adding another way. > > But are value ranges special for assumes, so that we need to handle them in a bundle? Is that just so we can easier skip ‘artificial’ assume users?It would make users explicit and we will have non-attribute bundles anyway. I find it also "conceptually nicer", would you prefer explicit instructions? ~ Johannes> Cheers, > Florian