Johannes Doerfert via llvm-dev
2021-Mar-24 15:16 UTC
[llvm-dev] [RFC] Adding range metadata to array subscripts.
On 3/24/21 9:06 AM, Clement Courbet wrote:> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert < > johannesdoerfert at gmail.com> wrote: > >> I really like encoding more (range) information in the IR, >> more thoughts inlined. >> >> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote: >>> Hi everyone, >>> >>> tl;dr: I would like to teach clang to output range metadata so that LLVM >>> can do better alias analysis. I have a proposal as D99248 >>> <https://reviews.llvm.org/D99248> (clang part) and D99247 >>> <https://reviews.llvm.org/D99247> (llvm part). But there are other >> possible >>> options that I'm detailing below. >>> >>> Consider the following code, adapted from brotli >>> <https://en.wikipedia.org/wiki/Brotli>: >>> >>> ``` >>> >>> struct Histogram { >>> >>> int values[256]; >>> >>> int total; >>> >>> }; >>> >>> Histogram DoIt(const int* image, int size) { >>> >>> Histogram histogram; >>> >>> for (int i = 0; i < size; ++i) { >>> >>> ++histogram.values[image[i]]; // (A) >>> >>> ++histogram.total; // (B) >>> >>> } >>> >>> return histogram; >>> >>> } >>> ``` >>> >>> In this code, the compiler does not know anything about the values of >>> images[i], so it assumes that 256 is a possible value for it. In that >> case, >>> (A) would change the value of histogram.total, so (B) has to load, add >> one >>> and store [godbolt <https://godbolt.org/z/KxE343>]. >>> >>> Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use >>> values to form a pointer to total and dereference it. What valid C/C++ >> code >>> is allowed to do with values is: >>> - Form any pointer in [values, values + 256]. >>> - Form and dereference any pointer in [values, values + 256) >>> >>> Note that the LLVM memory model is much laxer than that of C/C++. It has >> no >>> notion of types. In particular, given an LLVM aggregate definition: >>> >>> ``` >>> %struct.S = type { [42 x i32], i32, i32 } >>> ``` >>> >>> It is perfectly valid to use an address derived from a GEP(0,0,%i) [gep >>> reference] representing indexing into the [42 x i32] array to load the >> i32 >>> member at index 2. It is also valid for %i to be 43 (though not 44 if an >>> inbound GEP is used). >>> So clang has to give LLVM more information about the C/C++ rules. >>> >>> *IR representation:* >>> LLVM has several ways of representing ranges of values: >>> - *!range* metadata can be attached to integer call and load >> instructions >>> to indicate the allowed range of values of the result. LLVM's >> ValueTracking >>> provides a function for querying the range for any llvm::Variable. >>> - The *llvm.assume* intrinsic takes a boolean condition that can also >> be >>> used by ValueTracking to infer range of values. >>> - The *inrange* attribute of GEP can be used to indicate C-like >> semantics >>> for the structure field marked with the inrange attribute. It can only be >>> used for GEP constantexprs (ie.e. GEPs defined inline), but not for >>> standalone GEPs defining instructions. relevant discussion >>> <https://reviews.llvm.org/D22793?id=65626#inline-194653>. >>> >>> Alternatives: >>> *(1) *Annotate each array subscript index value with a range, e.g.: >>> ``` >>> %i = i64 … >>> %ri = call i64 @llvm.annotation.i64(%index), !range !0 >>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, >> i32 >>> %ri >>> ... >>> !0 = !{i64 0, i64 42} >>> ``` >>> *(2) *(variant of 1) relax the constraint that !range metadata can only >> be >>> set on call and load instructions, and set the !range metadata on the >> index >>> expression. We still need annotations for function parameters though: >>> ``` >>> %i = i64 … , !range !0 >>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, >> i32 >>> %i >>> ... >>> !0 = !{i64 0, i64 42} >>> ``` >>> This is slightly more compact. >>> >>> *(3)* Same as (1), with llvm.assume. This feels inferior to annotations. >>> *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this >> will >>> interfere with optimizations. >> I would very much like not to introduce another way to encode >> assumptions other than `llvm.assume`. If you want to avoid the extra >> instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`, >> which is in line with our move towards operand bundle use. >> > Thanks, I did not know about that. I've just tried it but it appears that > tags have to be attribute names, and `!range` is not a valid attribute, > it's a metadata node. Is there a way to encode this ?We need to get rid of that assertion. There are other non-attributes to be used in assume operand bundles in the (near) future, so the this work has to be done anyway.> >> SCEV should be thought about this (as well), unsure what the problem >> is you describe below. If BasicAA needs to know, sure. >> > scev-aa already knows how to use range information. If we add a !range > metadata in clang right now and use SCEV, there is nothing to do on the > LLVM side. My point was that scev-aa is not enabled in the default > pipeline, so we might as well teach BasicAA about this cheap case. > > Actually I think it makes sense to teach BasicAA about range information > anyway (D99247) given that it could already be useful in cases like: > > ``` > define dso_local void @DoIt(%struct.Histogram* noalias nocapture sret( > %struct.Histogram) align 4 %0, i32 %1, *i8 zeroext %2*) local_unnamed_addr > #0 { > ... > *%6 = zext i8 %2 to i64* > %7 = getelementptr inbounds %struct.Histogram, %struct.Histogram* %0, i64 0 > , i32 0, *i64 %6* > ``` > > Where ValueTracking could easily deduce that %6 is in [0,255].Sounds reasonable. ~ Johannes> > > >> ~ Johannes >> >> >>> *On the clang side*: >>> The clang part is quite trivial as the infrastructure is already in place >>> to emit dynamic ubsan guards: D99248 <https://reviews.llvm.org/D99248> >>> >>> *On the LLVM Side:* >>> Alternatives: >>> *(A)* - (annotation or assume options) Simply enable scev-aa which knows >>> how to handle value ranges in general. IIUC it's not enabled in clang >>> because it has issues with invalidation when code changes, and is >> therefore >>> not cacheable. This makes it too slow to be practical. >>> *(B) *- (annotation or assume options) Teach BasicAA to honor !range >>> metadata (D99247 <https://reviews.llvm.org/D99247>) >>> *(C)* - (inrange option) Teach BasicAA to honor inrange attributes of >> GEP. >>> I was leaning towards (1) and (B) because: >>> - BasicAA already has basic support for value range analysis >>> (zero/nonzero), this is a small and natural extension. >>> - The BasicAA improvement already benefits some existing code (as >>> evidenced by the test changes in D99247 <https://reviews.llvm.org/D99247 >>> ) >>> - Using range metadata rather than the `inrange` attribute means that >>> BasicAA will automatically benefit from improvements in value tracking in >>> the future. >>> >>> Opinions ? >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Florian Hahn via llvm-dev
2021-Mar-24 17:47 UTC
[llvm-dev] [RFC] Adding range metadata to array subscripts.
> On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 3/24/21 9:06 AM, Clement Courbet wrote: >> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert < >> johannesdoerfert at gmail.com> wrote: >> >>> I really like encoding more (range) information in the IR, >>> more thoughts inlined. >>> >>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote: >>>> Hi everyone, >>>> >>>> tl;dr: I would like to teach clang to output range metadata so that LLVM >>>> can do better alias analysis. I have a proposal as D99248 >>>> <https://reviews.llvm.org/D99248> (clang part) and D99247 >>>> <https://reviews.llvm.org/D99247> (llvm part). But there are other >>> possible >>>> options that I'm detailing below. >>>> >>>> Consider the following code, adapted from brotli >>>> <https://en.wikipedia.org/wiki/Brotli>: >>>> >>>> ``` >>>> >>>> struct Histogram { >>>> >>>> int values[256]; >>>> >>>> int total; >>>> >>>> }; >>>> >>>> Histogram DoIt(const int* image, int size) { >>>> >>>> Histogram histogram; >>>> >>>> for (int i = 0; i < size; ++i) { >>>> >>>> ++histogram.values[image[i]]; // (A) >>>> >>>> ++histogram.total; // (B) >>>> >>>> } >>>> >>>> return histogram; >>>> >>>> } >>>> ``` >>>> >>>> In this code, the compiler does not know anything about the values of >>>> images[i], so it assumes that 256 is a possible value for it. In that >>> case, >>>> (A) would change the value of histogram.total, so (B) has to load, add >>> one >>>> and store [godbolt <https://godbolt.org/z/KxE343>]. >>>> >>>> Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use >>>> values to form a pointer to total and dereference it. What valid C/C++ >>> code >>>> is allowed to do with values is: >>>> - Form any pointer in [values, values + 256]. >>>> - Form and dereference any pointer in [values, values + 256) >>>> >>>> Note that the LLVM memory model is much laxer than that of C/C++. It has >>> no >>>> notion of types. In particular, given an LLVM aggregate definition: >>>> >>>> ``` >>>> %struct.S = type { [42 x i32], i32, i32 } >>>> ``` >>>> >>>> It is perfectly valid to use an address derived from a GEP(0,0,%i) [gep >>>> reference] representing indexing into the [42 x i32] array to load the >>> i32 >>>> member at index 2. It is also valid for %i to be 43 (though not 44 if an >>>> inbound GEP is used). >>>> So clang has to give LLVM more information about the C/C++ rules. >>>> >>>> *IR representation:* >>>> LLVM has several ways of representing ranges of values: >>>> - *!range* metadata can be attached to integer call and load >>> instructions >>>> to indicate the allowed range of values of the result. LLVM's >>> ValueTracking >>>> provides a function for querying the range for any llvm::Variable. >>>> - The *llvm.assume* intrinsic takes a boolean condition that can also >>> be >>>> used by ValueTracking to infer range of values. >>>> - The *inrange* attribute of GEP can be used to indicate C-like >>> semantics >>>> for the structure field marked with the inrange attribute. It can only be >>>> used for GEP constantexprs (ie.e. GEPs defined inline), but not for >>>> standalone GEPs defining instructions. relevant discussion >>>> <https://reviews.llvm.org/D22793?id=65626#inline-194653>. >>>> >>>> Alternatives: >>>> *(1) *Annotate each array subscript index value with a range, e.g.: >>>> ``` >>>> %i = i64 … >>>> %ri = call i64 @llvm.annotation.i64(%index), !range !0 >>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, >>> i32 >>>> %ri >>>> ... >>>> !0 = !{i64 0, i64 42} >>>> ``` >>>> *(2) *(variant of 1) relax the constraint that !range metadata can only >>> be >>>> set on call and load instructions, and set the !range metadata on the >>> index >>>> expression. We still need annotations for function parameters though: >>>> ``` >>>> %i = i64 … , !range !0 >>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, >>> i32 >>>> %i >>>> ... >>>> !0 = !{i64 0, i64 42} >>>> ``` >>>> This is slightly more compact. >>>> >>>> *(3)* Same as (1), with llvm.assume. This feels inferior to annotations. >>>> *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this >>> will >>>> interfere with optimizations. >>> I would very much like not to introduce another way to encode >>> assumptions other than `llvm.assume`. If you want to avoid the extra >>> instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`, >>> which is in line with our move towards operand bundle use. >>> >> Thanks, I did not know about that. I've just tried it but it appears that >> tags have to be attribute names, and `!range` is not a valid attribute, >> it's a metadata node. Is there a way to encode this ? > > We need to get rid of that assertion. There are other non-attributes > to be used in assume operand bundles in the (near) future, so the this > work has to be done anyway.+1 on trying to use assume, rather than adding another way. But are value ranges special for assumes, so that we need to handle them in a bundle? Is that just so we can easier skip ‘artificial’ assume users? Cheers, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/bf9fb888/attachment.html>