Johannes Doerfert via llvm-dev
2021-Mar-24 13:20 UTC
[llvm-dev] [RFC] Adding range metadata to array subscripts.
I really like encoding more (range) information in the IR, more thoughts inlined. On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote:> Hi everyone, > > tl;dr: I would like to teach clang to output range metadata so that LLVM > can do better alias analysis. I have a proposal as D99248 > <https://reviews.llvm.org/D99248> (clang part) and D99247 > <https://reviews.llvm.org/D99247> (llvm part). But there are other possible > options that I'm detailing below. > > Consider the following code, adapted from brotli > <https://en.wikipedia.org/wiki/Brotli>: > > ``` > > struct Histogram { > > int values[256]; > > int total; > > }; > > Histogram DoIt(const int* image, int size) { > > Histogram histogram; > > for (int i = 0; i < size; ++i) { > > ++histogram.values[image[i]]; // (A) > > ++histogram.total; // (B) > > } > > return histogram; > > } > ``` > > In this code, the compiler does not know anything about the values of > images[i], so it assumes that 256 is a possible value for it. In that case, > (A) would change the value of histogram.total, so (B) has to load, add one > and store [godbolt <https://godbolt.org/z/KxE343>]. > > Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use > values to form a pointer to total and dereference it. What valid C/C++ code > is allowed to do with values is: > - Form any pointer in [values, values + 256]. > - Form and dereference any pointer in [values, values + 256) > > Note that the LLVM memory model is much laxer than that of C/C++. It has no > notion of types. In particular, given an LLVM aggregate definition: > > ``` > %struct.S = type { [42 x i32], i32, i32 } > ``` > > It is perfectly valid to use an address derived from a GEP(0,0,%i) [gep > reference] representing indexing into the [42 x i32] array to load the i32 > member at index 2. It is also valid for %i to be 43 (though not 44 if an > inbound GEP is used). > So clang has to give LLVM more information about the C/C++ rules. > > *IR representation:* > LLVM has several ways of representing ranges of values: > - *!range* metadata can be attached to integer call and load instructions > to indicate the allowed range of values of the result. LLVM's ValueTracking > provides a function for querying the range for any llvm::Variable. > - The *llvm.assume* intrinsic takes a boolean condition that can also be > used by ValueTracking to infer range of values. > - The *inrange* attribute of GEP can be used to indicate C-like semantics > for the structure field marked with the inrange attribute. It can only be > used for GEP constantexprs (ie.e. GEPs defined inline), but not for > standalone GEPs defining instructions. relevant discussion > <https://reviews.llvm.org/D22793?id=65626#inline-194653>. > > Alternatives: > *(1) *Annotate each array subscript index value with a range, e.g.: > ``` > %i = i64 … > %ri = call i64 @llvm.annotation.i64(%index), !range !0 > %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, i32 > %ri > ... > !0 = !{i64 0, i64 42} > ``` > *(2) *(variant of 1) relax the constraint that !range metadata can only be > set on call and load instructions, and set the !range metadata on the index > expression. We still need annotations for function parameters though: > ``` > %i = i64 … , !range !0 > %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, i32 > %i > ... > !0 = !{i64 0, i64 42} > ``` > This is slightly more compact. > > *(3)* Same as (1), with llvm.assume. This feels inferior to annotations. > *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this will > interfere with optimizations.I would very much like not to introduce another way to encode assumptions other than `llvm.assume`. If you want to avoid the extra instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`, which is in line with our move towards operand bundle use. SCEV should be thought about this (as well), unsure what the problem is you describe below. If BasicAA needs to know, sure. ~ Johannes> > *On the clang side*: > The clang part is quite trivial as the infrastructure is already in place > to emit dynamic ubsan guards: D99248 <https://reviews.llvm.org/D99248> > > *On the LLVM Side:* > Alternatives: > *(A)* - (annotation or assume options) Simply enable scev-aa which knows > how to handle value ranges in general. IIUC it's not enabled in clang > because it has issues with invalidation when code changes, and is therefore > not cacheable. This makes it too slow to be practical. > *(B) *- (annotation or assume options) Teach BasicAA to honor !range > metadata (D99247 <https://reviews.llvm.org/D99247>) > *(C)* - (inrange option) Teach BasicAA to honor inrange attributes of GEP. > > I was leaning towards (1) and (B) because: > - BasicAA already has basic support for value range analysis > (zero/nonzero), this is a small and natural extension. > - The BasicAA improvement already benefits some existing code (as > evidenced by the test changes in D99247 <https://reviews.llvm.org/D99247>) > - Using range metadata rather than the `inrange` attribute means that > BasicAA will automatically benefit from improvements in value tracking in > the future. > > Opinions ? > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Clement Courbet via llvm-dev
2021-Mar-24 14:06 UTC
[llvm-dev] [RFC] Adding range metadata to array subscripts.
On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:> I really like encoding more (range) information in the IR, > more thoughts inlined. > > On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote: > > Hi everyone, > > > > tl;dr: I would like to teach clang to output range metadata so that LLVM > > can do better alias analysis. I have a proposal as D99248 > > <https://reviews.llvm.org/D99248> (clang part) and D99247 > > <https://reviews.llvm.org/D99247> (llvm part). But there are other > possible > > options that I'm detailing below. > > > > Consider the following code, adapted from brotli > > <https://en.wikipedia.org/wiki/Brotli>: > > > > ``` > > > > struct Histogram { > > > > int values[256]; > > > > int total; > > > > }; > > > > Histogram DoIt(const int* image, int size) { > > > > Histogram histogram; > > > > for (int i = 0; i < size; ++i) { > > > > ++histogram.values[image[i]]; // (A) > > > > ++histogram.total; // (B) > > > > } > > > > return histogram; > > > > } > > ``` > > > > In this code, the compiler does not know anything about the values of > > images[i], so it assumes that 256 is a possible value for it. In that > case, > > (A) would change the value of histogram.total, so (B) has to load, add > one > > and store [godbolt <https://godbolt.org/z/KxE343>]. > > > > Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use > > values to form a pointer to total and dereference it. What valid C/C++ > code > > is allowed to do with values is: > > - Form any pointer in [values, values + 256]. > > - Form and dereference any pointer in [values, values + 256) > > > > Note that the LLVM memory model is much laxer than that of C/C++. It has > no > > notion of types. In particular, given an LLVM aggregate definition: > > > > ``` > > %struct.S = type { [42 x i32], i32, i32 } > > ``` > > > > It is perfectly valid to use an address derived from a GEP(0,0,%i) [gep > > reference] representing indexing into the [42 x i32] array to load the > i32 > > member at index 2. It is also valid for %i to be 43 (though not 44 if an > > inbound GEP is used). > > So clang has to give LLVM more information about the C/C++ rules. > > > > *IR representation:* > > LLVM has several ways of representing ranges of values: > > - *!range* metadata can be attached to integer call and load > instructions > > to indicate the allowed range of values of the result. LLVM's > ValueTracking > > provides a function for querying the range for any llvm::Variable. > > - The *llvm.assume* intrinsic takes a boolean condition that can also > be > > used by ValueTracking to infer range of values. > > - The *inrange* attribute of GEP can be used to indicate C-like > semantics > > for the structure field marked with the inrange attribute. It can only be > > used for GEP constantexprs (ie.e. GEPs defined inline), but not for > > standalone GEPs defining instructions. relevant discussion > > <https://reviews.llvm.org/D22793?id=65626#inline-194653>. > > > > Alternatives: > > *(1) *Annotate each array subscript index value with a range, e.g.: > > ``` > > %i = i64 … > > %ri = call i64 @llvm.annotation.i64(%index), !range !0 > > %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, > i32 > > %ri > > ... > > !0 = !{i64 0, i64 42} > > ``` > > *(2) *(variant of 1) relax the constraint that !range metadata can only > be > > set on call and load instructions, and set the !range metadata on the > index > > expression. We still need annotations for function parameters though: > > ``` > > %i = i64 … , !range !0 > > %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, > i32 > > %i > > ... > > !0 = !{i64 0, i64 42} > > ``` > > This is slightly more compact. > > > > *(3)* Same as (1), with llvm.assume. This feels inferior to annotations. > > *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this > will > > interfere with optimizations. > > I would very much like not to introduce another way to encode > assumptions other than `llvm.assume`. If you want to avoid the extra > instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`, > which is in line with our move towards operand bundle use. >Thanks, I did not know about that. I've just tried it but it appears that tags have to be attribute names, and `!range` is not a valid attribute, it's a metadata node. Is there a way to encode this ?> SCEV should be thought about this (as well), unsure what the problem > is you describe below. If BasicAA needs to know, sure. >scev-aa already knows how to use range information. If we add a !range metadata in clang right now and use SCEV, there is nothing to do on the LLVM side. My point was that scev-aa is not enabled in the default pipeline, so we might as well teach BasicAA about this cheap case. Actually I think it makes sense to teach BasicAA about range information anyway (D99247) given that it could already be useful in cases like: ``` define dso_local void @DoIt(%struct.Histogram* noalias nocapture sret( %struct.Histogram) align 4 %0, i32 %1, *i8 zeroext %2*) local_unnamed_addr #0 { ... *%6 = zext i8 %2 to i64* %7 = getelementptr inbounds %struct.Histogram, %struct.Histogram* %0, i64 0 , i32 0, *i64 %6* ``` Where ValueTracking could easily deduce that %6 is in [0,255].> > ~ Johannes > > > > > > *On the clang side*: > > The clang part is quite trivial as the infrastructure is already in place > > to emit dynamic ubsan guards: D99248 <https://reviews.llvm.org/D99248> > > > > *On the LLVM Side:* > > Alternatives: > > *(A)* - (annotation or assume options) Simply enable scev-aa which knows > > how to handle value ranges in general. IIUC it's not enabled in clang > > because it has issues with invalidation when code changes, and is > therefore > > not cacheable. This makes it too slow to be practical. > > *(B) *- (annotation or assume options) Teach BasicAA to honor !range > > metadata (D99247 <https://reviews.llvm.org/D99247>) > > *(C)* - (inrange option) Teach BasicAA to honor inrange attributes of > GEP. > > > > I was leaning towards (1) and (B) because: > > - BasicAA already has basic support for value range analysis > > (zero/nonzero), this is a small and natural extension. > > - The BasicAA improvement already benefits some existing code (as > > evidenced by the test changes in D99247 <https://reviews.llvm.org/D99247 > >) > > - Using range metadata rather than the `inrange` attribute means that > > BasicAA will automatically benefit from improvements in value tracking in > > the future. > > > > Opinions ? > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/024fedfe/attachment.html>