thr3ads.net - llvm dev - [llvm-dev] [RFC] Adding range metadata to array subscripts. [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Florian Hahn via llvm-dev

2021-Mar-24 17:47 UTC

[llvm-dev] [RFC] Adding range metadata to array subscripts.

> On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> On 3/24/21 9:06 AM, Clement Courbet wrote:
>> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert <
>> johannesdoerfert at gmail.com> wrote:
>> 
>>> I really like encoding more (range) information in the IR,
>>> more thoughts inlined.
>>> 
>>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote:
>>>> Hi everyone,
>>>> 
>>>> tl;dr: I would like to teach clang to output range metadata so
that LLVM
>>>> can do better alias analysis. I have a proposal as D99248
>>>> <https://reviews.llvm.org/D99248> (clang part) and D99247
>>>> <https://reviews.llvm.org/D99247> (llvm part). But there
are other
>>> possible
>>>> options that I'm detailing below.
>>>> 
>>>> Consider the following code, adapted from brotli
>>>> <https://en.wikipedia.org/wiki/Brotli>:
>>>> 
>>>> ```
>>>> 
>>>> struct Histogram {
>>>> 
>>>>    int values[256];
>>>> 
>>>>    int total;
>>>> 
>>>> };
>>>> 
>>>> Histogram DoIt(const int* image, int size) {
>>>> 
>>>>    Histogram histogram;
>>>> 
>>>>    for (int i = 0; i < size; ++i) {
>>>> 
>>>>      ++histogram.values[image[i]];  // (A)
>>>> 
>>>>      ++histogram.total;             // (B)
>>>> 
>>>>    }
>>>> 
>>>>    return histogram;
>>>> 
>>>> }
>>>> ```
>>>> 
>>>> In this code, the compiler does not know anything about the
values of
>>>> images[i], so it assumes that 256 is a possible value for it.
In that
>>> case,
>>>> (A) would change the value of histogram.total, so (B) has to
load, add
>>> one
>>>> and store [godbolt <https://godbolt.org/z/KxE343>].
>>>> 
>>>> Fortunately, C/C++ has a rule that it is invalid (actually, UB)
to use
>>>> values to form a pointer to total and dereference it. What
valid C/C++
>>> code
>>>> is allowed to do with values is:
>>>>   - Form any pointer in [values, values + 256].
>>>>   - Form and dereference any pointer in [values, values + 256)
>>>> 
>>>> Note that the LLVM memory model is much laxer than that of
C/C++. It has
>>> no
>>>> notion of types. In particular, given an LLVM aggregate
definition:
>>>> 
>>>> ```
>>>> %struct.S = type { [42 x i32], i32, i32 }
>>>> ```
>>>> 
>>>> It is perfectly valid to use an address derived from a
GEP(0,0,%i)  [gep
>>>> reference] representing indexing into the [42 x i32] array to
load the
>>> i32
>>>> member at index 2. It is also valid for %i to be 43 (though not
44 if an
>>>> inbound GEP is used).
>>>> So clang has to give LLVM more information about the C/C++
rules.
>>>> 
>>>> *IR representation:*
>>>> LLVM has several ways of representing ranges of values:
>>>>   - *!range* metadata can be attached to integer call and load
>>> instructions
>>>> to indicate the allowed range of values of the result.
LLVM's
>>> ValueTracking
>>>> provides a function for querying the range for any
llvm::Variable.
>>>>   - The *llvm.assume* intrinsic takes a boolean condition that
can also
>>> be
>>>> used by ValueTracking to infer range of values.
>>>>   - The *inrange* attribute of GEP can be used to indicate
C-like
>>> semantics
>>>> for the structure field marked with the inrange attribute. It
can only be
>>>> used for GEP constantexprs (ie.e. GEPs defined inline), but not
for
>>>> standalone GEPs defining instructions.  relevant discussion
>>>> <https://reviews.llvm.org/D22793?id=65626#inline-194653>.
>>>> 
>>>> Alternatives:
>>>> *(1) *Annotate each array subscript index value with a range,
e.g.:
>>>> ```
>>>> %i = i64 …
>>>> %ri =  call i64 @llvm.annotation.i64(%index), !range !0
>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0,
i32 0,
>>> i32
>>>> %ri
>>>> ...
>>>> !0 = !{i64 0, i64 42}
>>>> ```
>>>> *(2) *(variant of 1) relax the constraint that !range metadata
can only
>>> be
>>>> set on call and load instructions, and set the !range metadata
on the
>>> index
>>>> expression. We still need annotations for function parameters
though:
>>>> ```
>>>> %i = i64 … , !range !0
>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0,
i32 0,
>>> i32
>>>> %i
>>>> ...
>>>> !0 = !{i64 0, i64 42}
>>>> ```
>>>> This is slightly more compact.
>>>> 
>>>> *(3)* Same as (1), with llvm.assume. This feels inferior to
annotations.
>>>> *(4)* Extend inrange to non-constantexprs GEPs. It is unclear
how this
>>> will
>>>> interfere with optimizations.
>>> I would very much like not to introduce another way to encode
>>> assumptions other than `llvm.assume`. If you want to avoid the
extra
>>> instructions, use `llvm.assume(i1 true) ["range"(%val,
%lb, %ub)]`,
>>> which is in line with our move towards operand bundle use.
>>> 
>> Thanks, I did not know about that. I've just tried it but it
appears that
>> tags have to be attribute names, and `!range` is not a valid attribute,
>> it's a metadata node. Is there a way to encode this ?
> 
> We need to get rid of that assertion. There are other non-attributes
> to be used in assume operand bundles in the (near) future, so the this
> work has to be done anyway.

+1 on trying to use assume, rather than adding another way. 

But are value ranges special for assumes, so that we need to handle them in a
bundle? Is that just so we can easier skip ‘artificial’ assume users?

Cheers,
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/bf9fb888/attachment.html>

Johannes Doerfert via llvm-dev

2021-Mar-24 19:32 UTC

head link

[llvm-dev] [RFC] Adding range metadata to array subscripts.

On 3/24/21 12:47 PM, Florian Hahn wrote:>
>> On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>>
>>
>> On 3/24/21 9:06 AM, Clement Courbet wrote:
>>> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert <
>>> johannesdoerfert at gmail.com> wrote:
>>>
>>>> I really like encoding more (range) information in the IR,
>>>> more thoughts inlined.
>>>>
>>>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote:
>>>>> Hi everyone,
>>>>>
>>>>> tl;dr: I would like to teach clang to output range metadata
so that LLVM
>>>>> can do better alias analysis. I have a proposal as D99248
>>>>> <https://reviews.llvm.org/D99248> (clang part) and
D99247
>>>>> <https://reviews.llvm.org/D99247> (llvm part). But
there are other
>>>> possible
>>>>> options that I'm detailing below.
>>>>>
>>>>> Consider the following code, adapted from brotli
>>>>> <https://en.wikipedia.org/wiki/Brotli>:
>>>>>
>>>>> ```
>>>>>
>>>>> struct Histogram {
>>>>>
>>>>>     int values[256];
>>>>>
>>>>>     int total;
>>>>>
>>>>> };
>>>>>
>>>>> Histogram DoIt(const int* image, int size) {
>>>>>
>>>>>     Histogram histogram;
>>>>>
>>>>>     for (int i = 0; i < size; ++i) {
>>>>>
>>>>>       ++histogram.values[image[i]];  // (A)
>>>>>
>>>>>       ++histogram.total;             // (B)
>>>>>
>>>>>     }
>>>>>
>>>>>     return histogram;
>>>>>
>>>>> }
>>>>> ```
>>>>>
>>>>> In this code, the compiler does not know anything about the
values of
>>>>> images[i], so it assumes that 256 is a possible value for
it. In that
>>>> case,
>>>>> (A) would change the value of histogram.total, so (B) has
to load, add
>>>> one
>>>>> and store [godbolt <https://godbolt.org/z/KxE343>].
>>>>>
>>>>> Fortunately, C/C++ has a rule that it is invalid (actually,
UB) to use
>>>>> values to form a pointer to total and dereference it. What
valid C/C++
>>>> code
>>>>> is allowed to do with values is:
>>>>>    - Form any pointer in [values, values + 256].
>>>>>    - Form and dereference any pointer in [values, values +
256)
>>>>>
>>>>> Note that the LLVM memory model is much laxer than that of
C/C++. It has
>>>> no
>>>>> notion of types. In particular, given an LLVM aggregate
definition:
>>>>>
>>>>> ```
>>>>> %struct.S = type { [42 x i32], i32, i32 }
>>>>> ```
>>>>>
>>>>> It is perfectly valid to use an address derived from a
GEP(0,0,%i)  [gep
>>>>> reference] representing indexing into the [42 x i32] array
to load the
>>>> i32
>>>>> member at index 2. It is also valid for %i to be 43 (though
not 44 if an
>>>>> inbound GEP is used).
>>>>> So clang has to give LLVM more information about the C/C++
rules.
>>>>>
>>>>> *IR representation:*
>>>>> LLVM has several ways of representing ranges of values:
>>>>>    - *!range* metadata can be attached to integer call and
load
>>>> instructions
>>>>> to indicate the allowed range of values of the result.
LLVM's
>>>> ValueTracking
>>>>> provides a function for querying the range for any
llvm::Variable.
>>>>>    - The *llvm.assume* intrinsic takes a boolean condition
that can also
>>>> be
>>>>> used by ValueTracking to infer range of values.
>>>>>    - The *inrange* attribute of GEP can be used to indicate
C-like
>>>> semantics
>>>>> for the structure field marked with the inrange attribute.
It can only be
>>>>> used for GEP constantexprs (ie.e. GEPs defined inline), but
not for
>>>>> standalone GEPs defining instructions.  relevant discussion
>>>>>
<https://reviews.llvm.org/D22793?id=65626#inline-194653>.
>>>>>
>>>>> Alternatives:
>>>>> *(1) *Annotate each array subscript index value with a
range, e.g.:
>>>>> ```
>>>>> %i = i64 …
>>>>> %ri =  call i64 @llvm.annotation.i64(%index), !range !0
>>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s,
i64 0, i32 0,
>>>> i32
>>>>> %ri
>>>>> ...
>>>>> !0 = !{i64 0, i64 42}
>>>>> ```
>>>>> *(2) *(variant of 1) relax the constraint that !range
metadata can only
>>>> be
>>>>> set on call and load instructions, and set the !range
metadata on the
>>>> index
>>>>> expression. We still need annotations for function
parameters though:
>>>>> ```
>>>>> %i = i64 … , !range !0
>>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s,
i64 0, i32 0,
>>>> i32
>>>>> %i
>>>>> ...
>>>>> !0 = !{i64 0, i64 42}
>>>>> ```
>>>>> This is slightly more compact.
>>>>>
>>>>> *(3)* Same as (1), with llvm.assume. This feels inferior to
annotations.
>>>>> *(4)* Extend inrange to non-constantexprs GEPs. It is
unclear how this
>>>> will
>>>>> interfere with optimizations.
>>>> I would very much like not to introduce another way to encode
>>>> assumptions other than `llvm.assume`. If you want to avoid the
extra
>>>> instructions, use `llvm.assume(i1 true)
["range"(%val, %lb, %ub)]`,
>>>> which is in line with our move towards operand bundle use.
>>>>
>>> Thanks, I did not know about that. I've just tried it but it
appears that
>>> tags have to be attribute names, and `!range` is not a valid
attribute,
>>> it's a metadata node. Is there a way to encode this ?
>> We need to get rid of that assertion. There are other non-attributes
>> to be used in assume operand bundles in the (near) future, so the this
>> work has to be done anyway.
>
> +1 on trying to use assume, rather than adding another way.
>
> But are value ranges special for assumes, so that we need to handle them in
a bundle? Is that just so we can easier skip ‘artificial’ assume users?
It would make users explicit and we will have non-attribute bundles anyway.
I find it also "conceptually nicer", would you prefer explicit
instructions?

~ Johannes

> Cheers,
> Florian

llvm dev - Mar 2021 - [RFC] Adding range metadata to array subscripts.

[llvm-dev] [RFC] Adding range metadata to array subscripts.

[llvm-dev] [RFC] Adding range metadata to array subscripts.