James Courtier-Dutton via llvm-dev
2021-Mar-28 08:49 UTC
[llvm-dev] [RFC] Adding range metadata to array subscripts.
Hi, char* test_fill(int size) { char *test1 = malloc(size) for (n = 0; n <= size; n++) { test1[n] = 'A'; } } Would it be worth making the "range" information a little richer and be able to use algebraic expressions as well as numeric ranges. Note: the above example code has an off by one overflow, and it would be helpful if one could catch that at compile time. In this case, it could catch that n must be less than size, and not less than or equal to size. Thus putting the range value on the test1 pointer as being from address of test1 to test1 + (size - 1) This can only be achieved if algebraic expressions are used for ranges, and not just constant values. Actual use cases can get much more complicated with for example, non-contiguous ranges. e.g. 0,1,4,5 ok, but 2,3,6,7 not ok. Another useful thing to catch at compile time, would be a warning that a pointer is being dereferenced, and we were not able to apply a range expression to it. I.e. warn about unbounded dereferences. I think it would be useful to at least consider how we would capture this more complex range information/metadata in LLVM IR. Kind Regards James> >>>> On 3/24/21 9:06 AM, Clement Courbet wrote: > >>>>> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert < > >>>>> johannesdoerfert at gmail.com> wrote: > >>>>> > >>>>>> I really like encoding more (range) information in the IR, > >>>>>> more thoughts inlined. > >>>>>> > >>>>>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote: > >>>>>>> struct Histogram { > >>>>>>> > >>>>>>> int values[256]; > >>>>>>> > >>>>>>> int total; > >>>>>>> > >>>>>>> }; > >>>>>>> > >>>>>>> Histogram DoIt(const int* image, int size) { > >>>>>>> > >>>>>>> Histogram histogram; > >>>>>>> > >>>>>>> for (int i = 0; i < size; ++i) { > >>>>>>> > >>>>>>> ++histogram.values[image[i]]; // (A) > >>>>>>> > >>>>>>> ++histogram.total; // (B) > >>>>>>> > >>>>>>> } > >>>>>>> > >>>>>>> return histogram; > >>>>>>> > >>>>>>> }
Johannes Doerfert via llvm-dev
2021-Mar-28 15:44 UTC
[llvm-dev] [RFC] Adding range metadata to array subscripts.
On 3/28/21 3:49 AM, James Courtier-Dutton wrote:> Hi, > > char* test_fill(int size) { > char *test1 = malloc(size) > for (n = 0; n <= size; n++) { > test1[n] = 'A'; > } > } > > Would it be worth making the "range" information a little richer and > be able to use algebraic expressions as well as numeric ranges. > Note: the above example code has an off by one overflow, and it would > be helpful if one could catch that at compile time. > In this case, it could catch that n must be less than size, and not > less than or equal to size. > Thus putting the range value on the test1 pointer as being from > address of test1 to test1 + (size - 1) > > This can only be achieved if algebraic expressions are used for > ranges, and not just constant values. > Actual use cases can get much more complicated with for example, > non-contiguous ranges. e.g. 0,1,4,5 ok, but 2,3,6,7 not ok. > > Another useful thing to catch at compile time, would be a warning that > a pointer is being dereferenced, and we were not able to apply a range > expression to it. I.e. warn about unbounded dereferences. > > I think it would be useful to at least consider how we would capture > this more complex range information/metadata in LLVM IR.I think what you want is the max object extend attribute, formerly known as max object size when we only wanted to track an upper bound next revision shall also include the lower one: https://reviews.llvm.org/D87975 If we allow values instead of only constants you can "properly" generate warnings, using SCEV to determine the range of `n` above. That said, in operand bundles we can generally allow non-constant values, e.g., `["range"(%p, i32 0, i32 %N)]` ~ Johannes> Kind Regards > > James > > > > >>>>>> On 3/24/21 9:06 AM, Clement Courbet wrote: >>>>>>> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert < >>>>>>> johannesdoerfert at gmail.com> wrote: >>>>>>> >>>>>>>> I really like encoding more (range) information in the IR, >>>>>>>> more thoughts inlined. >>>>>>>> >>>>>>>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote: >>>>>>>>> struct Histogram { >>>>>>>>> >>>>>>>>> int values[256]; >>>>>>>>> >>>>>>>>> int total; >>>>>>>>> >>>>>>>>> }; >>>>>>>>> >>>>>>>>> Histogram DoIt(const int* image, int size) { >>>>>>>>> >>>>>>>>> Histogram histogram; >>>>>>>>> >>>>>>>>> for (int i = 0; i < size; ++i) { >>>>>>>>> >>>>>>>>> ++histogram.values[image[i]]; // (A) >>>>>>>>> >>>>>>>>> ++histogram.total; // (B) >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> return histogram; >>>>>>>>> >>>>>>>>> }