Nagurne, James via llvm-dev
2021-Mar-26 18:41 UTC
[llvm-dev] Unsigned integer underflow in HardwareLoops pass (PPC, perhaps ARM)
Our team is developing on a downstream target that utilizes the HardwareLoops
pass and have found that it generates unexpected code with regards to a
regression test that we have. I've not 100% vetted the test itself with
regards to the specifics of the C standard, but logically it makes sense:
I have the test up on Compiler Explorer, and the offending code can be
duplicated from a stock trunk clang on PowerPC: https://godbolt.org/z/KzW3nYjra
The test itself intends to ensure that small-width loop counters are not
promoted. It does this by constructing a loop with an unsigned 8-bit value and
purposefully underflowing line 20 with '--count'. What is expected to
happen is that the 8-bit value underflows to 0xFF, and the loop goes on to
execute 256 times, exiting the loop and returning 0. In the failure case where p
increments past the end of buffer, the test returns 1. I believe this failure
case is optimized out as undefined behavior.
In the PowerPC disassembly of the compiled test:
mr 30, 3
...
mtctr 30
.LBB0_1: # =>This Inner Loop Header: Depth=1
bdnz .LBB0_1
1. r3 (count) is placed into r30
2. The memset (*p++ = 0) portion of the loop is factored out into an actual
call to memset
3. r30 (count) is placed into the CTR
4. The CTR is used in bdnz
* With a quick glance at the definition of that instruction, the
decrement happens before the compare. This means that the CTR may underflow, and
will end up as either 0xffffffff or 0xffffffffffffffff
* The CTR will be compared to 0 and, now being a large positive value,
will not be 0
* The branch will occur, repeating a-c a finite but undesirable number of
times
Digging slightly deeper into the pass itself, the inserted intrinsics don't
seem to care about the original counter type past the point where the counter is
used in the hardware loop count initialization intrinsic:
entry:
br label %do.body
do.body: ; preds = %do.body, %entry
%count.addr.0 = phi i8 [ %count, %entry ], [ %dec, %do.body ]
%dec = add i8 %count.addr.0, -1
%cmp1.not = icmp eq i8 %dec, 0
br i1 %cmp1.not, label %do.end, label %do.body, !llvm.loop !2
Becomes
entry:
%0 = zext i8 %count to i32
call void @llvm.set.loop.iterations.i32(i32 %0)
br label %do.body
do.body: ; preds = %do.body, %entry
%1 = call i1 @llvm.loop.decrement.i32(i32 1)
br i1 %1, label %do.body, label %do.end, !llvm.loop !2
This seems like an oversight, albeit a very edge-casey one.
J.B. Nagurne
Code Generation
Texas Instruments
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210326/be3b83e4/attachment.html>
Sjoerd Meijer via llvm-dev
2021-Mar-29 17:45 UTC
[llvm-dev] Unsigned integer underflow in HardwareLoops pass (PPC, perhaps ARM)
Yep, that doesn't look good and deserves a PR and some more looking into.
Wondering why we haven't seen this before: I guess at higher optimisations
levels this problem is hidden by iteration count checks generated by the
vectoriser or loop unroller.
It is a bit of a funny test, as also shown by the code produced with a higher
opt level, but that shouldn't be an excuse I think.
Cheers,
Sjoerd.
________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Nagurne,
James via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 26 March 2021 18:41
To: 'llvm-dev at lists.llvm.org' <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Unsigned integer underflow in HardwareLoops pass (PPC,
perhaps ARM)
Our team is developing on a downstream target that utilizes the HardwareLoops
pass and have found that it generates unexpected code with regards to a
regression test that we have. I’ve not 100% vetted the test itself with regards
to the specifics of the C standard, but logically it makes sense:
I have the test up on Compiler Explorer, and the offending code can be
duplicated from a stock trunk clang on PowerPC: https://godbolt.org/z/KzW3nYjra
The test itself intends to ensure that small-width loop counters are not
promoted. It does this by constructing a loop with an unsigned 8-bit value and
purposefully underflowing line 20 with ‘--count’. What is expected to happen is
that the 8-bit value underflows to 0xFF, and the loop goes on to execute 256
times, exiting the loop and returning 0. In the failure case where p increments
past the end of buffer, the test returns 1. I believe this failure case is
optimized out as undefined behavior.
In the PowerPC disassembly of the compiled test:
mr 30, 3
…
mtctr 30
.LBB0_1: # =>This Inner Loop Header: Depth=1
bdnz .LBB0_1
1. r3 (count) is placed into r30
2. The memset (*p++ = 0) portion of the loop is factored out into an actual
call to memset
3. r30 (count) is placed into the CTR
4. The CTR is used in bdnz
* With a quick glance at the definition of that instruction, the
decrement happens before the compare. This means that the CTR may underflow, and
will end up as either 0xffffffff or 0xffffffffffffffff
* The CTR will be compared to 0 and, now being a large positive value,
will not be 0
* The branch will occur, repeating a-c a finite but undesirable number of
times
Digging slightly deeper into the pass itself, the inserted intrinsics don’t seem
to care about the original counter type past the point where the counter is used
in the hardware loop count initialization intrinsic:
entry:
br label %do.body
do.body: ; preds = %do.body, %entry
%count.addr.0 = phi i8 [ %count, %entry ], [ %dec, %do.body ]
%dec = add i8 %count.addr.0, -1
%cmp1.not = icmp eq i8 %dec, 0
br i1 %cmp1.not, label %do.end, label %do.body, !llvm.loop !2
Becomes
entry:
%0 = zext i8 %count to i32
call void @llvm.set.loop.iterations.i32(i32 %0)
br label %do.body
do.body: ; preds = %do.body, %entry
%1 = call i1 @llvm.loop.decrement.i32(i32 1)
br i1 %1, label %do.body, label %do.end, !llvm.loop !2
This seems like an oversight, albeit a very edge-casey one.
J.B. Nagurne
Code Generation
Texas Instruments
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210329/992220c9/attachment.html>