thr3ads.net - llvm dev - [llvm-dev] Possible AVX512 codegen bug in LLVM 10.0.1? [Sep 2020]

If this information is useful, please help other people find it:
Share via:

TB Schardl via llvm-dev

2020-Sep-05 03:49 UTC

[llvm-dev] Possible AVX512 codegen bug in LLVM 10.0.1?

Hey LLVMDev,

Perhaps I'm missing something, but I think I've stumbled across a
codegen
bug in LLVM 10.0.1 related to AVX512.  I've attached a small LLVM IR
testcase and generated x86_64 assembly file that shows the bug.

The test case is small, but not quite minimal, mostly because of driver
code included in the test case so one can compile and run the program.  The
program does a simple vectorizable computation two ways — once with a
vectorized loop, and then with a recursive function that contains a
vectorized loop at its base case — and then compares the results of those
two computations.  If it behaves correctly, both computations should
produce the same result, and the program should produce no output.  But
right now it seems that the recursive-function version produces roughly
half incorrect results, in a repeating pattern of 4 correct results
followed by 4 incorrect results.  (There are also some commented-out lines
in the LLVM file, from my own testing of alternative implementations to
confirm that the recurisve-function code is otherwise correct.)

The crux seems to be that the recursive function, _Z7loopdacllPjl, takes a
vector of 8 64-bit integers as one of its arguments.  There's no issue with
such an argument in LLVM IR, but the generated assembly seems to be
incorrect.  Examining the assembly file, it seems that _Z7loopdacllPjl
loads this vector argument off the stack with a 64-byte reload (notably on
line 78).  But before the call to _Z7loopdacllPjl from main (line 595), I
only see a single 32-byte spill corresponding to this vector argument.
Hence, it seems that the vectorized loop in _Z7loopdacllPjl gets a vector
half-filled with garbage values, leading to the observed misbehavior.

I'm not familiar enough with LLVM's x86_64 backend to understand why it
generates this particular assembly.  But the generated assembly seems
incorrect to me.  Am I missing something?

Please let me know if there's any other information you need from me.

Cheers,
TB
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/53671b02/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: avx512_codegen_bug.ll
Type: application/octet-stream
Size: 40283 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/53671b02/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: avx512_codegen_bug.s
Type: application/octet-stream
Size: 32126 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/53671b02/attachment-0003.obj>

Craig Topper via llvm-dev

2020-Sep-05 04:11 UTC

head link

[llvm-dev] Possible AVX512 codegen bug in LLVM 10.0.1?

I believe this is an interaction with our method for avoiding zmm registers
on skylake-avx512 by default. The clang frontend adds a function attribute
"min-legal-vector-width" to tell about any explicit vectors used in
function arguments, returns, inline assembly, or x86intrin.h intrinsics
used by the C code. The backend uses this to know if any 512 bit vectors it
sees came from the user code or from the auto vectorizers. If it came from
user code we need to use zmm, but if it came from the auto vectorizers
we're allowed to split into smaller vectors.

In your case your main function has the "min-legal-vector-width"
attribute
set to 0 which means the original C code was all scalar. None of the other
functions have the attribute. So the backend thinks any vectors it sees in
main came from the auto vectorizers and are allowed to be split. Lack of
attribute is treated conservatively. We assume that the vector widths
weren't checked. So any 512-bit vectors will use zmm in the other functions.

I notice in the ll file that the call to main has been modified to use a
vector when it didn't originally. So clang didn't see the vector when it
generated the code. I think you can remove the min-legal-vector-width
attribute to fix your issue.

Hope that helps. Let me know if you have any questions.

~Craig

On Fri, Sep 4, 2020 at 8:49 PM TB Schardl via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hey LLVMDev,
>
> Perhaps I'm missing something, but I think I've stumbled across a
codegen
> bug in LLVM 10.0.1 related to AVX512.  I've attached a small LLVM IR
> testcase and generated x86_64 assembly file that shows the bug.
>
> The test case is small, but not quite minimal, mostly because of driver
> code included in the test case so one can compile and run the program.  The
> program does a simple vectorizable computation two ways — once with a
> vectorized loop, and then with a recursive function that contains a
> vectorized loop at its base case — and then compares the results of those
> two computations.  If it behaves correctly, both computations should
> produce the same result, and the program should produce no output.  But
> right now it seems that the recursive-function version produces roughly
> half incorrect results, in a repeating pattern of 4 correct results
> followed by 4 incorrect results.  (There are also some commented-out lines
> in the LLVM file, from my own testing of alternative implementations to
> confirm that the recurisve-function code is otherwise correct.)
>
> The crux seems to be that the recursive function, _Z7loopdacllPjl, takes
> a vector of 8 64-bit integers as one of its arguments.  There's no
issue
> with such an argument in LLVM IR, but the generated assembly seems to be
> incorrect.  Examining the assembly file, it seems that _Z7loopdacllPjl
> loads this vector argument off the stack with a 64-byte reload (notably on
> line 78).  But before the call to _Z7loopdacllPjl from main (line 595), I
> only see a single 32-byte spill corresponding to this vector argument.
> Hence, it seems that the vectorized loop in _Z7loopdacllPjl gets a vector
> half-filled with garbage values, leading to the observed misbehavior.
>
> I'm not familiar enough with LLVM's x86_64 backend to understand
why it
> generates this particular assembly.  But the generated assembly seems
> incorrect to me.  Am I missing something?
>
> Please let me know if there's any other information you need from me.
>
> Cheers,
> TB
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/dfe6e7c9/attachment.html>

Craig Topper via llvm-dev

2020-Sep-05 04:16 UTC

head link

[llvm-dev] Possible AVX512 codegen bug in LLVM 10.0.1?

I forgot, another option is to compile your main with
-mprefer-vector-width=512 which will add another attribute
"prefer-vector-width" to main that will tell the backend to not split
512
bit vectors either.

~Craig


On Fri, Sep 4, 2020 at 9:11 PM Craig Topper <craig.topper at gmail.com>
wrote:
> I believe this is an interaction with our method for avoiding zmm
> registers on skylake-avx512 by default. The clang frontend adds a function
> attribute "min-legal-vector-width" to tell about any explicit
vectors used
> in function arguments, returns, inline assembly, or x86intrin.h intrinsics
> used by the C code. The backend uses this to know if any 512 bit vectors it
> sees came from the user code or from the auto vectorizers. If it came from
> user code we need to use zmm, but if it came from the auto vectorizers
> we're allowed to split into smaller vectors.
>
> In your case your main function has the "min-legal-vector-width"
attribute
> set to 0 which means the original C code was all scalar. None of the other
> functions have the attribute. So the backend thinks any vectors it sees in
> main came from the auto vectorizers and are allowed to be split. Lack of
> attribute is treated conservatively. We assume that the vector widths
> weren't checked. So any 512-bit vectors will use zmm in the other
functions.
>
> I notice in the ll file that the call to main has been modified to use a
> vector when it didn't originally. So clang didn't see the vector
when it
> generated the code. I think you can remove the min-legal-vector-width
> attribute to fix your issue.
>
> Hope that helps. Let me know if you have any questions.
>
> ~Craig
>
>
> On Fri, Sep 4, 2020 at 8:49 PM TB Schardl via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hey LLVMDev,
>>
>> Perhaps I'm missing something, but I think I've stumbled across
a codegen
>> bug in LLVM 10.0.1 related to AVX512.  I've attached a small LLVM
IR
>> testcase and generated x86_64 assembly file that shows the bug.
>>
>> The test case is small, but not quite minimal, mostly because of driver
>> code included in the test case so one can compile and run the program. 
The
>> program does a simple vectorizable computation two ways — once with a
>> vectorized loop, and then with a recursive function that contains a
>> vectorized loop at its base case — and then compares the results of
those
>> two computations.  If it behaves correctly, both computations should
>> produce the same result, and the program should produce no output.  But
>> right now it seems that the recursive-function version produces roughly
>> half incorrect results, in a repeating pattern of 4 correct results
>> followed by 4 incorrect results.  (There are also some commented-out
lines
>> in the LLVM file, from my own testing of alternative implementations to
>> confirm that the recurisve-function code is otherwise correct.)
>>
>> The crux seems to be that the recursive function, _Z7loopdacllPjl,
takes
>> a vector of 8 64-bit integers as one of its arguments.  There's no
issue
>> with such an argument in LLVM IR, but the generated assembly seems to
be
>> incorrect.  Examining the assembly file, it seems that _Z7loopdacllPjl
>> loads this vector argument off the stack with a 64-byte reload (notably
on
>> line 78).  But before the call to _Z7loopdacllPjl from main (line 595),
>> I only see a single 32-byte spill corresponding to this vector
argument.
>> Hence, it seems that the vectorized loop in _Z7loopdacllPjl gets a
>> vector half-filled with garbage values, leading to the observed
misbehavior.
>>
>> I'm not familiar enough with LLVM's x86_64 backend to
understand why it
>> generates this particular assembly.  But the generated assembly seems
>> incorrect to me.  Am I missing something?
>>
>> Please let me know if there's any other information you need from
me.
>>
>> Cheers,
>> TB
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/2458034d/attachment.html>

llvm dev - Sep 2020 - Possible AVX512 codegen bug in LLVM 10.0.1?

[llvm-dev] Possible AVX512 codegen bug in LLVM 10.0.1?

[llvm-dev] Possible AVX512 codegen bug in LLVM 10.0.1?

[llvm-dev] Possible AVX512 codegen bug in LLVM 10.0.1?