Cameron McInally via llvm-dev
2016-Oct-20 15:54 UTC
[llvm-dev] [AVX512BW] Nasty KAND issue
Hey guys, I've hit a pretty nasty issue on SKX with ANDs of masks <= 4 bits. In the IR, we represent a 4b vector mask as <4 x i1>. This assumes that the storage container for this type is also 4b, but it's not. The smallest mask register on SKX is 8b. This also implies that the smallest load/store moves 8b. We run into problems when we try to optimize ANDs (full test case attached): %r1 = and <4 x i1> %r0, <i1 -1, i1 -1, i1 -1, i1 -1> At the IR level the all1s mask looks like the Identity for this operation, so LLVM will remove it. But it is not the Identity since this operation should clear the top 4 bits of the 8 bit hardware register in play. E.g. kmovb -4(%rsp), %k0 kandb %k0, %k1, %k0 kmovb %k0, -4(%rsp) I began tracking down this issue and found that InstCombine will incorrectly remove the AND. Then I noticed that the Reassociate pass would also remove the AND if InstCombine did not. That made me nervous. My current thinking is that this might be a larger problem that shouldn't be patched up. Or maybe I made a faulty assumption with the IR I choose for this operation. Any help would be greatly appreciated. Thanks, Cam -------------- next part -------------- A non-text attachment was scrubbed... Name: test.ll Type: application/octet-stream Size: 396 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161020/75b6020c/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.s Type: application/octet-stream Size: 436 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161020/75b6020c/attachment-0001.obj>
> On Oct 20, 2016, at 8:54 AM, Cameron McInally via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hey guys, > > I've hit a pretty nasty issue on SKX with ANDs of masks <= 4 bits. > > In the IR, we represent a 4b vector mask as <4 x i1>. This assumes > that the storage container for this type is also 4b, but it's not.The storage type is not relevant, these bits are “unreachable” from the IR point of view. The backend is supposed to lower the operation in a safe way when it is needed to clear these bit. For example if you were to perform some arithmetic operation on these, it is likely that they would get zero extended to 8bits first and this is where the upper bits would be cleared.> The > smallest mask register on SKX is 8b. This also implies that the > smallest load/store moves 8b. > > We run into problems when we try to optimize ANDs (full test case attached): > > %r1 = and <4 x i1> %r0, <i1 -1, i1 -1, i1 -1, i1 -1> > > At the IR level the all1s mask looks like the Identity for this > operation, so LLVM will remove it. But it is not the Identity since > this operation should clear the top 4 bits of the 8 bit hardware > register in play. E.g.No, this operation alone does not need to clear the upper bit, they are undefined before and after.> > kmovb -4(%rsp), %k0 > kandb %k0, %k1, %k0 > kmovb %k0, -4(%rsp) > > I began tracking down this issue and found that InstCombine will > incorrectly remove the AND. Then I noticed that the Reassociate pass > would also remove the AND if InstCombine did not. That made me > nervous. My current thinking is that this might be a larger problem > that shouldn't be patched up. Or maybe I made a faulty assumption with > the IR I choose for this operation.There might be a legitimate issue, but your example fails short to illustrate it right now: you’re not showing how these upper bits are leaking into the computation somewhere? — Mehdi
Cameron McInally via llvm-dev
2016-Oct-20 16:26 UTC
[llvm-dev] [AVX512BW] Nasty KAND issue
On Thu, Oct 20, 2016 at 12:05 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:> >> On Oct 20, 2016, at 8:54 AM, Cameron McInally via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hey guys, >> >> I've hit a pretty nasty issue on SKX with ANDs of masks <= 4 bits. >> >> In the IR, we represent a 4b vector mask as <4 x i1>. This assumes >> that the storage container for this type is also 4b, but it's not. > > The storage type is not relevant, these bits are “unreachable” from the IR point of view. > The backend is supposed to lower the operation in a safe way when it is needed to clear these bit. > > For example if you were to perform some arithmetic operation on these, it is likely that they would get zero extended to 8bits first and this is where the upper bits would be cleared. > > >> The >> smallest mask register on SKX is 8b. This also implies that the >> smallest load/store moves 8b. >> >> We run into problems when we try to optimize ANDs (full test case attached): >> >> %r1 = and <4 x i1> %r0, <i1 -1, i1 -1, i1 -1, i1 -1> >> >> At the IR level the all1s mask looks like the Identity for this >> operation, so LLVM will remove it. But it is not the Identity since >> this operation should clear the top 4 bits of the 8 bit hardware >> register in play. E.g. > > No, this operation alone does not need to clear the upper bit, they are undefined before and after. > > >> >> kmovb -4(%rsp), %k0 >> kandb %k0, %k1, %k0 >> kmovb %k0, -4(%rsp) >> >> I began tracking down this issue and found that InstCombine will >> incorrectly remove the AND. Then I noticed that the Reassociate pass >> would also remove the AND if InstCombine did not. That made me >> nervous. My current thinking is that this might be a larger problem >> that shouldn't be patched up. Or maybe I made a faulty assumption with >> the IR I choose for this operation. > > There might be a legitimate issue, but your example fails short to illustrate it right now: you’re not showing how these upper bits are leaking into the computation somewhere? > > — > MehdiHi Mehdi, Yes, good point. Updated test case exhibiting the dirty bits attached. Notice that the kortest will operate on the dirty bits that should have been zeroed. Perhaps the problem is that the zext of the i4 to i16 does not get generated correctly. -Cam -------------- next part -------------- A non-text attachment was scrubbed... Name: test.ll Type: application/octet-stream Size: 376 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161020/8d31fe9d/attachment.obj>
Possibly Parallel Threads
- [AVX512BW] Nasty KAND issue
- [AVX512BW] Nasty KAND issue
- Fastest way to find the last index k such that x[k] < y in a sorted vector x?
- pcie-expander-bus doesn't support pcie-pci-bridge and pcie-switch-upstream-port
- [X86][AVX512] RFC: make i1 illegal in the Codegen