Blank, Guy via llvm-dev
2017-Jan-24 11:54 UTC
[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
Hi All,
AVX-512 introduced the K mask registers and masked operations which make a
natural choice for legalizing vectors of i1's.
For example,
define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) {
%r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32
4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1
true, i1 true>, <8 x i32> undef)
ret 8 x i32>%r
}
Can be lowered to
# BB#0:
kxnorw %k0, %k0, %k1
vpgatherqd (,%zmm1), %ymm0 {%k1}
retq
Legal vectors of i1's require support for BUILD_VECTOR(i1, i1, .., i1), i1
EXTRACT_VEC_ELEMENT (...) and INSERT_VEC_ELEMENT(i1, ...) , so making i1 legal
seemed like a sensible decision, and this is the current state in the top of
trunk.
However, making i1 legal affected instruction selection of scalar code as well.
Currently, there are cases where operations producing or consuming i1's are
selected (sub-optimally) to instructions that act on K-regs.
PR28650<https://llvm.org/bugs/show_bug.cgi?id=28650> is an example showing
that i1's live-in or live-out of basic-blocks are being selected to K
register classes, even though we don't want this to happen. This problem
does not happen on subtargets without the AVX-512 feature enabled.
The following is the AVX-512 code from the bug report:
# BB#0: # %entry
testb $1, %dil
je .LBB0_1
# BB#2: # %if
pushq %rax
callq bar
# kill: %AL<def> %AL<kill>
%EAX<def>
andl $1, %eax
kmovw %eax, %k0
addq $8, %rsp
jmp .LBB0_3
.LBB0_1:
kxnorw %k0, %k0, %k0
kshiftrw $15, %k0, %k0
.LBB0_3: # %else
kmovw %k0, %eax
# kill: %AL<def> %AL<kill>
%EAX<kill>
Retq
The kmov,kxnor,kshiftr instructions here are the instructions operating on K
registers. These are undesirable in the purely scalar input code.
Having a type that can be possibly legalized to two different register classes
exposes a fundamental limitation of the current instruction selection framework,
and that is we cannot always make the right decision about live-in/live-out
i1's because we cannot see beyond the boundary of the current basic-block we
are visiting. As a side-note, with GlobalISel this can be solved, since we see
the entire use-def chain at the function level.
Our initial thought was to write a pass that will be run after ISel to correct
bad selections. The pass would examine the use-def chains containing values that
were selected to K-regsiter classes, and, when profitable, re-assign the values
to GPR register classes (and replace the producing/consuming instructions
accordingly). But even with this fix-up pass, we would still be losing many ISel
pattern-matching rules that will be missed because the instruction set acting on
GPR is richer than the instruction set acting on K-regs. For example, a test
trying to match the sbb instruction:
define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {
entry:
%cmp = icmp ugt i32 %x, %y
%dec = sext i1 %cmp to i32
%dec.res = add nsw i32 %dec, %res
ret i32 %dec.res
}
Generates the following with AVX2:
# BB#0: # %entry
cmpl %edi, %esi
sbbl $0, %edx
movl %edx, %eax
retq
While AVX512 produces:
# BB#0: # %entry
xorl %ecx, %ecx
cmpl %esi, %edi
movl $-1, %eax
cmovbel %ecx, %eax
addl %edx, %eax
retq
So we would still end-up with cases where when the AVX-512 feature is enabled,
instruction selection for scalar code becomes inferior.
Finally, we suggest to undo the above issues cause by legalizing i1, by making
i1 illegal. This would make instruction selection of scalar code identical for
both cases when the AVX-512 feature is on and off. As for supporting
BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we believe we can
support these operations even when i1 is illegal and the vectors of i1 *are*
legal by using the i8 type instead of i1, as it should be implicitly
truncated/extended to the element type of the vNi1 vectors.
I am now working on a patch that will implement this approach.
Would appreciate to get feedback and comments.
Thanks,
Guy
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/af7cd1bb/attachment.html>
Martin J. O'Riordan via llvm-dev
2017-Jan-24 13:07 UTC
[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
I can't comment specifically on the impact on your target and 'i1',
but
there is an issue with LLVM that I do have a concern about.
Having a type that can be possibly legalized to two different register
classes exposes a fundamental limitation of the current instruction
selection framework
one of the problems I have encountered with LLVM is that I "do" want
to be
able to legalise and optimise for 2 (or more) register classes for the same
type, and LLVM does not really cope with this well. But it is not 'i1'
to
scalar versus vector that I run into the limitation, but small vectors and
large vectors.
In our architecture, we have two register files that can be used for SIMD
operations, one is 32-bits and the other is 128-bits. But quite often due
to register pressure or simply to reduce moving information, I would like to
be able to place something like 'v2i16' or 'v4i8' vectors into
either the
32-bit SIMD capable register class or into the low bits of a 128-bit SIMD
capable register class. I expect that other chip architectures have similar
capabilities.
Your statement above is true, but making it illegal means that these kinds
of SIMD transformations also become illegal.
MartinO
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Blank,
Guy via llvm-dev
Sent: 24 January 2017 11:54
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
Hi All,
AVX-512 introduced the K mask registers and masked operations which make a
natural choice for legalizing vectors of i1's.
For example,
define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) {
%r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32
4, <8 x
i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1
true>, <8 x i32> undef)
ret 8 x i32>%r
}
Can be lowered to
# BB#0:
kxnorw %k0, %k0, %k1
vpgatherqd (,%zmm1), %ymm0 {%k1}
retq
Legal vectors of i1's require support for BUILD_VECTOR(i1, i1, .., i1), i1
EXTRACT_VEC_ELEMENT (.) and INSERT_VEC_ELEMENT(i1, .) , so making i1 legal
seemed like a sensible decision, and this is the current state in the top of
trunk.
However, making i1 legal affected instruction selection of scalar code as
well. Currently, there are cases where operations producing or consuming
i1's are selected (sub-optimally) to instructions that act on K-regs.
PR28650 <https://llvm.org/bugs/show_bug.cgi?id=28650> is an example
showing
that i1's live-in or live-out of basic-blocks are being selected to K
register classes, even though we don't want this to happen. This problem
does not happen on subtargets without the AVX-512 feature enabled.
The following is the AVX-512 code from the bug report:
# BB#0: # %entry
testb $1, %dil
je .LBB0_1
# BB#2: # %if
pushq %rax
callq bar
# kill: %AL<def> %AL<kill>
%EAX<def>
andl $1, %eax
kmovw %eax, %k0
addq $8, %rsp
jmp .LBB0_3
.LBB0_1:
kxnorw %k0, %k0, %k0
kshiftrw $15, %k0, %k0
.LBB0_3: # %else
kmovw %k0, %eax
# kill: %AL<def> %AL<kill>
%EAX<kill>
Retq
The kmov,kxnor,kshiftr instructions here are the instructions operating on K
registers. These are undesirable in the purely scalar input code.
Having a type that can be possibly legalized to two different register
classes exposes a fundamental limitation of the current instruction
selection framework, and that is we cannot always make the right decision
about live-in/live-out i1's because we cannot see beyond the boundary of the
current basic-block we are visiting. As a side-note, with GlobalISel this
can be solved, since we see the entire use-def chain at the function level.
Our initial thought was to write a pass that will be run after ISel to
correct bad selections. The pass would examine the use-def chains containing
values that were selected to K-regsiter classes, and, when profitable,
re-assign the values to GPR register classes (and replace the
producing/consuming instructions accordingly). But even with this fix-up
pass, we would still be losing many ISel pattern-matching rules that will be
missed because the instruction set acting on GPR is richer than the
instruction set acting on K-regs. For example, a test trying to match the
sbb instruction:
define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {
entry:
%cmp = icmp ugt i32 %x, %y
%dec = sext i1 %cmp to i32
%dec.res = add nsw i32 %dec, %res
ret i32 %dec.res
}
Generates the following with AVX2:
# BB#0: # %entry
cmpl %edi, %esi
sbbl $0, %edx
movl %edx, %eax
retq
While AVX512 produces:
# BB#0: # %entry
xorl %ecx, %ecx
cmpl %esi, %edi
movl $-1, %eax
cmovbel %ecx, %eax
addl %edx, %eax
retq
So we would still end-up with cases where when the AVX-512 feature is
enabled, instruction selection for scalar code becomes inferior.
Finally, we suggest to undo the above issues cause by legalizing i1, by
making i1 illegal. This would make instruction selection of scalar code
identical for both cases when the AVX-512 feature is on and off. As for
supporting BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we
believe we can support these operations even when i1 is illegal and the
vectors of i1 *are* legal by using the i8 type instead of i1, as it should
be implicitly truncated/extended to the element type of the vNi1 vectors.
I am now working on a patch that will implement this approach.
Would appreciate to get feedback and comments.
Thanks,
Guy
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/f0ad3b3f/attachment-0001.html>
Hal Finkel via llvm-dev
2017-Jan-24 15:34 UTC
[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
On 01/24/2017 05:54 AM, Blank, Guy via llvm-dev wrote:> > Hi All, > > AVX-512 introduced the K mask registers and masked operations which > make a natural choice for legalizing vectors of i1’s. > > For example, > > define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) { > > %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 > x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, > i1 true>, <8 x i32> undef) > > ret 8 x i32>%r > > } > > Can be lowered to > > # BB#0: > > kxnorw %k0, %k0, %k1 > > vpgatherqd (,%zmm1), %ymm0 {%k1} > > retq > > Legal vectors of i1’s require support for BUILD_VECTOR(i1, i1, .., > i1), i1 EXTRACT_VEC_ELEMENT (…) and INSERT_VEC_ELEMENT(i1, …) , so > making i1 legal seemed like a sensible decision, and this is the > current state in the top of trunk. > > However, making i1 legal affected instruction selection of scalar code > as well. Currently, there are cases where operations producing or > consuming i1’s are selected (sub-optimally) to instructions that act > on K-regs. > > PR28650 <https://llvm.org/bugs/show_bug.cgi?id=28650> is an example > showing that i1’s live-in or live-out of basic-blocks are being > selected to K register classes, even though we don’t want this to > happen. This problem does not happen on subtargets without the AVX-512 > feature enabled. > The following is the AVX-512 code from the bug report: > > # BB#0: # %entry > > testb $1, %dil > > je .LBB0_1 > > # BB#2: # %if > > pushq %rax > > callq bar > > # kill: %AL<def> %AL<kill> %EAX<def> > > andl $1, %eax > > kmovw %eax, %k0 > > addq $8, %rsp > > jmp .LBB0_3 > > .LBB0_1: > > kxnorw %k0, %k0, %k0 > > kshiftrw $15, %k0, %k0 > > .LBB0_3: # %else > > kmovw %k0, %eax > > # kill: %AL<def> %AL<kill> %EAX<kill> > > Retq > > The kmov,kxnor,kshiftr instructions here are the instructions > operating on K registers. These are undesirable in the purely scalar > input code. > > Having a type that can be possibly legalized to two different register > classes exposes a fundamental limitation of the current instruction > selection framework, and that is we cannot always make the right > decision about live-in/live-out i1’s because we cannot see beyond the > boundary of the current basic-block we are visiting. As a side-note, > with GlobalISel this can be solved, since we see the entire use-def > chain at the function level. >Exactly. I certainly hope we'll be able to address this sensibly with GlobalISel.> Our initial thought was to write a pass that will be run after ISel to > correct bad selections. The pass would examine the use-def chains > containing values that were selected to K-regsiter classes, and, when > profitable, re-assign the values to GPR register classes (and replace > the producing/consuming instructions accordingly). But even with this > fix-up pass, we would still be losing many ISel pattern-matching rules > that will be missed because the instruction set acting on GPR is > richer than the instruction set acting on K-regs. For example, a test > trying to match the sbb instruction: >I think you'd want to do the fixup for these before/during isel, not afterward. PowerPC does some of this (see lib/Target/PowerPC/PPCBoolRetToInt.cpp and DAGCombineTruncBoolExt/DAGCombineExtBoolTrunc in lib/Target/PowerPC/PPCISelLowering.cpp). That code should trivially generalize to other targets. There are some places where we do this kind of thing after isel as well (e.g. lib/Target/AArch64/AArch64AdvSIMDScalarPass.cpp). That having been said, if you don't have actual i1 registers in which you'd like to keep and manipulate boolean values, marking i1 as illegal makes sense to me. -Hal> define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone > ssp { > > entry: > > %cmp = icmp ugt i32 %x, %y > > %dec = sext i1 %cmp to i32 > > %dec.res = add nsw i32 %dec, %res > > ret i32 %dec.res > > } > > Generates the following with AVX2: > > # BB#0: # %entry > > cmpl %edi, %esi > > sbbl $0, %edx > > movl %edx, %eax > > retq > > While AVX512 produces: > > # BB#0: # %entry > > xorl %ecx, %ecx > > cmpl %esi, %edi > > movl $-1, %eax > > cmovbel %ecx, %eax > > addl %edx, %eax > > retq > > So we would still end-up with cases where when the AVX-512 feature is > enabled, instruction selection for scalar code becomes inferior. > > Finally, we suggest to undo the above issues cause by legalizing i1, > by making i1 illegal. This would make instruction selection of scalar > code identical for both cases when the AVX-512 feature is on and off. > As for supporting BUILD_VECTOR, EXTRACT_VEC_ELEMENT and > INSERT_VEC_ELEMENT, we believe we can support these operations even > when i1 is illegal and the vectors of i1 **are** legal by using the i8 > type instead of i1, as it should be implicitly truncated/extended to > the element type of the vNi1 vectors. > I am now working on a patch that will implement this approach. > > Would appreciate to get feedback and comments. > > Thanks, > > Guy > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/23cb8f68/attachment.html>
Gerolf Hoflehner via llvm-dev
2017-Jan-25 01:38 UTC
[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
What is a good way to collect test cases for GISel expectations (in this case handle i1 efficiently)? It would be great to build up a repository of tests as opportunities/potentials pop up. Thanks Gerolf> On Jan 24, 2017, at 7:34 AM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 01/24/2017 05:54 AM, Blank, Guy via llvm-dev wrote: >> Hi All, >> >> AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1’s. >> For example, >> >> define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) { >> %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef) >> ret 8 x i32>%r >> } >> >> Can be lowered to >> >> # BB#0: >> kxnorw %k0, %k0, %k1 >> vpgatherqd (,%zmm1), %ymm0 {%k1} >> retq >> >> >> Legal vectors of i1’s require support for BUILD_VECTOR(i1, i1, .., i1), i1 EXTRACT_VEC_ELEMENT (…) and INSERT_VEC_ELEMENT(i1, …) , so making i1 legal seemed like a sensible decision, and this is the current state in the top of trunk. >> >> However, making i1 legal affected instruction selection of scalar code as well. Currently, there are cases where operations producing or consuming i1’s are selected (sub-optimally) to instructions that act on K-regs. >> PR28650 <https://llvm.org/bugs/show_bug.cgi?id=28650> is an example showing that i1’s live-in or live-out of basic-blocks are being selected to K register classes, even though we don’t want this to happen. This problem does not happen on subtargets without the AVX-512 feature enabled. >> The following is the AVX-512 code from the bug report: >> >> # BB#0: # %entry >> testb $1, %dil >> je .LBB0_1 >> # BB#2: # %if >> pushq %rax >> callq bar >> # kill: %AL<def> %AL<kill> %EAX<def> >> andl $1, %eax >> kmovw %eax, %k0 >> addq $8, %rsp >> jmp .LBB0_3 >> .LBB0_1: >> kxnorw %k0, %k0, %k0 >> kshiftrw $15, %k0, %k0 >> .LBB0_3: # %else >> kmovw %k0, %eax >> # kill: %AL<def> %AL<kill> %EAX<kill> >> Retq >> >> The kmov,kxnor,kshiftr instructions here are the instructions operating on K registers. These are undesirable in the purely scalar input code. >> >> >> Having a type that can be possibly legalized to two different register classes exposes a fundamental limitation of the current instruction selection framework, and that is we cannot always make the right decision about live-in/live-out i1’s because we cannot see beyond the boundary of the current basic-block we are visiting. As a side-note, with GlobalISel this can be solved, since we see the entire use-def chain at the function level. > > Exactly. I certainly hope we'll be able to address this sensibly with GlobalISel. > >> >> Our initial thought was to write a pass that will be run after ISel to correct bad selections. The pass would examine the use-def chains containing values that were selected to K-regsiter classes, and, when profitable, re-assign the values to GPR register classes (and replace the producing/consuming instructions accordingly). But even with this fix-up pass, we would still be losing many ISel pattern-matching rules that will be missed because the instruction set acting on GPR is richer than the instruction set acting on K-regs. For example, a test trying to match the sbb instruction: > > I think you'd want to do the fixup for these before/during isel, not afterward. PowerPC does some of this (see lib/Target/PowerPC/PPCBoolRetToInt.cpp and DAGCombineTruncBoolExt/DAGCombineExtBoolTrunc in lib/Target/PowerPC/PPCISelLowering.cpp). That code should trivially generalize to other targets. > > There are some places where we do this kind of thing after isel as well (e.g. lib/Target/AArch64/AArch64AdvSIMDScalarPass.cpp). > > That having been said, if you don't have actual i1 registers in which you'd like to keep and manipulate boolean values, marking i1 as illegal makes sense to me. > > -Hal > >> >> define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp { >> entry: >> %cmp = icmp ugt i32 %x, %y >> %dec = sext i1 %cmp to i32 >> %dec.res = add nsw i32 %dec, %res >> ret i32 %dec.res >> } >> >> Generates the following with AVX2: >> # BB#0: # %entry >> cmpl %edi, %esi >> sbbl $0, %edx >> movl %edx, %eax >> retq >> >> While AVX512 produces: >> # BB#0: # %entry >> xorl %ecx, %ecx >> cmpl %esi, %edi >> movl $-1, %eax >> cmovbel %ecx, %eax >> addl %edx, %eax >> retq >> >> So we would still end-up with cases where when the AVX-512 feature is enabled, instruction selection for scalar code becomes inferior. >> >> Finally, we suggest to undo the above issues cause by legalizing i1, by making i1 illegal. This would make instruction selection of scalar code identical for both cases when the AVX-512 feature is on and off. As for supporting BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we believe we can support these operations even when i1 is illegal and the vectors of i1 *are* legal by using the i8 type instead of i1, as it should be implicitly truncated/extended to the element type of the vNi1 vectors. >> I am now working on a patch that will implement this approach. >> >> Would appreciate to get feedback and comments. >> >> Thanks, >> Guy >> >> >> --------------------------------------------------------------------- >> Intel Israel (74) Limited >> >> This e-mail and any attachments may contain confidential material for >> the sole use of the intended recipient(s). Any review or distribution >> by others is strictly prohibited. If you are not the intended >> recipient, please contact the sender and delete all copies. >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/1dbbe22c/attachment.html>
Blank, Guy via llvm-dev
2017-Feb-02 08:03 UTC
[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
Hi Martin,
Your statement above is true, but making it illegal means that
these kinds of SIMD transformations also become illegal.
Not sure I understand what you mean here.
Even with i1 illegal, we would still be able to copy from GPR to K registers
when profitable.
Regards,
Guy
From: Martin J. O'Riordan [mailto:martin.oriordan at movidius.com]
Sent: Tuesday, January 24, 2017 15:07
To: Blank, Guy <guy.blank at intel.com>
Cc: 'LLVM Developers' <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
I can't comment specifically on the impact on your target and 'i1',
but there is an issue with LLVM that I do have a concern about.
Having a type that can be possibly legalized to two different register classes
exposes a fundamental limitation of the current instruction selection framework
one of the problems I have encountered with LLVM is that I "do" want
to be able to legalise and optimise for 2 (or more) register classes for the
same type, and LLVM does not really cope with this well. But it is not
'i1' to scalar versus vector that I run into the limitation, but small
vectors and large vectors.
In our architecture, we have two register files that can be used for SIMD
operations, one is 32-bits and the other is 128-bits. But quite often due to
register pressure or simply to reduce moving information, I would like to be
able to place something like 'v2i16' or 'v4i8' vectors into
either the 32-bit SIMD capable register class or into the low bits of a 128-bit
SIMD capable register class. I expect that other chip architectures have
similar capabilities.
Your statement above is true, but making it illegal means that these kinds of
SIMD transformations also become illegal.
MartinO
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Blank,
Guy via llvm-dev
Sent: 24 January 2017 11:54
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: [llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
Hi All,
AVX-512 introduced the K mask registers and masked operations which make a
natural choice for legalizing vectors of i1's.
For example,
define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) {
%r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32
4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1
true, i1 true>, <8 x i32> undef)
ret 8 x i32>%r
}
Can be lowered to
# BB#0:
kxnorw %k0, %k0, %k1
vpgatherqd (,%zmm1), %ymm0 {%k1}
retq
Legal vectors of i1's require support for BUILD_VECTOR(i1, i1, .., i1), i1
EXTRACT_VEC_ELEMENT (...) and INSERT_VEC_ELEMENT(i1, ...) , so making i1 legal
seemed like a sensible decision, and this is the current state in the top of
trunk.
However, making i1 legal affected instruction selection of scalar code as well.
Currently, there are cases where operations producing or consuming i1's are
selected (sub-optimally) to instructions that act on K-regs.
PR28650<https://llvm.org/bugs/show_bug.cgi?id=28650> is an example showing
that i1's live-in or live-out of basic-blocks are being selected to K
register classes, even though we don't want this to happen. This problem
does not happen on subtargets without the AVX-512 feature enabled.
The following is the AVX-512 code from the bug report:
# BB#0: # %entry
testb $1, %dil
je .LBB0_1
# BB#2: # %if
pushq %rax
callq bar
# kill: %AL<def> %AL<kill>
%EAX<def>
andl $1, %eax
kmovw %eax, %k0
addq $8, %rsp
jmp .LBB0_3
.LBB0_1:
kxnorw %k0, %k0, %k0
kshiftrw $15, %k0, %k0
.LBB0_3: # %else
kmovw %k0, %eax
# kill: %AL<def> %AL<kill>
%EAX<kill>
Retq
The kmov,kxnor,kshiftr instructions here are the instructions operating on K
registers. These are undesirable in the purely scalar input code.
Having a type that can be possibly legalized to two different register classes
exposes a fundamental limitation of the current instruction selection framework,
and that is we cannot always make the right decision about live-in/live-out
i1's because we cannot see beyond the boundary of the current basic-block we
are visiting. As a side-note, with GlobalISel this can be solved, since we see
the entire use-def chain at the function level.
Our initial thought was to write a pass that will be run after ISel to correct
bad selections. The pass would examine the use-def chains containing values that
were selected to K-regsiter classes, and, when profitable, re-assign the values
to GPR register classes (and replace the producing/consuming instructions
accordingly). But even with this fix-up pass, we would still be losing many ISel
pattern-matching rules that will be missed because the instruction set acting on
GPR is richer than the instruction set acting on K-regs. For example, a test
trying to match the sbb instruction:
define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {
entry:
%cmp = icmp ugt i32 %x, %y
%dec = sext i1 %cmp to i32
%dec.res = add nsw i32 %dec, %res
ret i32 %dec.res
}
Generates the following with AVX2:
# BB#0: # %entry
cmpl %edi, %esi
sbbl $0, %edx
movl %edx, %eax
retq
While AVX512 produces:
# BB#0: # %entry
xorl %ecx, %ecx
cmpl %esi, %edi
movl $-1, %eax
cmovbel %ecx, %eax
addl %edx, %eax
retq
So we would still end-up with cases where when the AVX-512 feature is enabled,
instruction selection for scalar code becomes inferior.
Finally, we suggest to undo the above issues cause by legalizing i1, by making
i1 illegal. This would make instruction selection of scalar code identical for
both cases when the AVX-512 feature is on and off. As for supporting
BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we believe we can
support these operations even when i1 is illegal and the vectors of i1 *are*
legal by using the i8 type instead of i1, as it should be implicitly
truncated/extended to the element type of the vNi1 vectors.
I am now working on a patch that will implement this approach.
Would appreciate to get feedback and comments.
Thanks,
Guy
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170202/703af50a/attachment.html>
Blank, Guy via llvm-dev
2017-Feb-02 11:34 UTC
[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
Hi Hal,
Thanks for the pointers to existing passes.
About doing the fixup before ISel, I think it will be difficult to determine
profitability of switching from GPR to K registers that early, except for in
relatively simple cases like the PowerPC example you gave.
As for doing it during ISel, we will not be able to solve any cross-BB issues
that way.
That being said, I do agree that if there are things that can be solved early,
we should attempt to do so.
Thanks,
Guy
From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: Tuesday, January 24, 2017 17:35
To: Blank, Guy <guy.blank at intel.com>; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
On 01/24/2017 05:54 AM, Blank, Guy via llvm-dev wrote:
Hi All,
AVX-512 introduced the K mask registers and masked operations which make a
natural choice for legalizing vectors of i1’s.
For example,
define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) {
%r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32
4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1
true, i1 true>, <8 x i32> undef)
ret 8 x i32>%r
}
Can be lowered to
# BB#0:
kxnorw %k0, %k0, %k1
vpgatherqd (,%zmm1), %ymm0 {%k1}
retq
Legal vectors of i1’s require support for BUILD_VECTOR(i1, i1, .., i1), i1
EXTRACT_VEC_ELEMENT (…) and INSERT_VEC_ELEMENT(i1, …) , so making i1 legal
seemed like a sensible decision, and this is the current state in the top of
trunk.
However, making i1 legal affected instruction selection of scalar code as well.
Currently, there are cases where operations producing or consuming i1’s are
selected (sub-optimally) to instructions that act on K-regs.
PR28650<https://llvm.org/bugs/show_bug.cgi?id=28650> is an example showing
that i1’s live-in or live-out of basic-blocks are being selected to K register
classes, even though we don’t want this to happen. This problem does not happen
on subtargets without the AVX-512 feature enabled.
The following is the AVX-512 code from the bug report:
# BB#0: # %entry
testb $1, %dil
je .LBB0_1
# BB#2: # %if
pushq %rax
callq bar
# kill: %AL<def> %AL<kill>
%EAX<def>
andl $1, %eax
kmovw %eax, %k0
addq $8, %rsp
jmp .LBB0_3
.LBB0_1:
kxnorw %k0, %k0, %k0
kshiftrw $15, %k0, %k0
.LBB0_3: # %else
kmovw %k0, %eax
# kill: %AL<def> %AL<kill>
%EAX<kill>
Retq
The kmov,kxnor,kshiftr instructions here are the instructions operating on K
registers. These are undesirable in the purely scalar input code.
Having a type that can be possibly legalized to two different register classes
exposes a fundamental limitation of the current instruction selection framework,
and that is we cannot always make the right decision about live-in/live-out i1’s
because we cannot see beyond the boundary of the current basic-block we are
visiting. As a side-note, with GlobalISel this can be solved, since we see the
entire use-def chain at the function level.
Exactly. I certainly hope we'll be able to address this sensibly with
GlobalISel.
Our initial thought was to write a pass that will be run after ISel to correct
bad selections. The pass would examine the use-def chains containing values that
were selected to K-regsiter classes, and, when profitable, re-assign the values
to GPR register classes (and replace the producing/consuming instructions
accordingly). But even with this fix-up pass, we would still be losing many ISel
pattern-matching rules that will be missed because the instruction set acting on
GPR is richer than the instruction set acting on K-regs. For example, a test
trying to match the sbb instruction:
I think you'd want to do the fixup for these before/during isel, not
afterward. PowerPC does some of this (see lib/Target/PowerPC/PPCBoolRetToInt.cpp
and DAGCombineTruncBoolExt/DAGCombineExtBoolTrunc in
lib/Target/PowerPC/PPCISelLowering.cpp). That code should trivially generalize
to other targets.
There are some places where we do this kind of thing after isel as well (e.g.
lib/Target/AArch64/AArch64AdvSIMDScalarPass.cpp).
That having been said, if you don't have actual i1 registers in which
you'd like to keep and manipulate boolean values, marking i1 as illegal
makes sense to me.
-Hal
define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {
entry:
%cmp = icmp ugt i32 %x, %y
%dec = sext i1 %cmp to i32
%dec.res = add nsw i32 %dec, %res
ret i32 %dec.res
}
Generates the following with AVX2:
# BB#0: # %entry
cmpl %edi, %esi
sbbl $0, %edx
movl %edx, %eax
retq
While AVX512 produces:
# BB#0: # %entry
xorl %ecx, %ecx
cmpl %esi, %edi
movl $-1, %eax
cmovbel %ecx, %eax
addl %edx, %eax
retq
So we would still end-up with cases where when the AVX-512 feature is enabled,
instruction selection for scalar code becomes inferior.
Finally, we suggest to undo the above issues cause by legalizing i1, by making
i1 illegal. This would make instruction selection of scalar code identical for
both cases when the AVX-512 feature is on and off. As for supporting
BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we believe we can
support these operations even when i1 is illegal and the vectors of i1 *are*
legal by using the i8 type instead of i1, as it should be implicitly
truncated/extended to the element type of the vNi1 vectors.
I am now working on a patch that will implement this approach.
Would appreciate to get feedback and comments.
Thanks,
Guy
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170202/f78d97e2/attachment-0001.html>
Mehdi Amini via llvm-dev
2017-Feb-02 18:49 UTC
[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
> On Jan 24, 2017, at 3:54 AM, Blank, Guy via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi All, > > AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1’s. > For example, > > define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) { > %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef) > ret 8 x i32>%r > } > > Can be lowered to > > # BB#0: > kxnorw %k0, %k0, %k1 > vpgatherqd (,%zmm1), %ymm0 {%k1} > retq > > > Legal vectors of i1’s require support for BUILD_VECTOR(i1, i1, .., i1), i1 EXTRACT_VEC_ELEMENT (…) and INSERT_VEC_ELEMENT(i1, …) , so making i1 legal seemed like a sensible decision, and this is the current state in the top of trunk. > > However, making i1 legal affected instruction selection of scalar code as well. Currently, there are cases where operations producing or consuming i1’s are selected (sub-optimally) to instructions that act on K-regs. > PR28650 <https://llvm.org/bugs/show_bug.cgi?id=28650> is an example showing that i1’s live-in or live-out of basic-blocks are being selected to K register classes, even though we don’t want this to happen. This problem does not happen on subtargets without the AVX-512 feature enabled. > The following is the AVX-512 code from the bug report: > > # BB#0: # %entry > testb $1, %dil > je .LBB0_1 > # BB#2: # %if > pushq %rax > callq bar > # kill: %AL<def> %AL<kill> %EAX<def> > andl $1, %eax > kmovw %eax, %k0 > addq $8, %rsp > jmp .LBB0_3 > .LBB0_1: > kxnorw %k0, %k0, %k0 > kshiftrw $15, %k0, %k0 > .LBB0_3: # %else > kmovw %k0, %eax > # kill: %AL<def> %AL<kill> %EAX<kill> > Retq > > The kmov,kxnor,kshiftr instructions here are the instructions operating on K registers. These are undesirable in the purely scalar input code. > > > Having a type that can be possibly legalized to two different register classes exposes a fundamental limitation of the current instruction selection framework, and that is we cannot always make the right decision about live-in/live-out i1’s because we cannot see beyond the boundary of the current basic-block we are visiting. As a side-note, with GlobalISel this can be solved, since we see the entire use-def chain at the function level. > > Our initial thought was to write a pass that will be run after ISel to correct bad selections. The pass would examine the use-def chains containing values that were selected to K-regsiter classes, and, when profitable, re-assign the values to GPR register classes (and replace the producing/consuming instructions accordingly). But even with this fix-up pass, we would still be losing many ISel pattern-matching rules that will be missed because the instruction set acting on GPR is richer than the instruction set acting on K-regs. For example, a test trying to match the sbb instruction: > > define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp { > entry: > %cmp = icmp ugt i32 %x, %y > %dec = sext i1 %cmp to i32 > %dec.res = add nsw i32 %dec, %res > ret i32 %dec.res > } > > Generates the following with AVX2: > # BB#0: # %entry > cmpl %edi, %esi > sbbl $0, %edx > movl %edx, %eax > retq > > While AVX512 produces: > # BB#0: # %entry > xorl %ecx, %ecx > cmpl %esi, %edi > movl $-1, %eax > cmovbel %ecx, %eax > addl %edx, %eax > retq > > So we would still end-up with cases where when the AVX-512 feature is enabled, instruction selection for scalar code becomes inferior. > > Finally, we suggest to undo the above issues cause by legalizing i1, by making i1 illegal. This would make instruction selection of scalar code identical for both cases when the AVX-512 feature is on and off. As for supporting BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we believe we can support these operations even when i1 is illegal and the vectors of i1 *are* legal by using the i8 type instead of i1, as it should be implicitly truncated/extended to the element type of the vNi1 vectors.FWIW this makes sense to me: using vector of i8 to represent the boolean values and making sure to select the right pattern to use the K register seems reasonable. How are you planning to implement the selection? Thanks, — Mehdi> I am now working on a patch that will implement this approach. > > Would appreciate to get feedback and comments. > > Thanks, > Guy > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170202/895133e8/attachment-0001.html>
Blank, Guy via llvm-dev
2017-Feb-05 16:51 UTC
[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
Actually the K registers are currently represented by vectors of i1, except for
the VK1 register class which is used with scalar i1 – that is the only one I
intend to change (to be a vector of i1 as well).
From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
Sent: Thursday, February 02, 2017 20:50
To: Blank, Guy <guy.blank at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
On Jan 24, 2017, at 3:54 AM, Blank, Guy via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi All,
AVX-512 introduced the K mask registers and masked operations which make a
natural choice for legalizing vectors of i1’s.
For example,
define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) {
%r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32
4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1
true, i1 true>, <8 x i32> undef)
ret 8 x i32>%r
}
Can be lowered to
# BB#0:
kxnorw %k0, %k0, %k1
vpgatherqd (,%zmm1), %ymm0 {%k1}
retq
Legal vectors of i1’s require support for BUILD_VECTOR(i1, i1, .., i1), i1
EXTRACT_VEC_ELEMENT (…) and INSERT_VEC_ELEMENT(i1, …) , so making i1 legal
seemed like a sensible decision, and this is the current state in the top of
trunk.
However, making i1 legal affected instruction selection of scalar code as well.
Currently, there are cases where operations producing or consuming i1’s are
selected (sub-optimally) to instructions that act on K-regs.
PR28650<https://llvm.org/bugs/show_bug.cgi?id=28650> is an example showing
that i1’s live-in or live-out of basic-blocks are being selected to K register
classes, even though we don’t want this to happen. This problem does not happen
on subtargets without the AVX-512 feature enabled.
The following is the AVX-512 code from the bug report:
# BB#0: # %entry
testb $1, %dil
je .LBB0_1
# BB#2: # %if
pushq %rax
callq bar
# kill: %AL<def> %AL<kill>
%EAX<def>
andl $1, %eax
kmovw %eax, %k0
addq $8, %rsp
jmp .LBB0_3
.LBB0_1:
kxnorw %k0, %k0, %k0
kshiftrw $15, %k0, %k0
.LBB0_3: # %else
kmovw %k0, %eax
# kill: %AL<def> %AL<kill>
%EAX<kill>
Retq
The kmov,kxnor,kshiftr instructions here are the instructions operating on K
registers. These are undesirable in the purely scalar input code.
Having a type that can be possibly legalized to two different register classes
exposes a fundamental limitation of the current instruction selection framework,
and that is we cannot always make the right decision about live-in/live-out i1’s
because we cannot see beyond the boundary of the current basic-block we are
visiting. As a side-note, with GlobalISel this can be solved, since we see the
entire use-def chain at the function level.
Our initial thought was to write a pass that will be run after ISel to correct
bad selections. The pass would examine the use-def chains containing values that
were selected to K-regsiter classes, and, when profitable, re-assign the values
to GPR register classes (and replace the producing/consuming instructions
accordingly). But even with this fix-up pass, we would still be losing many ISel
pattern-matching rules that will be missed because the instruction set acting on
GPR is richer than the instruction set acting on K-regs. For example, a test
trying to match the sbb instruction:
define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {
entry:
%cmp = icmp ugt i32 %x, %y
%dec = sext i1 %cmp to i32
%dec.res = add nsw i32 %dec, %res
ret i32 %dec.res
}
Generates the following with AVX2:
# BB#0: # %entry
cmpl %edi, %esi
sbbl $0, %edx
movl %edx, %eax
retq
While AVX512 produces:
# BB#0: # %entry
xorl %ecx, %ecx
cmpl %esi, %edi
movl $-1, %eax
cmovbel %ecx, %eax
addl %edx, %eax
retq
So we would still end-up with cases where when the AVX-512 feature is enabled,
instruction selection for scalar code becomes inferior.
Finally, we suggest to undo the above issues cause by legalizing i1, by making
i1 illegal. This would make instruction selection of scalar code identical for
both cases when the AVX-512 feature is on and off. As for supporting
BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we believe we can
support these operations even when i1 is illegal and the vectors of i1 *are*
legal by using the i8 type instead of i1, as it should be implicitly
truncated/extended to the element type of the vNi1 vectors.
FWIW this makes sense to me: using vector of i8 to represent the boolean values
and making sure to select the right pattern to use the K register seems
reasonable.
How are you planning to implement the selection?
Thanks,
—
Mehdi
I am now working on a patch that will implement this approach.
Would appreciate to get feedback and comments.
Thanks,
Guy
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170205/a868204b/attachment.html>