thr3ads.net - llvm dev - [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Haber, Gadi via llvm-dev

2016-Nov-23 11:50 UTC

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Hi All.

This is an RFC for a proposed target specific X86 optimization for reducing code
size in the encoding of AVX-512 instructions when possible.

When the AVX512F instruction set was introduced in X86 it included additional 32
registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM
registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31.
In order to encode the new registers of 16-31 and the additional instructions, a
new encoding prefix called EVEX, which extends the existing VEX encoding, was
introduced as shown below:

The EVEX encoding format:
            EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate]
# of bytes: 4    1      1      1      4       / 1         1

The existing VEX encoding format:
            [VEX]   OPCODE ModR/M [SIB] [DISP]   [IMM]
# of bytes: 0,2,3   1      1      0,1   0,1,2,4  0,1

Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only
up to 3 bytes.
Consequently, for the SKX architecture, many instructions that use only the
lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX
or the VEX format. For such cases, using the VEX encoding results in a code size
reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL
features enabled.

For example: "vmovss  %xmm0, 32(%rsp,%rax,4)", has the following 2
possible encodings:

EVEX encoding (8 bytes long):
            62 f1 7e 08 11 44 84 08         vmovss  %xmm0, 32(%rsp,%rax,4)

VEX encoding (6 bytes long):
           c5 fa 11 44 84 20                      vmovss  %xmm0, 32(%rsp,%rax,4)

See reported Bugzilla bugs about this proposed optimization:
https://llvm.org/bugs/show_bug.cgi?id=23376
https://llvm.org/bugs/show_bug.cgi?id=29162

The proposed optimization implementation is to add a table of all EVEX opcodes
that can be encoded via VEX in a new header file placed under lib/Target/X86.
A new pass is to be added at the pre-emit stage.
No need for special Opt flags, as it is always better to use the reduced VEX
encoding when possible.

Thank you for any comments or questions that you may have.

Sincerely,

Gadi.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161123/6bb696e6/attachment.html>

Hal Finkel via llvm-dev

2016-Nov-23 13:01 UTC

head link

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

----- Original Message -----
> From: "Gadi via llvm-dev Haber" <llvm-dev at
lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Sent: Wednesday, November 23, 2016 5:50:42 AM
> Subject: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX
> with VEX encoding
> Hi All.
> This is an RFC for a proposed target specific X86 optimization for
> reducing code size in the encoding of AVX-512 instructions when
> possible.
> When the AVX512F instruction set was introduced in X86 it included
> additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as
> additional 16 XMM registers XMM16-XMM31 and 16 YMM registers
> YMM16-YMM31.
> In order to encode the new registers of 16-31 and the additional
> instructions, a new encoding prefix called EVEX , which extends the
> existing VEX encoding , was introduced as shown below:
> The EVEX encoding format:
> EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate]
> # of bytes: 4 1 1 1 4 / 1 1
> The existing VEX encoding format:
> [VEX] OPCODE ModR/M [SIB] [DISP] [IMM]
> # of bytes: 0,2,3 1 1 0,1 0,1,2,4 0,1
> Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can
> take only up to 3 bytes.
> Consequently, for the SKX architecture, many instructions that use
> only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded
> by either the EVEX or the VEX format. For such cases, using the VEX
> encoding results in a code size reduction of ~2 bytes even though it
> is compiled with the AVX512F/AVX512VL features enabled.
> For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2
> possible encodings:
> EVEX encoding (8 bytes long):
> 62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4)
> VEX encoding (6 bytes long):
> c5 fa 11 44 84 20 vmovss %xmm0, 32(%rsp,%rax,4)
> See reported Bugzilla bugs about this proposed optimization:
> https://llvm.org/bugs/show_bug.cgi?id=23376
> https://llvm.org/bugs/show_bug.cgi?id=29162
> The proposed optimization implementation is to add a table of all
> EVEX opcodes that can be encoded via VEX in a new header file placed
> under lib/Target/X86.
> A new pass is to be added at the pre-emit stage .It might be better to have TableGen generate the mapping table for you instead
of manually making a table yourself. TableGen has a feature that is specifically
designed to make mapping tables like this. For examples, grep for InstrMapping
in:

lib/Target/Hexagon/Hexagon.td 
lib/Target/Mips/MipsDSPInstrFormats.td 
lib/Target/Mips/MipsInstrFormats.td 
lib/Target/Mips/Mips32r6InstrFormats.td 
lib/Target/PowerPC/PPC.td 
lib/Target/AMDGPU/SIInstrInfo.td 
lib/Target/AMDGPU/R600Instructions.td 
lib/Target/SystemZ/SystemZInstrFormats.td 
lib/Target/Lanai/LanaiInstrInfo.td 

I've used this feature a few times in the PowerPC backend, and it's
quite convenient.

-Hal 
> No need for special Opt flags, as it is always better to use the
> reduced VEX encoding when possible.
> Thank you for any comments or questions that you may have.
> Sincerely,
> Gadi.
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161123/7af54779/attachment.html>

Craig Topper via llvm-dev

2016-Nov-23 16:12 UTC

head link

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

I would like a command line option to disable this optimization. That way
tests can still verify that EVEX instructions came out of isel by using
-show-mc-encoding.

On Wed, Nov 23, 2016 at 5:01 AM Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> ------------------------------
>
> *From: *"Gadi via llvm-dev Haber" <llvm-dev at
lists.llvm.org>
> *To: *llvm-dev at lists.llvm.org
> *Sent: *Wednesday, November 23, 2016 5:50:42 AM
> *Subject: *[llvm-dev] RFC: code size reduction in X86 by replacing EVEX
> with        VEX encoding
>
>
>
> Hi All.
>
>
>
> This is an RFC for a proposed target specific X86 optimization for
> reducing code size in the encoding of AVX-512 instructions when possible.
>
>
>
> When the AVX512F instruction set was introduced in X86 it included
> additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as
> additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31.
>
> In order to encode the new registers of 16-31 and the additional
> instructions, a new encoding prefix called EVEX, which extends the
> existing VEX encoding, was introduced as shown below:
>
>
>
> The EVEX encoding format:
>
>             EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate]
>
> # of bytes: 4    1      1      1      4       / 1         1
>
>
>
> The existing VEX encoding format:
>
>             [VEX]   OPCODE ModR/M [SIB] [DISP]   [IMM]
>
> # of bytes: 0,2,3   1      1      0,1   0,1,2,4  0,1
>
>
>
> Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take
> only up to 3 bytes.
>
> Consequently, for the SKX architecture, many instructions that use only
> the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either
> the EVEX or the VEX format. For such cases, using the VEX encoding results
> in a code size reduction of ~2 bytes even though it is compiled with the
> AVX512F/AVX512VL features enabled.
>
>
>
> For example: “vmovss  %xmm0, 32(%rsp,%rax,4)“, has the following 2
> possible encodings:
>
>
>
> EVEX encoding (8 bytes long):
>
>             62 f1 7e 08 11 44 84 08         vmovss  %xmm0, 32(%rsp,%rax,4)
>
>
>
> VEX encoding (6 bytes long):
>
>            c5 fa 11 44 84 20                      vmovss  %xmm0,
> 32(%rsp,%rax,4)
>
>
>
> See reported Bugzilla bugs about this proposed optimization:
>
> https://llvm.org/bugs/show_bug.cgi?id=23376
>
> https://llvm.org/bugs/show_bug.cgi?id=29162
>
>
>
> The proposed optimization implementation is to add a table of all EVEX
> opcodes that can be encoded via VEX in a new header file placed under
> lib/Target/X86.
>
> A new pass is to be added at the pre-emit stage.
>
> It might be better to have TableGen generate the mapping table for you
> instead of manually making a table yourself. TableGen has a feature that is
> specifically designed to make mapping tables like this. For examples, grep
> for InstrMapping in:
>
> lib/Target/Hexagon/Hexagon.td
> lib/Target/Mips/MipsDSPInstrFormats.td
> lib/Target/Mips/MipsInstrFormats.td
> lib/Target/Mips/Mips32r6InstrFormats.td
> lib/Target/PowerPC/PPC.td
> lib/Target/AMDGPU/SIInstrInfo.td
> lib/Target/AMDGPU/R600Instructions.td
> lib/Target/SystemZ/SystemZInstrFormats.td
> lib/Target/Lanai/LanaiInstrInfo.td
>
> I've used this feature a few times in the PowerPC backend, and it's
quite
> convenient.
>
>  -Hal
>
> No need for special Opt flags, as it is always better to use the reduced
> VEX encoding when possible.
>
>
>
> Thank you for any comments or questions that you may have.
>
>
>
> Sincerely,
>
>
>
> Gadi.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161123/31cca6f3/attachment.html>

Haber, Gadi via llvm-dev

2016-Nov-24 07:22 UTC

head link

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Thanks for the tip.
Indeed, the EVEX opcodes in X86 have a convenient naming that help in this.

Sincerely,
Gadi.
From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: Wednesday, November 23, 2016 15:01
To: Haber, Gadi <gadi.haber at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with
VEX encoding

________________________________
From: "Gadi via llvm-dev Haber" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Sent: Wednesday, November 23, 2016 5:50:42 AM
Subject: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with      
VEX encoding
Hi All.

This is an RFC for a proposed target specific X86 optimization for reducing code
size in the encoding of AVX-512 instructions when possible.

When the AVX512F instruction set was introduced in X86 it included additional 32
registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM
registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31.
In order to encode the new registers of 16-31 and the additional instructions, a
new encoding prefix called EVEX, which extends the existing VEX encoding, was
introduced as shown below:

The EVEX encoding format:
            EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate]
# of bytes: 4    1      1      1      4       / 1         1

The existing VEX encoding format:
            [VEX]   OPCODE ModR/M [SIB] [DISP]   [IMM]
# of bytes: 0,2,3   1      1      0,1   0,1,2,4  0,1

Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only
up to 3 bytes.
Consequently, for the SKX architecture, many instructions that use only the
lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX
or the VEX format. For such cases, using the VEX encoding results in a code size
reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL
features enabled.

For example: “vmovss  %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible
encodings:

EVEX encoding (8 bytes long):
            62 f1 7e 08 11 44 84 08         vmovss  %xmm0, 32(%rsp,%rax,4)

VEX encoding (6 bytes long):
           c5 fa 11 44 84 20                      vmovss  %xmm0, 32(%rsp,%rax,4)

See reported Bugzilla bugs about this proposed optimization:
https://llvm.org/bugs/show_bug.cgi?id=23376
https://llvm.org/bugs/show_bug.cgi?id=29162

The proposed optimization implementation is to add a table of all EVEX opcodes
that can be encoded via VEX in a new header file placed under lib/Target/X86.
A new pass is to be added at the pre-emit stage.
It might be better to have TableGen generate the mapping table for you instead
of manually making a table yourself. TableGen has a feature that is specifically
designed to make mapping tables like this. For examples, grep for InstrMapping
in:

lib/Target/Hexagon/Hexagon.td
lib/Target/Mips/MipsDSPInstrFormats.td
lib/Target/Mips/MipsInstrFormats.td
lib/Target/Mips/Mips32r6InstrFormats.td
lib/Target/PowerPC/PPC.td
lib/Target/AMDGPU/SIInstrInfo.td
lib/Target/AMDGPU/R600Instructions.td
lib/Target/SystemZ/SystemZInstrFormats.td
lib/Target/Lanai/LanaiInstrInfo.td

I've used this feature a few times in the PowerPC backend, and it's
quite convenient.

 -Hal
No need for special Opt flags, as it is always better to use the reduced VEX
encoding when possible.

Thank you for any comments or questions that you may have.

Sincerely,

Gadi.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161124/8beb2165/attachment.html>

Rackover, Zvi via llvm-dev

2016-Nov-28 14:50 UTC

head link

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Hal, that’s a good point. There are more manually-maintained tables in the X86
backend that should probably be tablegened: the memory-folding tables and
ReplaceableInstrs, to name a couple.
If you have ideas on how to get these auto-generated, please let us know.

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal
Finkel via llvm-dev
Sent: Wednesday, November 23, 2016 15:01
To: Haber, Gadi <gadi.haber at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with
VEX encoding

________________________________
From: "Gadi via llvm-dev Haber" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Sent: Wednesday, November 23, 2016 5:50:42 AM
Subject: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with      
VEX encoding
Hi All.

This is an RFC for a proposed target specific X86 optimization for reducing code
size in the encoding of AVX-512 instructions when possible.

When the AVX512F instruction set was introduced in X86 it included additional 32
registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM
registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31.
In order to encode the new registers of 16-31 and the additional instructions, a
new encoding prefix called EVEX, which extends the existing VEX encoding, was
introduced as shown below:

The EVEX encoding format:
            EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate]
# of bytes: 4    1      1      1      4       / 1         1

The existing VEX encoding format:
            [VEX]   OPCODE ModR/M [SIB] [DISP]   [IMM]
# of bytes: 0,2,3   1      1      0,1   0,1,2,4  0,1

Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only
up to 3 bytes.
Consequently, for the SKX architecture, many instructions that use only the
lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX
or the VEX format. For such cases, using the VEX encoding results in a code size
reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL
features enabled.

For example: “vmovss  %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible
encodings:

EVEX encoding (8 bytes long):
            62 f1 7e 08 11 44 84 08         vmovss  %xmm0, 32(%rsp,%rax,4)

VEX encoding (6 bytes long):
           c5 fa 11 44 84 20                      vmovss  %xmm0, 32(%rsp,%rax,4)

See reported Bugzilla bugs about this proposed optimization:
https://llvm.org/bugs/show_bug.cgi?id=23376
https://llvm.org/bugs/show_bug.cgi?id=29162

The proposed optimization implementation is to add a table of all EVEX opcodes
that can be encoded via VEX in a new header file placed under lib/Target/X86.
A new pass is to be added at the pre-emit stage.
It might be better to have TableGen generate the mapping table for you instead
of manually making a table yourself. TableGen has a feature that is specifically
designed to make mapping tables like this. For examples, grep for InstrMapping
in:

lib/Target/Hexagon/Hexagon.td
lib/Target/Mips/MipsDSPInstrFormats.td
lib/Target/Mips/MipsInstrFormats.td
lib/Target/Mips/Mips32r6InstrFormats.td
lib/Target/PowerPC/PPC.td
lib/Target/AMDGPU/SIInstrInfo.td
lib/Target/AMDGPU/R600Instructions.td
lib/Target/SystemZ/SystemZInstrFormats.td
lib/Target/Lanai/LanaiInstrInfo.td

I've used this feature a few times in the PowerPC backend, and it's
quite convenient.

 -Hal
No need for special Opt flags, as it is always better to use the reduced VEX
encoding when possible.

Thank you for any comments or questions that you may have.

Sincerely,

Gadi.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161128/fd033c99/attachment-0001.html>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Nov 2016 - RFC: code size reduction in X86 by replacing EVEX with VEX encoding

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Seemingly Similar Threads