thr3ads.net - llvm dev - [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Rackover, Zvi via llvm-dev

2016-Nov-28 14:50 UTC

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Hal, that’s a good point. There are more manually-maintained tables in the X86
backend that should probably be tablegened: the memory-folding tables and
ReplaceableInstrs, to name a couple.
If you have ideas on how to get these auto-generated, please let us know.

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal
Finkel via llvm-dev
Sent: Wednesday, November 23, 2016 15:01
To: Haber, Gadi <gadi.haber at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with
VEX encoding

________________________________
From: "Gadi via llvm-dev Haber" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Sent: Wednesday, November 23, 2016 5:50:42 AM
Subject: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with      
VEX encoding
Hi All.

This is an RFC for a proposed target specific X86 optimization for reducing code
size in the encoding of AVX-512 instructions when possible.

When the AVX512F instruction set was introduced in X86 it included additional 32
registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM
registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31.
In order to encode the new registers of 16-31 and the additional instructions, a
new encoding prefix called EVEX, which extends the existing VEX encoding, was
introduced as shown below:

The EVEX encoding format:
            EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate]
# of bytes: 4    1      1      1      4       / 1         1

The existing VEX encoding format:
            [VEX]   OPCODE ModR/M [SIB] [DISP]   [IMM]
# of bytes: 0,2,3   1      1      0,1   0,1,2,4  0,1

Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only
up to 3 bytes.
Consequently, for the SKX architecture, many instructions that use only the
lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX
or the VEX format. For such cases, using the VEX encoding results in a code size
reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL
features enabled.

For example: “vmovss  %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible
encodings:

EVEX encoding (8 bytes long):
            62 f1 7e 08 11 44 84 08         vmovss  %xmm0, 32(%rsp,%rax,4)

VEX encoding (6 bytes long):
           c5 fa 11 44 84 20                      vmovss  %xmm0, 32(%rsp,%rax,4)

See reported Bugzilla bugs about this proposed optimization:
https://llvm.org/bugs/show_bug.cgi?id=23376
https://llvm.org/bugs/show_bug.cgi?id=29162

The proposed optimization implementation is to add a table of all EVEX opcodes
that can be encoded via VEX in a new header file placed under lib/Target/X86.
A new pass is to be added at the pre-emit stage.
It might be better to have TableGen generate the mapping table for you instead
of manually making a table yourself. TableGen has a feature that is specifically
designed to make mapping tables like this. For examples, grep for InstrMapping
in:

lib/Target/Hexagon/Hexagon.td
lib/Target/Mips/MipsDSPInstrFormats.td
lib/Target/Mips/MipsInstrFormats.td
lib/Target/Mips/Mips32r6InstrFormats.td
lib/Target/PowerPC/PPC.td
lib/Target/AMDGPU/SIInstrInfo.td
lib/Target/AMDGPU/R600Instructions.td
lib/Target/SystemZ/SystemZInstrFormats.td
lib/Target/Lanai/LanaiInstrInfo.td

I've used this feature a few times in the PowerPC backend, and it's
quite convenient.

 -Hal
No need for special Opt flags, as it is always better to use the reduced VEX
encoding when possible.

Thank you for any comments or questions that you may have.

Sincerely,

Gadi.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161128/fd033c99/attachment-0001.html>

Hal Finkel via llvm-dev

2016-Nov-28 16:55 UTC

head link

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

----- Original Message -----

From: "Zvi Rackover" <zvi.rackover at intel.com> 
To: "Hal Finkel" <hfinkel at anl.gov>, "Gadi Haber"
<gadi.haber at intel.com>
Cc: llvm-dev at lists.llvm.org 
Sent: Monday, November 28, 2016 8:50:15 AM 
Subject: RE: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with
VEX encoding

Hal, that’s a good point. There are more manually-maintained tables in the X86
backend that should probably be tablegened: the memory-folding tables and
ReplaceableInstrs, to name a couple.
If you have ideas on how to get these auto-generated, please let us know. 

I'm going to use ReplaceableInstrs as an example. ReplaceableInstrs is more
complicated than some of the other mappings because, as you'll see below,
there are multiple possible "key" instructions for each generated row
of the table (i.e. we need to be able to look up a row in the table by any of
the columns). The EVEX -> VEX mapping seems potentially simpler because it
lacks this requirement. Nevertheless, hopefully this is useful. Generically,
however, I agree with you that we should try to move a number of these
manually-maintained tables in the X86 backend over to TableGen. Also, I'm
writing this up without trying it, so please excuse any mistakes...

There are a few ReplaceableInstrs tables in X86InstrInfo.cpp (used by
getExecutionDomain/setExecutionDomain) that look like this:

static const uint16_t ReplaceableInstrs[][3] = { 
//PackedSingle PackedDouble PackedInt 
{ X86::MOVAPSmr, X86::MOVAPDmr, X86::MOVDQAmr }, 
{ X86::MOVAPSrm, X86::MOVAPDrm, X86::MOVDQArm }, 
{ X86::MOVAPSrr, X86::MOVAPDrr, X86::MOVDQArr }, 

The idea being that, given some instruction, we want to know: 

1. If it is in the table (in any column) 
2. The corresponding instruction in some other domain 

Here's how you might replace this manually-maintained table with a
TableGen-generated table using the InstrMapping feature:

1. In X86InstrInfo.cpp add the line: 

#define GET_INSTRMAP_INFO 

before X86GenInstrInfo.inc is included. 

2. Establish a key field in the instruction class use to establish the
relationship between the instructions in each row of the table. You might add to
X86Inst something like this:

// Key used to relate replaceable instructions. 
string ReplaceableInstrKey = ""; 

3. Establish some field to act as the column identifier: 
string ReplaceableInstrDomain = ""; 

in practice, for this case, I assume that we'd want to reuse the existing
ExeDomain class to set the value; I think that we can get the name of the class
directly like this:

string ReplaceableInstrDomain = !cast<string>(d); 

where d is the class parameter used to initialize the ExeDomain field. 

4. For each instruction, set the ReplaceableInstrKey to some unique name for
each row; for example, for MOVAPS, MOVAPD, MOVDQA add:

let ReplaceableInstrKey = "MOVA" in 

before the associated metaclass instantiations (or use any other of several ways
TableGen has to set the field value).

5. Add a "filter class" definition to X86.td (or where ever else
we'd like) like this:

// A filter class for the ReplaceableInstrs mapping. 
class ReplaceableInstr; 

and make sure that all instructions that will participate in the mapping also
derive from this class:

defm MOVAPS : foo<bar, ...>, ReplaceableInstr; 

In practice, you might want to combine this step with the previous one (i.e.
have the ReplaceableInstr class take a parameter which sets the key field name).

6. Add the definition of some mappings: 

def getReplaceableSSEPackedSingleInstr : InstrMapping { 
let FilterClass = "ReplaceableInstr"; 
let RowFields = [ "ReplaceableInstrKey", "FormBits" ]; 
let ColFields = [ "ReplaceableInstrDomain" ]; 
let KeyCol = ["SSEPackedSingle"]; 
let ValueCols = [["SSEPackedSingle"], ["SSEPackedDouble"],
["SSEPackedInt"]];
} 

def getReplaceableSSEPackedDoubleInstr : InstrMapping { 
let FilterClass = "ReplaceableInstr"; 
let RowFields = [ "ReplaceableInstrKey", "FormBits" ]; 
let ColFields = [ "ReplaceableInstrDomain" ]; 
let KeyCol = ["SSEPackedDouble"]; 
let ValueCols = [["SSEPackedSingle"], ["SSEPackedDouble"],
["SSEPackedInt"]];
} 

def getReplaceableSSEPackedIntInstr : InstrMapping { 
let FilterClass = "ReplaceableInstr"; 
let RowFields = [ "ReplaceableInstrKey", "FormBits" ]; 
let ColFields = [ "ReplaceableInstrDomain" ]; 
let KeyCol = ["SSEPackedInt"]; 
let ValueCols = [["SSEPackedSingle"], ["SSEPackedDouble"],
["SSEPackedInt"]];
} 

Note that my use of FormBits above is probably not right. I'm trying to make
each row identified by the value of ReplaceableInstrKey in addition to the
operand types (rr, rm, mr, etc.). I'll also note that it would be nice not
to have to define three separate mappings here, but I think that changing that
will require some (likely welcome) enhancement to the existing infrastructure.

7. The code in X86InstrInfo.cpp would then be changed to use the generated
mapping lookup functions, perhaps using some utility functions like this:

static bool isReplaceableInstr(const MachineInstr &MI) { 
uint16_t domain = (MI.getDesc().TSFlags >> X86II::SSEDomainShift) & 3;
unsigned opcode = MI.getOpcode(); 
switch (domain) { 
case 1: 
return X86::getReplaceableSSEPackedSingleInstr(opcode,
X86::ReplaceableInstrDomain_SSEPackedSingle) >= 0;
case 2: 
return X86::getReplaceableSSEPackedDoubleInstr(opcode,
X86::ReplaceableInstrDomain_SSEPackedDouble) >= 0;
case 3: 
return X86::getReplaceableSSEPackedIntInstr(opcode,
X86::ReplaceableInstrDomain_SSEPackedInt) >= 0;
} 
} 

static unsigned getReplacementInstrInDomain(const MachineInstr &MI, unsigned
newDomain) {
uint16_t domain = (MI.getDesc().TSFlags >> X86II::SSEDomainShift) & 3;
unsigned opcode = MI.getOpcode(); 

unsigned newDomainKey; 
switch (newDomain) { 
case 1: newDomainKey = X86::ReplaceableInstrDomain_SSEPackedSingle; break; 
case 2: newDomainKey = X86::ReplaceableInstrDomain_SSEPackedDouble; break; 
case 3; newDomainKey = X86::ReplaceableInstrDomain_SSEPackedInt; break; 
} 

switch (domain) { 
case 1: 
return X86::getReplaceableSSEPackedSingleInstr(opcode, newDomainKey); 
case 2: 
return X86::getReplaceableSSEPackedDoubleInstr(opcode, newDomainKey); 
case 3: 
return X86::getReplaceableSSEPackedIntInstr(opcode, newDomainKey); 
} 
} 

-Hal 

<blockquote>

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal
Finkel via llvm-dev
Sent: Wednesday, November 23, 2016 15:01 
To: Haber, Gadi <gadi.haber at intel.com> 
Cc: llvm-dev at lists.llvm.org 
Subject: Re: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with
VEX encoding

----- Original Message -----

<blockquote>

From: "Gadi via llvm-dev Haber" < llvm-dev at lists.llvm.org > 
To: llvm-dev at lists.llvm.org 
Sent: Wednesday, November 23, 2016 5:50:42 AM 
Subject: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX
encoding
Hi All. 

This is an RFC for a proposed target specific X86 optimization for reducing code
size in the encoding of AVX-512 instructions when possible.

When the AVX512F instruction set was introduced in X86 it included additional 32
registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM
registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31.
In order to encode the new registers of 16-31 and the additional instructions, a
new encoding prefix called EVEX , which extends the existing VEX encoding , was
introduced as shown below:

The EVEX encoding format: 
EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate] 
# of bytes: 4 1 1 1 4 / 1 1 

The existing VEX encoding format: 
[VEX] OPCODE ModR/M [SIB] [DISP] [IMM] 
# of bytes: 0,2,3 1 1 0,1 0,1,2,4 0,1 

Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only
up to 3 bytes.
Consequently, for the SKX architecture, many instructions that use only the
lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX
or the VEX format. For such cases, using the VEX encoding results in a code size
reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL
features enabled.

For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible
encodings:

EVEX encoding (8 bytes long): 
62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4) 

VEX encoding (6 bytes long): 
c5 fa 11 44 84 20 vmovss %xmm0, 32(%rsp,%rax,4) 

See reported Bugzilla bugs about this proposed optimization: 
https://llvm.org/bugs/show_bug.cgi?id=23376 
https://llvm.org/bugs/show_bug.cgi?id=29162 

The proposed optimization implementation is to add a table of all EVEX opcodes
that can be encoded via VEX in a new header file placed under lib/Target/X86.
A new pass is to be added at the pre-emit stage . 
</blockquote>

It might be better to have TableGen generate the mapping table for you instead
of manually making a table yourself. TableGen has a feature that is specifically
designed to make mapping tables like this. For examples, grep for InstrMapping
in:

lib/Target/Hexagon/Hexagon.td 
lib/Target/Mips/MipsDSPInstrFormats.td 
lib/Target/Mips/MipsInstrFormats.td 
lib/Target/Mips/Mips32r6InstrFormats.td 
lib/Target/PowerPC/PPC.td 
lib/Target/AMDGPU/SIInstrInfo.td 
lib/Target/AMDGPU/R600Instructions.td 
lib/Target/SystemZ/SystemZInstrFormats.td 
lib/Target/Lanai/LanaiInstrInfo.td 

I've used this feature a few times in the PowerPC backend, and it's
quite convenient.

-Hal 
<blockquote>

No need for special Opt flags, as it is always better to use the reduced VEX
encoding when possible.

Thank you for any comments or questions that you may have. 

Sincerely, 

Gadi. 

--------------------------------------------------------------------- 
Intel Israel (74) Limited 
This e-mail and any attachments may contain confidential material for 
the sole use of the intended recipient(s). Any review or distribution 
by others is strictly prohibited. If you are not the intended 
recipient, please contact the sender and delete all copies. 

_______________________________________________ 
LLVM Developers mailing list 
llvm-dev at lists.llvm.org 
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 
</blockquote>

-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
--------------------------------------------------------------------- 
Intel Israel (74) Limited 
This e-mail and any attachments may contain confidential material for 
the sole use of the intended recipient(s). Any review or distribution 
by others is strictly prohibited. If you are not the intended 
recipient, please contact the sender and delete all copies. 
</blockquote>

-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161128/79aaf238/attachment.html>

Rackover, Zvi via llvm-dev

2016-Nov-29 15:09 UTC

head link

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Thanks for the elaborate recipe. Created
pr31205<https://llvm.org/bugs/show_bug.cgi?id=31205> to track
opportunities for tablegening.

From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: Monday, November 28, 2016 18:56
To: Rackover, Zvi <zvi.rackover at intel.com>
Cc: llvm-dev at lists.llvm.org; Haber, Gadi <gadi.haber at intel.com>
Subject: Re: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with
VEX encoding

________________________________
From: "Zvi Rackover" <zvi.rackover at
intel.com<mailto:zvi.rackover at intel.com>>
To: "Hal Finkel" <hfinkel at anl.gov<mailto:hfinkel at
anl.gov>>, "Gadi Haber" <gadi.haber at
intel.com<mailto:gadi.haber at intel.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Sent: Monday, November 28, 2016 8:50:15 AM
Subject: RE: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with  
VEX encoding
Hal, that’s a good point. There are more manually-maintained tables in the X86
backend that should probably be tablegened: the memory-folding tables and
ReplaceableInstrs, to name a couple.
If you have ideas on how to get these auto-generated, please let us know.
I'm going to use ReplaceableInstrs as an example. ReplaceableInstrs is more
complicated than some of the other mappings because, as you'll see below,
there are multiple possible "key" instructions for each generated row
of the table (i.e. we need to be able to look up a row in the table by any of
the columns). The EVEX -> VEX mapping seems potentially simpler because it
lacks this requirement. Nevertheless, hopefully this is useful. Generically,
however, I agree with you that we should try to move a number of these
manually-maintained tables in the X86 backend over to TableGen. Also, I'm
writing this up without trying it, so please excuse any mistakes...

There are a few ReplaceableInstrs tables in X86InstrInfo.cpp (used by
getExecutionDomain/setExecutionDomain) that look like this:

static const uint16_t ReplaceableInstrs[][3] = {
  //PackedSingle     PackedDouble    PackedInt
  { X86::MOVAPSmr,   X86::MOVAPDmr,  X86::MOVDQAmr  },
  { X86::MOVAPSrm,   X86::MOVAPDrm,  X86::MOVDQArm  },
  { X86::MOVAPSrr,   X86::MOVAPDrr,  X86::MOVDQArr  },

The idea being that, given some instruction, we want to know:

  1. If it is in the table (in any column)
  2. The corresponding instruction in some other domain

Here's how you might replace this manually-maintained table with a
TableGen-generated table using the InstrMapping feature:

1. In X86InstrInfo.cpp add the line:

#define GET_INSTRMAP_INFO

before X86GenInstrInfo.inc is included.

2. Establish a key field in the instruction class use to establish the
relationship between the instructions in each row of the table. You might add to
X86Inst something like this:

  // Key used to relate replaceable instructions.
  string ReplaceableInstrKey = "";

3. Establish some field to act as the column identifier:
  string ReplaceableInstrDomain = "";

in practice, for this case, I assume that we'd want to reuse the existing
ExeDomain class to set the value; I think that we can get the name of the class
directly like this:

  string ReplaceableInstrDomain = !cast<string>(d);

where d is the class parameter used to initialize the ExeDomain field.

4. For each instruction, set the ReplaceableInstrKey to some unique name for
each row; for example, for MOVAPS, MOVAPD, MOVDQA add:

  let ReplaceableInstrKey = "MOVA" in

before the associated metaclass instantiations (or use any other of several ways
TableGen has to set the field value).

5. Add a "filter class" definition to X86.td (or where ever else
we'd like) like this:

  // A filter class for the ReplaceableInstrs mapping.
  class ReplaceableInstr;

and make sure that all instructions that will participate in the mapping also
derive from this class:

  defm MOVAPS : foo<bar, ...>, ReplaceableInstr;

In practice, you might want to combine this step with the previous one (i.e.
have the ReplaceableInstr class take a parameter which sets the key field name).

6. Add the definition of some mappings:

  def getReplaceableSSEPackedSingleInstr : InstrMapping {
    let FilterClass = "ReplaceableInstr";
    let RowFields = [ "ReplaceableInstrKey", "FormBits" ];
    let ColFields = [ "ReplaceableInstrDomain" ];
    let KeyCol = ["SSEPackedSingle"];
    let ValueCols = [["SSEPackedSingle"],
["SSEPackedDouble"], ["SSEPackedInt"]];
  }

  def getReplaceableSSEPackedDoubleInstr : InstrMapping {
    let FilterClass = "ReplaceableInstr";
    let RowFields = [ "ReplaceableInstrKey", "FormBits" ];
    let ColFields = [ "ReplaceableInstrDomain" ];
    let KeyCol = ["SSEPackedDouble"];
    let ValueCols = [["SSEPackedSingle"],
["SSEPackedDouble"], ["SSEPackedInt"]];
  }

  def getReplaceableSSEPackedIntInstr : InstrMapping {
    let FilterClass = "ReplaceableInstr";
    let RowFields = [ "ReplaceableInstrKey", "FormBits" ];
    let ColFields = [ "ReplaceableInstrDomain" ];
    let KeyCol = ["SSEPackedInt"];
    let ValueCols = [["SSEPackedSingle"],
["SSEPackedDouble"], ["SSEPackedInt"]];
  }

Note that my use of FormBits above is probably not right. I'm trying to make
each row identified by the value of ReplaceableInstrKey in addition to the
operand types (rr, rm, mr, etc.). I'll also note that it would be nice not
to have to define three separate mappings here, but I think that changing that
will require some (likely welcome) enhancement to the existing infrastructure.

7. The code in X86InstrInfo.cpp would then be changed to use the generated
mapping lookup functions, perhaps using some utility functions like this:

static bool isReplaceableInstr(const MachineInstr &MI) {
  uint16_t domain = (MI.getDesc().TSFlags >> X86II::SSEDomainShift) &
3;
  unsigned opcode = MI.getOpcode();
  switch (domain) {
  case 1:
    return X86::getReplaceableSSEPackedSingleInstr(opcode,
X86::ReplaceableInstrDomain_SSEPackedSingle) >= 0;
  case 2:
    return X86::getReplaceableSSEPackedDoubleInstr(opcode,
X86::ReplaceableInstrDomain_SSEPackedDouble) >= 0;
  case 3:
    return X86::getReplaceableSSEPackedIntInstr(opcode,
X86::ReplaceableInstrDomain_SSEPackedInt) >= 0;
  }
}

static unsigned getReplacementInstrInDomain(const MachineInstr &MI, unsigned
newDomain) {
  uint16_t domain = (MI.getDesc().TSFlags >> X86II::SSEDomainShift) &
3;
  unsigned opcode = MI.getOpcode();

  unsigned newDomainKey;
  switch (newDomain) {
  case 1: newDomainKey = X86::ReplaceableInstrDomain_SSEPackedSingle; break;
  case 2: newDomainKey = X86::ReplaceableInstrDomain_SSEPackedDouble; break;
  case 3; newDomainKey = X86::ReplaceableInstrDomain_SSEPackedInt; break;
  }

  switch (domain) {
  case 1:
    return X86::getReplaceableSSEPackedSingleInstr(opcode, newDomainKey);
  case 2:
    return X86::getReplaceableSSEPackedDoubleInstr(opcode, newDomainKey);
  case 3:
    return X86::getReplaceableSSEPackedIntInstr(opcode, newDomainKey);
  }
}

 -Hal

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal
Finkel via llvm-dev
Sent: Wednesday, November 23, 2016 15:01
To: Haber, Gadi <gadi.haber at intel.com<mailto:gadi.haber at
intel.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with
VEX encoding

________________________________
From: "Gadi via llvm-dev Haber" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Sent: Wednesday, November 23, 2016 5:50:42 AM
Subject: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with      
VEX encoding
Hi All.

This is an RFC for a proposed target specific X86 optimization for reducing code
size in the encoding of AVX-512 instructions when possible.

When the AVX512F instruction set was introduced in X86 it included additional 32
registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM
registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31.
In order to encode the new registers of 16-31 and the additional instructions, a
new encoding prefix called EVEX, which extends the existing VEX encoding, was
introduced as shown below:

The EVEX encoding format:
            EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate]
# of bytes: 4    1      1      1      4       / 1         1

The existing VEX encoding format:
            [VEX]   OPCODE ModR/M [SIB] [DISP]   [IMM]
# of bytes: 0,2,3   1      1      0,1   0,1,2,4  0,1

Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only
up to 3 bytes.
Consequently, for the SKX architecture, many instructions that use only the
lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX
or the VEX format. For such cases, using the VEX encoding results in a code size
reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL
features enabled.

For example: “vmovss  %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible
encodings:

EVEX encoding (8 bytes long):
            62 f1 7e 08 11 44 84 08         vmovss  %xmm0, 32(%rsp,%rax,4)

VEX encoding (6 bytes long):
           c5 fa 11 44 84 20                      vmovss  %xmm0, 32(%rsp,%rax,4)

See reported Bugzilla bugs about this proposed optimization:
https://llvm.org/bugs/show_bug.cgi?id=23376
https://llvm.org/bugs/show_bug.cgi?id=29162

The proposed optimization implementation is to add a table of all EVEX opcodes
that can be encoded via VEX in a new header file placed under lib/Target/X86.
A new pass is to be added at the pre-emit stage.
It might be better to have TableGen generate the mapping table for you instead
of manually making a table yourself. TableGen has a feature that is specifically
designed to make mapping tables like this. For examples, grep for InstrMapping
in:

lib/Target/Hexagon/Hexagon.td
lib/Target/Mips/MipsDSPInstrFormats.td
lib/Target/Mips/MipsInstrFormats.td
lib/Target/Mips/Mips32r6InstrFormats.td
lib/Target/PowerPC/PPC.td
lib/Target/AMDGPU/SIInstrInfo.td
lib/Target/AMDGPU/R600Instructions.td
lib/Target/SystemZ/SystemZInstrFormats.td
lib/Target/Lanai/LanaiInstrInfo.td

I've used this feature a few times in the PowerPC backend, and it's
quite convenient.

 -Hal
No need for special Opt flags, as it is always better to use the reduced VEX
encoding when possible.

Thank you for any comments or questions that you may have.

Sincerely,

Gadi.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161129/7500ba79/attachment-0001.html>

llvm dev - Nov 2016 - RFC: code size reduction in X86 by replacing EVEX with VEX encoding

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding