thr3ads.net - search: "ymm16"

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 01

5

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...hese flags will be used to limit the vector register size presented by TTI to the vectorizers. The backend will still be able to use wider registers for code written using the instrinsics in x86intrin.h. And the backend will still be able to use AVX512VL instructions and the additional XMM16-31 and YMM16-31 registers. Motivation: -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU frequency that may offset the gains from using the wider register size. See section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference Manual published October 2017. -The vector AL...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

4

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...ecific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix called EVEX, which extends the existing VEX encoding, was introduced as shown below: The EVEX encoding format: EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate] # of byte...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 03

2

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...limit the vector register size presented by TTI to the vectorizers. >> The backend will still be able to use wider registers for code written >> using the instrinsics in x86intrin.h. And the backend will still be able to >> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 >> registers. >> >> >> >> Motivation: >> >> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU >> frequency that may offset the gains from using the wider register size. See >> section 15.26 of Intel® 64 and IA-32 Archit...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

2

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...educing code size in the encoding of AVX-512 instructions when possible. > > > > When the AVX512F instruction set was introduced in X86 it included > additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as > additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. > > In order to encode the new registers of 16-31 and the additional > instructions, a new encoding prefix called EVEX, which extends the > existing VEX encoding, was introduced as shown below: > > > > The EVEX encoding format: > > EVEX Opcode ModR/M...

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

...;>>> * vpgatherqd ymm15 {k2}, zmmword ptr [zmm16]* >>>>> * vinserti64x4 zmm14, zmm15, ymm14, 1* >>>>> * kmovw k2, k1* >>>>> * vpgatherqd ymm15 {k2}, zmmword ptr [zmm19]* >>>>> * kxnorw k2, k0, k0* >>>>> * vpgatherqd ymm16 {k2}, zmmword ptr [zmm18]* >>>>> * vinserti64x4 zmm15, zmm16, ymm15, 1* >>>>> * kmovw k2, k1* >>>>> * vpgatherqd ymm1 {k2}, zmmword ptr [zmm21]* >>>>> * kxnorw k2, k0, k0* >>>>> * vpgatherqd ymm16 {k2}, zmmword ptr [zmm20]*...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 07

2

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...ize presented by TTI to the vectorizers. > >>> The backend will still be able to use wider registers for code written > >>> using the instrinsics in x86intrin.h. And the backend will still be able to > >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 > >>> registers. > >>> > >>> > >>> > >>> Motivation: > >>> > >>> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU > >>> frequency that may offset the gains from using the wider...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 24

3

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...ecific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix called EVEX, which extends the existing VEX encoding, was introduced as shown below: The EVEX encoding format: EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate] # of byte...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 09

2

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...> > >>> The backend will still be able to use wider registers for code >> written >> > >>> using the instrinsics in x86intrin.h. And the backend will still be >> able to >> > >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 >> > >>> registers. >> > >>> >> > >>> >> > >>> >> > >>> Motivation: >> > >>> >> > >>> -Using 512-bit operations on some Intel CPUs may cause a decrease >> in CPU &g...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 28

2

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...ecific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix called EVEX, which extends the existing VEX encoding, was introduced as shown below: The EVEX encoding format: EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate] # of byte...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 11

2

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...nd will still be able to use wider registers for code >>>> written >>>> > >>> using the instrinsics in x86intrin.h. And the backend will still >>>> be able to >>>> > >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 >>>> > >>> registers. >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> Motivation: >>>> > >>> >>>> > >>> -Using 512-bit operations on...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 12

2

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...;>>>>> written >>>>>> > >>> using the instrinsics in x86intrin.h. And the backend will >>>>>> still be able to >>>>>> > >>> use AVX512VL instructions and the additional XMM16-31 and >>>>>> YMM16-31 >>>>>> > >>> registers. >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> Motivation: >>>>>> > >>> >>>&gt...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 13

3

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...;> written >>>>>>> > >>> using the instrinsics in x86intrin.h. And the backend will >>>>>>> still be able to >>>>>>> > >>> use AVX512VL instructions and the additional XMM16-31 and >>>>>>> YMM16-31 >>>>>>> > >>> registers. >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> Motivation: >>>>>>> > >...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 13

2

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...nsics in >> x86intrin.h. And the backend will >> still be able to >> > >>> use AVX512VL instructions and >> the additional XMM16-31 and YMM16-31 >> > >>> registers. >> > >>> >> > >>> >> > >>> >> > &gt...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 14

2

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...t;> > >>> using the instrinsics in x86intrin.h. And the backend will >>>>>>>>>> still be able to >>>>>>>>>> > >>> use AVX512VL instructions and the additional XMM16-31 and >>>>>>>>>> YMM16-31 >>>>>>>>>> > >>> registers. >>>>>>>>>> > >>> >>>>>>>>>> > >>> >>>>>>>>>> > >>> >>>>>>>>>> > &gt...

search for: ymm16