thr3ads.net - search: "zmm0"

2017 Jun 24

4

AVX Scheduling and Parallelism

Hello, After generating AVX code for large no of iterations i came to realize that it still uses only 2 registers zmm0 and zmm1 when the loop urnroll factor=1024, i wonder if this register allocation allows operations in parallel? Also i know all the elements within a single vector instruction are computed in parallel but does the elements of multiple instructions computed in parallel? like are 2 vmov with differ...

AVX Scheduling and Parallelism

2017 Jun 25

2

AVX Scheduling and Parallelism

...w, it might cause problems by making the instruction encodings large. cc'ing some Intel folks for further comments. -Hal On 06/23/2017 09:02 PM, hameeza ahmed via llvm-dev wrote: Hello, After generating AVX code for large no of iterations i came to realize that it still uses only 2 registers zmm0 and zmm1 when the loop urnroll factor=1024, i wonder if this register allocation allows operations in parallel? Also i know all the elements within a single vector instruction are computed in parallel but does the elements of multiple instructions computed in parallel? like are 2 vmov with differ...

AVX Scheduling and Parallelism

2017 Jun 25

0

AVX Scheduling and Parallelism

Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only deliver 16 bytes per cycle from the icache to the decoder. Essentially all of the instructions...

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

...s step* >>>>> >>>> * vpgatherqd ymm14 {k2}, zmmword ptr [zmm15] **; **here again issues >>>>> with index zmm15. it should be **[0,4000,8000,.......28000] but its >>>>> different due to above computation.* >>>>> * vinserti64x4 zmm0, zmm14, ymm0, 1* >>>>> * kmovw k2, k1* >>>>> * vpgatherqd ymm14 {k2}, zmmword ptr [zmm17]* >>>>> * kxnorw k2, k0, k0* >>>>> * vpgatherqd ymm15 {k2}, zmmword ptr [zmm16]* >>>>> * vinserti64x4 zmm14, zmm15, ymm14, 1* >>&...

RFC: Adding Support For Vectorcall Calling Convention

2016 Nov 30

2

RFC: Adding Support For Vectorcall Calling Convention

...ds the standard x64 calling convention while adding support for HVA and vector types. There are four main differences: - Floating-point types are considered vector types just like __m128, __m256 and __m512. The first 6 vector typed arguments are saved in physical registers XMM0/YMM0/ZMM0 until XMM5/YMM5/ZMM5. - After vector types and integer types are allocated, HVA types are allocated, in ascending order, to unused vector registers XMM0/YMM0/ZMM0 to XMM5/YMM5/ZMM5. - Just like in the default x65 CC, Shadow space is allocated for vector/HVA types. The size is...

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

2

[RFC][VECLIB] how should we legalize VECLIB calls?

...vcvtdq2pd %xmm1, %ymm0 vextractf128 $1, %ymm1, %xmm1 vcvtdq2pd %xmm1, %ymm1 callq __svml_sin8 vmovups %ymm1, 32(%r15,%r12,8) vmovups %ymm0, (%r15,%r12,8) Unfortunately, __svml_sin8() doesn't use this form of input/output. It takes zmm0 and returns zmm0. i.e., not legal to use for AVX. What we need to see instead is two calls to __svml_sin4(), like below. vmovaps %ymm0, %ymm1 vcvtdq2pd %xmm1, %ymm0 vextractf128 $1, %ymm1, %xmm1 vcvtdq2pd %xmm1, %ymm1 callq __svml_sin4...

[LLVMdev] Intel asm syntax and variable names

2015 Jul 23

2

[LLVMdev] Intel asm syntax and variable names

...quot;flags" So basically, what I’m seeing with “flags” (which should be a legit variable name) is that the X86AsmParser creates a reference to an implicit register instead of a reference to memory. There are additional issues here as well - what if we compile to SSE, but use a variable named “ZMM0” which is a register in AVX-512? Should this be allowed? We probably need some way to mark the registers (using attributes or predicates?) so that we’d know which ones are part of the legal set of registers that can be referenced in the architecture we’re compiling too. Do you think this is a good...

[LLVMdev] Intel asm syntax and variable names

2015 Jul 23

0

[LLVMdev] Intel asm syntax and variable names

So, there is no prior art for escaping the name of a global symbol with the same name as a register? If there is, I'd rather we just implement it and leave it at that. We can probably fix the 'flags' case easily in LLVM, but I'd rather not bend over backwards to make ZMM0 be a global name when AVX is disabled. On Thu, Jul 23, 2015 at 9:12 AM, Yatsina, Marina <marina.yatsina at intel.com> wrote: > Microsoft assembler treats mov to EAX as a register, even if there is a > global memory also named EAX – meaning the register takes precedence. > > But...

[LLVMdev] Intel asm syntax and variable names

2015 Jul 23

1

[LLVMdev] Intel asm syntax and variable names

...> So, there is no prior art for escaping the name of a global symbol with the same name as a register? If there is, I'd rather we just implement it and leave it at that. > > We can probably fix the 'flags' case easily in LLVM, but I'd rather not bend over backwards to make ZMM0 be a global name when AVX is disabled. > > On Thu, Jul 23, 2015 at 9:12 AM, Yatsina, Marina <marina.yatsina at intel.com <mailto:marina.yatsina at intel.com>> wrote: > Microsoft assembler treats mov to EAX as a register, even if there is a global memory also named EAX – meanin...

[LLVMdev] Intel asm syntax and variable names

2015 Jul 23

2

[LLVMdev] Intel asm syntax and variable names

...o basically, what I'm seeing with "flags" (which should be a legit variable name) is that the X86AsmParser creates a reference to an implicit register instead of a reference to memory. There are additional issues here as well - what if we compile to SSE, but use a variable named "ZMM0" which is a register in AVX-512? Should this be allowed? We probably need some way to mark the registers (using attributes or predicates?) so that we'd know which ones are part of the legal set of registers that can be referenced in the architecture we're compiling too. Do you think t...

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

2

[RFC][VECLIB] how should we legalize VECLIB calls?

...vcvtdq2pd %xmm1, %ymm0 vextractf128 $1, %ymm1, %xmm1 vcvtdq2pd %xmm1, %ymm1 callq __svml_sin8 vmovups %ymm1, 32(%r15,%r12,8) vmovups %ymm0, (%r15,%r12,8) Unfortunately, __svml_sin8() doesn't use this form of input/output. It takes zmm0 and returns zmm0. i.e., not legal to use for AVX. What we need to see instead is two calls to __svml_sin4(), like below. vmovaps %ymm0, %ymm1 vcvtdq2pd %xmm1, %ymm0 vextractf128 $1, %ymm1, %xmm1 vcvtdq2pd %xmm1, %ymm1 callq __svml_sin4...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

4

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Hi All. This is an RFC for a proposed target specific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix called EVEX, which extends the existing VEX encoding, was introduced as shown below: The EVEX encoding...

[LLVMdev] Intel asm syntax and variable names

2015 Jul 23

0

[LLVMdev] Intel asm syntax and variable names

...cally, what I’m seeing with “flags” (which should be a legit > variable name) is that the X86AsmParser creates a reference to an implicit > register instead of a reference to memory. > > There are additional issues here as well - what if we compile to SSE, but > use a variable named “ZMM0” which is a register in AVX-512? Should this be > allowed? > > > > We probably need some way to mark the registers (using attributes or > predicates?) so that we’d know which ones are part of the legal set of > registers that can be referenced in the architecture we’re compilin...

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

8

[RFC][VECLIB] how should we legalize VECLIB calls?

...%xmm1, %ymm1 > > callq __svml_sin8 > > vmovups %ymm1, 32(%r15,%r12,8) > > vmovups %ymm0, (%r15,%r12,8) > > Unfortunately, __svml_sin8() doesn’t use this form of > input/output. It takes zmm0 and returns zmm0. > > i.e., not legal to use for AVX. > > > > What we need to see instead is two calls to __svml_sin4(), > like below. > > vmovaps %ymm0, %ymm1 > > vcvtdq2pd ...

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

2

[RFC][VECLIB] how should we legalize VECLIB calls?

...vcvtdq2pd %xmm1, %ymm1 >> >> callq __svml_sin8 >> >> vmovups %ymm1, 32(%r15,%r12,8) >> >> vmovups %ymm0, (%r15,%r12,8) >> >> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It >> takes zmm0 and returns zmm0. >> >> i.e., not legal to use for AVX. >> >> >> >> What we need to see instead is two calls to __svml_sin4(), like below. >> >> vmovaps %ymm0, %ymm1 >> >> vcvtdq2pd %xmm1, %ymm0 >> >>...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

2

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...t; > > > This is an RFC for a proposed target specific X86 optimization for > reducing code size in the encoding of AVX-512 instructions when possible. > > > > When the AVX512F instruction set was introduced in X86 it included > additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as > additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. > > In order to encode the new registers of 16-31 and the additional > instructions, a new encoding prefix called EVEX, which extends the > existing VEX encoding, was introduced as shown b...

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

2

[RFC][VECLIB] how should we legalize VECLIB calls?

...ctf128 $1, %ymm1, %xmm1 > > vcvtdq2pd %xmm1, %ymm1 > > callq __svml_sin8 > > vmovups %ymm1, 32(%r15,%r12,8) > > vmovups %ymm0, (%r15,%r12,8) > > Unfortunately, __svml_sin8() doesn’t use this form of input/output. It > takes zmm0 and returns zmm0. > > i.e., not legal to use for AVX. > > > > What we need to see instead is two calls to __svml_sin4(), like below. > > vmovaps %ymm0, %ymm1 > > vcvtdq2pd %xmm1, %ymm0 > > vextractf128 $1, %ymm1, %xmm1 > >...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 24

3

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...lacing EVEX with VEX encoding Hi All. This is an RFC for a proposed target specific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix called EVEX, which extends the existing VEX encoding, was introduced as shown below: The EVEX encoding...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 28

2

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...placing EVEX with VEX encoding Hi All. This is an RFC for a proposed target specific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix called EVEX, which extends the existing VEX encoding, was introduced as shown below: The EVEX encoding...

LLVM FunctionType cannot be returned as VectorType?

2018 Jul 23

2

LLVM FunctionType cannot be returned as VectorType?

...-------- Jia Yu, Ph.D. Student in Computer Science Arizona State University <http://www.asu.edu/> On Mon, Jul 23, 2018 at 6:50 AM Cranmer, Joshua via llvm-dev < llvm-dev at lists.llvm.org> wrote: > In x86 ABI terms, a result that is a vector is returned in %xmm0 (or > %ymm0/%zmm0 if the size is >128 bits). All other scalar types are returned > via %rax (or some subslice thereof). > > > > The way you’re calling the function is expecting the value to be found in > %rax, where the callee is trying to return it in %xmm0, which means you’re > reading the...

search for: zmm0