thr3ads.net - search: "ymm15"

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

3

[LLVMdev] Calling conventions for YMM registers on AVX

...This thread has lots of interesting information: http://software.intel.com/en-us/forums/showthread.php?t=59291 I wasn't able to find a formal Win64 ABI spec, but according to http://www.agner.org/optimize/calling_conventions.pdf, xmm6-xmm15 are callee-saved on win64, but the high bits in ymm6-ymm15 are not. That's not currently correctly modelled in LLVM. To fix it, create a pseudo-register YMMHI_CLOBBER that aliases ymm6-ymm15. Then add YMMHI_CLOBBER to the registers clobbered by WINCALL64*. /jakob

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 11

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

...vbroadcasti128 .Lbswap128_mask, %ymm14; + vbroadcasti128 .Lbswap128_mask(%rip), %ymm14; vinserti128 $1, %xmm0, %ymm1, %ymm0; vpshufb %ymm14, %ymm0, %ymm13; vmovdqu %ymm13, 15 * 32(%rax); @@ -1158,7 +1158,7 @@ ENTRY(camellia_ctr_32way) /* inpack32_pre: */ vpbroadcastq (key_table)(CTX), %ymm15; - vpshufb .Lpack_bswap, %ymm15, %ymm15; + vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15; vpxor %ymm0, %ymm15, %ymm0; vpxor %ymm1, %ymm15, %ymm1; vpxor %ymm2, %ymm15, %ymm2; @@ -1242,13 +1242,13 @@ camellia_xts_crypt_32way: subq $(16 * 32), %rsp; movq %rsp, %rax; - vbroadcasti128 .Lxts_gf1...

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 10

0

[LLVMdev] Calling conventions for YMM registers on AVX

...lt;"ymm12", [XMM12, XMM12b]>, DwarfRegNum<[29, -2, -2]>; def YMM13: RegisterWithSubRegs<"ymm13", [XMM13, XMM13b]>, DwarfRegNum<[30, -2, -2]>; def YMM14: RegisterWithSubRegs<"ymm14", [XMM14, XMM14b]>, DwarfRegNum<[31, -2, -2]>; def YMM15: RegisterWithSubRegs<"ymm15", [XMM15, XMM15b]>, DwarfRegNum<[32, -2, -2]>; } - Elena -----Original Message----- From: Jakob Stoklund Olesen [mailto:jolesen at apple.com] Sent: Tuesday, January 10, 2012 01:14 To: Demikhovsky, Elena Cc: Bruno Cardoso Lopes; llvmdev at cs.ui...

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

0

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > I'll explain what we see in the code. > 1. The caller saves XMM registers across the call if needed (according to DEFS definition). > YMMs are not in the set, so caller does not take care. This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. Are you

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

2

[LLVMdev] Calling conventions for YMM registers on AVX

I'll explain what we see in the code. 1. The caller saves XMM registers across the call if needed (according to DEFS definition). YMMs are not in the set, so caller does not take care. 2. The callee preserves XMMs but works with YMMs and clobbering them. 3. So after the call, the upper part of YMM is gone. - Elena -----Original Message----- From: Bruno Cardoso Lopes [mailto:bruno.cardoso at

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

4

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...OPCODE ModR/M [SIB] [DISP] [IMM] # of bytes: 0,2,3 1 1 0,1 0,1,2,4 0,1 Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, for the SKX architecture, many instructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled. For example: "vmovss %xmm0, 32(%rsp,%rax,4)", has the following 2 possible enc...

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

...its >>>>> different due to above computation.* >>>>> * vinserti64x4 zmm0, zmm14, ymm0, 1* >>>>> * kmovw k2, k1* >>>>> * vpgatherqd ymm14 {k2}, zmmword ptr [zmm17]* >>>>> * kxnorw k2, k0, k0* >>>>> * vpgatherqd ymm15 {k2}, zmmword ptr [zmm16]* >>>>> * vinserti64x4 zmm14, zmm15, ymm14, 1* >>>>> * kmovw k2, k1* >>>>> * vpgatherqd ymm15 {k2}, zmmword ptr [zmm19]* >>>>> * kxnorw k2, k0, k0* >>>>> * vpgatherqd ymm16 {k2}, zmmword ptr [zmm18]*...

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

3

[LLVMdev] Stack alignment on X86 AVX seems incorrect

..., %ymm11 vmovaps -240(%rbp), %ymm13 vmovaps -208(%rbp), %ymm9 vmovaps -272(%rbp), %ymm7 vmovaps -304(%rbp), %ymm0 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm1 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm0 vmovaps -176(%rbp), %ymm15 vmovaps -144(%rbp), %ymm0 vmovaps -240(%rbp), %ymm0 vmovaps -208(%rbp), %ymm0 vmovaps -272(%rbp), %ymm0 vmovaps -304(%rbp), %ymm0 vmovaps should not access stack if it is not aligned to 32 - Elena -------------- next part -------------- An HTML attachment was sc...

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

2018 May 23

33

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v3: - Update on message to describe longer term PIE goal. - Minor change on ftrace if condition. - Changed code using xchgq. - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

2

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...of bytes: 0,2,3 1 1 0,1 0,1,2,4 0,1 > > > > Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take > only up to 3 bytes. > > Consequently, for the SKX architecture, many instructions that use only > the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either > the EVEX or the VEX format. For such cases, using the VEX encoding results > in a code size reduction of ~2 bytes even though it is compiled with the > AVX512F/AVX512VL features enabled. > > > > For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the...

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

0

[LLVMdev] Stack alignment on X86 AVX seems incorrect

...vmovaps -240(%rbp), %ymm13 vmovaps -208(%rbp), %ymm9 vmovaps -272(%rbp), %ymm7 vmovaps -304(%rbp), %ymm0 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm1 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm0 vmovaps -176(%rbp), %ymm15 vmovaps -144(%rbp), %ymm0 vmovaps -240(%rbp), %ymm0 vmovaps -208(%rbp), %ymm0 vmovaps -272(%rbp), %ymm0 vmovaps -304(%rbp), %ymm0 vmovaps should not access stack if it is not aligned to 32 - Elena -----Original Message----- From: llvmdev-bounces at cs.uiuc...

[LLVMdev] Stack alignment in kernel

2012 Mar 01

2

[LLVMdev] Stack alignment in kernel

I'm running in AVX mode, but the stack before call to kernel is aligned to 16 bit. Could you, please, tell me where it should be specified? Thank you. - Elena --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 24

3

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...OPCODE ModR/M [SIB] [DISP] [IMM] # of bytes: 0,2,3 1 1 0,1 0,1,2,4 0,1 Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, for the SKX architecture, many instructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled. For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible encodings: E...

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

0

[LLVMdev] Stack alignment on X86 AVX seems incorrect

...%rbp), %ymm9 >> vmovaps -272(%rbp), %ymm7 >> vmovaps -304(%rbp), %ymm0 >> vmovaps -112(%rbp), %ymm0 >> vmovaps -80(%rbp), %ymm1 >> vmovaps -112(%rbp), %ymm0 >> vmovaps -80(%rbp), %ymm0 >> vmovaps -176(%rbp), %ymm15 >> vmovaps -144(%rbp), %ymm0 >> vmovaps -240(%rbp), %ymm0 >> vmovaps -208(%rbp), %ymm0 >> vmovaps -272(%rbp), %ymm0 >> vmovaps -304(%rbp), %ymm0 >> >> vmovaps should not access stack if it is not aligned to 32 >> &gt...

search for: ymm15