thr3ads.net - search: "ymm14"

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

3

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

...gt;>>>> 16. here it should be zmm14**=[3200,3600,40000,.......54000]. but by >>>>> the above computation these indexes are changes??* >>>>> * kxnorw k2, k0, k0 **; **dont understand the need for this step* >>>>> >>>> * vpgatherqd ymm14 {k2}, zmmword ptr [zmm15] **; **here again issues >>>>> with index zmm15. it should be **[0,4000,8000,.......28000] but its >>>>> different due to above computation.* >>>>> * vinserti64x4 zmm0, zmm14, ymm0, 1* >>>>> * kmovw k2, k1* >&...

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

3

[LLVMdev] Stack alignment on X86 AVX seems incorrect

...; <llvmdev at cs.uiuc.edu> Message-ID: <A0DC88CEB3010344830D52D66533DA8E0C2E7A at HASMSX103.ger.corp.intel.com> Content-Type: text/plain; charset="windows-1252" ./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp vmovaps -176(%rbp), %ymm14 vmovaps -144(%rbp), %ymm11 vmovaps -240(%rbp), %ymm13 vmovaps -208(%rbp), %ymm9 vmovaps -272(%rbp), %ymm7 vmovaps -304(%rbp), %ymm0 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm1 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm0...

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 10

0

[LLVMdev] Calling conventions for YMM registers on AVX

...lt;"ymm11", [XMM11, XMM11b]>, DwarfRegNum<[28, -2, -2]>; def YMM12: RegisterWithSubRegs<"ymm12", [XMM12, XMM12b]>, DwarfRegNum<[29, -2, -2]>; def YMM13: RegisterWithSubRegs<"ymm13", [XMM13, XMM13b]>, DwarfRegNum<[30, -2, -2]>; def YMM14: RegisterWithSubRegs<"ymm14", [XMM14, XMM14b]>, DwarfRegNum<[31, -2, -2]>; def YMM15: RegisterWithSubRegs<"ymm15", [XMM15, XMM15b]>, DwarfRegNum<[32, -2, -2]>; } - Elena -----Original Message----- From: Jakob Stoklund Olesen [mailto:jolesen at apple...

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

0

[LLVMdev] Stack alignment on X86 AVX seems incorrect

./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp vmovaps -176(%rbp), %ymm14 vmovaps -144(%rbp), %ymm11 vmovaps -240(%rbp), %ymm13 vmovaps -208(%rbp), %ymm9 vmovaps -272(%rbp), %ymm7 vmovaps -304(%rbp), %ymm0 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm1 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp),...

[LLVMdev] Stack alignment in kernel

2012 Mar 01

2

[LLVMdev] Stack alignment in kernel

I'm running in AVX mode, but the stack before call to kernel is aligned to 16 bit. Could you, please, tell me where it should be specified? Thank you. - Elena --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

0

[LLVMdev] Stack alignment on X86 AVX seems incorrect

...:A0DC88CEB3010344830D52D66533DA8E0C2E7A at HASMSX103.ger.corp.intel.com>> >> Content-Type: text/plain; charset="windows-1252" >> >> ./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep >> ymm | grep rbp >> vmovaps -176(%rbp), %ymm14 >> vmovaps -144(%rbp), %ymm11 >> vmovaps -240(%rbp), %ymm13 >> vmovaps -208(%rbp), %ymm9 >> vmovaps -272(%rbp), %ymm7 >> vmovaps -304(%rbp), %ymm0 >> vmovaps -112(%rbp), %ymm0 >> vmovaps -80(%rbp), %ymm1 >...

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

0

[LLVMdev] Stack alignment on X86 AVX seems incorrect

On Thu, Mar 1, 2012 at 5:30 PM, Cameron McInally <cameron.mcinally at nyu.edu>wrote: > Aligning the stack to 32 bytes when there are auto AVX vector variables >> present shouldn't necessarily break the x86-64 ABI, as long as smaller auto >> variables remain properly aligned. A similar approach was taken for i386 >> in GCC in order to support SSE vectors. >>

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

3

[LLVMdev] Stack alignment on X86 AVX seems incorrect

Even if you explicitly specify –stack-alignment=16 the aligned movs are still generated. It is not an issue related to ABI. See my original mail: ./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp vmovaps -176(%rbp), %ymm14 vmovaps -144(%rbp), %ymm11 vmovaps -240(%rbp), %ymm13 - Elena From: Cameron McInally [mailto:cameron.mcinally at nyu.edu] Sent: Friday, March 02, 2012 01:04 To: Evandro Menezes Cc: Demikhovsky, Elena; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems...

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 11

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

...ey, x0; \ - vpshufb .Lpack_bswap, x0, x0; \ + vpshufb .Lpack_bswap(%rip), x0, x0; \ \ vpxor x0, y7, y7; \ vpxor x0, y6, y6; \ @@ -1112,7 +1112,7 @@ ENTRY(camellia_ctr_32way) vmovdqu (%rcx), %xmm0; vmovdqa %xmm0, %xmm1; inc_le128(%xmm0, %xmm15, %xmm14); - vbroadcasti128 .Lbswap128_mask, %ymm14; + vbroadcasti128 .Lbswap128_mask(%rip), %ymm14; vinserti128 $1, %xmm0, %ymm1, %ymm0; vpshufb %ymm14, %ymm0, %ymm13; vmovdqu %ymm13, 15 * 32(%rax); @@ -1158,7 +1158,7 @@ ENTRY(camellia_ctr_32way) /* inpack32_pre: */ vpbroadcastq (key_table)(CTX), %ymm15; - vpshufb .Lpack_bswap, %ymm15,...

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

4

[LLVMdev] Stack alignment on X86 AVX seems incorrect

On Thu, Mar 1, 2012 at 4:29 PM, Evandro Menezes <emenezes at codeaurora.org>wrote: ... > Aligning the stack to 32 bytes when there are auto AVX vector variables > present shouldn't necessarily break the x86-64 ABI, as long as smaller auto > variables remain properly aligned. A similar approach was taken for i386 > in GCC in order to support SSE vectors. > > Perhaps

[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?

2014 Feb 21

2

[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?

...ize:256;offset:659;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:29;dwarf:29;#00 > $qRegisterInfo68#b0 > $name:ymm13;bitsize:256;offset:691;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:30;dwarf:30;#00 > $qRegisterInfo69#b1 > $name:ymm14;bitsize:256;offset:723;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:31;dwarf:31;#00 > $qRegisterInfo6a#d9 > $name:ymm15;bitsize:256;offset:755;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:32;dwarf:32;#00 > $qRegisterInfo6b#da > $...

[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?

2014 Feb 20

2

[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?

Thank you, Clayton. This is very helpful. We use the LLDB specific GDB remote extensions, and our debugger server supports "qRegisterInfo" package. "reg 0x3c" is the frame pointer. In the example mentioned above, we have SP = FP - 40 for current call frame. And variable "a" is stored at address (FP + -24) from asm instruction [FP + -24] = R3;; Thus we can conclude

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

2018 May 23

33

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v3: - Update on message to describe longer term PIE goal. - Minor change on ftrace if condition. - Changed code using xchgq. - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

search for: ymm14