thr3ads.net - search: "vmovdqa"

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 11

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

...h/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S index f7c495e2863c..46feaea52632 100644 --- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S @@ -52,10 +52,10 @@ /* \ * S-function with AES subbytes \ */ \ - vmovdqa .Linv_shift_row, t4; \ - vbroadcastss .L0f0f0f0f, t7; \ - vmovdqa .Lpre_tf_lo_s1, t0; \ - vmovdqa .Lpre_tf_hi_s1, t1; \ + vmovdqa .Linv_shift_row(%rip), t4; \ + vbroadcastss .L0f0f0f0f(%rip), t7; \ + vmovdqa .Lpre_tf_lo_s1(%rip), t0; \ + vmovdqa .Lpre_tf_hi_s1(%rip), t1; \ \ /* AES inverse shif...

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

2018 May 23

33

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v3: - Update on message to describe longer term PIE goal. - Minor change on ftrace if condition. - Changed code using xchgq. - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace

[LLVMdev] AVX calling convention?

2013 Sep 05

1

[LLVMdev] AVX calling convention?

...to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually. I tracked this down to the following. The call site looks like vmovdqa 24064(%rsp), %ymm0 vmovdqa %ymm0, (%rsp) vzeroupper callq __Z14convert_char16Dv16_s which passes the argument on the stack. The callee, however, begins with __Z14convert_char16Dv16_s: ## @_Z14convert_char16Dv16_s .cfi_startproc ## BB#0: ## %entry p...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

0

[LLVMdev] SIMD for sdiv <2 x i64>

...----------------------------------------------- # BB#3: # %if.then.i.i.i.i.i.i vpsllq $3, %xmm0, %xmm0 vpextrq $1, %xmm0, %rbx movq %rbx, %rdi vmovaps %xmm2, 96(%rsp) # 16-byte Spill vmovaps %xmm5, 64(%rsp) # 16-byte Spill vmovdqa %xmm6, 16(%rsp) # 16-byte Spill callq _Znam movq %rax, 128(%rsp) movq 16(%r12), %rsi movq %rax, %rdi movq %rbx, %rdx callq memmove vmovdqa 16(%rsp), %xmm6 # 16-byte Reload vmovaps 64(%rsp), %xmm5 # 16-byte Reload vmovaps 96...

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

1

[LLVMdev] SIMD for sdiv <2 x i64>

...-------- > > # BB#3: # %if.then.i.i.i.i.i.i > vpsllq $3, %xmm0, %xmm0 > vpextrq $1, %xmm0, %rbx > movq %rbx, %rdi > vmovaps %xmm2, 96(%rsp) # 16-byte Spill > vmovaps %xmm5, 64(%rsp) # 16-byte Spill > vmovdqa %xmm6, 16(%rsp) # 16-byte Spill > callq _Znam > movq %rax, 128(%rsp) > movq 16(%r12), %rsi > movq %rax, %rdi > movq %rbx, %rdx > callq memmove > vmovdqa 16(%rsp), %xmm6 # 16-byte Reload > vmovaps 64(%rsp), %xmm5...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

2

[LLVMdev] SIMD for sdiv <2 x i64>

On 07/24/2015 03:42 AM, Benjamin Kramer wrote: >> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: >> >> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

2

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...eger lane accessor was used. Output from clang 3.4 for target corei7-avx: $ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math -march=native -mtune=native -DSPILLING_ENSUES=0 /* no spilling */ $ objdump -dC --no-show-raw-insn ./a.out ... 00000000004004f0 <main>: 4004f0: vmovdqa 0x2004c8(%rip),%xmm0 # 6009c0 <x> 4004f8: vpsrld $0x17,%xmm0,%xmm0 4004fd: vpaddd 0x17b(%rip),%xmm0,%xmm0 # 400680 <__dso_handle+0x8> 400505: vcvtdq2ps %xmm0,%xmm1 400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690 <__dso_handle+0x18> 400511:...

invalid code generated on Windows x86_64 using skylake-specific features

2017 Sep 30

2

invalid code generated on Windows x86_64 using skylake-specific features

...,+xsaveopt,-sha,+adx,-avx512pf,+sse3 It successfully creates a binary, but the binary when run crashes with: Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF. The disassembly of the crashed instruction is: 00007FF7C9913BA7 vmovdqa xmmword ptr [rbp-20h],xmm0 There is no callstack or source in the MSVC debugger. The .pdb produced is 64KB exactly. The file was linked with: lld -NOLOGO -DEBUG -MACHINE:X64 /SUBSYSTEM:console -OUT:.\test.exe -NODEFAULTLIB -ENTRY:_start ./zig-cache/test.obj ./zig-cache/builtin.obj ./zig-cache...

invalid code generated on Windows x86_64 using skylake-specific features

2017 Oct 01

1

invalid code generated on Windows x86_64 using skylake-specific features

...; > It successfully creates a binary, but the binary when run crashes with: > > Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: Access > violation reading location 0xFFFFFFFFFFFFFFFF. > > The disassembly of the crashed instruction is: > > 00007FF7C9913BA7 vmovdqa xmmword ptr [rbp-20h],xmm0 > > There is no callstack or source in the MSVC debugger. The .pdb produced is > 64KB exactly. The file was linked with: > > lld -NOLOGO -DEBUG -MACHINE:X64 /SUBSYSTEM:console -OUT:.\test.exe > -NODEFAULTLIB -ENTRY:_start ./zig-cache/test.obj ./zig-c...

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 18

2

Vector trunc code generation difference between llvm-3.9 and 4.0

...s through the backend and > produces worse code even for x86 with AVX2: > before: > vmovd %edi, %xmm1 > vpmovzxwq %xmm1, %xmm1 > vpsraw %xmm1, %xmm0, %xmm0 > retq > > after: > vmovd %edi, %xmm1 > vpbroadcastd %xmm1, %ymm1 > vmovdqa LCPI1_0(%rip), %ymm2 > vpshufb %ymm2, %ymm1, %ymm1 > vpermq $232, %ymm1, %ymm1 > vpmovzxwd %xmm1, %ymm1 > vpmovsxwd %xmm0, %ymm0 > vpsravd %ymm1, %ymm0, %ymm0 > vpshufb %ymm2, %ymm0, %ymm0 > vpermq $232, %ymm0, %ymm0 >...

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

...gt; %cmp.zeroS53_D to <2 x i64> %BCS54_D = bitcast <2 x i64> %sextS54_D to i128 %mskS54_D = icmp ne i128 %BCS54_D, 0 br i1 %mskS54_D, label %middle.block, label %vector.ph Now the assembly for the above IR code is: # BB#4: # %for.cond.preheader vmovdqa 144(%rsp), %xmm0 # 16-byte Reload vpmuludq %xmm7, %xmm0, %xmm2 vpsrlq $32, %xmm7, %xmm4 vpmuludq %xmm4, %xmm0, %xmm4 vpsllq $32, %xmm4, %xmm4 vpaddq %xmm4, %xmm2, %xmm2 vpsrlq $32, %xmm0, %xmm4 vpmuludq %xmm7, %xmm4, %xmm4 vpsllq $32, %xmm4, %xmm...

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 17

2

Vector trunc code generation difference between llvm-3.9 and 4.0

Correction in the C snippet: typedef signed short v8i16_t __attribute__((ext_vector_type(8))); v8i16_t foo (v8i16_t a, int n) { return a >> n; } Best regards Saurabh On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com> wrote: > Hello, > > We are investigating a difference in code generation for vector splat > instructions between llvm-3.9

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Mar 08

2

Vector trunc code generation difference between llvm-3.9 and 4.0

...gt;> before: >>> vmovd %edi, %xmm1 >>> vpmovzxwq %xmm1, %xmm1 >>> vpsraw %xmm1, %xmm0, %xmm0 >>> retq >>> >>> after: >>> vmovd %edi, %xmm1 >>> vpbroadcastd %xmm1, %ymm1 >>> vmovdqa LCPI1_0(%rip), %ymm2 >>> vpshufb %ymm2, %ymm1, %ymm1 >>> vpermq $232, %ymm1, %ymm1 >>> vpmovzxwd %xmm1, %ymm1 >>> vpmovsxwd %xmm0, %ymm0 >>> vpsravd %ymm1, %ymm0, %ymm0 >>> vpshufb %ymm2, %ymm0, %ymm0...

invalid code generated on Windows x86_64 using skylake-specific features

2017 Oct 03

2

invalid code generated on Windows x86_64 using skylake-specific features

...when run crashes with: >>> >>> Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: >>> Access violation reading location 0xFFFFFFFFFFFFFFFF. >>> >>> The disassembly of the crashed instruction is: >>> >>> 00007FF7C9913BA7 vmovdqa xmmword ptr [rbp-20h],xmm0 >>> >>> There is no callstack or source in the MSVC debugger. The .pdb produced >>> is 64KB exactly. The file was linked with: >>> >>> lld -NOLOGO -DEBUG -MACHINE:X64 /SUBSYSTEM:console -OUT:.\test.exe >>> -NODEFAU...

search for: vmovdqa