Displaying 20 results from an estimated 21 matches for "vmovdqa".
2017 Oct 11
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
...h/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index f7c495e2863c..46feaea52632 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -52,10 +52,10 @@
/* \
* S-function with AES subbytes \
*/ \
- vmovdqa .Linv_shift_row, t4; \
- vbroadcastss .L0f0f0f0f, t7; \
- vmovdqa .Lpre_tf_lo_s1, t0; \
- vmovdqa .Lpre_tf_hi_s1, t1; \
+ vmovdqa .Linv_shift_row(%rip), t4; \
+ vbroadcastss .L0f0f0f0f(%rip), t7; \
+ vmovdqa .Lpre_tf_lo_s1(%rip), t0; \
+ vmovdqa .Lpre_tf_hi_s1(%rip), t1; \
\
/* AES inverse shif...
2018 Mar 13
32
[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce
2018 Mar 13
32
[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce
2018 May 23
33
[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v3:
- Update on message to describe longer term PIE goal.
- Minor change on ftrace if condition.
- Changed code using xchgq.
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace
2013 Sep 05
1
[LLVMdev] AVX calling convention?
...to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually.
I tracked this down to the following. The call site looks like
vmovdqa 24064(%rsp), %ymm0
vmovdqa %ymm0, (%rsp)
vzeroupper
callq __Z14convert_char16Dv16_s
which passes the argument on the stack. The callee, however, begins with
__Z14convert_char16Dv16_s: ## @_Z14convert_char16Dv16_s
.cfi_startproc
## BB#0: ## %entry
p...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...-----------------------------------------------
# BB#3: # %if.then.i.i.i.i.i.i
vpsllq $3, %xmm0, %xmm0
vpextrq $1, %xmm0, %rbx
movq %rbx, %rdi
vmovaps %xmm2, 96(%rsp) # 16-byte Spill
vmovaps %xmm5, 64(%rsp) # 16-byte Spill
vmovdqa %xmm6, 16(%rsp) # 16-byte Spill
callq _Znam
movq %rax, 128(%rsp)
movq 16(%r12), %rsi
movq %rax, %rdi
movq %rbx, %rdx
callq memmove
vmovdqa 16(%rsp), %xmm6 # 16-byte Reload
vmovaps 64(%rsp), %xmm5 # 16-byte Reload
vmovaps 96...
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.
Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.
Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to
2015 Jul 24
1
[LLVMdev] SIMD for sdiv <2 x i64>
...--------
>
> # BB#3: # %if.then.i.i.i.i.i.i
> vpsllq $3, %xmm0, %xmm0
> vpextrq $1, %xmm0, %rbx
> movq %rbx, %rdi
> vmovaps %xmm2, 96(%rsp) # 16-byte Spill
> vmovaps %xmm5, 64(%rsp) # 16-byte Spill
> vmovdqa %xmm6, 16(%rsp) # 16-byte Spill
> callq _Znam
> movq %rax, 128(%rsp)
> movq 16(%r12), %rsi
> movq %rax, %rdi
> movq %rbx, %rdx
> callq memmove
> vmovdqa 16(%rsp), %xmm6 # 16-byte Reload
> vmovaps 64(%rsp), %xmm5...
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
On 07/24/2015 03:42 AM, Benjamin Kramer wrote:
>> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote:
>>
>> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing
2017 Oct 11
32
[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce dynamic relocation space on
mapped memory. It also simplifies the relocation process.
- Move the start the module section next to the kernel. Remove the need for
-mcmodel=large on modules. Extends
2017 Oct 11
32
[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce dynamic relocation space on
mapped memory. It also simplifies the relocation process.
- Move the start the module section next to the kernel. Remove the need for
-mcmodel=large on modules. Extends
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...eger lane accessor was used.
Output from clang 3.4 for target corei7-avx:
$ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math
-march=native -mtune=native -DSPILLING_ENSUES=0 /* no spilling */
$ objdump -dC --no-show-raw-insn ./a.out
...
00000000004004f0 <main>:
4004f0: vmovdqa 0x2004c8(%rip),%xmm0 # 6009c0 <x>
4004f8: vpsrld $0x17,%xmm0,%xmm0
4004fd: vpaddd 0x17b(%rip),%xmm0,%xmm0 # 400680
<__dso_handle+0x8>
400505: vcvtdq2ps %xmm0,%xmm1
400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690
<__dso_handle+0x18>
400511:...
2017 Sep 30
2
invalid code generated on Windows x86_64 using skylake-specific features
...,+xsaveopt,-sha,+adx,-avx512pf,+sse3
It successfully creates a binary, but the binary when run crashes with:
Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: Access
violation reading location 0xFFFFFFFFFFFFFFFF.
The disassembly of the crashed instruction is:
00007FF7C9913BA7 vmovdqa xmmword ptr [rbp-20h],xmm0
There is no callstack or source in the MSVC debugger. The .pdb produced is
64KB exactly. The file was linked with:
lld -NOLOGO -DEBUG -MACHINE:X64 /SUBSYSTEM:console -OUT:.\test.exe
-NODEFAULTLIB -ENTRY:_start ./zig-cache/test.obj ./zig-cache/builtin.obj
./zig-cache...
2017 Oct 01
1
invalid code generated on Windows x86_64 using skylake-specific features
...;
> It successfully creates a binary, but the binary when run crashes with:
>
> Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: Access
> violation reading location 0xFFFFFFFFFFFFFFFF.
>
> The disassembly of the crashed instruction is:
>
> 00007FF7C9913BA7 vmovdqa xmmword ptr [rbp-20h],xmm0
>
> There is no callstack or source in the MSVC debugger. The .pdb produced is
> 64KB exactly. The file was linked with:
>
> lld -NOLOGO -DEBUG -MACHINE:X64 /SUBSYSTEM:console -OUT:.\test.exe
> -NODEFAULTLIB -ENTRY:_start ./zig-cache/test.obj ./zig-c...
2017 Feb 18
2
Vector trunc code generation difference between llvm-3.9 and 4.0
...s through the backend and
> produces worse code even for x86 with AVX2:
> before:
> vmovd %edi, %xmm1
> vpmovzxwq %xmm1, %xmm1
> vpsraw %xmm1, %xmm0, %xmm0
> retq
>
> after:
> vmovd %edi, %xmm1
> vpbroadcastd %xmm1, %ymm1
> vmovdqa LCPI1_0(%rip), %ymm2
> vpshufb %ymm2, %ymm1, %ymm1
> vpermq $232, %ymm1, %ymm1
> vpmovzxwd %xmm1, %ymm1
> vpmovsxwd %xmm0, %ymm0
> vpsravd %ymm1, %ymm0, %ymm0
> vpshufb %ymm2, %ymm0, %ymm0
> vpermq $232, %ymm0, %ymm0
>...
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...gt; %cmp.zeroS53_D to <2 x i64>
%BCS54_D = bitcast <2 x i64> %sextS54_D to i128
%mskS54_D = icmp ne i128 %BCS54_D, 0
br i1 %mskS54_D, label %middle.block, label %vector.ph
Now the assembly for the above IR code is:
# BB#4: # %for.cond.preheader
vmovdqa 144(%rsp), %xmm0 # 16-byte Reload
vpmuludq %xmm7, %xmm0, %xmm2
vpsrlq $32, %xmm7, %xmm4
vpmuludq %xmm4, %xmm0, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpsrlq $32, %xmm0, %xmm4
vpmuludq %xmm7, %xmm4, %xmm4
vpsllq $32, %xmm4, %xmm...
2017 Feb 17
2
Vector trunc code generation difference between llvm-3.9 and 4.0
Correction in the C snippet:
typedef signed short v8i16_t __attribute__((ext_vector_type(8)));
v8i16_t foo (v8i16_t a, int n)
{
return a >> n;
}
Best regards
Saurabh
On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com>
wrote:
> Hello,
>
> We are investigating a difference in code generation for vector splat
> instructions between llvm-3.9
2017 Mar 08
2
Vector trunc code generation difference between llvm-3.9 and 4.0
...gt;> before:
>>> vmovd %edi, %xmm1
>>> vpmovzxwq %xmm1, %xmm1
>>> vpsraw %xmm1, %xmm0, %xmm0
>>> retq
>>>
>>> after:
>>> vmovd %edi, %xmm1
>>> vpbroadcastd %xmm1, %ymm1
>>> vmovdqa LCPI1_0(%rip), %ymm2
>>> vpshufb %ymm2, %ymm1, %ymm1
>>> vpermq $232, %ymm1, %ymm1
>>> vpmovzxwd %xmm1, %ymm1
>>> vpmovsxwd %xmm0, %ymm0
>>> vpsravd %ymm1, %ymm0, %ymm0
>>> vpshufb %ymm2, %ymm0, %ymm0...
2017 Oct 03
2
invalid code generated on Windows x86_64 using skylake-specific features
...when run crashes with:
>>>
>>> Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005:
>>> Access violation reading location 0xFFFFFFFFFFFFFFFF.
>>>
>>> The disassembly of the crashed instruction is:
>>>
>>> 00007FF7C9913BA7 vmovdqa xmmword ptr [rbp-20h],xmm0
>>>
>>> There is no callstack or source in the MSVC debugger. The .pdb produced
>>> is 64KB exactly. The file was linked with:
>>>
>>> lld -NOLOGO -DEBUG -MACHINE:X64 /SUBSYSTEM:console -OUT:.\test.exe
>>> -NODEFAU...