Displaying 18 results from an estimated 18 matches for "vmovdqu".
Did you mean:
vmovdqa
2013 Aug 28
3
[PATCH] x86: AVX instruction emulation fixes
...if ( cpu_has_mmx )
+ case X86EMUL_FPU_ymm:
+ if ( cpu_has_avx )
break;
default:
return X86EMUL_UNHANDLEABLE;
@@ -629,6 +641,73 @@ int main(int argc, char **argv)
else
printf("skipped\n");
+ printf("%-40s", "Testing vmovdqu %ymm2,(%ecx)...");
+ if ( stack_exec && cpu_has_avx )
+ {
+ extern const unsigned char vmovdqu_to_mem[];
+
+ asm volatile ( "vpcmpeqb %%xmm2, %%xmm2, %%xmm2\n"
+ ".pushsection .test, \"a\", @progbits\n"
+...
2017 Oct 11
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
...+ vmovdqa .Lpost_tf_hi_s2(%rip), t5; \
filter_8bit(x2, t2, t3, t7, t6); \
filter_8bit(x5, t2, t3, t7, t6); \
\
@@ -443,7 +443,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vmovdqu .Lshufb_16x16b, a0; \
+ vmovdqu .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -482,7 +482,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
#define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \...
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
I assume compiler knows that your only have 2 input values that you just
added together 1000 times.
Despite the fact that you stored to a[i] and b[i] here, nothing reads them
other than the addition in the same loop iteration. So the compiler easily
removed the a and b arrays. Same with 'c', it's not read outside the loop
so it doesn't need to exist. So the compiler turned your
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...now)
> domain changes such as (xmm5 and 0 are initially integers, and are
> dead after the store):
> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2]
> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
> vmovdqu %xmm0, 0x20(%rax)
> turning into:
> vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xmm5[0,0]
> vshufps $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm5[1,2]
> vmovups %xmm0, 0x20(%rax)
>
All of these stem from what I think is the same c...
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...and 0 are initially integers, and are
>>> dead after the store):
>>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2]
>>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>>> vmovdqu %xmm0, 0x20(%rax)
>>> turning into:
>>> vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xmm5[0,0]
>>> vshufps $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 =
>>> xmm0[0,2],xmm5[1,2]
>>> vmovups %xmm0, 0x20(%rax)
>>&g...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...anges such as (xmm5 and 0 are initially integers, and are
>> dead after the store):
>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2]
>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>> vmovdqu %xmm0, 0x20(%rax)
>> turning into:
>> vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xmm5[0,0]
>> vshufps $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 =
>> xmm0[0,2],xmm5[1,2]
>> vmovups %xmm0, 0x20(%rax)
>>
>
> All of thes...
2018 Apr 26
2
windows ABI problem with i128?
...mov -0x20(%rbp),%rdx
56: 48 8b 4d e8 mov -0x18(%rbp),%rcx
5a: e8 00 00 00 00 callq 5f <_start+0x4f>
5f: 48 89 55 d8 mov %rdx,-0x28(%rbp)
63: 48 89 45 d0 mov %rax,-0x30(%rbp)
67: c5 fa 6f 45 d0 vmovdqu -0x30(%rbp),%xmm0
6c: c5 fa 6f 4d e0 vmovdqu -0x20(%rbp),%xmm1
71: c5 f9 74 c1 vpcmpeqb %xmm1,%xmm0,%xmm0
75: c5 79 d7 c0 vpmovmskb %xmm0,%r8d
79: 41 81 e8 ff ff 00 00 sub $0xffff,%r8d
80: 44 89 45 cc mov %r8d,-0x34...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...integers, and are
>>>> dead after the store):
>>>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2]
>>>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>>>> vmovdqu %xmm0, 0x20(%rax)
>>>> turning into:
>>>> vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 =
>>>> xmm0[2,0],xmm5[0,0]
>>>> vshufps $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 =
>>>> xmm0[0,2],xmm5[1,2]
>>>> vmovu...
2018 Apr 26
0
windows ABI problem with i128?
...dx
> 56: 48 8b 4d e8 mov -0x18(%rbp),%rcx
> 5a: e8 00 00 00 00 callq 5f <_start+0x4f>
> 5f: 48 89 55 d8 mov %rdx,-0x28(%rbp)
> 63: 48 89 45 d0 mov %rax,-0x30(%rbp)
> 67: c5 fa 6f 45 d0 vmovdqu -0x30(%rbp),%xmm0
> 6c: c5 fa 6f 4d e0 vmovdqu -0x20(%rbp),%xmm1
> 71: c5 f9 74 c1 vpcmpeqb %xmm1,%xmm0,%xmm0
> 75: c5 79 d7 c0 vpmovmskb %xmm0,%r8d
> 79: 41 81 e8 ff ff 00 00 sub $0xffff,%r8d
> 80: 44 89 45 cc...
2018 Mar 13
32
[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce
2018 Mar 13
32
[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.
Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.
Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to
2018 May 23
33
[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v3:
- Update on message to describe longer term PIE goal.
- Minor change on ftrace if condition.
- Changed code using xchgq.
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace
2018 Apr 26
1
windows ABI problem with i128?
...8b 4d e8 mov -0x18(%rbp),%rcx
> > 5a: e8 00 00 00 00 callq 5f <_start+0x4f>
> > 5f: 48 89 55 d8 mov %rdx,-0x28(%rbp)
> > 63: 48 89 45 d0 mov %rax,-0x30(%rbp)
> > 67: c5 fa 6f 45 d0 vmovdqu -0x30(%rbp),%xmm0
> > 6c: c5 fa 6f 4d e0 vmovdqu -0x20(%rbp),%xmm1
> > 71: c5 f9 74 c1 vpcmpeqb %xmm1,%xmm0,%xmm0
> > 75: c5 79 d7 c0 vpmovmskb %xmm0,%r8d
> > 79: 41 81 e8 ff ff 00 00 sub $0xffff,%r8d
> >...
2017 Oct 11
32
[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce dynamic relocation space on
mapped memory. It also simplifies the relocation process.
- Move the start the module section next to the kernel. Remove the need for
-mcmodel=large on modules. Extends
2017 Oct 11
32
[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce dynamic relocation space on
mapped memory. It also simplifies the relocation process.
- Move the start the module section next to the kernel. Remove the need for
-mcmodel=large on modules. Extends
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers!
I would like to flip on another chunk of the new vector shuffling,
specifically the logic to mark ~all shuffles as "legal".
This can be tested today with the flag
"-x86-experimental-vector-shuffle-legality". I would essentially like to
make this the default (by removing the "false" path). Doing this will allow
me to