search for: vmovdqu

Displaying 18 results from an estimated 18 matches for "vmovdqu".

Did you mean: vmovdqa
2013 Aug 28
3
[PATCH] x86: AVX instruction emulation fixes
...if ( cpu_has_mmx ) + case X86EMUL_FPU_ymm: + if ( cpu_has_avx ) break; default: return X86EMUL_UNHANDLEABLE; @@ -629,6 +641,73 @@ int main(int argc, char **argv) else printf("skipped\n"); + printf("%-40s", "Testing vmovdqu %ymm2,(%ecx)..."); + if ( stack_exec && cpu_has_avx ) + { + extern const unsigned char vmovdqu_to_mem[]; + + asm volatile ( "vpcmpeqb %%xmm2, %%xmm2, %%xmm2\n" + ".pushsection .test, \"a\", @progbits\n" +...
2017 Oct 11
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
...+ vmovdqa .Lpost_tf_hi_s2(%rip), t5; \ filter_8bit(x2, t2, t3, t7, t6); \ filter_8bit(x5, t2, t3, t7, t6); \ \ @@ -443,7 +443,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab) transpose_4x4(c0, c1, c2, c3, a0, a1); \ transpose_4x4(d0, d1, d2, d3, a0, a1); \ \ - vmovdqu .Lshufb_16x16b, a0; \ + vmovdqu .Lshufb_16x16b(%rip), a0; \ vmovdqu st1, a1; \ vpshufb a0, a2, a2; \ vpshufb a0, a3, a3; \ @@ -482,7 +482,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab) #define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \...
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
I assume compiler knows that your only have 2 input values that you just added together 1000 times. Despite the fact that you stored to a[i] and b[i] here, nothing reads them other than the addition in the same loop iteration. So the compiler easily removed the a and b arrays. Same with 'c', it's not read outside the loop so it doesn't need to exist. So the compiler turned your
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...now) > domain changes such as (xmm5 and 0 are initially integers, and are > dead after the store): > vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2] > vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0 > = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11] > vmovdqu %xmm0, 0x20(%rax) > turning into: > vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xmm5[0,0] > vshufps $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm5[1,2] > vmovups %xmm0, 0x20(%rax) > All of these stem from what I think is the same c...
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...and 0 are initially integers, and are >>> dead after the store): >>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2] >>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0 >>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11] >>> vmovdqu %xmm0, 0x20(%rax) >>> turning into: >>> vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xmm5[0,0] >>> vshufps $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 = >>> xmm0[0,2],xmm5[1,2] >>> vmovups %xmm0, 0x20(%rax) >>&g...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...anges such as (xmm5 and 0 are initially integers, and are >> dead after the store): >> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2] >> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0 >> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11] >> vmovdqu %xmm0, 0x20(%rax) >> turning into: >> vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xmm5[0,0] >> vshufps $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 = >> xmm0[0,2],xmm5[1,2] >> vmovups %xmm0, 0x20(%rax) >> > > All of thes...
2018 Apr 26
2
windows ABI problem with i128?
...mov -0x20(%rbp),%rdx 56: 48 8b 4d e8 mov -0x18(%rbp),%rcx 5a: e8 00 00 00 00 callq 5f <_start+0x4f> 5f: 48 89 55 d8 mov %rdx,-0x28(%rbp) 63: 48 89 45 d0 mov %rax,-0x30(%rbp) 67: c5 fa 6f 45 d0 vmovdqu -0x30(%rbp),%xmm0 6c: c5 fa 6f 4d e0 vmovdqu -0x20(%rbp),%xmm1 71: c5 f9 74 c1 vpcmpeqb %xmm1,%xmm0,%xmm0 75: c5 79 d7 c0 vpmovmskb %xmm0,%r8d 79: 41 81 e8 ff ff 00 00 sub $0xffff,%r8d 80: 44 89 45 cc mov %r8d,-0x34...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...integers, and are >>>> dead after the store): >>>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2] >>>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0 >>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11] >>>> vmovdqu %xmm0, 0x20(%rax) >>>> turning into: >>>> vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = >>>> xmm0[2,0],xmm5[0,0] >>>> vshufps $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 = >>>> xmm0[0,2],xmm5[1,2] >>>> vmovu...
2018 Apr 26
0
windows ABI problem with i128?
...dx > 56: 48 8b 4d e8 mov -0x18(%rbp),%rcx > 5a: e8 00 00 00 00 callq 5f <_start+0x4f> > 5f: 48 89 55 d8 mov %rdx,-0x28(%rbp) > 63: 48 89 45 d0 mov %rax,-0x30(%rbp) > 67: c5 fa 6f 45 d0 vmovdqu -0x30(%rbp),%xmm0 > 6c: c5 fa 6f 4d e0 vmovdqu -0x20(%rbp),%xmm1 > 71: c5 f9 74 c1 vpcmpeqb %xmm1,%xmm0,%xmm0 > 75: c5 79 d7 c0 vpmovmskb %xmm0,%r8d > 79: 41 81 e8 ff ff 00 00 sub $0xffff,%r8d > 80: 44 89 45 cc...
2018 Mar 13
32
[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization
Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce
2018 Mar 13
32
[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization
Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to
2018 May 23
33
[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization
Changes: - patch v3: - Update on message to describe longer term PIE goal. - Minor change on ftrace if condition. - Changed code using xchgq. - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace
2018 Apr 26
1
windows ABI problem with i128?
...8b 4d e8 mov -0x18(%rbp),%rcx > > 5a: e8 00 00 00 00 callq 5f <_start+0x4f> > > 5f: 48 89 55 d8 mov %rdx,-0x28(%rbp) > > 63: 48 89 45 d0 mov %rax,-0x30(%rbp) > > 67: c5 fa 6f 45 d0 vmovdqu -0x30(%rbp),%xmm0 > > 6c: c5 fa 6f 4d e0 vmovdqu -0x20(%rbp),%xmm1 > > 71: c5 f9 74 c1 vpcmpeqb %xmm1,%xmm0,%xmm0 > > 75: c5 79 d7 c0 vpmovmskb %xmm0,%r8d > > 79: 41 81 e8 ff ff 00 00 sub $0xffff,%r8d > >...
2017 Oct 11
32
[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization
Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends
2017 Oct 11
32
[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization
Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to