search for: movdqu

Displaying 20 results from an estimated 28 matches for "movdqu".

Did you mean: vmovdqu
2016 Aug 05
3
enabling interleaved access loop vectorization
...cycles per lane (!). This is an overestimate we need to fix, since the vectorized code is actually fairly decent - e.g. forcing vectorization, with SSE4.2, we get: .LBB0_3: # %vector.body # =>This Inner Loop Header: Depth=1 movdqu (%rdi,%rax,4), %xmm3 movd %xmm0, %rcx movdqu 4(%rdi,%rcx,4), %xmm4 paddd %xmm3, %xmm4 movdqu 8(%rdi,%rcx,4), %xmm3 paddd %xmm4, %xmm3 movdqa %xmm1, %xmm4 paddq %xmm4, %xmm4 movdqa %xmm0, %xmm5 paddq %xmm5, %xmm5 movd %xmm5, %rcx pextrq $1, %xmm5, %rdx movd %xmm4, %r8 pextrq $1, %xmm4, %r9 movd (%rd...
2016 May 26
2
enabling interleaved access loop vectorization
Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer
2016 Aug 05
2
enabling interleaved access loop vectorization
...e we need to fix, since the vectorized code is > actually fairly decent - e.g. forcing vectorization, with SSE4.2, we get: > > > > .LBB0_3: # %vector.body > > # =>This Inner Loop Header: Depth=1 > > movdqu (%rdi,%rax,4), %xmm3 > > movd %xmm0, %rcx > > movdqu 4(%rdi,%rcx,4), %xmm4 > > paddd %xmm3, %xmm4 > > movdqu 8(%rdi,%rcx,4), %xmm3 > > paddd %xmm4, %xmm3 > > movdqa %xmm1, %xmm4 > > paddq %xmm4, %xmm4 > > movdqa %xmm0, %xmm5 > > paddq %xmm5, %xm...
2016 May 26
0
enabling interleaved access loop vectorization
On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Is there a compile-time and/or potential runtime cost that makes > enableInterleavedAccessVectorization() default to 'false'? > > I notice that this is set to true for ARM, AArch64, and PPC. > > In particular, I'm wondering if there's a reason it's not enabled for
2016 Jun 15
8
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Hello, Currently the loop vectorizer will, by default, not consider vectorization factors that would make it generate types that do not fit into the target platform's vector registers. That is, if the widest scalar type in the scalar loop is i64, and the platform's largest vector register is 256-bit wide, we will not consider a VF above 4. We have a command line option (-mllvm
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
...andq $-8, %rdx pxor %xmm0, %xmm0 pxor %xmm1, %xmm1 .align 16, 0x90 .LBB0_3: # %vector.body # =>This Inner Loop Header: Depth=1 movdqa %xmm1, %xmm2 movdqa %xmm0, %xmm3 movdqu -16(%rdi), %xmm0 movdqu (%rdi), %xmm1 paddd %xmm3, %xmm0 paddd %xmm2, %xmm1 addq $32, %rdi addq $-8, %rdx jne .LBB0_3 # BB#4: movq %r8, %rdi movq %rax, %rdx jmp .LBB0_5 .LBB0_1: pxor %xmm1, %xmm1 .LBB0_5:...
2008 Sep 03
0
[LLVMdev] Instruction MVT::ValueTypes
...hat contains 8- > bit data. > It's just bits, after all, but there is a "preference" to what > should be in > the register for performance reasons. It's not good to mix-and- > match MOVAPD > and MOVAPS on the same data. For the case of MOVAPS vs. MOVAPD vs. MOVDQU (assuming you have a micro-architecture where there's actually a difference), this can be achieved by having instruction selection select the right instructions. For example, find code like this in X86InstrSSE.td: def : Pat<(alignedloadv2i64 addr:$src), (MOVAPSrm addr:$src)>,...
2013 Feb 19
2
[LLVMdev] Is it a bug or am I missing something ?
...e_test: # @sample_test # BB#0: # %L.entry movl 4(%esp), %eax movss 304(%eax), %xmm0 xorps %xmm1, %xmm1 movl 8(%esp), %eax movups %xmm1, 624(%eax) pshufd $65, %xmm0, %xmm0 # xmm0 = xmm0[1,0,0,1] movdqu %xmm0, 608(%eax) ret .Ltmp0: .size sample_test, .Ltmp0-sample_test .section ".note.GNU-stack","", at progbits It seems to me that this sequence of instruction is building vector: <float 'elem_1_of_source', float 'elem_0_of_source...
2017 Oct 11
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
.../x86/crypto/aesni-intel_asm.S index 16627fec80b2..5f73201dff32 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -325,7 +325,8 @@ _get_AAD_rest0\num_initial_blocks\operation: vpshufb and an array of shuffle masks */ movq %r12, %r11 salq $4, %r11 - movdqu aad_shift_arr(%r11), \TMP1 + leaq aad_shift_arr(%rip), %rax + movdqu (%rax,%r11,), \TMP1 PSHUFB_XMM \TMP1, %xmm\i _get_AAD_rest_final\num_initial_blocks\operation: PSHUFB_XMM %xmm14, %xmm\i # byte-reflect the AAD data @@ -584,7 +585,8 @@ _get_AAD_rest0\num_initial_blocks\operation:...
2017 Jan 04
3
dovecot-pigeonhole running external script ends with signal 11
...e info available. Cannot access memory at address 0x7fffffffd848 Then I installed debuginfo for glibc via debuginfo-install glibc-2.17-157.el7_3.1.x86_64 and get Program received signal SIGSEGV, Segmentation fault. [Switching to process 18099] __strlen_sse2 () at ../sysdeps/x86_64/strlen.S:31 31 movdqu (%rdi), %xmm1 So this is an issue for glibc developpers? Just wonder why this error does not occur if I call the script directly on cli or if I use sieve-test program. It seems only to occur if the script called from dovecot To compare I tried gdb as well as user vmail and get more detailed infor...
2013 Feb 19
0
[LLVMdev] Is it a bug or am I missing something ?
...e_test: # @sample_test # BB#0: # %L.entry movl 4(%esp), %eax movss 304(%eax), %xmm0 xorps %xmm1, %xmm1 movl 8(%esp), %eax movups %xmm1, 624(%eax) pshufd $65, %xmm0, %xmm0 # xmm0 = xmm0[1,0,0,1] movdqu %xmm0, 608(%eax) ret .Ltmp0: .size sample_test, .Ltmp0-sample_test .section ".note.GNU-stack","", at progbits It seems to me that this sequence of instruction is building vector: <float 'elem_1_of_source', float 'elem_0_of_source...
2008 Sep 03
3
[LLVMdev] Instruction MVT::ValueTypes
On Tuesday 02 September 2008 16:47, Evan Cheng wrote: > On Sep 2, 2008, at 10:42 AM, David Greene wrote: > > Is there an easy way to get the MVT::ValueType of a MachineInstruction > > MachineOperand? For example, the register operand of an x86 MOVAPD > > should > > have an MVT::ValueType of v2f64. A MOVAPS register operand should > > have an > >
2017 Jan 04
4
dovecot-pigeonhole running external script ends with signal 11
...gt; >> Then I installed debuginfo for glibc via debuginfo-install >> glibc-2.17-157.el7_3.1.x86_64 and get >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to process 18099] >> __strlen_sse2 () at ../sysdeps/x86_64/strlen.S:31 >> 31 movdqu (%rdi), %xmm1 >> >> So this is an issue for glibc developpers? Just wonder why this error >> does not occur if I call the script directly on cli or if I use >> sieve-test program. It seems only to occur if the script called from dovecot >> >> To compare I tried g...
2011 Dec 28
1
[LLVMdev] Codegen for vector float->double cast fails on x86 above SSE3
...%xmm1, %xmm1 unpcklpd %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[0] movupd %xmm1, (%rsi) ret Load both, cast float to double (cvtss2sd), pack vectors, and store. But with llc -mcpu=penryn or greater, it yields nonsense: movq (%rdi), %xmm0 pshufd $16, %xmm0, %xmm0 ## xmm0 = xmm0[0,0,1,0] movdqu %xmm0, (%rsi) ret -------------- next part -------------- A non-text attachment was scrubbed... Name: vec_cast.ll Type: application/octet-stream Size: 406 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111228/3a33b948/attachment.obj> -------------- nex...
2017 Jan 04
0
dovecot-pigeonhole running external script ends with signal 11
...at address 0x7fffffffd848 > > Then I installed debuginfo for glibc via debuginfo-install > glibc-2.17-157.el7_3.1.x86_64 and get > > Program received signal SIGSEGV, Segmentation fault. > [Switching to process 18099] > __strlen_sse2 () at ../sysdeps/x86_64/strlen.S:31 > 31 movdqu (%rdi), %xmm1 > > So this is an issue for glibc developpers? Just wonder why this error > does not occur if I call the script directly on cli or if I use > sieve-test program. It seems only to occur if the script called from dovecot > > To compare I tried gdb as well as user vmail...
2017 May 29
8
[Bug 1152] New: iptables-xml crashed on -D rules
...sr/bin/iptables-xml < /etc/iptables.post <iptables-rules version="1.0"> <!-- # Managed by puppet --> <table name="filter" > Program received signal SIGSEGV, Segmentation fault. __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:165 165 movdqu (%rsi), %xmm2 (gdb) bt #0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:165 #1 0x00000000004041f8 in needChain (chain=0x0) at iptables-xml.c:276 #2 iptables_xml_main (argc=<optimized out>, argv=<optimized out>) at iptables-xml.c:848 #3 0x00007ffff711eb35 in __li...
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to
2017 Jan 03
3
dovecot-pigeonhole running external script ends with signal 11
Hi, I'm running a dovecot 2.2.26 (self compiled) on a Centos 7. I have a sieve script which should run an external script (in filter mode) that encrypts the mail using the users pub key. I configured 90-plugin.conf as follows plugin { sieve_plugins = sieve_extprograms sieve_extensions = +vnd.dovecot.filter sieve_filter_bin_dir = /etc/dovecot/sieve-filters
2017 Oct 11
32
[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization
Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends