search for: movslq

Displaying 20 results from an estimated 59 matches for "movslq".

2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...ps %xmm0,%xmm1 400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690 <__dso_handle+0x18> 400511: vcvttps2dq %xmm1,%xmm1 400515: vpmullw 0x183(%rip),%xmm1,%xmm1 # 4006a0 <__dso_handle+0x28> 40051d: vpsubd %xmm1,%xmm0,%xmm0 400521: vmovq %xmm0,%rax 400526: movslq %eax,%rcx 400529: sar $0x20,%rax 40052d: vpextrq $0x1,%xmm0,%rdx 400533: movslq %edx,%rsi 400536: sar $0x20,%rdx 40053a: vmovss 0x4006c0(,%rcx,4),%xmm0 400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0 40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0 400...
2011 Jan 12
2
[LLVMdev] Wrong assembly is written for x86_64 target in JIT without optimization?
...00000800989c00: mov $0x1,%edi 0x0000000800989c05: xor %eax,%eax 0x0000000800989c07: mov $0x800a09060,%rcx 0x0000000800989c11: mov %eax,%esi 0x0000000800989c13: mov %eax,%edx 0x0000000800989c15: callq *%ecx 0x0000000800989c17: movslq %eax,%rcx 0x0000000800989c1a: mov 0xfffffffffffffff8(%rbp),%r8 0x0000000800989c1e: mov (%r8,%rcx,1),%eax 0x0000000800989c22: mov $0x1,%edi 0x0000000800989c27: mov $0x4,%esi 0x0000000800989c2c: xor %edx,%edx 0x0000000800989c2e: mov...
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...imm = 0x2AAAAAAAAAAAAAAB imulq %rbx movq %rdx, %rcx movq %rcx, %rax shrq $63, %rax shrq $2, %rcx addl %eax, %ecx vpextrq $1, %xmm5, %rax imulq %rbx movq %rdx, %rax shrq $63, %rax shrq $2, %rdx addl %eax, %edx movslq %edx, %rax vmovq %rax, %xmm5 movslq %ecx, %rax vmovq %rax, %xmm6 vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150723/4c853c43...
2017 Feb 13
4
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...s_preempted;" > >> +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;" > >> +"__raw_callee_save___kvm_vcpu_is_preempted:" > >> +FRAME_BEGIN > >> +"push %rdi;" > >> +"push %rdx;" > >> +"movslq %edi, %rdi;" > >> +"movq $steal_time+16, %rax;" > >> +"movq __per_cpu_offset(,%rdi,8), %rdx;" > >> +"cmpb $0, (%rdx,%rax);" Could we not put the $steal_time+16 displacement as an immediate in the cmpb and save a whole register...
2017 Feb 13
4
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...s_preempted;" > >> +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;" > >> +"__raw_callee_save___kvm_vcpu_is_preempted:" > >> +FRAME_BEGIN > >> +"push %rdi;" > >> +"push %rdx;" > >> +"movslq %edi, %rdi;" > >> +"movq $steal_time+16, %rax;" > >> +"movq __per_cpu_offset(,%rdi,8), %rdx;" > >> +"cmpb $0, (%rdx,%rax);" Could we not put the $steal_time+16 displacement as an immediate in the cmpb and save a whole register...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...> shrq $63, %rax > shrq $2, %rdx > addl %eax, %edx > movslq %edx, %rax > vmovq %rax, %xmm5 > movslq %ecx, %rax > vmovq %rax, %xmm6 > vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0] AVX2 doesn't have integer vector division ins...
2011 Jan 12
0
[LLVMdev] Wrong assembly is written for x86_64 target in JIT without optimization?
...$0x1,%edi > 0x0000000800989c05: xor %eax,%eax > 0x0000000800989c07: mov $0x800a09060,%rcx > 0x0000000800989c11: mov %eax,%esi > 0x0000000800989c13: mov %eax,%edx > 0x0000000800989c15: callq *%ecx > 0x0000000800989c17: movslq %eax,%rcx > 0x0000000800989c1a: mov 0xfffffffffffffff8(%rbp),%r8 > 0x0000000800989c1e: mov (%r8,%rcx,1),%eax > 0x0000000800989c22: mov $0x1,%edi > 0x0000000800989c27: mov $0x4,%esi > 0x0000000800989c2c: xor %edx,%edx > 0x0...
2017 Feb 13
5
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...wrote: > > On 02/13/2017 05:53 AM, Peter Zijlstra wrote: > >> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote: > >>> That way we'd end up with something like: > >>> > >>> asm(" > >>> push %rdi; > >>> movslq %edi, %rdi; > >>> movq __per_cpu_offset(,%rdi,8), %rax; > >>> cmpb $0, %[offset](%rax); > >>> setne %al; > >>> pop %rdi; > >>> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct steal_time, preempted)));...
2017 Feb 13
5
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...wrote: > > On 02/13/2017 05:53 AM, Peter Zijlstra wrote: > >> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote: > >>> That way we'd end up with something like: > >>> > >>> asm(" > >>> push %rdi; > >>> movslq %edi, %rdi; > >>> movq __per_cpu_offset(,%rdi,8), %rax; > >>> cmpb $0, %[offset](%rax); > >>> setne %al; > >>> pop %rdi; > >>> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct steal_time, preempted)));...
2017 Feb 13
2
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On 02/13/2017 05:53 AM, Peter Zijlstra wrote: > On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote: >> That way we'd end up with something like: >> >> asm(" >> push %rdi; >> movslq %edi, %rdi; >> movq __per_cpu_offset(,%rdi,8), %rax; >> cmpb $0, %[offset](%rax); >> setne %al; >> pop %rdi; >> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct steal_time, preempted))); >> >> And if we could get rid of...
2017 Feb 13
2
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On 02/13/2017 05:53 AM, Peter Zijlstra wrote: > On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote: >> That way we'd end up with something like: >> >> asm(" >> push %rdi; >> movslq %edi, %rdi; >> movq __per_cpu_offset(,%rdi,8), %rax; >> cmpb $0, %[offset](%rax); >> setne %al; >> pop %rdi; >> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct steal_time, preempted))); >> >> And if we could get rid of...
2017 Feb 10
2
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...ot; > +".global __raw_callee_save___kvm_vcpu_is_preempted;" > +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;" > +"__raw_callee_save___kvm_vcpu_is_preempted:" > +FRAME_BEGIN > +"push %rdi;" > +"push %rdx;" > +"movslq %edi, %rdi;" > +"movq $steal_time+16, %rax;" > +"movq __per_cpu_offset(,%rdi,8), %rdx;" > +"cmpb $0, (%rdx,%rax);" > +"setne %al;" > +"pop %rdx;" > +"pop %rdi;" > +FRAME_END > +"ret;" >...
2017 Feb 10
2
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...ot; > +".global __raw_callee_save___kvm_vcpu_is_preempted;" > +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;" > +"__raw_callee_save___kvm_vcpu_is_preempted:" > +FRAME_BEGIN > +"push %rdi;" > +"push %rdx;" > +"movslq %edi, %rdi;" > +"movq $steal_time+16, %rax;" > +"movq __per_cpu_offset(,%rdi,8), %rdx;" > +"cmpb $0, (%rdx,%rax);" > +"setne %al;" > +"pop %rdx;" > +"pop %rdi;" > +FRAME_END > +"ret;" >...
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...> shrq $63, %rax >> shrq $2, %rcx >> addl %eax, %ecx >> vpextrq $1, %xmm5, %rax >> imulq %rbx >> movq %rdx, %rax >> shrq $63, %rax >> shrq $2, %rdx >> addl %eax, %edx >> movslq %edx, %rax >> vmovq %rax, %xmm5 >> movslq %ecx, %rax >> vmovq %rax, %xmm6 >> vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0] > AVX2 doesn't have integer vector division instructions and LLVM lowers divides by constants into (128 bit)...
2016 Jul 29
2
PIC preferred too strongly, even at CodeModel::Large?
...xpected this to be taken care of. Isn't >> that what gcc would do given a Large CodeModel? > > > This sounds like a bug, but I can't reproduce it. Testcase? I've attached an example with a standard switch instruction, compiled with `llc -code-model=large`. It produces: movslq (%rax,%rdi,4), %rsi addq %rax, %rsi jmpq *%rsi >> Second, is it okay to >> silently fold into Reloc::PIC_ in this case and leave the user with >> sporadic crashes? > > > Large code model and PIC should be compatible. Technically, yes. My understanding is that, instead o...
2016 Oct 12
4
[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote: > I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences. > The following tests pass at "-O3" and
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...> shrq $2, %rcx >>> addl %eax, %ecx >>> vpextrq $1, %xmm5, %rax >>> imulq %rbx >>> movq %rdx, %rax >>> shrq $63, %rax >>> shrq $2, %rdx >>> addl %eax, %edx >>> movslq %edx, %rax >>> vmovq %rax, %xmm5 >>> movslq %ecx, %rax >>> vmovq %rax, %xmm6 >>> vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0] >>> >> AVX2 doesn't have integer vector division instructions and LLVM lowers &g...
2017 Feb 13
3
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On February 13, 2017 2:53:43 AM PST, Peter Zijlstra <peterz at infradead.org> wrote: >On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote: >> That way we'd end up with something like: >> >> asm(" >> push %rdi; >> movslq %edi, %rdi; >> movq __per_cpu_offset(,%rdi,8), %rax; >> cmpb $0, %[offset](%rax); >> setne %al; >> pop %rdi; >> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct >steal_time, preempted))); >> >> And if we could get ri...
2017 Feb 13
3
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On February 13, 2017 2:53:43 AM PST, Peter Zijlstra <peterz at infradead.org> wrote: >On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote: >> That way we'd end up with something like: >> >> asm(" >> push %rdi; >> movslq %edi, %rdi; >> movq __per_cpu_offset(,%rdi,8), %rax; >> cmpb $0, %[offset](%rax); >> setne %al; >> pop %rdi; >> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct >steal_time, preempted))); >> >> And if we could get ri...
2005 Mar 23
3
[PATCH] promised MMX patches rc1
Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of