Displaying 20 results from an estimated 59 matches for "movslq".
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...ps %xmm0,%xmm1
400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690
<__dso_handle+0x18>
400511: vcvttps2dq %xmm1,%xmm1
400515: vpmullw 0x183(%rip),%xmm1,%xmm1 # 4006a0
<__dso_handle+0x28>
40051d: vpsubd %xmm1,%xmm0,%xmm0
400521: vmovq %xmm0,%rax
400526: movslq %eax,%rcx
400529: sar $0x20,%rax
40052d: vpextrq $0x1,%xmm0,%rdx
400533: movslq %edx,%rsi
400536: sar $0x20,%rdx
40053a: vmovss 0x4006c0(,%rcx,4),%xmm0
400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0
40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0
400...
2011 Jan 12
2
[LLVMdev] Wrong assembly is written for x86_64 target in JIT without optimization?
...00000800989c00: mov $0x1,%edi
0x0000000800989c05: xor %eax,%eax
0x0000000800989c07: mov $0x800a09060,%rcx
0x0000000800989c11: mov %eax,%esi
0x0000000800989c13: mov %eax,%edx
0x0000000800989c15: callq *%ecx
0x0000000800989c17: movslq %eax,%rcx
0x0000000800989c1a: mov 0xfffffffffffffff8(%rbp),%r8
0x0000000800989c1e: mov (%r8,%rcx,1),%eax
0x0000000800989c22: mov $0x1,%edi
0x0000000800989c27: mov $0x4,%esi
0x0000000800989c2c: xor %edx,%edx
0x0000000800989c2e: mov...
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...imm = 0x2AAAAAAAAAAAAAAB
imulq %rbx
movq %rdx, %rcx
movq %rcx, %rax
shrq $63, %rax
shrq $2, %rcx
addl %eax, %ecx
vpextrq $1, %xmm5, %rax
imulq %rbx
movq %rdx, %rax
shrq $63, %rax
shrq $2, %rdx
addl %eax, %edx
movslq %edx, %rax
vmovq %rax, %xmm5
movslq %ecx, %rax
vmovq %rax, %xmm6
vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150723/4c853c43...
2017 Feb 13
4
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...s_preempted;"
> >> +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
> >> +"__raw_callee_save___kvm_vcpu_is_preempted:"
> >> +FRAME_BEGIN
> >> +"push %rdi;"
> >> +"push %rdx;"
> >> +"movslq %edi, %rdi;"
> >> +"movq $steal_time+16, %rax;"
> >> +"movq __per_cpu_offset(,%rdi,8), %rdx;"
> >> +"cmpb $0, (%rdx,%rax);"
Could we not put the $steal_time+16 displacement as an immediate in the
cmpb and save a whole register...
2017 Feb 13
4
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...s_preempted;"
> >> +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
> >> +"__raw_callee_save___kvm_vcpu_is_preempted:"
> >> +FRAME_BEGIN
> >> +"push %rdi;"
> >> +"push %rdx;"
> >> +"movslq %edi, %rdi;"
> >> +"movq $steal_time+16, %rax;"
> >> +"movq __per_cpu_offset(,%rdi,8), %rdx;"
> >> +"cmpb $0, (%rdx,%rax);"
Could we not put the $steal_time+16 displacement as an immediate in the
cmpb and save a whole register...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...> shrq $63, %rax
> shrq $2, %rdx
> addl %eax, %edx
> movslq %edx, %rax
> vmovq %rax, %xmm5
> movslq %ecx, %rax
> vmovq %rax, %xmm6
> vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0]
AVX2 doesn't have integer vector division ins...
2011 Jan 12
0
[LLVMdev] Wrong assembly is written for x86_64 target in JIT without optimization?
...$0x1,%edi
> 0x0000000800989c05: xor %eax,%eax
> 0x0000000800989c07: mov $0x800a09060,%rcx
> 0x0000000800989c11: mov %eax,%esi
> 0x0000000800989c13: mov %eax,%edx
> 0x0000000800989c15: callq *%ecx
> 0x0000000800989c17: movslq %eax,%rcx
> 0x0000000800989c1a: mov 0xfffffffffffffff8(%rbp),%r8
> 0x0000000800989c1e: mov (%r8,%rcx,1),%eax
> 0x0000000800989c22: mov $0x1,%edi
> 0x0000000800989c27: mov $0x4,%esi
> 0x0000000800989c2c: xor %edx,%edx
> 0x0...
2017 Feb 13
5
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...wrote:
> > On 02/13/2017 05:53 AM, Peter Zijlstra wrote:
> >> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote:
> >>> That way we'd end up with something like:
> >>>
> >>> asm("
> >>> push %rdi;
> >>> movslq %edi, %rdi;
> >>> movq __per_cpu_offset(,%rdi,8), %rax;
> >>> cmpb $0, %[offset](%rax);
> >>> setne %al;
> >>> pop %rdi;
> >>> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct steal_time, preempted)));...
2017 Feb 13
5
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...wrote:
> > On 02/13/2017 05:53 AM, Peter Zijlstra wrote:
> >> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote:
> >>> That way we'd end up with something like:
> >>>
> >>> asm("
> >>> push %rdi;
> >>> movslq %edi, %rdi;
> >>> movq __per_cpu_offset(,%rdi,8), %rax;
> >>> cmpb $0, %[offset](%rax);
> >>> setne %al;
> >>> pop %rdi;
> >>> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct steal_time, preempted)));...
2017 Feb 13
2
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On 02/13/2017 05:53 AM, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote:
>> That way we'd end up with something like:
>>
>> asm("
>> push %rdi;
>> movslq %edi, %rdi;
>> movq __per_cpu_offset(,%rdi,8), %rax;
>> cmpb $0, %[offset](%rax);
>> setne %al;
>> pop %rdi;
>> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct steal_time, preempted)));
>>
>> And if we could get rid of...
2017 Feb 13
2
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On 02/13/2017 05:53 AM, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote:
>> That way we'd end up with something like:
>>
>> asm("
>> push %rdi;
>> movslq %edi, %rdi;
>> movq __per_cpu_offset(,%rdi,8), %rax;
>> cmpb $0, %[offset](%rax);
>> setne %al;
>> pop %rdi;
>> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct steal_time, preempted)));
>>
>> And if we could get rid of...
2017 Feb 10
2
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...ot;
> +".global __raw_callee_save___kvm_vcpu_is_preempted;"
> +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
> +"__raw_callee_save___kvm_vcpu_is_preempted:"
> +FRAME_BEGIN
> +"push %rdi;"
> +"push %rdx;"
> +"movslq %edi, %rdi;"
> +"movq $steal_time+16, %rax;"
> +"movq __per_cpu_offset(,%rdi,8), %rdx;"
> +"cmpb $0, (%rdx,%rax);"
> +"setne %al;"
> +"pop %rdx;"
> +"pop %rdi;"
> +FRAME_END
> +"ret;"
>...
2017 Feb 10
2
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...ot;
> +".global __raw_callee_save___kvm_vcpu_is_preempted;"
> +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
> +"__raw_callee_save___kvm_vcpu_is_preempted:"
> +FRAME_BEGIN
> +"push %rdi;"
> +"push %rdx;"
> +"movslq %edi, %rdi;"
> +"movq $steal_time+16, %rax;"
> +"movq __per_cpu_offset(,%rdi,8), %rdx;"
> +"cmpb $0, (%rdx,%rax);"
> +"setne %al;"
> +"pop %rdx;"
> +"pop %rdi;"
> +FRAME_END
> +"ret;"
>...
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...> shrq $63, %rax
>> shrq $2, %rcx
>> addl %eax, %ecx
>> vpextrq $1, %xmm5, %rax
>> imulq %rbx
>> movq %rdx, %rax
>> shrq $63, %rax
>> shrq $2, %rdx
>> addl %eax, %edx
>> movslq %edx, %rax
>> vmovq %rax, %xmm5
>> movslq %ecx, %rax
>> vmovq %rax, %xmm6
>> vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0]
> AVX2 doesn't have integer vector division instructions and LLVM lowers divides by constants into (128 bit)...
2016 Jul 29
2
PIC preferred too strongly, even at CodeModel::Large?
...xpected this to be taken care of. Isn't
>> that what gcc would do given a Large CodeModel?
>
>
> This sounds like a bug, but I can't reproduce it. Testcase?
I've attached an example with a standard switch instruction, compiled
with `llc -code-model=large`. It produces:
movslq (%rax,%rdi,4), %rsi
addq %rax, %rsi
jmpq *%rsi
>> Second, is it okay to
>> silently fold into Reloc::PIC_ in this case and leave the user with
>> sporadic crashes?
>
>
> Large code model and PIC should be compatible.
Technically, yes. My understanding is that, instead o...
2016 Oct 12
4
[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences.
>
The following tests pass at "-O3" and
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...> shrq $2, %rcx
>>> addl %eax, %ecx
>>> vpextrq $1, %xmm5, %rax
>>> imulq %rbx
>>> movq %rdx, %rax
>>> shrq $63, %rax
>>> shrq $2, %rdx
>>> addl %eax, %edx
>>> movslq %edx, %rax
>>> vmovq %rax, %xmm5
>>> movslq %ecx, %rax
>>> vmovq %rax, %xmm6
>>> vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0]
>>>
>> AVX2 doesn't have integer vector division instructions and LLVM lowers
&g...
2017 Feb 13
3
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On February 13, 2017 2:53:43 AM PST, Peter Zijlstra <peterz at infradead.org> wrote:
>On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote:
>> That way we'd end up with something like:
>>
>> asm("
>> push %rdi;
>> movslq %edi, %rdi;
>> movq __per_cpu_offset(,%rdi,8), %rax;
>> cmpb $0, %[offset](%rax);
>> setne %al;
>> pop %rdi;
>> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct
>steal_time, preempted)));
>>
>> And if we could get ri...
2017 Feb 13
3
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On February 13, 2017 2:53:43 AM PST, Peter Zijlstra <peterz at infradead.org> wrote:
>On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote:
>> That way we'd end up with something like:
>>
>> asm("
>> push %rdi;
>> movslq %edi, %rdi;
>> movq __per_cpu_offset(,%rdi,8), %rax;
>> cmpb $0, %[offset](%rax);
>> setne %al;
>> pop %rdi;
>> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct
>steal_time, preempted)));
>>
>> And if we could get ri...
2005 Mar 23
3
[PATCH] promised MMX patches rc1
Hello,
Here is my first speedup patch. Like 10-11%. No IDCT yet.
Please feel free to comment my code or even better think about
improvements. :) I belive my routines are not so bad, maybe
one day they will be even more faster.
What needs to be optimized is the loop filter fuction. I have
no ideas now how to do it. It does not leave much space for parallel
stuff, copying memory from lot of