Displaying 20 results from an estimated 36 matches for "cmove".
Did you mean:
move
2008 May 27
3
[LLVMdev] Float compare-for-equality and select optimization opportunity
...ucomiss instruction (unordered compare and set
flags). I only used IRBuilder::CreateFCmpOEQ. It also appears to invert the
conditional, for no clear reason. I think it could be rewritten as follows:
movss xmm0,dword ptr [ecx+4]
comiss xmm0,dword ptr [ecx+8]
mov edx,edi
cmove edx,ecx
cmove ecx,esi
cmove esi,edi
Compared to the original C syntax code this looks pretty straightforward.
Curiously, when I replace the compare-for-equality with something like a
less-than, it does generate such compact code (using comiss and cmova). And
the not-equal...
2008 May 27
1
[LLVMdev] Float compare-for-equality and select optimizationopportunity
...ucomiss instruction (unordered compare and set
flags). I only used IRBuilder::CreateFCmpOEQ. It also appears to invert the
conditional, for no clear reason. I think it could be rewritten as follows:
movss xmm0,dword ptr [ecx+4]
comiss xmm0,dword ptr [ecx+8]
mov edx,edi
cmove edx,ecx
cmove ecx,esi
cmove esi,edi
Compared to the original C syntax code this looks pretty straightforward.
Curiously, when I replace the compare-for-equality with something like a
less-than, it does generate such compact code (using comiss and cmova). And
the not-equal...
2008 May 27
0
[LLVMdev] Float compare-for-equality and select optimizationopportunity
...ucomiss instruction (unordered compare and set
flags). I only used IRBuilder::CreateFCmpOEQ. It also appears to invert the
conditional, for no clear reason. I think it could be rewritten as follows:
movss xmm0,dword ptr [ecx+4]
comiss xmm0,dword ptr [ecx+8]
mov edx,edi
cmove edx,ecx
cmove ecx,esi
cmove esi,edi
Compared to the original C syntax code this looks pretty straightforward.
Curiously, when I replace the compare-for-equality with something like a
less-than, it does generate such compact code (using comiss and cmova). And
the not-equal...
2008 May 27
1
[LLVMdev] Float compare-for-equality andselect optimizationopportunity
...ucomiss instruction (unordered compare and set
flags). I only used IRBuilder::CreateFCmpOEQ. It also appears to invert the
conditional, for no clear reason. I think it could be rewritten as follows:
movss xmm0,dword ptr [ecx+4]
comiss xmm0,dword ptr [ecx+8]
mov edx,edi
cmove edx,ecx
cmove ecx,esi
cmove esi,edi
Compared to the original C syntax code this looks pretty straightforward.
Curiously, when I replace the compare-for-equality with something like a
less-than, it does generate such compact code (using comiss and cmova). And
the not-equal...
2013 Sep 12
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...en by LEA.
332: mov eax,0x7
337: mov rsi,rbx
33a: cpuid
33c: xchg rsi,rbx
33f: and esi,0x20
342: shr esi,0x5
345: lea rbp,[rip+0x0] # 34c <llvm::sys::getHostCPUName()+0xbc>
34c: lea r12,[rip+0x0] # 353 <llvm::sys::getHostCPUName()+0xc3>
353: cmove rbp,r12
357: lea rdi,[rsp+0x188]
35f: lea rsi,[rip+0x0] # 366 <llvm::sys::getHostCPUName()+0xd6>
In both other cases (2) & (3) SI is saved into stack region.
> I promise I'll do the review of your code after that.
Thanks.
Regards,
--
Adam
2013 Sep 13
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...; 33a: cpuid
> 33c: xchg rsi,rbx
> 33f: and esi,0x20
> 342: shr esi,0x5
> 345: lea rbp,[rip+0x0] # 34c
> <llvm::sys::getHostCPUName()+0xbc>
> 34c: lea r12,[rip+0x0] # 353
> <llvm::sys::getHostCPUName()+0xc3>
> 353: cmove rbp,r12
> 357: lea rdi,[rsp+0x188]
> 35f: lea rsi,[rip+0x0] # 366
> <llvm::sys::getHostCPUName()+0xd6>
>
> In both other cases (2) & (3) SI is saved into stack region.
>
> > I promise I'll do the review of your code after that.
>
> Tha...
2018 May 09
3
Ignored branch predictor hints
...";
else return "f";
}
GCC correctly prefers the first case:
b(int):
mov eax, OFFSET FLAT:.LC0
test edi, edi
jne .L7
ret
But Clang seems to ignore _builtin_expect hints in this case.
b(int): # @b(int)
cmp edi, 1
mov eax, offset .L.str.1
mov ecx, offset .L.str.2
cmove rcx, rax
test edi, edi
mov eax, offset .L.str
cmovne rax, rcx
ret
https://godbolt.org/g/tuAVT7
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180509/2e5eaa99/attachment.html>
2013 Sep 13
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Actually there is no miscompile there as esi isn't needed. The flags are
which the cmove is using.
342: shr esi,0x5
345: lea rbp,[rip+0x0] # 34c <llvm::sys::getHostCPUName()+0xbc>
34c: lea r12,[rip+0x0] # 353 <llvm::sys::getHostCPUName()+0xc3>
353: cmove rbp,r12 <- this is dependent on the flags from the shift.
I think your real prob...
2004 Dec 03
5
xen 2.0/2.0.1 reboots silently on via C3-cpu
On my lex mainboard with a Via C3 cpu, xen 2.0(.1) reboots (nearly)
immediately without any outputs.
The same setup/version boots without problems on a via epia M board with
a Via C3-2 cpu (Nehemia), having a rather different chipset.
Starting with option noreboot, nobiostables doesn''t change anything
visible.
Is this a known problem? Or just a knowingly unsupported chipset?
Have you
2009 Mar 03
3
[LLVMdev] Tight overlapping loops and performance
...m via llc:
.text
.align 4,0x90
.globl _main
_main:
subl $12, %esp
movl $1999, %eax
xorl %ecx, %ecx
movl $1999, %edx
.align 4,0x90
LBB1_1: ## loopto
cmpl $1, %eax
leal -1(%eax), %eax
cmove %edx, %eax
incl %ecx
cmpl $999999999, %ecx
jne LBB1_1 ## loopto
LBB1_2: ## bb1
movl %eax, 4(%esp)
movl $LC, (%esp)
call _printf
xorl %eax, %eax
addl $12, %esp
ret
.section __TEXT,__cstring,cs...
2017 Apr 19
3
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
...ce.
>
> I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted.
>
>
> $ more llvm/lib/Target/X86//README-X86-64.txt
> …
> Are we better off using branches instead of cmove to implement FP to
> unsigned i64?
>
> _conv:
> ucomiss LC0(%rip), %xmm0
> cvttss2siq %xmm0, %rdx
> jb L3
> subss LC0(%rip), %xmm0
> movabsq $-9223372036854775808, %rax
> cvttss2siq %xmm0, %rdx
>...
2009 Mar 02
0
[LLVMdev] Tight overlapping loops and performance
On Mon, Mar 2, 2009 at 2:45 PM, Jonathan Turner <probata at hotmail.com> wrote:
> For which version of gcc? I should mention I'm on OS X and using the LLVM
> SVN.
gcc 4.3. It's also possible this is processor-sensitive.
>> First, try looking at the generated code... the code LLVM generates is
>> probably not what you're expecting. I'm getting the
2015 Nov 21
2
Recent -Os code size regressions
...d of stores and branches.
>
> I know a backend change I made to ARM isn't behaving as well as it could,
and I have patches to fix that. Speculatively reverting midend patches
isn't the best way to approach this, in my opinion! :)
>
For i586, the effect of r252152 seems to cause cmoves instead of branches.
Code size increase is +35% for i586.
Unfortunately the object files are wildly different in a way that does not
seem to occur in other workloads. I tried to clip a concise before and
after case.
Before
:
As a reference point, I found OR $0x408 and OR $0x810 in close pro...
2018 May 09
0
Ignored branch predictor hints
...x, OFFSET FLAT:.LC0
>> test edi, edi
>> jne .L7
>> ret
>>
>> But Clang seems to ignore _builtin_expect hints in this case.
>>
>> b(int): # @b(int)
>> cmp edi, 1
>> mov eax, offset .L.str.1
>> mov ecx, offset .L.str.2
>> cmove rcx, rax
>> test edi, edi
>> mov eax, offset .L.str
>> cmovne rax, rcx
>> ret
>>
>> https://godbolt.org/g/tuAVT7
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
&g...
2020 Aug 23
3
clang performing worse than gcc for this loop
...branches but executes more instructions. `perf` reports 32.76% front-end cycles idle with the clang code compared to 24.20% for gcc generated code. Clang generated code seems to perform worse in branch-miss and icache events (as reported by `perf`). But it is not clear why. Are the two back-to-back cmove instructions the reason? Any comments on this?
[cid:image002.png at 01D67897.72235000]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200823/da7c3f2a/attachment-0001.html>
-------------- next part ---...
2018 May 09
2
Ignored branch predictor hints
...mov eax, OFFSET FLAT:.LC0
>> test edi, edi
>> jne .L7
>> ret
>>
>> But Clang seems to ignore _builtin_expect hints in this case.
>> b(int): # @b(int)
>> cmp edi, 1
>> mov eax, offset .L.str.1
>> mov ecx, offset .L.str.2
>> cmove rcx, rax
>> test edi, edi
>> mov eax, offset .L.str
>> cmovne rax, rcx
>> ret
>> https://godbolt.org/g/tuAVT7 <https://godbolt.org/g/tuAVT7>_______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.ll...
2017 Apr 20
4
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
...nt: Wednesday, April 19, 2017 10:14 AM
To: Michael Clark <michaeljclark at mac.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
> Are we better off using branches instead of cmove to implement FP to
unsigned i64?
This seems like it was done for perf reason (mispredict). Conditional-to-cmov transformation should keep from introducing additional observable side-effects, and it's clear that whatever did this did not account for floating point exception.
On Wed, Apr 19, 20...
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...shr eax, 1
mov ecx, edx | # CF is set from the LSB of EAX
xor ecx, 79764919 | sbb edx, edx
test al, 1 | # EDX is 0xFFFFFFFF if CF set, else 0
mov eax, ecx | and edx, 79764919
cmove eax, edx | xor eax, edx
add r8d, 1
jne .LBB1_5
add rdi, 1
test esi, esi
jne .LBB1_4
not eax
ret
.LBB1_1:
xor eax, eax
ret
JFTR: with -O2, the inner loop gets unrolled, using the same n...
2009 Mar 02
3
[LLVMdev] Tight overlapping loops and performance
> Date: Mon, 2 Mar 2009 13:41:45 -0800
> From: eli.friedman at gmail.com
> To: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Tight overlapping loops and performance
>
> Hmm, on my computer, I get around 2.5 seconds with both gcc -O3 and
> llvm-gcc -O3 (using llvm-gcc from svn). Not sure what you're doing
> differently; I wouldn't be surprised if it's
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...| # CF is set from the LSB of EAX
>> xor ecx, 79764919 | sbb edx, edx
>> test al, 1 | # EDX is 0xFFFFFFFF if CF set,
>> else 0
>> mov eax, ecx | and edx, 79764919
>> cmove eax, edx | xor eax, edx
>> add r8d, 1
>> jne .LBB1_5
>> add rdi, 1
>> test esi, esi
>> jne .LBB1_4
>> not eax
>> ret
>> .LBB1_1:
>> xor eax...