thr3ads.net - search: "cmove"

[LLVMdev] Float compare-for-equality and select optimization opportunity

2008 May 27

3

[LLVMdev] Float compare-for-equality and select optimization opportunity

...ucomiss instruction (unordered compare and set flags). I only used IRBuilder::CreateFCmpOEQ. It also appears to invert the conditional, for no clear reason. I think it could be rewritten as follows: movss xmm0,dword ptr [ecx+4] comiss xmm0,dword ptr [ecx+8] mov edx,edi cmove edx,ecx cmove ecx,esi cmove esi,edi Compared to the original C syntax code this looks pretty straightforward. Curiously, when I replace the compare-for-equality with something like a less-than, it does generate such compact code (using comiss and cmova). And the not-equal...

[LLVMdev] Float compare-for-equality and select optimizationopportunity

2008 May 27

1

[LLVMdev] Float compare-for-equality and select optimizationopportunity

...ucomiss instruction (unordered compare and set flags). I only used IRBuilder::CreateFCmpOEQ. It also appears to invert the conditional, for no clear reason. I think it could be rewritten as follows: movss xmm0,dword ptr [ecx+4] comiss xmm0,dword ptr [ecx+8] mov edx,edi cmove edx,ecx cmove ecx,esi cmove esi,edi Compared to the original C syntax code this looks pretty straightforward. Curiously, when I replace the compare-for-equality with something like a less-than, it does generate such compact code (using comiss and cmova). And the not-equal...

[LLVMdev] Float compare-for-equality and select optimizationopportunity

2008 May 27

0

[LLVMdev] Float compare-for-equality and select optimizationopportunity

...ucomiss instruction (unordered compare and set flags). I only used IRBuilder::CreateFCmpOEQ. It also appears to invert the conditional, for no clear reason. I think it could be rewritten as follows: movss xmm0,dword ptr [ecx+4] comiss xmm0,dword ptr [ecx+8] mov edx,edi cmove edx,ecx cmove ecx,esi cmove esi,edi Compared to the original C syntax code this looks pretty straightforward. Curiously, when I replace the compare-for-equality with something like a less-than, it does generate such compact code (using comiss and cmova). And the not-equal...

[LLVMdev] Float compare-for-equality andselect optimizationopportunity

2008 May 27

1

[LLVMdev] Float compare-for-equality andselect optimizationopportunity

...ucomiss instruction (unordered compare and set flags). I only used IRBuilder::CreateFCmpOEQ. It also appears to invert the conditional, for no clear reason. I think it could be rewritten as follows: movss xmm0,dword ptr [ecx+4] comiss xmm0,dword ptr [ecx+8] mov edx,edi cmove edx,ecx cmove ecx,esi cmove esi,edi Compared to the original C syntax code this looks pretty straightforward. Curiously, when I replace the compare-for-equality with something like a less-than, it does generate such compact code (using comiss and cmova). And the not-equal...

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

2013 Sep 12

2

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

...en by LEA. 332: mov eax,0x7 337: mov rsi,rbx 33a: cpuid 33c: xchg rsi,rbx 33f: and esi,0x20 342: shr esi,0x5 345: lea rbp,[rip+0x0] # 34c <llvm::sys::getHostCPUName()+0xbc> 34c: lea r12,[rip+0x0] # 353 <llvm::sys::getHostCPUName()+0xc3> 353: cmove rbp,r12 357: lea rdi,[rsp+0x188] 35f: lea rsi,[rip+0x0] # 366 <llvm::sys::getHostCPUName()+0xd6> In both other cases (2) & (3) SI is saved into stack region. > I promise I'll do the review of your code after that. Thanks. Regards, -- Adam

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

2013 Sep 13

0

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

...; 33a: cpuid > 33c: xchg rsi,rbx > 33f: and esi,0x20 > 342: shr esi,0x5 > 345: lea rbp,[rip+0x0] # 34c > <llvm::sys::getHostCPUName()+0xbc> > 34c: lea r12,[rip+0x0] # 353 > <llvm::sys::getHostCPUName()+0xc3> > 353: cmove rbp,r12 > 357: lea rdi,[rsp+0x188] > 35f: lea rsi,[rip+0x0] # 366 > <llvm::sys::getHostCPUName()+0xd6> > > In both other cases (2) & (3) SI is saved into stack region. > > > I promise I'll do the review of your code after that. > > Tha...

Ignored branch predictor hints

2018 May 09

3

Ignored branch predictor hints

..."; else return "f"; } GCC correctly prefers the first case: b(int): mov eax, OFFSET FLAT:.LC0 test edi, edi jne .L7 ret But Clang seems to ignore _builtin_expect hints in this case. b(int): # @b(int) cmp edi, 1 mov eax, offset .L.str.1 mov ecx, offset .L.str.2 cmove rcx, rax test edi, edi mov eax, offset .L.str cmovne rax, rcx ret https://godbolt.org/g/tuAVT7 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180509/2e5eaa99/attachment.html>

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

2013 Sep 13

2

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

Actually there is no miscompile there as esi isn't needed. The flags are which the cmove is using. 342: shr esi,0x5 345: lea rbp,[rip+0x0] # 34c <llvm::sys::getHostCPUName()+0xbc> 34c: lea r12,[rip+0x0] # 353 <llvm::sys::getHostCPUName()+0xc3> 353: cmove rbp,r12 <- this is dependent on the flags from the shift. I think your real prob...

xen 2.0/2.0.1 reboots silently on via C3-cpu

2004 Dec 03

5

xen 2.0/2.0.1 reboots silently on via C3-cpu

On my lex mainboard with a Via C3 cpu, xen 2.0(.1) reboots (nearly) immediately without any outputs. The same setup/version boots without problems on a via epia M board with a Via C3-2 cpu (Nehemia), having a rather different chipset. Starting with option noreboot, nobiostables doesn''t change anything visible. Is this a known problem? Or just a knowingly unsupported chipset? Have you

[LLVMdev] Tight overlapping loops and performance

2009 Mar 03

3

[LLVMdev] Tight overlapping loops and performance

...m via llc: .text .align 4,0x90 .globl _main _main: subl $12, %esp movl $1999, %eax xorl %ecx, %ecx movl $1999, %edx .align 4,0x90 LBB1_1: ## loopto cmpl $1, %eax leal -1(%eax), %eax cmove %edx, %eax incl %ecx cmpl $999999999, %ecx jne LBB1_1 ## loopto LBB1_2: ## bb1 movl %eax, 4(%esp) movl $LC, (%esp) call _printf xorl %eax, %eax addl $12, %esp ret .section __TEXT,__cstring,cs...

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 19

3

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

...ce. > > I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted. > > > $ more llvm/lib/Target/X86//README-X86-64.txt > … > Are we better off using branches instead of cmove to implement FP to > unsigned i64? > > _conv: > ucomiss LC0(%rip), %xmm0 > cvttss2siq %xmm0, %rdx > jb L3 > subss LC0(%rip), %xmm0 > movabsq $-9223372036854775808, %rax > cvttss2siq %xmm0, %rdx >...

[LLVMdev] Tight overlapping loops and performance

2009 Mar 02

0

[LLVMdev] Tight overlapping loops and performance

On Mon, Mar 2, 2009 at 2:45 PM, Jonathan Turner <probata at hotmail.com> wrote: > For which version of gcc? I should mention I'm on OS X and using the LLVM > SVN. gcc 4.3. It's also possible this is processor-sensitive. >> First, try looking at the generated code... the code LLVM generates is >> probably not what you're expecting. I'm getting the

Recent -Os code size regressions

2015 Nov 21

2

Recent -Os code size regressions

...d of stores and branches. > > I know a backend change I made to ARM isn't behaving as well as it could, and I have patches to fix that. Speculatively reverting midend patches isn't the best way to approach this, in my opinion! :) > For i586, the effect of r252152 seems to cause cmoves instead of branches. Code size increase is +35% for i586. Unfortunately the object files are wildly different in a way that does not seem to occur in other workloads. I tried to clip a concise before and after case. Before : As a reference point, I found OR $0x408 and OR $0x810 in close pro...

Ignored branch predictor hints

2018 May 09

0

Ignored branch predictor hints

...x, OFFSET FLAT:.LC0 >> test edi, edi >> jne .L7 >> ret >> >> But Clang seems to ignore _builtin_expect hints in this case. >> >> b(int): # @b(int) >> cmp edi, 1 >> mov eax, offset .L.str.1 >> mov ecx, offset .L.str.2 >> cmove rcx, rax >> test edi, edi >> mov eax, offset .L.str >> cmovne rax, rcx >> ret >> >> https://godbolt.org/g/tuAVT7 >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org &g...

clang performing worse than gcc for this loop

2020 Aug 23

3

clang performing worse than gcc for this loop

...branches but executes more instructions. `perf` reports 32.76% front-end cycles idle with the clang code compared to 24.20% for gcc generated code. Clang generated code seems to perform worse in branch-miss and icache events (as reported by `perf`). But it is not clear why. Are the two back-to-back cmove instructions the reason? Any comments on this? [cid:image002.png at 01D67897.72235000] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200823/da7c3f2a/attachment-0001.html> -------------- next part ---...

Ignored branch predictor hints

2018 May 09

2

Ignored branch predictor hints

...mov eax, OFFSET FLAT:.LC0 >> test edi, edi >> jne .L7 >> ret >> >> But Clang seems to ignore _builtin_expect hints in this case. >> b(int): # @b(int) >> cmp edi, 1 >> mov eax, offset .L.str.1 >> mov ecx, offset .L.str.2 >> cmove rcx, rax >> test edi, edi >> mov eax, offset .L.str >> cmovne rax, rcx >> ret >> https://godbolt.org/g/tuAVT7 <https://godbolt.org/g/tuAVT7>_______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.ll...

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 20

4

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

...nt: Wednesday, April 19, 2017 10:14 AM To: Michael Clark <michaeljclark at mac.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long > Are we better off using branches instead of cmove to implement FP to unsigned i64? This seems like it was done for perf reason (mispredict). Conditional-to-cmov transformation should keep from introducing additional observable side-effects, and it's clear that whatever did this did not account for floating point exception. On Wed, Apr 19, 20...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...shr eax, 1 mov ecx, edx | # CF is set from the LSB of EAX xor ecx, 79764919 | sbb edx, edx test al, 1 | # EDX is 0xFFFFFFFF if CF set, else 0 mov eax, ecx | and edx, 79764919 cmove eax, edx | xor eax, edx add r8d, 1 jne .LBB1_5 add rdi, 1 test esi, esi jne .LBB1_4 not eax ret .LBB1_1: xor eax, eax ret JFTR: with -O2, the inner loop gets unrolled, using the same n...

[LLVMdev] Tight overlapping loops and performance

2009 Mar 02

3

[LLVMdev] Tight overlapping loops and performance

> Date: Mon, 2 Mar 2009 13:41:45 -0800 > From: eli.friedman at gmail.com > To: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Tight overlapping loops and performance > > Hmm, on my computer, I get around 2.5 seconds with both gcc -O3 and > llvm-gcc -O3 (using llvm-gcc from svn). Not sure what you're doing > differently; I wouldn't be surprised if it's

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...| # CF is set from the LSB of EAX >> xor ecx, 79764919 | sbb edx, edx >> test al, 1 | # EDX is 0xFFFFFFFF if CF set, >> else 0 >> mov eax, ecx | and edx, 79764919 >> cmove eax, edx | xor eax, edx >> add r8d, 1 >> jne .LBB1_5 >> add rdi, 1 >> test esi, esi >> jne .LBB1_4 >> not eax >> ret >> .LBB1_1: >> xor eax...

search for: cmove