thr3ads.net - search: "r8d"

[GE users] Apple Leopard has dtrace -- anyone used the SGE probes/scripts yet?

2007 Nov 14

10

[GE users] Apple Leopard has dtrace -- anyone used the SGE probes/scripts yet?

Hi, Chris (cc) and I try to get the SGE master monitor work with Apple Leopard dtrace. Unfortunately we are stuck with the error msg below. Anyone having an idea what could be the cause? What I can rule out as cause is function inlining for the reasons explained below. Background information on SGE master monitor implementation is under http://wiki.gridengine.info/wiki/index.php/Dtrace

[RFC] New pass: LoopExitValues

2015 Sep 01

2

[RFC] New pass: LoopExitValues

...for (int Outer = 0; Outer < Size; ++Outer) for (int Inner = 0; Inner < Size; ++Inner) Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val; } With LoopExitValues ------------------------------- matrix_mul: testl %edi, %edi je .LBB0_5 xorl %r9d, %r9d xorl %r8d, %r8d .LBB0_2: xorl %r11d, %r11d .LBB0_3: movl %r9d, %r10d movl (%rdx,%r10,4), %eax imull %ecx, %eax movl %eax, (%rsi,%r10,4) incl %r11d incl %r9d cmpl %r11d, %edi jne .LBB0_3 incl %r8d cmpl %edi, %r8d jne .LBB0_2 .LBB0_5: retq Without LoopExit...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

....LBB0_2: # =>This Loop Header: Depth=1 add esi, -1 movzx edx, byte ptr [rdi] shl edx, 24 xor edx, eax mov ecx, -8 mov eax, edx .LBB0_3: # Parent Loop BB0_2 Depth=1 | # 4 instructions instead of 6, r8 not clobbered! lea r8d, [rax + rax] | add eax, eax mov edx, r8d | # CF is set from the MSB of EAX xor edx, -306674912 | sbb edx, edx test eax, eax | # EDX is 0xFFFFFFFF if CF set, else 0 mov eax, edx | a...

A thought to improve IPRA

2016 Jul 29

2

A thought to improve IPRA

...this working: > 1 ) While generating CFI for such function it requires to map Dwarf > register to LLVM register and even if we force LLVM to use Dwarf > register number for CFI then also it will be wrong for some register > for which currently we don't have such mapping for example R8D > register on X86 (when dealing with actual register usage info we may > have such case where R8D is being used) > To fix this I tried to filter the functions which will be optimized > by putting a constraints that it should have attribute NoUnwind but > that does not help. Is it pos...

A thought to improve IPRA

2016 Jul 29

0

A thought to improve IPRA

...> 1 ) While generating CFI for such function it requires to map Dwarf > > register to LLVM register and even if we force LLVM to use Dwarf > > register number for CFI then also it will be wrong for some register > > for which currently we don't have such mapping for example R8D > > register on X86 (when dealing with actual register usage info we may > > have such case where R8D is being used) > > To fix this I tried to filter the functions which will be optimized > > by putting a constraints that it should have attribute NoUnwind but > > that...

[RFC] New pass: LoopExitValues

2015 Aug 31

2

[RFC] New pass: LoopExitValues

Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the

A thought to improve IPRA

2016 Jul 28

0

A thought to improve IPRA

...few problems to get this working: 1 ) While generating CFI for such function it requires to map Dwarf register to LLVM register and even if we force LLVM to use Dwarf register number for CFI then also it will be wrong for some register for which currently we don't have such mapping for example R8D register on X86 (when dealing with actual register usage info we may have such case where R8D is being used) To fix this I tried to filter the functions which will be optimized by putting a constraints that it should have attribute NoUnwind but that does not help. Is it possible to disable CFI gene...

[LLVMdev] Strange behaviour with x86-64 windows, bad call instruction address

2012 Feb 14

0

[LLVMdev] Strange behaviour with x86-64 windows, bad call instruction address

...ne that is failing has a 64 bit address, as indicated in the snippet below: 000007FFFFC511D7 pop rbp 000007FFFFC511D8 ret 000007FFFFC511D9 sub rsp,20h 000007FFFFC511DD mov rcx,qword ptr [rbp-70h] 000007FFFFC511E1 mov edx,0FFFFFFFEh 000007FFFFC511E6 xor r8d,r8d 000007FFFFC511E9 call rsi 000007FFFFC511EB add rsp,20h 000007FFFFC511EF test al,1 000007FFFFC511F2 je 000007FFFFC511C3 000007FFFFC511F8 sub rsp,20h 000007FFFFC511FC mov rax,7FFFFC30030h 000007FFFFC51206 mov rcx,rdi 000007FFFFC51209...

A thought to improve IPRA

2016 Jul 29

2

A thought to improve IPRA

...map > > > Dwarf > > > > register to LLVM register and even if we force LLVM to use Dwarf > > > > register number for CFI then also it will be wrong for some > > > register > > > > for which currently we don't have such mapping for example R8D > > > > register on X86 (when dealing with actual register usage info we > > > may > > > > have such case where R8D is being used) > > > > To fix this I tried to filter the functions which will be > > > optimized > > > > by putti...

A thought to improve IPRA

2016 Jul 28

1

A thought to improve IPRA

...roblems to get this working: > 1 ) While generating CFI for such function it requires to map Dwarf register to LLVM register and even if we force LLVM to use Dwarf register number for CFI then also it will be wrong for some register for which currently we don't have such mapping for example R8D register on X86 (when dealing with actual register usage info we may have such case where R8D is being used) > To fix this I tried to filter the functions which will be optimized by putting a constraints that it should have attribute NoUnwind but that does not help. Is it possible to disable CFI...

A thought to improve IPRA

2016 Jul 08

3

A thought to improve IPRA

On Sat, Jul 9, 2016 at 12:18 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > On Jul 8, 2016, at 11:41 AM, vivek pandya <vivekvpandya at gmail.com> wrote: > > > > On Fri, Jul 8, 2016 at 11:46 PM, Mehdi Amini <mehdi.amini at apple.com> > wrote: > >> >> On Jul 8, 2016, at 11:12 AM, vivek pandya <vivekvpandya at gmail.com> wrote:

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...t; movzx edx, byte ptr [rdi] >> shl edx, 24 >> xor edx, eax >> mov ecx, -8 >> mov eax, edx >> .LBB0_3: # Parent Loop BB0_2 Depth=1 | # 4 instructions instead of 6, r8 >> not clobbered! >> lea r8d, [rax + rax] | add eax, eax >> mov edx, r8d | # CF is set from the MSB of EAX >> xor edx, -306674912 | sbb edx, edx >> test eax, eax | # EDX is 0xFFFFFFFF if CF set, >> else 0 >>...

A thought to improve IPRA

2016 Aug 05

2

A thought to improve IPRA

...> 1 ) While generating CFI for such function it requires to map Dwarf > > register to LLVM register and even if we force LLVM to use Dwarf > > register number for CFI then also it will be wrong for some register > > for which currently we don't have such mapping for example R8D > > register on X86 (when dealing with actual register usage info we may > > have such case where R8D is being used) > > To fix this I tried to filter the functions which will be optimized > > by putting a constraints that it should have attribute NoUnwind but > > that...

A thought to improve IPRA

2016 Aug 16

2

A thought to improve IPRA

...ch function it requires to map Dwarf >>>> > register to LLVM register and even if we force LLVM to use Dwarf >>>> > register number for CFI then also it will be wrong for some register >>>> > for which currently we don't have such mapping for example R8D >>>> > register on X86 (when dealing with actual register usage info we may >>>> > have such case where R8D is being used) >>>> > To fix this I tried to filter the functions which will be optimized >>>> > by putting a constraints that it s...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

..._100000D30 ; --------------------------------------------------------------------------- loc_100000DFA: ; CODE XREF: _main+5E j mov ecx, [rax+r8*4] lea r9d, [rcx+1] mov [rax+r8*4], r9d cmp ecx, r8d jge loc_100000F0E lea r12, [rax+4] xor r14d, r14d db 2Eh nop word ptr [rax+rax+00000000h] loc_100000E20: ; CODE XREF: _main+216 j test r15d, r15d...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...----------------------------------- >> >> loc_100000DFA: ; CODE XREF: _main+5E j >> mov ecx, [rax+r8*4] >> lea r9d, [rcx+1] >> mov [rax+r8*4], r9d >> cmp ecx, r8d >> jge loc_100000F0E >> lea r12, [rax+4] >> xor r14d, r14d >> db 2Eh >> nop word ptr [rax+rax+00000000h] >> >> loc_100000E20: ; COD...

[PATCH] promised MMX patches rc1

2005 Mar 23

3

[PATCH] promised MMX patches rc1

Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...> >>>> loc_100000DFA: ; CODE XREF: _main+5E j >>>> mov ecx, [rax+r8*4] >>>> lea r9d, [rcx+1] >>>> mov [rax+r8*4], r9d >>>> cmp ecx, r8d >>>> jge loc_100000F0E >>>> lea r12, [rax+4] >>>> xor r14d, r14d >>>> db 2Eh >>>> nop word ptr [rax+rax+00000000h] >>>> >&g...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 28

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...dx, 24 >> >> xor edx, eax >> >> mov ecx, -8 >> >> mov eax, edx >> >> .LBB0_3: # Parent Loop BB0_2 Depth=1 | # 4 instructions instead of >> 6, r8 >> >> not clobbered! >> >> lea r8d, [rax + rax] | add eax, eax >> >> mov edx, r8d | # CF is set from the MSB of >> EAX >> >> xor edx, -306674912 | sbb edx, edx >> >> test eax, eax | # EDX is 0xFFFFFFFF i...

jbd/kjournald oops on 2.6.30.1

2009 Sep 23

0

jbd/kjournald oops on 2.6.30.1

...e_journal_head+6>: push %rbx 0xffffffff8037b767 <__journal_remove_journal_head+7>: callq 0xffffffff8020bcc0 <mcount> 0xffffffff8037b76c <__journal_remove_journal_head+12>: mov 0x40(%rdi),%rbx 0xffffffff8037b770 <__journal_remove_journal_head+16>: mov 0x8(%rbx),%r8d <====== Oops 0xffffffff8037b774 <__journal_remove_journal_head+20>: test %r8d,%r8d 0xffffffff8037b777 <__journal_remove_journal_head+23>: js 0xffffffff8037b86d <__journal_remove_journal_head+269> 0xffffffff8037b77d <__journal_remove_journal_head+29>: lock incl 0x...

search for: r8d