thr3ads.net - search: "r11d"

[PATCH] Optimized assembler version of md5_process() for x86-64

2020 May 22

2

[PATCH] Optimized assembler version of md5_process() for x86-64

...' + # D is 'edx' + + cmp %rdi, %rsi # cmp end with ptr + je 1f # jmp if ptr == end + + # BEGIN of loop over 16-word blocks +2: # save old values of A, B, C, D + mov %eax, %r8d + mov %ebx, %r9d + mov %ecx, %r14d + mov %edx, %r15d + mov 0*4(%rsi), %r10d /* (NEXT STEP) X[0] */ + mov %edx, %r11d /* (NEXT STEP) z' = %edx */ + xor %ecx, %r11d /* y ^ ... */ + lea -680876936(%eax,%r10d),%eax /* Const + dst + ... */ + and %ebx, %r11d /* x & ... */ + xor %edx, %r11d /* z ^ ... */ + mov 1*4(%rsi),%r10d /* (NEXT STEP) X[1] */ + add %r11d, %eax /* dst += ... */ + rol $7, %eax /* dst <&lt...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...edi, 4 ; size_t call _calloc lea edx, [r15-1] movsxd r8, edx mov ecx, r15d add ecx, 0FFFFFFFEh js loc_100000DFA test r15d, r15d mov r11d, [rax+r8*4] jle loc_100000EAE mov ecx, r15d add ecx, 0FFFFFFFEh mov [rsp+48h+var_34], ecx movsxd rcx, ecx lea rcx, [rax+rcx*4] mov [rsp+48h+var_40], rcx...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...;> lea edx, [r15-1] >> movsxd r8, edx >> mov ecx, r15d >> add ecx, 0FFFFFFFEh >> js loc_100000DFA >> test r15d, r15d >> mov r11d, [rax+r8*4] >> jle loc_100000EAE >> mov ecx, r15d >> add ecx, 0FFFFFFFEh >> mov [rsp+48h+var_34], ecx >> movsxd rcx, ecx >> lea rcx, [rax+rcx*4] &...

[RFC] New pass: LoopExitValues

2015 Sep 01

2

[RFC] New pass: LoopExitValues

...lt; Size; ++Outer) for (int Inner = 0; Inner < Size; ++Inner) Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val; } With LoopExitValues ------------------------------- matrix_mul: testl %edi, %edi je .LBB0_5 xorl %r9d, %r9d xorl %r8d, %r8d .LBB0_2: xorl %r11d, %r11d .LBB0_3: movl %r9d, %r10d movl (%rdx,%r10,4), %eax imull %ecx, %eax movl %eax, (%rsi,%r10,4) incl %r11d incl %r9d cmpl %r11d, %edi jne .LBB0_3 incl %r8d cmpl %edi, %r8d jne .LBB0_2 .LBB0_5: retq Without LoopExitValues: ----------------------...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...gt;>> movsxd r8, edx >>>> mov ecx, r15d >>>> add ecx, 0FFFFFFFEh >>>> js loc_100000DFA >>>> test r15d, r15d >>>> mov r11d, [rax+r8*4] >>>> jle loc_100000EAE >>>> mov ecx, r15d >>>> add ecx, 0FFFFFFFEh >>>> mov [rsp+48h+var_34], ecx >>>> movsxd rcx, ecx >>&gt...

[RFC] New pass: LoopExitValues

2015 Aug 31

2

[RFC] New pass: LoopExitValues

Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the

Register Dataflow Analysis on X86

2019 Nov 08

2

Register Dataflow Analysis on X86

Do you know whether it has been fixed on the 8.0.1 release? Scott On Fri, Nov 8, 2019 at 9:45 AM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote: The one blocking issue that existed in the past has been fixed. I haven’t had time to do any work on it lately, but I’m not aware of any fundamental problems that would make it not work on x86. --

Register Dataflow Analysis on X86

2019 Dec 23

2

Register Dataflow Analysis on X86

Hi Scott, That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both. These separate defs are there because they reach disjoint registers. -- Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development From: Scott Douglas Constable <sdconsta at syr.edu> Sent: Monda...

[PATCH] promised MMX patches rc1

2005 Mar 23

3

[PATCH] promised MMX patches rc1

Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of

Register Dataflow Analysis on X86

2020 Jan 10

2

Register Dataflow Analysis on X86

...ints? Thanks, Scott On Mon, Dec 23, 2019 at 12:46 PM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote: Hi Scott, That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both. These separate defs are there because they reach disjoint registers. -- Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development From: Scott Douglas Constable <sdconsta at syr.edu<mailto:sdcon...

[LLVMdev] bug in X86 disasm code?

2013 Sep 12

1

[LLVMdev] bug in X86 disasm code?

...IT \ ENTRY(EAX) \ ENTRY(ECX) \ ENTRY(EDX) \ ENTRY(EBX) \ ENTRY(sib) \ ENTRY(EBP) \ ENTRY(ESI) \ ENTRY(EDI) \ ENTRY(R8D) \ ENTRY(R9D) \ ENTRY(R10D) \ ENTRY(R11D) \ ENTRY(R12D) \ ENTRY(R13D) \ ENTRY(R14D) \ ENTRY(R15D) the ENTRY(sib) looks suspicious. that should be ENTRY(ESP), no? thanks. J -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/l...

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 25

3

Tail call optimization is getting affected due to local function related optimization with IPRA

...as per regmaks collected by RegUsageInfoCollector pass. Function Name : bitrv2 Clobbered Registers: AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI ESP RAX RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B R9B R10B R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W R10W R11W R12W R13W R14W R15W How ever caller of bitrv2, makewt has callee saved registers as per CC, but this code results in segmentation fault when compliled with O1 because makewt has value of *ip in R14 register and that is stored and restore by makewt at begining...

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 25

0

Tail call optimization is getting affected due to local function related optimization with IPRA

...oCollector pass. > > Function Name : bitrv2 > Clobbered Registers: > AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI > ESP RAX > RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B > R9B R10B > R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W > R10W R11W > R12W R13W R14W R15W > > How ever caller of bitrv2, makewt has callee saved registers as per CC, > but this > code results in segmentation fault when compliled with O1 because makewt > has value > of *ip in R14 register and that is st...

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 26

3

Tail call optimization is getting affected due to local function related optimization with IPRA

...t; Function Name : bitrv2 >> Clobbered Registers: >> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI >> ESP RAX >> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B >> R9B R10B >> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W >> R10W R11W >> R12W R13W R14W R15W >> >> How ever caller of bitrv2, makewt has callee saved registers as per CC, >> but this >> code results in segmentation fault when compliled with O1 because makewt >> has value >> of *i...

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

2

Tail call optimization is getting affected due to local function related optimization with IPRA

...gUsageInfoCollector pass. > > Function Name : bitrv2 > Clobbered Registers: > AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI ESP RAX > RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B R9B R10B > R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W R10W R11W > R12W R13W R14W R15W > > How ever caller of bitrv2, makewt has callee saved registers as per CC, but this > code results in segmentation fault when compliled with O1 because makewt has value > of *ip in R14 register and that is stored and resto...

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

...# Child Loop BB0_2 Depth 2 >>>>> # Child Loop BB0_3 Depth >>>>> 3 >>>>> # Child Loop BB0_5 Depth >>>>> 3 >>>>> xor r11d, r11d >>>>> .p2align 4, 0x90 >>>>> .LBB0_2: # %.preheader >>>>> # Parent Loop BB0_1 Depth=1 >>>>> # => This Loop Header: D...

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 27

0

Tail call optimization is getting affected due to local function related optimization with IPRA

...2 >>> Clobbered Registers: >>> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI >>> ESP RAX >>> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B >>> R9B R10B >>> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W >>> R10W R11W >>> R12W R13W R14W R15W >>> >>> How ever caller of bitrv2, makewt has callee saved registers as per CC, >>> but this >>> code results in segmentation fault when compliled with O1 because makewt >>&...

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

0

Tail call optimization is getting affected due to local function related optimization with IPRA

...ered Registers: >>>> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI >>>> ESP RAX >>>> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B >>>> R9B R10B >>>> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W >>>> R10W R11W >>>> R12W R13W R14W R15W >>>> >>>> How ever caller of bitrv2, makewt has callee saved registers as per CC, >>>> but this >>>> code results in segmentation fault when compliled with O1...

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

2

Tail call optimization is getting affected due to local function related optimization with IPRA

...> Clobbered Registers: >>>>>> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI ESP RAX >>>>>> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B R9B R10B >>>>>> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W R10W R11W >>>>>> R12W R13W R14W R15W >>>>>> >>>>>> How ever caller of bitrv2, makewt has callee saved registers as per CC, but this >>>>>> code results in segmentation fault when compliled with O1 b...

[Bridge] bridge at start up

2007 Apr 18

1

[Bridge] bridge at start up

...x0000000000000004 <invoke+4>: callq *%esi > 0x0000000000000006 <invoke+6>: add $0x8,%rsp > 0x000000000000000a <invoke+10>: retq > > gcc-3.4.4: > 0x0000000000000000 <invoke+0>: mov %rsi,%r11 > 0x0000000000000003 <invoke+3>: jmpq *%r11d > > Regards > Patrick > > > ------------------------------ > > Message: 3 > Date: Thu, 27 Jan 2005 15:24:50 -0800 > From: "David S. Miller" <davem@davemloft.net> > Subject: [Bridge] Re: [PATCH/RFC] Reduce call chain length in > ne...

search for: r11d