Displaying 20 results from an estimated 25 matches for "r11d".
Did you mean:
r11
2020 May 22
2
[PATCH] Optimized assembler version of md5_process() for x86-64
...'
+ # D is 'edx'
+
+ cmp %rdi, %rsi # cmp end with ptr
+ je 1f # jmp if ptr == end
+
+ # BEGIN of loop over 16-word blocks
+2: # save old values of A, B, C, D
+ mov %eax, %r8d
+ mov %ebx, %r9d
+ mov %ecx, %r14d
+ mov %edx, %r15d
+ mov 0*4(%rsi), %r10d /* (NEXT STEP) X[0] */
+ mov %edx, %r11d /* (NEXT STEP) z' = %edx */
+ xor %ecx, %r11d /* y ^ ... */
+ lea -680876936(%eax,%r10d),%eax /* Const + dst + ... */
+ and %ebx, %r11d /* x & ... */
+ xor %edx, %r11d /* z ^ ... */
+ mov 1*4(%rsi),%r10d /* (NEXT STEP) X[1] */
+ add %r11d, %eax /* dst += ... */
+ rol $7, %eax /* dst <<...
2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
...edi, 4 ; size_t
call _calloc
lea edx, [r15-1]
movsxd r8, edx
mov ecx, r15d
add ecx, 0FFFFFFFEh
js loc_100000DFA
test r15d, r15d
mov r11d, [rax+r8*4]
jle loc_100000EAE
mov ecx, r15d
add ecx, 0FFFFFFFEh
mov [rsp+48h+var_34], ecx
movsxd rcx, ecx
lea rcx, [rax+rcx*4]
mov [rsp+48h+var_40], rcx...
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
...;> lea edx, [r15-1]
>> movsxd r8, edx
>> mov ecx, r15d
>> add ecx, 0FFFFFFFEh
>> js loc_100000DFA
>> test r15d, r15d
>> mov r11d, [rax+r8*4]
>> jle loc_100000EAE
>> mov ecx, r15d
>> add ecx, 0FFFFFFFEh
>> mov [rsp+48h+var_34], ecx
>> movsxd rcx, ecx
>> lea rcx, [rax+rcx*4]
&...
2015 Sep 01
2
[RFC] New pass: LoopExitValues
...lt; Size; ++Outer)
for (int Inner = 0; Inner < Size; ++Inner)
Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val;
}
With LoopExitValues
-------------------------------
matrix_mul:
testl %edi, %edi
je .LBB0_5
xorl %r9d, %r9d
xorl %r8d, %r8d
.LBB0_2:
xorl %r11d, %r11d
.LBB0_3:
movl %r9d, %r10d
movl (%rdx,%r10,4), %eax
imull %ecx, %eax
movl %eax, (%rsi,%r10,4)
incl %r11d
incl %r9d
cmpl %r11d, %edi
jne .LBB0_3
incl %r8d
cmpl %edi, %r8d
jne .LBB0_2
.LBB0_5:
retq
Without LoopExitValues:
----------------------...
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
...gt;>> movsxd r8, edx
>>>> mov ecx, r15d
>>>> add ecx, 0FFFFFFFEh
>>>> js loc_100000DFA
>>>> test r15d, r15d
>>>> mov r11d, [rax+r8*4]
>>>> jle loc_100000EAE
>>>> mov ecx, r15d
>>>> add ecx, 0FFFFFFFEh
>>>> mov [rsp+48h+var_34], ecx
>>>> movsxd rcx, ecx
>>>...
2015 Aug 31
2
[RFC] New pass: LoopExitValues
Hello LLVM,
This is a proposal for a new pass that improves performance and code
size in some nested loop situations. The pass is target independent.
>From the description in the file header:
This optimization finds loop exit values reevaluated after the loop
execution and replaces them by the corresponding exit values if they
are available. Such sequences can arise after the
2019 Nov 08
2
Register Dataflow Analysis on X86
Do you know whether it has been fixed on the 8.0.1 release?
Scott
On Fri, Nov 8, 2019 at 9:45 AM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote:
The one blocking issue that existed in the past has been fixed. I haven’t had time to do any work on it lately, but I’m not aware of any fundamental problems that would make it not work on x86.
--
2019 Dec 23
2
Register Dataflow Analysis on X86
Hi Scott,
That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both.
These separate defs are there because they reach disjoint registers.
--
Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development
From: Scott Douglas Constable <sdconsta at syr.edu>
Sent: Monda...
2005 Mar 23
3
[PATCH] promised MMX patches rc1
Hello,
Here is my first speedup patch. Like 10-11%. No IDCT yet.
Please feel free to comment my code or even better think about
improvements. :) I belive my routines are not so bad, maybe
one day they will be even more faster.
What needs to be optimized is the loop filter fuction. I have
no ideas now how to do it. It does not leave much space for parallel
stuff, copying memory from lot of
2020 Jan 10
2
Register Dataflow Analysis on X86
...ints?
Thanks,
Scott
On Mon, Dec 23, 2019 at 12:46 PM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote:
Hi Scott,
That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both.
These separate defs are there because they reach disjoint registers.
--
Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development
From: Scott Douglas Constable <sdconsta at syr.edu<mailto:sdcon...
2013 Sep 12
1
[LLVMdev] bug in X86 disasm code?
...IT \
ENTRY(EAX) \
ENTRY(ECX) \
ENTRY(EDX) \
ENTRY(EBX) \
ENTRY(sib) \
ENTRY(EBP) \
ENTRY(ESI) \
ENTRY(EDI) \
ENTRY(R8D) \
ENTRY(R9D) \
ENTRY(R10D) \
ENTRY(R11D) \
ENTRY(R12D) \
ENTRY(R13D) \
ENTRY(R14D) \
ENTRY(R15D)
the ENTRY(sib) looks suspicious. that should be ENTRY(ESP), no?
thanks.
J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/l...
2016 Jun 25
3
Tail call optimization is getting affected due to local function related optimization with IPRA
...as
per regmaks collected by RegUsageInfoCollector pass.
Function Name : bitrv2
Clobbered Registers:
AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI ESP
RAX
RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B R9B
R10B
R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W R10W
R11W
R12W R13W R14W R15W
How ever caller of bitrv2, makewt has callee saved registers as per CC, but
this
code results in segmentation fault when compliled with O1 because makewt
has value
of *ip in R14 register and that is stored and restore by makewt at begining...
2016 Jun 25
0
Tail call optimization is getting affected due to local function related optimization with IPRA
...oCollector pass.
>
> Function Name : bitrv2
> Clobbered Registers:
> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI
> ESP RAX
> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B
> R9B R10B
> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W
> R10W R11W
> R12W R13W R14W R15W
>
> How ever caller of bitrv2, makewt has callee saved registers as per CC,
> but this
> code results in segmentation fault when compliled with O1 because makewt
> has value
> of *ip in R14 register and that is st...
2016 Jun 26
3
Tail call optimization is getting affected due to local function related optimization with IPRA
...t; Function Name : bitrv2
>> Clobbered Registers:
>> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI
>> ESP RAX
>> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B
>> R9B R10B
>> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W
>> R10W R11W
>> R12W R13W R14W R15W
>>
>> How ever caller of bitrv2, makewt has callee saved registers as per CC,
>> but this
>> code results in segmentation fault when compliled with O1 because makewt
>> has value
>> of *i...
2016 Jun 28
2
Tail call optimization is getting affected due to local function related optimization with IPRA
...gUsageInfoCollector pass.
>
> Function Name : bitrv2
> Clobbered Registers:
> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI ESP RAX
> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B R9B R10B
> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W R10W R11W
> R12W R13W R14W R15W
>
> How ever caller of bitrv2, makewt has callee saved registers as per CC, but this
> code results in segmentation fault when compliled with O1 because makewt has value
> of *ip in R14 register and that is stored and resto...
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
...# Child Loop BB0_2 Depth 2
>>>>> # Child Loop BB0_3 Depth
>>>>> 3
>>>>> # Child Loop BB0_5 Depth
>>>>> 3
>>>>> xor r11d, r11d
>>>>> .p2align 4, 0x90
>>>>> .LBB0_2: # %.preheader
>>>>> # Parent Loop BB0_1 Depth=1
>>>>> # => This Loop Header: D...
2016 Jun 27
0
Tail call optimization is getting affected due to local function related optimization with IPRA
...2
>>> Clobbered Registers:
>>> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI
>>> ESP RAX
>>> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B
>>> R9B R10B
>>> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W
>>> R10W R11W
>>> R12W R13W R14W R15W
>>>
>>> How ever caller of bitrv2, makewt has callee saved registers as per CC,
>>> but this
>>> code results in segmentation fault when compliled with O1 because makewt
>>&...
2016 Jun 28
0
Tail call optimization is getting affected due to local function related optimization with IPRA
...ered Registers:
>>>> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI
>>>> ESP RAX
>>>> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B
>>>> R9B R10B
>>>> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W
>>>> R10W R11W
>>>> R12W R13W R14W R15W
>>>>
>>>> How ever caller of bitrv2, makewt has callee saved registers as per CC,
>>>> but this
>>>> code results in segmentation fault when compliled with O1...
2016 Jun 28
2
Tail call optimization is getting affected due to local function related optimization with IPRA
...> Clobbered Registers:
>>>>>> AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI ESP RAX
>>>>>> RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B R9B R10B
>>>>>> R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W R10W R11W
>>>>>> R12W R13W R14W R15W
>>>>>>
>>>>>> How ever caller of bitrv2, makewt has callee saved registers as per CC, but this
>>>>>> code results in segmentation fault when compliled with O1 b...
2007 Apr 18
1
[Bridge] bridge at start up
...x0000000000000004 <invoke+4>: callq *%esi
> 0x0000000000000006 <invoke+6>: add $0x8,%rsp
> 0x000000000000000a <invoke+10>: retq
>
> gcc-3.4.4:
> 0x0000000000000000 <invoke+0>: mov %rsi,%r11
> 0x0000000000000003 <invoke+3>: jmpq *%r11d
>
> Regards
> Patrick
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 27 Jan 2005 15:24:50 -0800
> From: "David S. Miller" <davem@davemloft.net>
> Subject: [Bridge] Re: [PATCH/RFC] Reduce call chain length in
> ne...