thr3ads.net - search: "r15d"

Help required regarding IPRA and Local Function optimization

2016 Jun 30

4

Help required regarding IPRA and Local Function optimization

...rge function with recursion and object oriented code so I am not able to find a pattern which is causing failure. So I tried following simple case to understand expected behavior from this optimization. Consider following code : define void @bar() #0 { call void asm sideeffect "movl %ecx, %r15d", "~{r15}"() #0 call void @foo() call void asm sideeffect "movl %r15d, %ebx", "~{rbx}"() #0 ret void } define internal void @foo() #0 { call void asm sideeffect "movl %r14d, %r15d", "~{r15}"() #0 ret void } and its generated assembl...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...ov rsi, offset __mh_execute_header add rsi, rax sar rsi, 20h ; size_t mov edi, 4 ; size_t call _calloc lea edx, [r15-1] movsxd r8, edx mov ecx, r15d add ecx, 0FFFFFFFEh js loc_100000DFA test r15d, r15d mov r11d, [rax+r8*4] jle loc_100000EAE mov ecx, r15d add ecx, 0FFFFFFFEh mov [rsp+48h+...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...add rsi, rax >> sar rsi, 20h ; size_t >> mov edi, 4 ; size_t >> call _calloc >> lea edx, [r15-1] >> movsxd r8, edx >> mov ecx, r15d >> add ecx, 0FFFFFFFEh >> js loc_100000DFA >> test r15d, r15d >> mov r11d, [rax+r8*4] >> jle loc_100000EAE >> mov ecx, r15d >>...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...sar rsi, 20h ; size_t >>>> mov edi, 4 ; size_t >>>> call _calloc >>>> lea edx, [r15-1] >>>> movsxd r8, edx >>>> mov ecx, r15d >>>> add ecx, 0FFFFFFFEh >>>> js loc_100000DFA >>>> test r15d, r15d >>>> mov r11d, [rax+r8*4] >>>> jle loc_100000EAE >>>>...

Help required regarding IPRA and Local Function optimization

2016 Jun 30

0

Help required regarding IPRA and Local Function optimization

...19508 <+8>: pushq %r13 0x10001950a <+10>: pushq %r12 0x10001950c <+12>: pushq %rbx 0x10001950d <+13>: pushq %rax 0x10001950e <+14>: movl %ecx, %r12d 0x100019511 <+17>: movl %edx, %r13d 0x100019514 <+20>: movl %esi, %r15d 0x100019517 <+23>: movq %rdi, %rbx -> 0x10001951a <+26>: movl 0x18(%rbx), %r14d Please correct me if any thing is wrong and also please provide some help. -Vivek 2016-06-30 14:21 GMT+05:30 vivek pandya <vivekvpandya at gmail.com>: > Hello Mentors, > > I...

IPRA, interprocedural register allocation, question

2016 Jul 06

3

IPRA, interprocedural register allocation, question

...orrect register mask (and that also means that skipping clobbers list while IPRA enabled may broke executable) For example in following code: int gcd( int a, int b ) { int result ; /* Compute Greatest Common Divisor using Euclid's Algorithm */ __asm__ __volatile__ ( "movl %1, %%r15d;" "movl %2, %%ecx;" "CONTD: cmpl $0, %%ecx;" "je DONE;" "xorl %%r13d, %%r13d;" "idivl %%ecx;"...

IPRA, interprocedural register allocation, question

2016 Jul 08

2

IPRA, interprocedural register allocation, question

...clobbers list while IPRA enabled may broke > executable) > > For example in following code: > > int gcd( int a, int b ) { > > int result ; > > /* Compute Greatest Common Divisor using Euclid's Algorithm */ > > __asm__ __volatile__ ( "movl %1, %%r15d;" > > "movl %2, %%ecx;" > > "CONTD: cmpl $0, %%ecx;" > > "je DONE;" > > "xorl %%r13d, %%r13d;" > >...

What does a dead register mean?

2018 Feb 06

3

What does a dead register mean?

...see the following sequence: ADJCALLSTACKDOWN64 0, 0, 0, *implicit-def dead %rsp*, implicit-def dead %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp CALL64pcrel32 @foo, <regmask %bh %bl %bp %bpl %bx %ebp %ebx %rbp %rbx %r12 %r13 %r14 %r15 %r12b %r13b %r14b %r15b %r12d %r13d %r14d %r15d %r12w %r13w %r14w %r15w>, *implicit %rsp*, implicit %ssp, implicit-def %rsp, implicit-def %ssp ADJCALLSTACKUP64 0, 0, implicit-def dead %rsp, implicit-def dead %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp RET 0 The ADJCALLSTACKDOWN64 has implicit-def dead %rsp. However the nex...

IPRA, interprocedural register allocation, question

2016 Jul 09

3

IPRA, interprocedural register allocation, question

...orrect register mask (and that also means that skipping clobbers list while IPRA enabled may broke executable) For example in following code: int gcd( int a, int b ) { int result ; /* Compute Greatest Common Divisor using Euclid's Algorithm */ __asm__ __volatile__ ( "movl %1, %%r15d;" "movl %2, %%ecx;" "CONTD: cmpl $0, %%ecx;" "je DONE;" "xorl %%r13d, %%r13d;" "idivl %%ecx;"...

[LLVMdev] bug in X86 disasm code?

2013 Sep 12

1

[LLVMdev] bug in X86 disasm code?

...\ ENTRY(sib) \ ENTRY(EBP) \ ENTRY(ESI) \ ENTRY(EDI) \ ENTRY(R8D) \ ENTRY(R9D) \ ENTRY(R10D) \ ENTRY(R11D) \ ENTRY(R12D) \ ENTRY(R13D) \ ENTRY(R14D) \ ENTRY(R15D) the ENTRY(sib) looks suspicious. that should be ENTRY(ESP), no? thanks. J -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130912/a2bd068c/attachment.html>

[RFC] Loading Bitfields with Smallest Needed Types

2020 May 26

6

[RFC] Loading Bitfields with Smallest Needed Types

...; ... } Right before the "same_flow = ... ->same_flow;" statement is executed, a store is made to the bitfield at the end of a called function: NAPI_GRO_CB(skb)->same_flow = 1; The store is a byte: orb $0x1,0x4a(%rbx) while the read is a word: movzwl 0x4a(%r12),%r15d The problem is that between the store and the load the value hasn't been retired / placed in the cache. One would expect store-to-load forwarding to kick in, but on x86 that doesn't happen because x86 requires the store to be of equal or greater size than the load. So instead the load take...

llvm-stress crash

2017 Mar 14

3

llvm-stress crash

...fi#10>, 0, %noreg; mem:ST16[FixedStack10](align=8) GR128Bit:%vreg265 -> regalloc %R5L<def> = LLCRMux %R6L, %R4Q<imp-def> ST128 %R4Q<kill>, <fi#10>, 0, %noreg; mem:ST16[FixedStack10](align=8) -> pseudo expansion %R5L<def> = LLCR %R6L STG %R4D<kill>, %R15D, 200, %noreg; mem:ST16[FixedStack7](align=8) STG %R5D<kill>, %R15D, 208, %noreg; mem:ST16[FixedStack7](align=8) *** Bad machine code: Using an undefined physical register *** - function: autogen_SD29355 - basic block: BB#19 CF257 (0x4cb6b00) - instruction: STG - operand 0: %R4D<kill&...

IPRA, interprocedural register allocation, question

2016 Jul 12

2

IPRA, interprocedural register allocation, question

...orrect register mask (and that also means that skipping clobbers list while IPRA enabled may broke executable) For example in following code: int gcd( int a, int b ) { int result ; /* Compute Greatest Common Divisor using Euclid's Algorithm */ __asm__ __volatile__ ( "movl %1, %%r15d;" "movl %2, %%ecx;" "CONTD: cmpl $0, %%ecx;" "je DONE;" "xorl %%r13d, %%r13d;" "idivl %%ecx;"...

[cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

2020 May 27

2

[cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

...McCall via cfe-dev < cfe-dev at lists.llvm.org> wrote: > On 26 May 2020, at 18:28, Bill Wendling via llvm-dev wrote: > > [...] The store is a byte: > > > > orb $0x1,0x4a(%rbx) > > > > while the read is a word: > > > > movzwl 0x4a(%r12),%r15d > > > > The problem is that between the store and the load the value hasn't > > been retired / placed in the cache. One would expect store-to-load > > forwarding to kick in, but on x86 that doesn't happen because x86 > > requires the store to be of equal or gre...

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 25

3

Tail call optimization is getting affected due to local function related optimization with IPRA

...cted by RegUsageInfoCollector pass. Function Name : bitrv2 Clobbered Registers: AH AL AX BH BL BP BPL BX CH CL CX DI DIL EAX EBP EBX ECX EDI EFLAGS ESI ESP RAX RBP RBX RCX RDI RSI RSP SI SIL SP SPL R8 R9 R10 R11 R12 R13 R14 R15 R8B R9B R10B R11B R12B R13B R14B R15B R8D R9D R10D R11D R12D R13D R14D R15D R8W R9W R10W R11W R12W R13W R14W R15W How ever caller of bitrv2, makewt has callee saved registers as per CC, but this code results in segmentation fault when compliled with O1 because makewt has value of *ip in R14 register and that is stored and restore by makewt at begining of call but due to t...

What does a dead register mean?

2018 Feb 06

0

What does a dead register mean?

...: > > ADJCALLSTACKDOWN64 0, 0, 0, *implicit-def dead %rsp*, implicit-def dead > %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp > CALL64pcrel32 @foo, <regmask %bh %bl %bp %bpl %bx %ebp %ebx %rbp %rbx > %r12 %r13 %r14 %r15 %r12b %r13b %r14b %r15b %r12d %r13d %r14d %r15d > %r12w %r13w %r14w %r15w>, *implicit %rsp*, implicit %ssp, implicit-def > %rsp, implicit-def %ssp > ADJCALLSTACKUP64 0, 0, implicit-def dead %rsp, implicit-def dead > %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp > RET 0 > > > The ADJCALLSTACKDOWN64...

[Bug 93557] New: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a

2016 Jan 02

13

[Bug 93557] New: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a

https://bugs.freedesktop.org/show_bug.cgi?id=93557 Bug ID: 93557 Summary: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a Product: xorg Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: blocker

IPRA, interprocedural register allocation, question

2016 Jul 12

3

IPRA, interprocedural register allocation, question

...orrect register mask (and that also means that skipping clobbers list while IPRA enabled may broke executable) For example in following code: int gcd( int a, int b ) { int result ; /* Compute Greatest Common Divisor using Euclid's Algorithm */ __asm__ __volatile__ ( "movl %1, %%r15d;" "movl %2, %%ecx;" "CONTD: cmpl $0, %%ecx;" "je DONE;" "xorl %%r13d, %%r13d;" "idivl %%ecx;"...

Finding caller-saved registers at a function call site

2016 Jun 27

3

Finding caller-saved registers at a function call site

...storage location rbp - 0x8) is used in the addition to calculate the returned value. However, when I print the RegMask operand for the call machine instruction, I get the following: <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W> I don't see xmm1 as being preserved across this call. Am I missing something? Thanks for your help! On Wed, Jun 22, 2016 at 5:01 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > Hi Rob, > > Rob Lyerly via llvm-dev wrote: > > I'...

[LLVMdev] Apparent optimizer bug on X86_64

2011 Mar 19

2

[LLVMdev] Apparent optimizer bug on X86_64

...code: 1300 /*-----------------------------. 1301 | yyreduce -- Do a reduction. | 1302 `-----------------------------*/ 1303 yyreduce: 1304 /* yyn is the number of a rule to reduce with. */ 1305 yylen = yyr2[yyn]; 0x0000000000400c14 <rpcalc_parse+628>: mov r15d,r14d 0x0000000000400c17 <rpcalc_parse+631>: movzx r12d,BYTE PTR [r15+0x4015e2] 0x0000000000400c1f <rpcalc_parse+639>: mov eax,0x1 0x0000000000400c24 <rpcalc_parse+644>: mov r13,rax 0x0000000000400c27 <rpcalc_parse+647>: sub r13,r...

search for: r15d