thr3ads.net - llvm dev - [llvm-dev] Help required regarding IPRA and Local Function optimization [Jun 2016]

If this information is useful, please help other people find it:
Share via:

vivek pandya via llvm-dev

2016-Jun-30 08:51 UTC

[llvm-dev] Help required regarding IPRA and Local Function optimization

Hello Mentors,

I am currently finding bug in Local Function related optimization due to
which runtime failures are observed in some test cases, as those test cases
are containing very large function with recursion and object oriented code
so I am not able to find a pattern which is causing failure. So I tried
following simple case to understand expected behavior from this
optimization.

Consider following code :

define void @bar() #0 {
  call void asm sideeffect "movl %ecx, %r15d", "~{r15}"() #0
  call void @foo()
  call void asm sideeffect "movl %r15d, %ebx", "~{rbx}"() #0
  ret void
}

define internal void @foo() #0 {
  call void asm sideeffect "movl %r14d, %r15d", "~{r15}"()
#0
  ret void
}

and its generated assembly code when IPRA enabled:

.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 12
.p2align 4, 0x90
_foo:                                   ## @foo
.cfi_startproc
## BB#0:
## InlineAsm Start
movl %r14d, %r15d
## InlineAsm End
retq
.cfi_endproc

.globl _bar
.p2align 4, 0x90
_bar:                                   ## @bar
.cfi_startproc
## BB#0:
pushq %r15
Ltmp0:
.cfi_def_cfa_offset 16
pushq %rbx
Ltmp1:
.cfi_def_cfa_offset 24
pushq %rax
Ltmp2:
.cfi_def_cfa_offset 32
Ltmp3:
.cfi_offset %rbx, -24
Ltmp4:
.cfi_offset %r15, -16
## InlineAsm Start
movl %ecx, %r15d
## InlineAsm End
callq _foo
## InlineAsm Start
movl %r15d, %ebx
## InlineAsm End
addq $8, %rsp
popq %rbx
popq %r15
retq
.cfi_endproc


.subsections_via_symbols

now foo clobbers R15 (which is callee saved) but as foo is local function
IPRA will mark R15 as clobbered and foo will not have save/restore for R15
in prologue/epilog . Now for above function code to work correctly in call
site of foo in bar save and restore of R15 is expected but I am not able to
find a pass in llvm which does that in fact if I am not wrong RegMasks of
call site will be used by reg allocators
by LiveIntervals::checkRegMaskInterference and due to that if R15 is marked
clobbered  by call _foo then R15 will not be used for live-range which is
spanned across call _foo. ( that it self is other concerns because it may
result in virtual reg spill due to lack of available regs, as while setting
callee saved regs none it will be propagated through regmaks)

Here are my questions related to this example:
1) Is there any pass or code in LLVM which is responsible for caller saved
register for Physical Registers? By looking at InlineSpiller.cpp it is
responsible for VReg spilling.
2) If such pass exists then why R15 is not saved around call __foo?
3) Why _bar is saving %rax in above code?

Please help!

Sincerely,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160630/64436cd3/attachment.html>

vivek pandya via llvm-dev

2016-Jun-30 16:32 UTC

head link

[llvm-dev] Help required regarding IPRA and Local Function optimization

One more interesting thing I have noticed is as following :

In sqlite3 code consider 3 functions namely sqlite3Update, sqlite3Select
and sqlite3Where begin sqlite3WhereBegin is called by both functions
sqlite3Update and sqlite3Select but according to CallGraphSCC sqlite3Update
is codegen before in that case during RegMask propagation phase default
regmask is used for call site of sqlite3WhereBegin and later
sqlite3WhereBegin is optimized not to save callee saved registers this
should obviously not happen.

Here is assembly code that is printed with lldb dis command on run time
failure and after careful observation I have identified one bug:
...
   0x10002d8ff <+1855>: movl   -0x74(%rbp), %r13d
    0x10002d903 <+1859>: movq   -0x30(%rbp), %r12  ; this contains address
of a structure
    0x10002d907 <+1863>: movq   -0x38(%rbp), %r14
    0x10002d90b <+1867>: movq   -0x58(%rbp), %r15
    0x10002d90f <+1871>: leaq   -0x150(%rbp), %rdi
    0x10002d916 <+1878>: movq   -0x50(%rbp), %rsi
    0x10002d91a <+1882>: callq  0x10001a940               ;
sqlite3ExprResolveNames at sqlite3.c:47419 this function preserves callee
saved regs
    0x10002d91f <+1887>: testl  %eax, %eax
    0x10002d921 <+1889>: je     0x10002d92c               ; <+1900>
at
sqlite3.c:66485
    0x10002d923 <+1891>: movq   -0x70(%rbp), %rdx
    0x10002d927 <+1895>: jmp    0x10002d2b1               ; <+241>
at
sqlite3.c:66299
    0x10002d92c <+1900>: xorl   %eax, %eax
    0x10002d92e <+1902>: movq   %rax, -0xe0(%rbp)
    0x10002d935 <+1909>: xorl   %ecx, %ecx
    0x10002d937 <+1911>: xorl   %r8d, %r8d
    0x10002d93a <+1914>: movq   %r14, %rdi
    0x10002d93d <+1917>: movq   -0xd8(%rbp), %rsi
    0x10002d944 <+1924>: movq   -0x50(%rbp), %rdx
    0x10002d948 <+1928>: callq  0x100030600               ;
sqlite3WhereBegin at sqlite3.c:69859 this function will not save any callee
saved regs and actual code uses R12
    0x10002d94d <+1933>: movq   %rax, %r14
    0x10002d950 <+1936>: testq  %r14, %r14
    0x10002d953 <+1939>: je     0x10002e0d3               ; <+3859>
at
sqlite3.c:66699
    0x10002d959 <+1945>: movq   -0x48(%rbp), %rax
    0x10002d95d <+1949>: cmpb   $0x0, 0x69(%rax)
    0x10002d961 <+1953>: movl   $0xa, %eax
    0x10002d966 <+1958>: movl   $0x26, %esi
    0x10002d96b <+1963>: cmovnel %eax, %esi
    0x10002d96e <+1966>: movq   %r12, %rdi  ; here value of R12 is
clobbered so wrong address is passed as parameter and due to that while
executing sqlite3VdbeAddOp2 bed memory access error is raised.
    0x10002d971 <+1969>: movq   -0x68(%rbp), %rdx
    0x10002d975 <+1973>: movl   %r13d, %ecx
    0x10002d978 <+1976>: callq  0x100019720                 ;
sqlite3VdbeAddOp2 at sqlite3.c:37297
...

Here is lldb dis result for sqlite3VdbeAddOp3:
    0x100019500 <+0>:   pushq  %rbp
    0x100019501 <+1>:   movq   %rsp, %rbp
    0x100019504 <+4>:   pushq  %r15
    0x100019506 <+6>:   pushq  %r14
    0x100019508 <+8>:   pushq  %r13
    0x10001950a <+10>:  pushq  %r12
    0x10001950c <+12>:  pushq  %rbx
    0x10001950d <+13>:  pushq  %rax
    0x10001950e <+14>:  movl   %ecx, %r12d
    0x100019511 <+17>:  movl   %edx, %r13d
    0x100019514 <+20>:  movl   %esi, %r15d
    0x100019517 <+23>:  movq   %rdi, %rbx
->  0x10001951a <+26>:  movl   0x18(%rbx), %r14d

Please correct me if any thing is wrong and also please provide some help.

-Vivek

2016-06-30 14:21 GMT+05:30 vivek pandya <vivekvpandya at gmail.com>:
> Hello Mentors,
>
> I am currently finding bug in Local Function related optimization due to
> which runtime failures are observed in some test cases, as those test cases
> are containing very large function with recursion and object oriented code
> so I am not able to find a pattern which is causing failure. So I tried
> following simple case to understand expected behavior from this
> optimization.
>
> Consider following code :
>
> define void @bar() #0 {
>   call void asm sideeffect "movl %ecx, %r15d",
"~{r15}"() #0
>   call void @foo()
>   call void asm sideeffect "movl %r15d, %ebx",
"~{rbx}"() #0
>   ret void
> }
>
> define internal void @foo() #0 {
>   call void asm sideeffect "movl %r14d, %r15d",
"~{r15}"() #0
>   ret void
> }
>
> and its generated assembly code when IPRA enabled:
>
> .section __TEXT,__text,regular,pure_instructions
> .macosx_version_min 10, 12
> .p2align 4, 0x90
> _foo:                                   ## @foo
> .cfi_startproc
> ## BB#0:
> ## InlineAsm Start
> movl %r14d, %r15d
> ## InlineAsm End
> retq
> .cfi_endproc
>
> .globl _bar
> .p2align 4, 0x90
> _bar:                                   ## @bar
> .cfi_startproc
> ## BB#0:
> pushq %r15
> Ltmp0:
> .cfi_def_cfa_offset 16
> pushq %rbx
> Ltmp1:
> .cfi_def_cfa_offset 24
> pushq %rax
> Ltmp2:
> .cfi_def_cfa_offset 32
> Ltmp3:
> .cfi_offset %rbx, -24
> Ltmp4:
> .cfi_offset %r15, -16
> ## InlineAsm Start
> movl %ecx, %r15d
> ## InlineAsm End
> callq _foo
> ## InlineAsm Start
> movl %r15d, %ebx
> ## InlineAsm End
> addq $8, %rsp
> popq %rbx
> popq %r15
> retq
> .cfi_endproc
>
>
> .subsections_via_symbols
>
> now foo clobbers R15 (which is callee saved) but as foo is local function
> IPRA will mark R15 as clobbered and foo will not have save/restore for R15
> in prologue/epilog . Now for above function code to work correctly in call
> site of foo in bar save and restore of R15 is expected but I am not able to
> find a pass in llvm which does that in fact if I am not wrong RegMasks of
> call site will be used by reg allocators
> by LiveIntervals::checkRegMaskInterference and due to that if R15 is marked
> clobbered  by call _foo then R15 will not be used for live-range which is
> spanned across call _foo. ( that it self is other concerns because it may
> result in virtual reg spill due to lack of available regs, as while setting
> callee saved regs none it will be propagated through regmaks)
>
> Here are my questions related to this example:
> 1) Is there any pass or code in LLVM which is responsible for caller saved
> register for Physical Registers? By looking at InlineSpiller.cpp it is
> responsible for VReg spilling.
> 2) If such pass exists then why R15 is not saved around call __foo?
> 3) Why _bar is saving %rax in above code?
>
> Please help!
>
> Sincerely,
> Vivek
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160630/6e157a83/attachment-0001.html>

Quentin Colombet via llvm-dev

2016-Jul-01 23:37 UTC

head link

[llvm-dev] Help required regarding IPRA and Local Function optimization

Hi Vivek,

I believe your reduced test case is broken.
> On Jun 30, 2016, at 1:51 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> 
> Hello Mentors,
> 
> I am currently finding bug in Local Function related optimization due to
which runtime failures are observed in some test cases, as those test cases are
containing very large function with recursion and object oriented code so I am
not able to find a pattern which is causing failure. So I tried following simple
case to understand expected behavior from this optimization.
> 
> Consider following code :
> 
> define void @bar() #0 {
>   call void asm sideeffect "movl	%ecx, %r15d",
"~{r15}"() #0
>   call void @foo()
>   call void asm sideeffect "movl	%r15d, %ebx",
"~{rbx}"() #0
>   ret void
> }
> 
> define internal void @foo() #0 {
>   call void asm sideeffect "movl	%r14d, %r15d",
"~{r15}"() #0
>   ret void
> }
> 
> and its generated assembly code when IPRA enabled:
> 
> 	.section	__TEXT,__text,regular,pure_instructions
> 	.macosx_version_min 10, 12
> 	.p2align	4, 0x90
> _foo:                                   ## @foo
> 	.cfi_startproc
> ## BB#0:
> 	## InlineAsm Start
> 	movl	%r14d, %r15d
> 	## InlineAsm End
> 	retq
> 	.cfi_endproc
> 
> 	.globl	_bar
> 	.p2align	4, 0x90
> _bar:                                   ## @bar
> 	.cfi_startproc
> ## BB#0:
> 	pushq	%r15
> Ltmp0:
> 	.cfi_def_cfa_offset 16
> 	pushq	%rbx
> Ltmp1:
> 	.cfi_def_cfa_offset 24
> 	pushq	%rax
> Ltmp2:
> 	.cfi_def_cfa_offset 32
> Ltmp3:
> 	.cfi_offset %rbx, -24
> Ltmp4:
> 	.cfi_offset %r15, -16
> 	## InlineAsm Start
> 	movl	%ecx, %r15d
> 	## InlineAsm End
> 	callq	_foo
> 	## InlineAsm Start
> 	movl	%r15d, %ebx
> 	## InlineAsm End
> 	addq	$8, %rsp
> 	popq	%rbx
> 	popq	%r15
> 	retq
> 	.cfi_endproc
> 
> 
> .subsections_via_symbols
> 
> now foo clobbers R15 (which is callee saved) but as foo is local function
IPRA will mark R15 as clobbered and foo will not have save/restore for R15 in
prologue/epilog . Now for above function code to work correctly in call site of
foo in bar save and restore of R15 is expected but I am not able to find a pass
in llvm which does that in fact if I am not wrong RegMasks of call site will be
used by reg allocators by LiveIntervals::checkRegMaskInterference and due to
that if R15 is marked clobbered  by call _foo then R15 will not be used for
live-range which is spanned across call _foo. ( that it self is other concerns
because it may result in virtual reg spill due to lack of available regs, as
while setting callee saved regs none it will be propagated through regmaks)
> 
> Here are my questions related to this example:
> 1) Is there any pass or code in LLVM which is responsible for caller saved
register for Physical Registers? By looking at InlineSpiller.cpp it is
responsible for VReg spilling.
If you caller saved register "by hand” (like with inline assembly, you are
supposed to control their live range.
What I am saying is that if you want support from the compiler, you need to give
it this freedom, and your test case does not provide that.
i.e., if you want the compiler to help, you would need to save r15 in a virtual
register, and use this virtual register in the next inline asm statement.
E.g. (do not try to run that code, the syntax is probably wrong, but I wanted to
illustrate the idea)

define void @bar() #0 {
  call void asm sideeffect "movl	%ecx, %r15d; movl %r15d, $r", i32
%tmpVal, "~{r15}"() #0
  call void @foo()
  call void asm side effect “movl $r, %r15d; movl	%r15d, %ebx",
"~{rbx}"() #0
  ret void
}
> 2) If such pass exists then why R15 is not saved around call __foo?
R15 is not live in your example. I mean, inline asm statements are opaque for
the compiler and it cannot track the liveness from the strings :). The only
thing it knows, is what you tell it: you clobber r15 in one instruction and rbx
in another. It does know the second one use r15 from the first one.
> 3) Why _bar is saving %rax in above code?
That’s an optimization :). We actually need to do sub $8 (probably to realign
the stack), but since sub and push are as expensive, we do push.

Cheers,
-Quentin> 
> Please help!
> 
> Sincerely,
> Vivek
>

vivek pandya via llvm-dev

2016-Jul-02 11:27 UTC

head link

[llvm-dev] Help required regarding IPRA and Local Function optimization

On Sat, Jul 2, 2016 at 5:07 AM, Quentin Colombet <qcolombet at apple.com>
wrote:
> Hi Vivek,
>
> I believe your reduced test case is broken.
>
> > On Jun 30, 2016, at 1:51 AM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
> >
> > Hello Mentors,
> >
> > I am currently finding bug in Local Function related optimization due
to
> which runtime failures are observed in some test cases, as those test cases
> are containing very large function with recursion and object oriented code
> so I am not able to find a pattern which is causing failure. So I tried
> following simple case to understand expected behavior from this
> optimization.
> >
> > Consider following code :
> >
> > define void @bar() #0 {
> >   call void asm sideeffect "movl      %ecx, %r15d",
"~{r15}"() #0
> >   call void @foo()
> >   call void asm sideeffect "movl      %r15d, %ebx",
"~{rbx}"() #0
> >   ret void
> > }
> >
> > define internal void @foo() #0 {
> >   call void asm sideeffect "movl      %r14d, %r15d",
"~{r15}"() #0
> >   ret void
> > }
> >
> > and its generated assembly code when IPRA enabled:
> >
> >       .section        __TEXT,__text,regular,pure_instructions
> >       .macosx_version_min 10, 12
> >       .p2align        4, 0x90
> > _foo:                                   ## @foo
> >       .cfi_startproc
> > ## BB#0:
> >       ## InlineAsm Start
> >       movl    %r14d, %r15d
> >       ## InlineAsm End
> >       retq
> >       .cfi_endproc
> >
> >       .globl  _bar
> >       .p2align        4, 0x90
> > _bar:                                   ## @bar
> >       .cfi_startproc
> > ## BB#0:
> >       pushq   %r15
> > Ltmp0:
> >       .cfi_def_cfa_offset 16
> >       pushq   %rbx
> > Ltmp1:
> >       .cfi_def_cfa_offset 24
> >       pushq   %rax
> > Ltmp2:
> >       .cfi_def_cfa_offset 32
> > Ltmp3:
> >       .cfi_offset %rbx, -24
> > Ltmp4:
> >       .cfi_offset %r15, -16
> >       ## InlineAsm Start
> >       movl    %ecx, %r15d
> >       ## InlineAsm End
> >       callq   _foo
> >       ## InlineAsm Start
> >       movl    %r15d, %ebx
> >       ## InlineAsm End
> >       addq    $8, %rsp
> >       popq    %rbx
> >       popq    %r15
> >       retq
> >       .cfi_endproc
> >
> >
> > .subsections_via_symbols
> >
> > now foo clobbers R15 (which is callee saved) but as foo is local
> function IPRA will mark R15 as clobbered and foo will not have save/restore
> for R15 in prologue/epilog . Now for above function code to work correctly
> in call site of foo in bar save and restore of R15 is expected but I am not
> able to find a pass in llvm which does that in fact if I am not wrong
> RegMasks of call site will be used by reg allocators by
> LiveIntervals::checkRegMaskInterference and due to that if R15 is marked
> clobbered  by call _foo then R15 will not be used for live-range which is
> spanned across call _foo. ( that it self is other concerns because it may
> result in virtual reg spill due to lack of available regs, as while setting
> callee saved regs none it will be propagated through regmaks)
> >
> > Here are my questions related to this example:
> > 1) Is there any pass or code in LLVM which is responsible for caller
> saved register for Physical Registers? By looking at InlineSpiller.cpp it
> is responsible for VReg spilling.
>
> If you caller saved register "by hand” (like with inline assembly, you
are
> supposed to control their live range.
> What I am saying is that if you want support from the compiler, you need
> to give it this freedom, and your test case does not provide that.
> i.e., if you want the compiler to help, you would need to save r15 in a
> virtual register, and use this virtual register in the next inline asm
> statement.
> E.g. (do not try to run that code, the syntax is probably wrong, but I
> wanted to illustrate the idea)
>
> define void @bar() #0 {
>   call void asm sideeffect "movl        %ecx, %r15d; movl %r15d,
$r", i32
> %tmpVal, "~{r15}"() #0
>   call void @foo()
>   call void asm side effect “movl $r, %r15d; movl       %r15d, %ebx",
> "~{rbx}"() #0
>   ret void
> }
>
> > 2) If such pass exists then why R15 is not saved around call __foo?
>
> R15 is not live in your example. I mean, inline asm statements are opaque
> for the compiler and it cannot track the liveness from the strings :). The
> only thing it knows, is what you tell it: you clobber r15 in one
> instruction and rbx in another. It does know the second one use r15 from
> the first one.
>
> > 3) Why _bar is saving %rax in above code?
>
> That’s an optimization :). We actually need to do sub $8 (probably to
> realign the stack), but since sub and push are as expensive, we do push.
>
> Thanks Quentin, I got your point. I will update the test case accordingly.
Sincerely,,
Vivek

> Cheers,
> -Quentin
> >
> > Please help!
> >
> > Sincerely,
> > Vivek
> >
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160702/4f2bcad8/attachment.html>

vivek pandya via llvm-dev

2016-Jul-02 11:45 UTC

head link

[llvm-dev] Help required regarding IPRA and Local Function optimization

On Thu, Jun 30, 2016 at 10:02 PM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> One more interesting thing I have noticed is as following :
>
> In sqlite3 code consider 3 functions namely sqlite3Update, sqlite3Select
> and sqlite3Where begin sqlite3WhereBegin is called by both functions
> sqlite3Update and sqlite3Select but according to CallGraphSCC sqlite3Update
> is codegen before in that case during RegMask propagation phase default
> regmask is used for call site of sqlite3WhereBegin and later
> sqlite3WhereBegin is optimized not to save callee saved registers this
> should obviously not happen.
>
> Here is assembly code that is printed with lldb dis command on run time
> failure and after careful observation I have identified one bug:
> ...
>    0x10002d8ff <+1855>: movl   -0x74(%rbp), %r13d
>     0x10002d903 <+1859>: movq   -0x30(%rbp), %r12  ; this contains
address
> of a structure
>     0x10002d907 <+1863>: movq   -0x38(%rbp), %r14
>     0x10002d90b <+1867>: movq   -0x58(%rbp), %r15
>     0x10002d90f <+1871>: leaq   -0x150(%rbp), %rdi
>     0x10002d916 <+1878>: movq   -0x50(%rbp), %rsi
>     0x10002d91a <+1882>: callq  0x10001a940               ;
> sqlite3ExprResolveNames at sqlite3.c:47419 this function preserves callee
> saved regs
>     0x10002d91f <+1887>: testl  %eax, %eax
>     0x10002d921 <+1889>: je     0x10002d92c               ;
<+1900> at
> sqlite3.c:66485
>     0x10002d923 <+1891>: movq   -0x70(%rbp), %rdx
>     0x10002d927 <+1895>: jmp    0x10002d2b1               ;
<+241> at
> sqlite3.c:66299
>     0x10002d92c <+1900>: xorl   %eax, %eax
>     0x10002d92e <+1902>: movq   %rax, -0xe0(%rbp)
>     0x10002d935 <+1909>: xorl   %ecx, %ecx
>     0x10002d937 <+1911>: xorl   %r8d, %r8d
>     0x10002d93a <+1914>: movq   %r14, %rdi
>     0x10002d93d <+1917>: movq   -0xd8(%rbp), %rsi
>     0x10002d944 <+1924>: movq   -0x50(%rbp), %rdx
>     0x10002d948 <+1928>: callq  0x100030600               ;
> sqlite3WhereBegin at sqlite3.c:69859 this function will not save any callee
> saved regs and actual code uses R12
>     0x10002d94d <+1933>: movq   %rax, %r14
>     0x10002d950 <+1936>: testq  %r14, %r14
>     0x10002d953 <+1939>: je     0x10002e0d3               ;
<+3859> at
> sqlite3.c:66699
>     0x10002d959 <+1945>: movq   -0x48(%rbp), %rax
>     0x10002d95d <+1949>: cmpb   $0x0, 0x69(%rax)
>     0x10002d961 <+1953>: movl   $0xa, %eax
>     0x10002d966 <+1958>: movl   $0x26, %esi
>     0x10002d96b <+1963>: cmovnel %eax, %esi
>     0x10002d96e <+1966>: movq   %r12, %rdi  ; here value of R12 is
> clobbered so wrong address is passed as parameter and due to that while
> executing sqlite3VdbeAddOp2 bed memory access error is raised.
>     0x10002d971 <+1969>: movq   -0x68(%rbp), %rdx
>     0x10002d975 <+1973>: movl   %r13d, %ecx
>     0x10002d978 <+1976>: callq  0x100019720                 ;
> sqlite3VdbeAddOp2 at sqlite3.c:37297
> ...
>
> Here is lldb dis result for sqlite3VdbeAddOp3:
>     0x100019500 <+0>:   pushq  %rbp
>     0x100019501 <+1>:   movq   %rsp, %rbp
>     0x100019504 <+4>:   pushq  %r15
>     0x100019506 <+6>:   pushq  %r14
>     0x100019508 <+8>:   pushq  %r13
>     0x10001950a <+10>:  pushq  %r12
>     0x10001950c <+12>:  pushq  %rbx
>     0x10001950d <+13>:  pushq  %rax
>     0x10001950e <+14>:  movl   %ecx, %r12d
>     0x100019511 <+17>:  movl   %edx, %r13d
>     0x100019514 <+20>:  movl   %esi, %r15d
>     0x100019517 <+23>:  movq   %rdi, %rbx
> ->  0x10001951a <+26>:  movl   0x18(%rbx), %r14d
>
> Please correct me if any thing is wrong and also please provide some help.
>
The above explained bug is due to not excluding recursive functions
recursive function related optimization. I have also confirm the same bug
with one of other failing test case sphereflake.cpp here is explanation:

static node_t *create(node_t*n,const int lvl,int dist,v_t c,v_t d,double r)
{
n = 1 + new (n) node_t(sphere_t(c,2.*r),sphere_t(c,r), lvl > 1 ? dist : 1);
if (lvl <= 1)
return n; /*if not at the bottom, recurse a bit more*/
dist=std::max((dist-childs)/childs,1); const basis_t b(d);
const double nr=r*1/3.,daL=2.*M_PI/6.,daU=2.*M_PI/3.; double a=0;
for(int i=0;i<6;++i){ /*lower ring*/
const v_t ndir((d*-.2+b.b1*LLVMsin(a)+b.b2*LLVMcos(a)).norm());
/*transcendentals?!*/
n=create(n,lvl-1,dist,c+ndir*(r+nr),ndir,nr);
a+=daL;
}
a-=daL/3.;/*tweak*/
for(int i=0;i<3;++i){ /*upper ring*/
const v_t ndir((d*+.6+b.b1*LLVMsin(a)+b.b2*LLVMcos(a)).norm());
n=create(n,lvl-1,dist,c+ndir*(r+nr),ndir,nr); a+=daU;
}
return n;
}

the above function is recursive function but it optimized to not to save
callee saved registers. Due to recursion while performing IPRA, it can't
have updated regmask ( or it is not correct to have it ) so register
allocators will use callee saved registers. But then it optimizes for not
saving registers due to this in above case object v_t 's address has been
loaded in EBX register and when recursive call returns it gets clobbered.

The sqlite3 failure is also due to this bug as I previously mentioned
RegMask is not propagated at some call sites and still it disables saving
registers.

To address this issue I have changed check for allowing functions to be
considered for CSR optimization as follows:

bool RegUsageInfoCollector::isEligibleForTailCallOptimization(Function *F) {
    if (!F->hasFnAttribute(Attribute::NoRecurse)) {
    dbgs() << F->getName() << " Function is
recursive\n";
    return true;
    }

  const Module *M = F->getParent();
  for (const Function &Fu : *M)
    for (const BasicBlock &BB : Fu)
      for (const Instruction &II : BB) {
        if (auto CS = ImmutableCallSite(&II))
          if (CS.getCalledFunction() == F && CS.isTailCall()) {
          dbgs() << F->getName() << "Function is
tailCall\n";
            return true;
          }
      }
  return false;

}

And now test-suite passes with zero failures. Also I have added a simple
static to count number of functions which gets optimized, according to that
sqlite3 application is having total 32 functions optimized, and SPASS is
having 104 functions optimized.

If you found any thing incorrect please let me know otherwise just ping on
this mail so that I can update review request.

Sincerely,
Vivek

> -Vivek
>
> 2016-06-30 14:21 GMT+05:30 vivek pandya <vivekvpandya at gmail.com>:
>
>> Hello Mentors,
>>
>> I am currently finding bug in Local Function related optimization due
to
>> which runtime failures are observed in some test cases, as those test
cases
>> are containing very large function with recursion and object oriented
code
>> so I am not able to find a pattern which is causing failure. So I tried
>> following simple case to understand expected behavior from this
>> optimization.
>>
>> Consider following code :
>>
>> define void @bar() #0 {
>>   call void asm sideeffect "movl %ecx, %r15d",
"~{r15}"() #0
>>   call void @foo()
>>   call void asm sideeffect "movl %r15d, %ebx",
"~{rbx}"() #0
>>   ret void
>> }
>>
>> define internal void @foo() #0 {
>>   call void asm sideeffect "movl %r14d, %r15d",
"~{r15}"() #0
>>   ret void
>> }
>>
>> and its generated assembly code when IPRA enabled:
>>
>> .section __TEXT,__text,regular,pure_instructions
>> .macosx_version_min 10, 12
>> .p2align 4, 0x90
>> _foo:                                   ## @foo
>> .cfi_startproc
>> ## BB#0:
>> ## InlineAsm Start
>> movl %r14d, %r15d
>> ## InlineAsm End
>> retq
>> .cfi_endproc
>>
>> .globl _bar
>> .p2align 4, 0x90
>> _bar:                                   ## @bar
>> .cfi_startproc
>> ## BB#0:
>> pushq %r15
>> Ltmp0:
>> .cfi_def_cfa_offset 16
>> pushq %rbx
>> Ltmp1:
>> .cfi_def_cfa_offset 24
>> pushq %rax
>> Ltmp2:
>> .cfi_def_cfa_offset 32
>> Ltmp3:
>> .cfi_offset %rbx, -24
>> Ltmp4:
>> .cfi_offset %r15, -16
>> ## InlineAsm Start
>> movl %ecx, %r15d
>> ## InlineAsm End
>> callq _foo
>> ## InlineAsm Start
>> movl %r15d, %ebx
>> ## InlineAsm End
>> addq $8, %rsp
>> popq %rbx
>> popq %r15
>> retq
>> .cfi_endproc
>>
>>
>> .subsections_via_symbols
>>
>> now foo clobbers R15 (which is callee saved) but as foo is local
function
>> IPRA will mark R15 as clobbered and foo will not have save/restore for
R15
>> in prologue/epilog . Now for above function code to work correctly in
call
>> site of foo in bar save and restore of R15 is expected but I am not
able to
>> find a pass in llvm which does that in fact if I am not wrong RegMasks
of
>> call site will be used by reg allocators
>> by LiveIntervals::checkRegMaskInterference and due to that if R15 is
marked
>> clobbered  by call _foo then R15 will not be used for live-range which
is
>> spanned across call _foo. ( that it self is other concerns because it
may
>> result in virtual reg spill due to lack of available regs, as while
setting
>> callee saved regs none it will be propagated through regmaks)
>>
>> Here are my questions related to this example:
>> 1) Is there any pass or code in LLVM which is responsible for caller
>> saved register for Physical Registers? By looking at InlineSpiller.cpp
it
>> is responsible for VReg spilling.
>> 2) If such pass exists then why R15 is not saved around call __foo?
>> 3) Why _bar is saving %rax in above code?
>>
>> Please help!
>>
>> Sincerely,
>> Vivek
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160702/9a5f2c41/attachment-0001.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jun 2016 - Help required regarding IPRA and Local Function optimization

[llvm-dev] Help required regarding IPRA and Local Function optimization

[llvm-dev] Help required regarding IPRA and Local Function optimization

[llvm-dev] Help required regarding IPRA and Local Function optimization

[llvm-dev] Help required regarding IPRA and Local Function optimization

[llvm-dev] Help required regarding IPRA and Local Function optimization

Reasonably Related Threads