thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation [Jul 2016]

If this information is useful, please help other people find it:
Share via:

vivek pandya via llvm-dev

2016-Jun-26 11:48 UTC

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Hello LLVM Developers,

Please follow summary of work done during this week.

Implementation:
===========
During this week patch for bug fix 28144 is updated after finding more
refinement in remarks calculation. As per suggestion from Matthias Braun
and Hal Finkel regmask calculation code is same as
MachineRegisterInfo::isPhysRegModified() except no check of isNoReturnDef()
is required. So we proposed to add a bool argument SkipNoReturnDef with
default value false to isPhysRegModified method so that with out breaking
current use of isPhysRegModified we can reuse that code for the purpose of
IPRA. The patch can be found here : http://reviews.llvm.org/D21395

With IPRA to improve code quality, call site with local functions are
forced to have caller saved registers ( more improved heuristics will be
implemented ) I have been experimenting this on my local machine and I
discovered that tail call optimization is getting affected due to this
optimization and some test case in test-suite fails with segmentation fault
or infinite recursion due to counter value gets overwritten. Please find
more details and example bug at
https://groups.google.com/d/msg/llvm-dev/TSoYxeMMzxM/rb9e_M2iEwAJ

I have also tried a very simple method to handle indirect function in IPRA
but at higher optimization level, indirect function calls are getting
converted to direct function calls so I request interested community member
to guide me. We can have discussion about this on Monday morning (PDT).
More discussion on this can be found at here :
https://groups.google.com/d/msg/llvm-dev/dPk3lKwH1kU/GNfhD_jKEQAJ

Testing:

=====
During this week I think that IPRA optimization is more stabilized after
having bug fix so have run test-suite with that and also as per suggestion
form Quentin Colombet I tested test-suite with only codegen order changed
to bottom up on call graph.  Overall this codegen order improves runtime
and compile time. I have shared results here:

https://docs.google.com/document/d/1At3QqEWmeDEXnDVz-CGh2GDlYQR3VRz3ipIfcXoLC3c/edit?usp=sharing

https://docs.google.com/document/d/1hS-Cj3mEDqUCTKTYaJpoJpVOBk5E2wHK9XSGLowNPeM/edit?usp=sharing

Plan for next week:
=============1) Rebase pending patches and get the review process completed.
2) Solve tail call related bug.
3) Discuss some ideas and heuristics for improving IPRA.
4) Discuss how to handle indirect function call with in IPRA.
5) More testing with llvm test-suite

Sincerely,
Vivek

On Tue, Jun 21, 2016 at 9:26 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
>
>
> On Tue, Jun 21, 2016 at 1:45 AM, Matthias Braun <matze at braunis.de>
wrote:
>
>>
>> > On Jun 20, 2016, at 12:53 PM, Sanjoy Das via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> > Hi Vivek,
>> >
>> > vivek pandya via llvm-dev wrote:
>> > >     int foo() {
>> > >     return 12;
>> > >     }
>> > >
>> > >     int bar(int a) {
>> > >     return foo() + a;
>> > >     }
>> > >
>> > >     int (*fp)() = 0;
>> > >     int (*fp1)(int) = 0;
>> > >
>> > >     int main() {
>> > >     fp = foo;
>> > >     fp();
>> > >     fp1 = bar;
>> > >     fp1(15);
>> > >     return 0;
>> > >     }
>> >
>> > IMO it is waste of time trying to do a better job at the IPRA
level on
>> > IR like the above ^.  LLVM should be folding the indirect calls to
>> > direct calls at the IR level, and if it isn't that's a bug
in the IR
>> > level optimizer.
>> +1 from me.
>>
>> Yes at -O3 level simple indirect calls including virtual functions are
> getting optimized to direct call.
>
>
>> The interesting cases are the non-obvious ones (assumeing foo/bar have
>> the same parameters). Things gets interesting once you have uncertainty
in
>> the mix. The minimally interesting case would look like this:
>>
>> int main() {
>>     int (*fp)();
>>     if (rand()) {
>>         fp = foo;
>>     } else {
>>         fp = bar;
>>     }
>>     fp(42);
>> }
>>
>
> I tried this case and my simple hack fails to optimize it :-) . This
> requires discussion on IRC.
>
> Sincerely,
> -Vivek
>
>
>> However predicting the possible targets of a call is IMO a question of
>> computing a call graph datastructure and improving upon that. We should
be
>> sure that we discuss and implement this independently of the register
>> allocation work!
>>
>> - Matthias
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160626/ff5f596f/attachment.html>

vivek pandya via llvm-dev

2016-Jul-03 12:13 UTC

head link

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Hello LLVM Developers.

This week much of my time is consumed in debugging IPRA's effect on higher
level optimization specifically due to not having callee saved registers. I
think it was hard but I learned a lot and LLDB helped me a lot.

Here is summary for this week:

Implementation:
===========
Implemented a very simple check to prevent no callee saved registers
optimization to functions which are recursive or may be optimized as tail
call. A simple statistic added to count number of functions optimized for
not having callee saved registers.

Testing:

=====
Debugged failing test cases due to no callee saved registers optimization.
More details with examples can be found here
https://groups.google.com/d/topic/llvm-dev/TSoYxeMMzxM/discussion . Now all
test in llvm test-suite pass.

Study:

====
To find some ideas to improve current IPRA I read 2 papers namely
“Minimizing Register Usage Penalty at Procedure Calls” by Fred C. Chow and
“Register Allocation Across Procedure and Module Boundaries” by Santhanam
and Odnert.

1) From the first paper I like the idea of shrink wrap analysis and LLVM
currently have this optimization but the approach is completely different.
I have initiated a discussion for that, it can be found here
https://groups.google.com/d/topic/llvm-dev/_mZoGUQDMGo/discussion I would
like to talk to Quentin Colombet more about this.

2) From the second paper I like the idea of spill code motion, in this
optimization spill due to callee saved register is pushed to less
frequently called caller, but the approach mentioned in that paper requiems
call frequency details and also it differs register allocation to very
late, the optimization it self requires register usage details but it
operates on register usage estimation done in earlier stage. This
optimization also requires help from intra-procedural register allocators.
I would like to have more discussion on this over IRC this Monday with my
mentors.

Plan for next week:
=============1) Rebase pending patches and get the review process completed.
2) Discuss how can identified ideas can be implemented with in current
infrastructure.
3) Discuss how to handle indirect function call with in IPRA.

Sincerely,
Vivek

On Sun, Jun 26, 2016 at 5:18 PM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Hello LLVM Developers,
>
> Please follow summary of work done during this week.
>
> Implementation:
> ===========>
> During this week patch for bug fix 28144 is updated after finding more
> refinement in remarks calculation. As per suggestion from Matthias Braun
> and Hal Finkel regmask calculation code is same as
> MachineRegisterInfo::isPhysRegModified() except no check of isNoReturnDef()
> is required. So we proposed to add a bool argument SkipNoReturnDef with
> default value false to isPhysRegModified method so that with out breaking
> current use of isPhysRegModified we can reuse that code for the purpose of
> IPRA. The patch can be found here : http://reviews.llvm.org/D21395
>
> With IPRA to improve code quality, call site with local functions are
> forced to have caller saved registers ( more improved heuristics will be
> implemented ) I have been experimenting this on my local machine and I
> discovered that tail call optimization is getting affected due to this
> optimization and some test case in test-suite fails with segmentation fault
> or infinite recursion due to counter value gets overwritten. Please find
> more details and example bug at
> https://groups.google.com/d/msg/llvm-dev/TSoYxeMMzxM/rb9e_M2iEwAJ
>
> I have also tried a very simple method to handle indirect function in IPRA
> but at higher optimization level, indirect function calls are getting
> converted to direct function calls so I request interested community member
> to guide me. We can have discussion about this on Monday morning (PDT).
> More discussion on this can be found at here :
> https://groups.google.com/d/msg/llvm-dev/dPk3lKwH1kU/GNfhD_jKEQAJ
>
>
> Testing:
>
> =====>
> During this week I think that IPRA optimization is more stabilized after
> having bug fix so have run test-suite with that and also as per suggestion
> form Quentin Colombet I tested test-suite with only codegen order changed
> to bottom up on call graph.  Overall this codegen order improves runtime
> and compile time. I have shared results here:
>
>
>
https://docs.google.com/document/d/1At3QqEWmeDEXnDVz-CGh2GDlYQR3VRz3ipIfcXoLC3c/edit?usp=sharing
>
>
>
>
https://docs.google.com/document/d/1hS-Cj3mEDqUCTKTYaJpoJpVOBk5E2wHK9XSGLowNPeM/edit?usp=sharing
>
> Plan for next week:
> =============> 1) Rebase pending patches and get the review process
completed.
> 2) Solve tail call related bug.
> 3) Discuss some ideas and heuristics for improving IPRA.
> 4) Discuss how to handle indirect function call with in IPRA.
> 5) More testing with llvm test-suite
>
> Sincerely,
> Vivek
>
>
> On Tue, Jun 21, 2016 at 9:26 AM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
>
>>
>>
>> On Tue, Jun 21, 2016 at 1:45 AM, Matthias Braun <matze at
braunis.de> wrote:
>>
>>>
>>> > On Jun 20, 2016, at 12:53 PM, Sanjoy Das via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> >
>>> > Hi Vivek,
>>> >
>>> > vivek pandya via llvm-dev wrote:
>>> > >     int foo() {
>>> > >     return 12;
>>> > >     }
>>> > >
>>> > >     int bar(int a) {
>>> > >     return foo() + a;
>>> > >     }
>>> > >
>>> > >     int (*fp)() = 0;
>>> > >     int (*fp1)(int) = 0;
>>> > >
>>> > >     int main() {
>>> > >     fp = foo;
>>> > >     fp();
>>> > >     fp1 = bar;
>>> > >     fp1(15);
>>> > >     return 0;
>>> > >     }
>>> >
>>> > IMO it is waste of time trying to do a better job at the IPRA
level on
>>> > IR like the above ^.  LLVM should be folding the indirect
calls to
>>> > direct calls at the IR level, and if it isn't that's a
bug in the IR
>>> > level optimizer.
>>> +1 from me.
>>>
>>> Yes at -O3 level simple indirect calls including virtual functions
are
>> getting optimized to direct call.
>>
>>
>>> The interesting cases are the non-obvious ones (assumeing foo/bar
have
>>> the same parameters). Things gets interesting once you have
uncertainty in
>>> the mix. The minimally interesting case would look like this:
>>>
>>> int main() {
>>>     int (*fp)();
>>>     if (rand()) {
>>>         fp = foo;
>>>     } else {
>>>         fp = bar;
>>>     }
>>>     fp(42);
>>> }
>>>
>>
>> I tried this case and my simple hack fails to optimize it :-) . This
>> requires discussion on IRC.
>>
>> Sincerely,
>> -Vivek
>>
>>
>>> However predicting the possible targets of a call is IMO a question
of
>>> computing a call graph datastructure and improving upon that. We
should be
>>> sure that we discuss and implement this independently of the
register
>>> allocation work!
>>>
>>> - Matthias
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160703/8aa93869/attachment-0001.html>

vivek pandya via llvm-dev

2016-Jul-10 04:42 UTC

head link

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Hello LLVM Developers,

Please feel free to send any ideas that you can think to improve current
IPRA. I will work on it and if possible I will implement that.

Please consider summary of work done during this week.

Implementation:
===========
The reviews requests has been updated to reflect the reviews.


Testing:

====
To get more benefit from IPRA I experimented it with LTO and results were
positive. For the SPASS application (one of the multi source benchmark in
test suite) execution time is reduce by 0.02s when LTO+IPRA compare to LTO
only. The current IPRA works at compile time and its optimization scope is
limited to a module so LTO produces a large module from small modules then
generates machine code for that. So it will help IPRA by providing a huge
module with very less external functions. to use IPRA with LTO one can pass
following arguments to clang : -flto -Wl,-mllvm,-enable-ipra . A more
detailed discussion can be found here
https://groups.google.com/d/topic/llvm-dev/Vkd-NOytdcA/discussion


Study:

====
Majority of time I have spent on finding new ideas to improve current IPRA.
As current approach can only see information with in current module it is
very hard to improve it further. Most of the approaches described in
literatures requirers help from a program analyzer and intra-procedural
register allocation and register allocation for the whole module is
differed until IPRA is completed. Also these approaches requires a heavy
data flow analysis so that is totally orthogonal to current approach. But
still we am able to identified two possible improvements and many thanks to
Peter Lawrence for his suggestions and questions.


   1. First improvement is to help IPRA by using __attribute__. This can be
   particularly use full when working with a library or external code which is
   written completely in assembly and a user is able to provide accurate
   register usage information. So idea is to supply regmask details for such
   external function with __attribute__ in function declaration and let IPRA
   propagate it to improve register allocation.  I will be working on this
   next week.
   2. Second improvement is to make less frequently executed function save
   every register it clobbers thus making it preserving all registers and
   propagating this information to more frequently executed callers to improve
   its register allocation. This leads us to PGO driven IPRA. More details can
   be found here.
   https://groups.google.com/d/topic/llvm-dev/jhC7L50el8k/discussion

Plan for Next Week:
==============1) Start implementing above two improvements.
2) Run test-suite with --benchmark option so that more precisely
 improvement can be calculated.

Sincerely,
Vivek

On Sun, Jul 3, 2016 at 5:43 PM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Hello LLVM Developers.
>
> This week much of my time is consumed in debugging IPRA's effect on
higher
> level optimization specifically due to not having callee saved registers. I
> think it was hard but I learned a lot and LLDB helped me a lot.
>
> Here is summary for this week:
>
> Implementation:
> ===========>
> Implemented a very simple check to prevent no callee saved registers
> optimization to functions which are recursive or may be optimized as tail
> call. A simple statistic added to count number of functions optimized for
> not having callee saved registers.
>
>
> Testing:
>
> =====>
> Debugged failing test cases due to no callee saved registers optimization.
> More details with examples can be found here
> https://groups.google.com/d/topic/llvm-dev/TSoYxeMMzxM/discussion . Now
> all test in llvm test-suite pass.
>
>
> Study:
>
> ====>
> To find some ideas to improve current IPRA I read 2 papers namely
> “Minimizing Register Usage Penalty at Procedure Calls” by Fred C. Chow and
> “Register Allocation Across Procedure and Module Boundaries” by Santhanam
> and Odnert.
>
> 1) From the first paper I like the idea of shrink wrap analysis and LLVM
> currently have this optimization but the approach is completely different.
> I have initiated a discussion for that, it can be found here
> https://groups.google.com/d/topic/llvm-dev/_mZoGUQDMGo/discussion I would
> like to talk to Quentin Colombet more about this.
>
> 2) From the second paper I like the idea of spill code motion, in this
> optimization spill due to callee saved register is pushed to less
> frequently called caller, but the approach mentioned in that paper requiems
> call frequency details and also it differs register allocation to very
> late, the optimization it self requires register usage details but it
> operates on register usage estimation done in earlier stage. This
> optimization also requires help from intra-procedural register allocators.
> I would like to have more discussion on this over IRC this Monday with my
> mentors.
>
> Plan for next week:
> =============> 1) Rebase pending patches and get the review process
completed.
> 2) Discuss how can identified ideas can be implemented with in current
> infrastructure.
> 3) Discuss how to handle indirect function call with in IPRA.
>
> Sincerely,
> Vivek
>
>
> On Sun, Jun 26, 2016 at 5:18 PM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
>
>> Hello LLVM Developers,
>>
>> Please follow summary of work done during this week.
>>
>> Implementation:
>> ===========>>
>> During this week patch for bug fix 28144 is updated after finding more
>> refinement in remarks calculation. As per suggestion from Matthias
Braun
>> and Hal Finkel regmask calculation code is same as
>> MachineRegisterInfo::isPhysRegModified() except no check of
isNoReturnDef()
>> is required. So we proposed to add a bool argument SkipNoReturnDef with
>> default value false to isPhysRegModified method so that with out
breaking
>> current use of isPhysRegModified we can reuse that code for the purpose
of
>> IPRA. The patch can be found here : http://reviews.llvm.org/D21395
>>
>> With IPRA to improve code quality, call site with local functions are
>> forced to have caller saved registers ( more improved heuristics will
be
>> implemented ) I have been experimenting this on my local machine and I
>> discovered that tail call optimization is getting affected due to this
>> optimization and some test case in test-suite fails with segmentation
fault
>> or infinite recursion due to counter value gets overwritten. Please
find
>> more details and example bug at
>> https://groups.google.com/d/msg/llvm-dev/TSoYxeMMzxM/rb9e_M2iEwAJ
>>
>> I have also tried a very simple method to handle indirect function in
>> IPRA but at higher optimization level, indirect function calls are
getting
>> converted to direct function calls so I request interested community
member
>> to guide me. We can have discussion about this on Monday morning (PDT).
>> More discussion on this can be found at here :
>> https://groups.google.com/d/msg/llvm-dev/dPk3lKwH1kU/GNfhD_jKEQAJ
>>
>>
>> Testing:
>>
>> =====>>
>> During this week I think that IPRA optimization is more stabilized
after
>> having bug fix so have run test-suite with that and also as per
suggestion
>> form Quentin Colombet I tested test-suite with only codegen order
changed
>> to bottom up on call graph.  Overall this codegen order improves
runtime
>> and compile time. I have shared results here:
>>
>>
>>
https://docs.google.com/document/d/1At3QqEWmeDEXnDVz-CGh2GDlYQR3VRz3ipIfcXoLC3c/edit?usp=sharing
>>
>>
>>
>>
https://docs.google.com/document/d/1hS-Cj3mEDqUCTKTYaJpoJpVOBk5E2wHK9XSGLowNPeM/edit?usp=sharing
>>
>> Plan for next week:
>> =============>> 1) Rebase pending patches and get the review
process completed.
>> 2) Solve tail call related bug.
>> 3) Discuss some ideas and heuristics for improving IPRA.
>> 4) Discuss how to handle indirect function call with in IPRA.
>> 5) More testing with llvm test-suite
>>
>> Sincerely,
>> Vivek
>>
>>
>> On Tue, Jun 21, 2016 at 9:26 AM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jun 21, 2016 at 1:45 AM, Matthias Braun <matze at
braunis.de>
>>> wrote:
>>>
>>>>
>>>> > On Jun 20, 2016, at 12:53 PM, Sanjoy Das via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>> >
>>>> > Hi Vivek,
>>>> >
>>>> > vivek pandya via llvm-dev wrote:
>>>> > >     int foo() {
>>>> > >     return 12;
>>>> > >     }
>>>> > >
>>>> > >     int bar(int a) {
>>>> > >     return foo() + a;
>>>> > >     }
>>>> > >
>>>> > >     int (*fp)() = 0;
>>>> > >     int (*fp1)(int) = 0;
>>>> > >
>>>> > >     int main() {
>>>> > >     fp = foo;
>>>> > >     fp();
>>>> > >     fp1 = bar;
>>>> > >     fp1(15);
>>>> > >     return 0;
>>>> > >     }
>>>> >
>>>> > IMO it is waste of time trying to do a better job at the
IPRA level on
>>>> > IR like the above ^.  LLVM should be folding the indirect
calls to
>>>> > direct calls at the IR level, and if it isn't
that's a bug in the IR
>>>> > level optimizer.
>>>> +1 from me.
>>>>
>>>> Yes at -O3 level simple indirect calls including virtual
functions are
>>> getting optimized to direct call.
>>>
>>>
>>>> The interesting cases are the non-obvious ones (assumeing
foo/bar have
>>>> the same parameters). Things gets interesting once you have
uncertainty in
>>>> the mix. The minimally interesting case would look like this:
>>>>
>>>> int main() {
>>>>     int (*fp)();
>>>>     if (rand()) {
>>>>         fp = foo;
>>>>     } else {
>>>>         fp = bar;
>>>>     }
>>>>     fp(42);
>>>> }
>>>>
>>>
>>> I tried this case and my simple hack fails to optimize it :-) .
This
>>> requires discussion on IRC.
>>>
>>> Sincerely,
>>> -Vivek
>>>
>>>
>>>> However predicting the possible targets of a call is IMO a
question of
>>>> computing a call graph datastructure and improving upon that.
We should be
>>>> sure that we discuss and implement this independently of the
register
>>>> allocation work!
>>>>
>>>> - Matthias
>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160710/738f8c54/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Jul 2016 - [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Possibly Parallel Threads