thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation [Jul 2016]

If this information is useful, please help other people find it:
Share via:

vivek pandya via llvm-dev

2016-Jul-10 04:42 UTC

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Hello LLVM Developers,

Please feel free to send any ideas that you can think to improve current
IPRA. I will work on it and if possible I will implement that.

Please consider summary of work done during this week.

Implementation:
===========
The reviews requests has been updated to reflect the reviews.


Testing:

====
To get more benefit from IPRA I experimented it with LTO and results were
positive. For the SPASS application (one of the multi source benchmark in
test suite) execution time is reduce by 0.02s when LTO+IPRA compare to LTO
only. The current IPRA works at compile time and its optimization scope is
limited to a module so LTO produces a large module from small modules then
generates machine code for that. So it will help IPRA by providing a huge
module with very less external functions. to use IPRA with LTO one can pass
following arguments to clang : -flto -Wl,-mllvm,-enable-ipra . A more
detailed discussion can be found here
https://groups.google.com/d/topic/llvm-dev/Vkd-NOytdcA/discussion


Study:

====
Majority of time I have spent on finding new ideas to improve current IPRA.
As current approach can only see information with in current module it is
very hard to improve it further. Most of the approaches described in
literatures requirers help from a program analyzer and intra-procedural
register allocation and register allocation for the whole module is
differed until IPRA is completed. Also these approaches requires a heavy
data flow analysis so that is totally orthogonal to current approach. But
still we am able to identified two possible improvements and many thanks to
Peter Lawrence for his suggestions and questions.


   1. First improvement is to help IPRA by using __attribute__. This can be
   particularly use full when working with a library or external code which is
   written completely in assembly and a user is able to provide accurate
   register usage information. So idea is to supply regmask details for such
   external function with __attribute__ in function declaration and let IPRA
   propagate it to improve register allocation.  I will be working on this
   next week.
   2. Second improvement is to make less frequently executed function save
   every register it clobbers thus making it preserving all registers and
   propagating this information to more frequently executed callers to improve
   its register allocation. This leads us to PGO driven IPRA. More details can
   be found here.
   https://groups.google.com/d/topic/llvm-dev/jhC7L50el8k/discussion

Plan for Next Week:
==============1) Start implementing above two improvements.
2) Run test-suite with --benchmark option so that more precisely
 improvement can be calculated.

Sincerely,
Vivek

On Sun, Jul 3, 2016 at 5:43 PM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Hello LLVM Developers.
>
> This week much of my time is consumed in debugging IPRA's effect on
higher
> level optimization specifically due to not having callee saved registers. I
> think it was hard but I learned a lot and LLDB helped me a lot.
>
> Here is summary for this week:
>
> Implementation:
> ===========>
> Implemented a very simple check to prevent no callee saved registers
> optimization to functions which are recursive or may be optimized as tail
> call. A simple statistic added to count number of functions optimized for
> not having callee saved registers.
>
>
> Testing:
>
> =====>
> Debugged failing test cases due to no callee saved registers optimization.
> More details with examples can be found here
> https://groups.google.com/d/topic/llvm-dev/TSoYxeMMzxM/discussion . Now
> all test in llvm test-suite pass.
>
>
> Study:
>
> ====>
> To find some ideas to improve current IPRA I read 2 papers namely
> “Minimizing Register Usage Penalty at Procedure Calls” by Fred C. Chow and
> “Register Allocation Across Procedure and Module Boundaries” by Santhanam
> and Odnert.
>
> 1) From the first paper I like the idea of shrink wrap analysis and LLVM
> currently have this optimization but the approach is completely different.
> I have initiated a discussion for that, it can be found here
> https://groups.google.com/d/topic/llvm-dev/_mZoGUQDMGo/discussion I would
> like to talk to Quentin Colombet more about this.
>
> 2) From the second paper I like the idea of spill code motion, in this
> optimization spill due to callee saved register is pushed to less
> frequently called caller, but the approach mentioned in that paper requiems
> call frequency details and also it differs register allocation to very
> late, the optimization it self requires register usage details but it
> operates on register usage estimation done in earlier stage. This
> optimization also requires help from intra-procedural register allocators.
> I would like to have more discussion on this over IRC this Monday with my
> mentors.
>
> Plan for next week:
> =============> 1) Rebase pending patches and get the review process
completed.
> 2) Discuss how can identified ideas can be implemented with in current
> infrastructure.
> 3) Discuss how to handle indirect function call with in IPRA.
>
> Sincerely,
> Vivek
>
>
> On Sun, Jun 26, 2016 at 5:18 PM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
>
>> Hello LLVM Developers,
>>
>> Please follow summary of work done during this week.
>>
>> Implementation:
>> ===========>>
>> During this week patch for bug fix 28144 is updated after finding more
>> refinement in remarks calculation. As per suggestion from Matthias
Braun
>> and Hal Finkel regmask calculation code is same as
>> MachineRegisterInfo::isPhysRegModified() except no check of
isNoReturnDef()
>> is required. So we proposed to add a bool argument SkipNoReturnDef with
>> default value false to isPhysRegModified method so that with out
breaking
>> current use of isPhysRegModified we can reuse that code for the purpose
of
>> IPRA. The patch can be found here : http://reviews.llvm.org/D21395
>>
>> With IPRA to improve code quality, call site with local functions are
>> forced to have caller saved registers ( more improved heuristics will
be
>> implemented ) I have been experimenting this on my local machine and I
>> discovered that tail call optimization is getting affected due to this
>> optimization and some test case in test-suite fails with segmentation
fault
>> or infinite recursion due to counter value gets overwritten. Please
find
>> more details and example bug at
>> https://groups.google.com/d/msg/llvm-dev/TSoYxeMMzxM/rb9e_M2iEwAJ
>>
>> I have also tried a very simple method to handle indirect function in
>> IPRA but at higher optimization level, indirect function calls are
getting
>> converted to direct function calls so I request interested community
member
>> to guide me. We can have discussion about this on Monday morning (PDT).
>> More discussion on this can be found at here :
>> https://groups.google.com/d/msg/llvm-dev/dPk3lKwH1kU/GNfhD_jKEQAJ
>>
>>
>> Testing:
>>
>> =====>>
>> During this week I think that IPRA optimization is more stabilized
after
>> having bug fix so have run test-suite with that and also as per
suggestion
>> form Quentin Colombet I tested test-suite with only codegen order
changed
>> to bottom up on call graph.  Overall this codegen order improves
runtime
>> and compile time. I have shared results here:
>>
>>
>>
https://docs.google.com/document/d/1At3QqEWmeDEXnDVz-CGh2GDlYQR3VRz3ipIfcXoLC3c/edit?usp=sharing
>>
>>
>>
>>
https://docs.google.com/document/d/1hS-Cj3mEDqUCTKTYaJpoJpVOBk5E2wHK9XSGLowNPeM/edit?usp=sharing
>>
>> Plan for next week:
>> =============>> 1) Rebase pending patches and get the review
process completed.
>> 2) Solve tail call related bug.
>> 3) Discuss some ideas and heuristics for improving IPRA.
>> 4) Discuss how to handle indirect function call with in IPRA.
>> 5) More testing with llvm test-suite
>>
>> Sincerely,
>> Vivek
>>
>>
>> On Tue, Jun 21, 2016 at 9:26 AM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jun 21, 2016 at 1:45 AM, Matthias Braun <matze at
braunis.de>
>>> wrote:
>>>
>>>>
>>>> > On Jun 20, 2016, at 12:53 PM, Sanjoy Das via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>> >
>>>> > Hi Vivek,
>>>> >
>>>> > vivek pandya via llvm-dev wrote:
>>>> > >     int foo() {
>>>> > >     return 12;
>>>> > >     }
>>>> > >
>>>> > >     int bar(int a) {
>>>> > >     return foo() + a;
>>>> > >     }
>>>> > >
>>>> > >     int (*fp)() = 0;
>>>> > >     int (*fp1)(int) = 0;
>>>> > >
>>>> > >     int main() {
>>>> > >     fp = foo;
>>>> > >     fp();
>>>> > >     fp1 = bar;
>>>> > >     fp1(15);
>>>> > >     return 0;
>>>> > >     }
>>>> >
>>>> > IMO it is waste of time trying to do a better job at the
IPRA level on
>>>> > IR like the above ^.  LLVM should be folding the indirect
calls to
>>>> > direct calls at the IR level, and if it isn't
that's a bug in the IR
>>>> > level optimizer.
>>>> +1 from me.
>>>>
>>>> Yes at -O3 level simple indirect calls including virtual
functions are
>>> getting optimized to direct call.
>>>
>>>
>>>> The interesting cases are the non-obvious ones (assumeing
foo/bar have
>>>> the same parameters). Things gets interesting once you have
uncertainty in
>>>> the mix. The minimally interesting case would look like this:
>>>>
>>>> int main() {
>>>>     int (*fp)();
>>>>     if (rand()) {
>>>>         fp = foo;
>>>>     } else {
>>>>         fp = bar;
>>>>     }
>>>>     fp(42);
>>>> }
>>>>
>>>
>>> I tried this case and my simple hack fails to optimize it :-) .
This
>>> requires discussion on IRC.
>>>
>>> Sincerely,
>>> -Vivek
>>>
>>>
>>>> However predicting the possible targets of a call is IMO a
question of
>>>> computing a call graph datastructure and improving upon that.
We should be
>>>> sure that we discuss and implement this independently of the
register
>>>> allocation work!
>>>>
>>>> - Matthias
>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160710/738f8c54/attachment.html>

vivek pandya via llvm-dev

2016-Jul-17 15:03 UTC

head link

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Dear Community,

Sorry for being late for weekly status.

Please find summary for this week as below:

Implementation
===========
This week I have implemented support for __attribute__((regmask(“clobbered
list here”))). This currently applicable to function declaration only and
it provides user a chance to help IPRA by specifying actual register usage
by a function which is currently not declared in the module. One such case
is when functions written in pure assembly is used inside current module
because in such a case if this attribute is not present IPRA will use CC so
it will limit performance benefits from IPRA. Alternatively in this
particular case one can use preserve_all or preserve_most attribute
specified with clang to help IPRA but I believe in some case user may not
be able to describe register usage with such CC then attribute “regmask”
can help.

For this support I needed to hack clang and LLVM both. How ever it seems
that applicability of this kind of attribute is not limited to IPRA so we
have initiated discussion on mailing list
https://groups.google.com/d/topic/llvm-dev/w70_WljNCHE/discussion.

I have also implemented a patch which fixes a very subtle bug in regmask
calculation. Thanks to zan jyu Wong <zyfwong at gmail.com> for bringing
this
to notice.

For example if CL is only clobbered than CH should not be marked clobbered
but CX, RCX and ECX should be mark clobbered. Previously for each modified
register all of its aliases are marked clobbered by markRegClobbred() in
RegUsageInfoCollector.cpp but that is wrong because when CL is clobbered
then MRI::isPhysRegModified() will return true for CL, CX, ECX, RCX which
is correct behavior but then for CX, EXC, RCX we mark CH also clobbered as
CH is aliased to CX,ECX,RCX so markRegClobbred() is not required because
isPhysRegModified already take cares of proper aliasing register. A very
simple test case has been added to verify this change.


Testing

====
This week I run test-suite with —benchmark-only flag and on average 4%
improvement is noted in execution time how ever on average 5% increment is
noted in compile time.


Study

===
For PGO driven IPRA I am able to access profile summary info in LLVM pass
to decide if function is hot or cold but to implement it target
independently I also need to determine if function is hot or cold inside a
TargetFrameLowringImpl but my attempts failed I will look for other ways to
solve this problem.


Plan for next week

=============
1) Work to improve attribute regmask support as per suggestion on RFC.

2) Implement experiment PGO driven IPRA


Sincerely,

Vivek

On Sun, Jul 10, 2016 at 10:12 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Hello LLVM Developers,
>
> Please feel free to send any ideas that you can think to improve current
> IPRA. I will work on it and if possible I will implement that.
>
> Please consider summary of work done during this week.
>
> Implementation:
> ===========>
> The reviews requests has been updated to reflect the reviews.
>
>
> Testing:
>
> ====>
> To get more benefit from IPRA I experimented it with LTO and results were
> positive. For the SPASS application (one of the multi source benchmark in
> test suite) execution time is reduce by 0.02s when LTO+IPRA compare to LTO
> only. The current IPRA works at compile time and its optimization scope is
> limited to a module so LTO produces a large module from small modules then
> generates machine code for that. So it will help IPRA by providing a huge
> module with very less external functions. to use IPRA with LTO one can pass
> following arguments to clang : -flto -Wl,-mllvm,-enable-ipra . A more
> detailed discussion can be found here
> https://groups.google.com/d/topic/llvm-dev/Vkd-NOytdcA/discussion
>
>
> Study:
>
> ====>
> Majority of time I have spent on finding new ideas to improve current
> IPRA. As current approach can only see information with in current module
> it is very hard to improve it further. Most of the approaches described in
> literatures requirers help from a program analyzer and intra-procedural
> register allocation and register allocation for the whole module is
> differed until IPRA is completed. Also these approaches requires a heavy
> data flow analysis so that is totally orthogonal to current approach. But
> still we am able to identified two possible improvements and many thanks to
> Peter Lawrence for his suggestions and questions.
>
>
>    1. First improvement is to help IPRA by using __attribute__. This can
>    be particularly use full when working with a library or external code
which
>    is written completely in assembly and a user is able to provide accurate
>    register usage information. So idea is to supply regmask details for
such
>    external function with __attribute__ in function declaration and let
IPRA
>    propagate it to improve register allocation.  I will be working on this
>    next week.
>    2. Second improvement is to make less frequently executed function
>    save every register it clobbers thus making it preserving all registers
and
>    propagating this information to more frequently executed callers to
improve
>    its register allocation. This leads us to PGO driven IPRA. More details
can
>    be found here.
>    https://groups.google.com/d/topic/llvm-dev/jhC7L50el8k/discussion
>
> Plan for Next Week:
> ==============> 1) Start implementing above two improvements.
> 2) Run test-suite with --benchmark option so that more precisely
>  improvement can be calculated.
>
> Sincerely,
> Vivek
>
> On Sun, Jul 3, 2016 at 5:43 PM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
>
>> Hello LLVM Developers.
>>
>> This week much of my time is consumed in debugging IPRA's effect on
>> higher level optimization specifically due to not having callee saved
>> registers. I think it was hard but I learned a lot and LLDB helped me a
>> lot.
>>
>> Here is summary for this week:
>>
>> Implementation:
>> ===========>>
>> Implemented a very simple check to prevent no callee saved registers
>> optimization to functions which are recursive or may be optimized as
tail
>> call. A simple statistic added to count number of functions optimized
for
>> not having callee saved registers.
>>
>>
>> Testing:
>>
>> =====>>
>> Debugged failing test cases due to no callee saved registers
>> optimization. More details with examples can be found here
>> https://groups.google.com/d/topic/llvm-dev/TSoYxeMMzxM/discussion . Now
>> all test in llvm test-suite pass.
>>
>>
>> Study:
>>
>> ====>>
>> To find some ideas to improve current IPRA I read 2 papers namely
>> “Minimizing Register Usage Penalty at Procedure Calls” by Fred C. Chow
and
>> “Register Allocation Across Procedure and Module Boundaries” by
Santhanam
>> and Odnert.
>>
>> 1) From the first paper I like the idea of shrink wrap analysis and
LLVM
>> currently have this optimization but the approach is completely
different.
>> I have initiated a discussion for that, it can be found here
>> https://groups.google.com/d/topic/llvm-dev/_mZoGUQDMGo/discussion I
>> would like to talk to Quentin Colombet more about this.
>>
>> 2) From the second paper I like the idea of spill code motion, in this
>> optimization spill due to callee saved register is pushed to less
>> frequently called caller, but the approach mentioned in that paper
requiems
>> call frequency details and also it differs register allocation to very
>> late, the optimization it self requires register usage details but it
>> operates on register usage estimation done in earlier stage. This
>> optimization also requires help from intra-procedural register
allocators.
>> I would like to have more discussion on this over IRC this Monday with
my
>> mentors.
>>
>> Plan for next week:
>> =============>> 1) Rebase pending patches and get the review
process completed.
>> 2) Discuss how can identified ideas can be implemented with in current
>> infrastructure.
>> 3) Discuss how to handle indirect function call with in IPRA.
>>
>> Sincerely,
>> Vivek
>>
>>
>> On Sun, Jun 26, 2016 at 5:18 PM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>> Hello LLVM Developers,
>>>
>>> Please follow summary of work done during this week.
>>>
>>> Implementation:
>>> ===========>>>
>>> During this week patch for bug fix 28144 is updated after finding
more
>>> refinement in remarks calculation. As per suggestion from Matthias
Braun
>>> and Hal Finkel regmask calculation code is same as
>>> MachineRegisterInfo::isPhysRegModified() except no check of
isNoReturnDef()
>>> is required. So we proposed to add a bool argument SkipNoReturnDef
with
>>> default value false to isPhysRegModified method so that with out
breaking
>>> current use of isPhysRegModified we can reuse that code for the
purpose of
>>> IPRA. The patch can be found here : http://reviews.llvm.org/D21395
>>>
>>> With IPRA to improve code quality, call site with local functions
are
>>> forced to have caller saved registers ( more improved heuristics
will be
>>> implemented ) I have been experimenting this on my local machine
and I
>>> discovered that tail call optimization is getting affected due to
this
>>> optimization and some test case in test-suite fails with
segmentation fault
>>> or infinite recursion due to counter value gets overwritten. Please
find
>>> more details and example bug at
>>> https://groups.google.com/d/msg/llvm-dev/TSoYxeMMzxM/rb9e_M2iEwAJ
>>>
>>> I have also tried a very simple method to handle indirect function
in
>>> IPRA but at higher optimization level, indirect function calls are
getting
>>> converted to direct function calls so I request interested
community member
>>> to guide me. We can have discussion about this on Monday morning
(PDT).
>>> More discussion on this can be found at here :
>>> https://groups.google.com/d/msg/llvm-dev/dPk3lKwH1kU/GNfhD_jKEQAJ
>>>
>>>
>>> Testing:
>>>
>>> =====>>>
>>> During this week I think that IPRA optimization is more stabilized
after
>>> having bug fix so have run test-suite with that and also as per
suggestion
>>> form Quentin Colombet I tested test-suite with only codegen order
changed
>>> to bottom up on call graph.  Overall this codegen order improves
runtime
>>> and compile time. I have shared results here:
>>>
>>>
>>>
https://docs.google.com/document/d/1At3QqEWmeDEXnDVz-CGh2GDlYQR3VRz3ipIfcXoLC3c/edit?usp=sharing
>>>
>>>
>>>
>>>
https://docs.google.com/document/d/1hS-Cj3mEDqUCTKTYaJpoJpVOBk5E2wHK9XSGLowNPeM/edit?usp=sharing
>>>
>>> Plan for next week:
>>> =============>>> 1) Rebase pending patches and get the
review process completed.
>>> 2) Solve tail call related bug.
>>> 3) Discuss some ideas and heuristics for improving IPRA.
>>> 4) Discuss how to handle indirect function call with in IPRA.
>>> 5) More testing with llvm test-suite
>>>
>>> Sincerely,
>>> Vivek
>>>
>>>
>>> On Tue, Jun 21, 2016 at 9:26 AM, vivek pandya <vivekvpandya at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 21, 2016 at 1:45 AM, Matthias Braun <matze at
braunis.de>
>>>> wrote:
>>>>
>>>>>
>>>>> > On Jun 20, 2016, at 12:53 PM, Sanjoy Das via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>> >
>>>>> > Hi Vivek,
>>>>> >
>>>>> > vivek pandya via llvm-dev wrote:
>>>>> > >     int foo() {
>>>>> > >     return 12;
>>>>> > >     }
>>>>> > >
>>>>> > >     int bar(int a) {
>>>>> > >     return foo() + a;
>>>>> > >     }
>>>>> > >
>>>>> > >     int (*fp)() = 0;
>>>>> > >     int (*fp1)(int) = 0;
>>>>> > >
>>>>> > >     int main() {
>>>>> > >     fp = foo;
>>>>> > >     fp();
>>>>> > >     fp1 = bar;
>>>>> > >     fp1(15);
>>>>> > >     return 0;
>>>>> > >     }
>>>>> >
>>>>> > IMO it is waste of time trying to do a better job at
the IPRA level
>>>>> on
>>>>> > IR like the above ^.  LLVM should be folding the
indirect calls to
>>>>> > direct calls at the IR level, and if it isn't
that's a bug in the IR
>>>>> > level optimizer.
>>>>> +1 from me.
>>>>>
>>>>> Yes at -O3 level simple indirect calls including virtual
functions are
>>>> getting optimized to direct call.
>>>>
>>>>
>>>>> The interesting cases are the non-obvious ones (assumeing
foo/bar have
>>>>> the same parameters). Things gets interesting once you have
uncertainty in
>>>>> the mix. The minimally interesting case would look like
this:
>>>>>
>>>>> int main() {
>>>>>     int (*fp)();
>>>>>     if (rand()) {
>>>>>         fp = foo;
>>>>>     } else {
>>>>>         fp = bar;
>>>>>     }
>>>>>     fp(42);
>>>>> }
>>>>>
>>>>
>>>> I tried this case and my simple hack fails to optimize it :-) .
This
>>>> requires discussion on IRC.
>>>>
>>>> Sincerely,
>>>> -Vivek
>>>>
>>>>
>>>>> However predicting the possible targets of a call is IMO a
question of
>>>>> computing a call graph datastructure and improving upon
that. We should be
>>>>> sure that we discuss and implement this independently of
the register
>>>>> allocation work!
>>>>>
>>>>> - Matthias
>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160717/8ce2cefc/attachment-0001.html>

vivek pandya via llvm-dev

2016-Jul-25 16:42 UTC

head link

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Dear Community,

Sorry for being late for weekly status report but I was traveling back to
my college.

Please consider this week's summary as follows:

Implementation:
===========
This week I tried to get experimental PGO driven IPRA work. The idea is to
save all register in prolog and restore it in epilog for cold function so
that IPRA can propagate some free register to upper region of call graph.
For this I changed  spill callee saved regs related functions in
PrologEpilogInserter pass to pass ProfileSummaryInfo object as parameter so
that ultimately I can access it at TargetFrameLowringImpl and override
callee saved register details of default regmask. But this also required to
change function signature to in target specific code of TargetFrameLowring.

This is not sufficient when running with -g flag or function is a error
handler function because for such function at time of object code emission
debug register to LLVM register mapping is required. This mapping is not
done for each register. For example RCX is not defined.


Testing:

=====
This week I also did benchmark for some selected test cases from LLVM test
suite. I have compiler and run each program 10 times and measure the
average for compile time and execution time impact also code size
reduction. I have compile each time with O1 level. The results are good but
it is also clear that IPRA may not beneficial for every program but run
time increment is not much significant for such cases.  Please find details
at
https://docs.google.com/spreadsheets/d/1d34UcGuUK36B3AY8HN8fvgZZnF4k5KPI1snfYqIIZxU/edit?usp=sharing


Other:

===
A small patch to fix regmask calculation so that alias register are
considered properly it committed. Thanks to Matthias Braun for committing
it. The details can be found here https://reviews.llvm.org/D22400


Sincerely,

Vivek

On Sun, Jul 17, 2016 at 8:33 PM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Dear Community,
>
> Sorry for being late for weekly status.
>
> Please find summary for this week as below:
>
> Implementation
> ===========>
> This week I have implemented support for __attribute__((regmask(“clobbered
> list here”))). This currently applicable to function declaration only and
> it provides user a chance to help IPRA by specifying actual register usage
> by a function which is currently not declared in the module. One such case
> is when functions written in pure assembly is used inside current module
> because in such a case if this attribute is not present IPRA will use CC so
> it will limit performance benefits from IPRA. Alternatively in this
> particular case one can use preserve_all or preserve_most attribute
> specified with clang to help IPRA but I believe in some case user may not
> be able to describe register usage with such CC then attribute “regmask”
> can help.
>
> For this support I needed to hack clang and LLVM both. How ever it seems
> that applicability of this kind of attribute is not limited to IPRA so we
> have initiated discussion on mailing list
> https://groups.google.com/d/topic/llvm-dev/w70_WljNCHE/discussion.
>
> I have also implemented a patch which fixes a very subtle bug in regmask
> calculation. Thanks to zan jyu Wong <zyfwong at gmail.com> for
bringing this
> to notice.
>
> For example if CL is only clobbered than CH should not be marked clobbered
> but CX, RCX and ECX should be mark clobbered. Previously for each modified
> register all of its aliases are marked clobbered by markRegClobbred() in
> RegUsageInfoCollector.cpp but that is wrong because when CL is clobbered
> then MRI::isPhysRegModified() will return true for CL, CX, ECX, RCX which
> is correct behavior but then for CX, EXC, RCX we mark CH also clobbered as
> CH is aliased to CX,ECX,RCX so markRegClobbred() is not required because
> isPhysRegModified already take cares of proper aliasing register. A very
> simple test case has been added to verify this change.
>
>
> Testing
>
> ====>
> This week I run test-suite with —benchmark-only flag and on average 4%
> improvement is noted in execution time how ever on average 5% increment is
> noted in compile time.
>
>
> Study
>
> ===>
> For PGO driven IPRA I am able to access profile summary info in LLVM pass
> to decide if function is hot or cold but to implement it target
> independently I also need to determine if function is hot or cold inside a
> TargetFrameLowringImpl but my attempts failed I will look for other ways to
> solve this problem.
>
>
> Plan for next week
>
> =============>
> 1) Work to improve attribute regmask support as per suggestion on RFC.
>
> 2) Implement experiment PGO driven IPRA
>
>
> Sincerely,
>
> Vivek
>
> On Sun, Jul 10, 2016 at 10:12 AM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
>
>> Hello LLVM Developers,
>>
>> Please feel free to send any ideas that you can think to improve
current
>> IPRA. I will work on it and if possible I will implement that.
>>
>> Please consider summary of work done during this week.
>>
>> Implementation:
>> ===========>>
>> The reviews requests has been updated to reflect the reviews.
>>
>>
>> Testing:
>>
>> ====>>
>> To get more benefit from IPRA I experimented it with LTO and results
were
>> positive. For the SPASS application (one of the multi source benchmark
in
>> test suite) execution time is reduce by 0.02s when LTO+IPRA compare to
LTO
>> only. The current IPRA works at compile time and its optimization scope
is
>> limited to a module so LTO produces a large module from small modules
then
>> generates machine code for that. So it will help IPRA by providing a
huge
>> module with very less external functions. to use IPRA with LTO one can
pass
>> following arguments to clang : -flto -Wl,-mllvm,-enable-ipra . A more
>> detailed discussion can be found here
>> https://groups.google.com/d/topic/llvm-dev/Vkd-NOytdcA/discussion
>>
>>
>> Study:
>>
>> ====>>
>> Majority of time I have spent on finding new ideas to improve current
>> IPRA. As current approach can only see information with in current
module
>> it is very hard to improve it further. Most of the approaches described
in
>> literatures requirers help from a program analyzer and intra-procedural
>> register allocation and register allocation for the whole module is
>> differed until IPRA is completed. Also these approaches requires a
heavy
>> data flow analysis so that is totally orthogonal to current approach.
But
>> still we am able to identified two possible improvements and many
thanks to
>> Peter Lawrence for his suggestions and questions.
>>
>>
>>    1. First improvement is to help IPRA by using __attribute__. This
can
>>    be particularly use full when working with a library or external
code which
>>    is written completely in assembly and a user is able to provide
accurate
>>    register usage information. So idea is to supply regmask details for
such
>>    external function with __attribute__ in function declaration and let
IPRA
>>    propagate it to improve register allocation.  I will be working on
this
>>    next week.
>>    2. Second improvement is to make less frequently executed function
>>    save every register it clobbers thus making it preserving all
registers and
>>    propagating this information to more frequently executed callers to
improve
>>    its register allocation. This leads us to PGO driven IPRA. More
details can
>>    be found here.
>>    https://groups.google.com/d/topic/llvm-dev/jhC7L50el8k/discussion
>>
>> Plan for Next Week:
>> ==============>> 1) Start implementing above two improvements.
>> 2) Run test-suite with --benchmark option so that more precisely
>>  improvement can be calculated.
>>
>> Sincerely,
>> Vivek
>>
>> On Sun, Jul 3, 2016 at 5:43 PM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>> Hello LLVM Developers.
>>>
>>> This week much of my time is consumed in debugging IPRA's
effect on
>>> higher level optimization specifically due to not having callee
saved
>>> registers. I think it was hard but I learned a lot and LLDB helped
me a
>>> lot.
>>>
>>> Here is summary for this week:
>>>
>>> Implementation:
>>> ===========>>>
>>> Implemented a very simple check to prevent no callee saved
registers
>>> optimization to functions which are recursive or may be optimized
as tail
>>> call. A simple statistic added to count number of functions
optimized for
>>> not having callee saved registers.
>>>
>>>
>>> Testing:
>>>
>>> =====>>>
>>> Debugged failing test cases due to no callee saved registers
>>> optimization. More details with examples can be found here
>>> https://groups.google.com/d/topic/llvm-dev/TSoYxeMMzxM/discussion .
Now
>>> all test in llvm test-suite pass.
>>>
>>>
>>> Study:
>>>
>>> ====>>>
>>> To find some ideas to improve current IPRA I read 2 papers namely
>>> “Minimizing Register Usage Penalty at Procedure Calls” by Fred C.
Chow and
>>> “Register Allocation Across Procedure and Module Boundaries” by
Santhanam
>>> and Odnert.
>>>
>>> 1) From the first paper I like the idea of shrink wrap analysis and
LLVM
>>> currently have this optimization but the approach is completely
different.
>>> I have initiated a discussion for that, it can be found here
>>> https://groups.google.com/d/topic/llvm-dev/_mZoGUQDMGo/discussion I
>>> would like to talk to Quentin Colombet more about this.
>>>
>>> 2) From the second paper I like the idea of spill code motion, in
this
>>> optimization spill due to callee saved register is pushed to less
>>> frequently called caller, but the approach mentioned in that paper
requiems
>>> call frequency details and also it differs register allocation to
very
>>> late, the optimization it self requires register usage details but
it
>>> operates on register usage estimation done in earlier stage. This
>>> optimization also requires help from intra-procedural register
allocators.
>>> I would like to have more discussion on this over IRC this Monday
with my
>>> mentors.
>>>
>>> Plan for next week:
>>> =============>>> 1) Rebase pending patches and get the
review process completed.
>>> 2) Discuss how can identified ideas can be implemented with in
current
>>> infrastructure.
>>> 3) Discuss how to handle indirect function call with in IPRA.
>>>
>>> Sincerely,
>>> Vivek
>>>
>>>
>>> On Sun, Jun 26, 2016 at 5:18 PM, vivek pandya <vivekvpandya at
gmail.com>
>>> wrote:
>>>
>>>> Hello LLVM Developers,
>>>>
>>>> Please follow summary of work done during this week.
>>>>
>>>> Implementation:
>>>> ===========>>>>
>>>> During this week patch for bug fix 28144 is updated after
finding more
>>>> refinement in remarks calculation. As per suggestion from
Matthias Braun
>>>> and Hal Finkel regmask calculation code is same as
>>>> MachineRegisterInfo::isPhysRegModified() except no check of
isNoReturnDef()
>>>> is required. So we proposed to add a bool argument
SkipNoReturnDef with
>>>> default value false to isPhysRegModified method so that with
out breaking
>>>> current use of isPhysRegModified we can reuse that code for the
purpose of
>>>> IPRA. The patch can be found here :
http://reviews.llvm.org/D21395
>>>>
>>>> With IPRA to improve code quality, call site with local
functions are
>>>> forced to have caller saved registers ( more improved
heuristics will be
>>>> implemented ) I have been experimenting this on my local
machine and I
>>>> discovered that tail call optimization is getting affected due
to this
>>>> optimization and some test case in test-suite fails with
segmentation fault
>>>> or infinite recursion due to counter value gets overwritten.
Please find
>>>> more details and example bug at
>>>>
https://groups.google.com/d/msg/llvm-dev/TSoYxeMMzxM/rb9e_M2iEwAJ
>>>>
>>>> I have also tried a very simple method to handle indirect
function in
>>>> IPRA but at higher optimization level, indirect function calls
are getting
>>>> converted to direct function calls so I request interested
community member
>>>> to guide me. We can have discussion about this on Monday
morning (PDT).
>>>> More discussion on this can be found at here :
>>>>
https://groups.google.com/d/msg/llvm-dev/dPk3lKwH1kU/GNfhD_jKEQAJ
>>>>
>>>>
>>>> Testing:
>>>>
>>>> =====>>>>
>>>> During this week I think that IPRA optimization is more
stabilized
>>>> after having bug fix so have run test-suite with that and also
as per
>>>> suggestion form Quentin Colombet I tested test-suite with only
codegen
>>>> order changed to bottom up on call graph.  Overall this codegen
order
>>>> improves runtime and compile time. I have shared results here:
>>>>
>>>>
>>>>
https://docs.google.com/document/d/1At3QqEWmeDEXnDVz-CGh2GDlYQR3VRz3ipIfcXoLC3c/edit?usp=sharing
>>>>
>>>>
>>>>
>>>>
https://docs.google.com/document/d/1hS-Cj3mEDqUCTKTYaJpoJpVOBk5E2wHK9XSGLowNPeM/edit?usp=sharing
>>>>
>>>> Plan for next week:
>>>> =============>>>> 1) Rebase pending patches and get
the review process completed.
>>>> 2) Solve tail call related bug.
>>>> 3) Discuss some ideas and heuristics for improving IPRA.
>>>> 4) Discuss how to handle indirect function call with in IPRA.
>>>> 5) More testing with llvm test-suite
>>>>
>>>> Sincerely,
>>>> Vivek
>>>>
>>>>
>>>> On Tue, Jun 21, 2016 at 9:26 AM, vivek pandya <vivekvpandya
at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 21, 2016 at 1:45 AM, Matthias Braun <matze
at braunis.de>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> > On Jun 20, 2016, at 12:53 PM, Sanjoy Das via
llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>> >
>>>>>> > Hi Vivek,
>>>>>> >
>>>>>> > vivek pandya via llvm-dev wrote:
>>>>>> > >     int foo() {
>>>>>> > >     return 12;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     int bar(int a) {
>>>>>> > >     return foo() + a;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     int (*fp)() = 0;
>>>>>> > >     int (*fp1)(int) = 0;
>>>>>> > >
>>>>>> > >     int main() {
>>>>>> > >     fp = foo;
>>>>>> > >     fp();
>>>>>> > >     fp1 = bar;
>>>>>> > >     fp1(15);
>>>>>> > >     return 0;
>>>>>> > >     }
>>>>>> >
>>>>>> > IMO it is waste of time trying to do a better job
at the IPRA level
>>>>>> on
>>>>>> > IR like the above ^.  LLVM should be folding the
indirect calls to
>>>>>> > direct calls at the IR level, and if it isn't
that's a bug in the IR
>>>>>> > level optimizer.
>>>>>> +1 from me.
>>>>>>
>>>>>> Yes at -O3 level simple indirect calls including
virtual functions
>>>>> are getting optimized to direct call.
>>>>>
>>>>>
>>>>>> The interesting cases are the non-obvious ones
(assumeing foo/bar
>>>>>> have the same parameters). Things gets interesting once
you have
>>>>>> uncertainty in the mix. The minimally interesting case
would look like this:
>>>>>>
>>>>>> int main() {
>>>>>>     int (*fp)();
>>>>>>     if (rand()) {
>>>>>>         fp = foo;
>>>>>>     } else {
>>>>>>         fp = bar;
>>>>>>     }
>>>>>>     fp(42);
>>>>>> }
>>>>>>
>>>>>
>>>>> I tried this case and my simple hack fails to optimize it
:-) . This
>>>>> requires discussion on IRC.
>>>>>
>>>>> Sincerely,
>>>>> -Vivek
>>>>>
>>>>>
>>>>>> However predicting the possible targets of a call is
IMO a question
>>>>>> of computing a call graph datastructure and improving
upon that. We should
>>>>>> be sure that we discuss and implement this
independently of the register
>>>>>> allocation work!
>>>>>>
>>>>>> - Matthias
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/cea77d45/attachment-0001.html>

llvm dev - Jul 2016 - [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation