thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation [Jun 2016]

If this information is useful, please help other people find it:
Share via:

vivek pandya via llvm-dev

2016-Jun-19 11:29 UTC

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Dear Community,

Please find summary of work done during this week as follow:

Implementation:
===========
During this week we have identified a bug in IPRA due to not considering
RegMask of function calls in given machine function. The same bug on
AArch64 has been reported by Chad Rosier and more detailed description can
be found at https://llvm.org/bugs/show_bug.cgi?id=28144 . To fix this bug
RegMask calculation have been modified to consider RegMask of function call
in a Machine Function. The patch is here http://reviews.llvm.org/D21395.

AsmPrinter.cpp is modified to print call preserved registers in comments at
call site in generated assembly file. This suggestion was by Quentin
Colombet to improve readability of asm files while experimenting RegMask
and calling convention etc. This simple patch can be found here
http://reviews.llvm.org/D21490.

We have also experimented a simple improvement to IPRA by setting callee
saved registers to none for local function and we have found some
performance improvement.

Testing:
=====
After bug 28144 fix there is no runtime failures in test suite. Also due to
bug 28144 there was about 60 run time failures and total time taken for
test suite compilation was 30% more compare to with out IPRA. After bug fix
with IPRA total compile time improvement compare to without IPRA is about 4
to 8 minutes.


Study:

====
This week I study code responsible for adding spill and restore for callee
saved registers. Also studied how calling convention is defined in target
specific .td files. I studied AsmPrinter.cpp and specifically
emitComments() method which is responsible for adding comments in llvm
generated assembly files. I also studied about some linkage type in LLMV IR
like ‘internal’ which represent local function in module.


Plan for next week:

1) Submit patch related to local function optimization for review

2) Find more possible improvements

3) Get active patches committed

4) Compile large software with IPRA enabled


Sincerely,

Vivek

On Wed, Jun 15, 2016 at 8:45 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
>
>
> On Wed, Jun 15, 2016 at 8:40 AM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
>
>>
>>
>> On Wed, Jun 15, 2016 at 6:16 AM, Quentin Colombet <qcolombet at
apple.com>
>> wrote:
>>
>>> Hi Vivek,
>>>
>>> How much of the slow down on runtime comes from the different
layout of
>>> the function in the asm file? (I.e., because of the dummy scc
pass.)
>>>
>>> Hello Quentin,
>>
>> Please do not consider previous results as there was a major bug in
>> RegMask calculation due to not considering RegMasks of callee in MF
body
>> while calculating register usage information, that has been fixed now (
as
>> discussed with Matthias Braun and Mehdi Amini ) and after this bugfix I
>> have run test-suite with and without IPRA.  Yes there is runtime slow
down
>> for some test cases ranging from 1% to 64% similarly compile time slow
down
>> is ranging from 1% to 48%. The runtime performance improvement is
ranging
>> from 1% to 35% and surprisingly there is also compile time improvement
in a
>> range from 1% to 60% . I would request you to go through complete
results
>> at
>>
https://docs.google.com/document/d/1cavn-POrZdhw-rrdPXV8mSvyppvOWs2rxmLgaOnd6KE/edit?usp=sharing
>>
>> In above result baseline is IPRA and current is without IPRA. So
actually
> data with background red is actual improvement and green is regression.
> -Vivek
>
>> Also there is not extra failure due to IPRA now so in the result above
I
>> have removed failures.
>>
>> Sincerely,
>> Vivek
>>
>>
>>> Cheers,
>>> Q
>>>
>>> Le 11 juin 2016 à 21:49, vivek pandya via llvm-dev <
>>> llvm-dev at lists.llvm.org> a écrit :
>>>
>>> Dear Community,
>>>
>>> The patch for Interprocedural Register Allocation has been
committed now
>>> , thanks to Mehdi Amini for that. We would like you to play with it
and let
>>> us know your views and more importantly ideas to improve it.
>>>
>>> The test-suite run has indicated some non trivial issue that
results in
>>> run time failure of the programs, we will be investigating it more.
Here
>>> are some stats :
>>>
>>> test-suite has been tested with IPRA enabled and overall results
are not
>>> much encouraging. On average 30% increase in compile time. Many
programs
>>> have also increase in execution time ( average 20%) that is really
serious
>>> concern for us. About 60 tests have failed on run time this
indicates error
>>> in compilation. how ever 3 tests have improvement in their runtime
and that
>>> is 7% average.
>>>
>>>
>>> This week I think good thing for me to learn is to setup llvm
>>> development environment properly other wise one can end up wasting
too much
>>> time building the llvm it self.
>>>
>>> So here is brief summary:
>>> Implementation:
>>> ===========>>>
>>> The patch has been split into analysis and transformation passes.
The
>>> pass responsible for register usage propagation has been made
target
>>> independent.  A print method and command line option
-print-regusage has
>>> been added so that RegMaks details can be printed in Release builds
also,
>>> this enables lit test case to be testable in Release build too.
Other minor
>>> changes to adhere coding and naming conventions.
>>>
>>>
>>> Testing:
>>>
>>> =====>>>
>>> test-suite has been tested with IPRA enabled.
>>>
>>>
>>> Study and other:
>>>
>>> ============>>>
>>> Learned about LNT, test-suite for LLVM, Inline assembly in LLVM IR,
>>> fastcc, local functions, MCStream class. In C++ I leaned about
emplace
>>> family of methods in STL and perfect forwarding introduced in
C++11.
>>>
>>>
>>> Plan for next week:
>>>
>>> 1) Investigate issue related to functional correctness that leads
to run
>>> time failures
>>>
>>> 2) profile the compilation process to verify increase in time due
to IPRA
>>>
>>> 3) Improve IPRA by instructing codegen to not save register for
local
>>> function.
>>>
>>> 4) Make the pass emit asm comments to indicate register clobbered
by
>>> function call at call site in generated ASM file.
>>>
>>>
>>> Sincerely,
>>>
>>> Vivek
>>>
>>> On Sun, Jun 5, 2016 at 8:48 AM, vivek pandya <vivekvpandya at
gmail.com>
>>> wrote:
>>>
>>>> Dear Community,
>>>>
>>>> This week I got my patch reviewed by mentors and based on that
I have
>>>> done changes. Also we have identified a problem with
callee-saved registers
>>>> being marked as clobbered registers so we fixed that bug. I
have described
>>>> other minor changes in following section.
>>>>
>>>> It was expected to get the patch committed by end of this week
but due
>>>> to unexpected mistake I was not able to complete writing test
cases. Sorry
>>>> for that.
>>>> I had build llvm with ipra enable by default and that build
files were
>>>> on my path ! Due to that next time I tried to build llvm it was
terribly
>>>> slow  (almost 1 hour for 10% build ). I spend to much time on
fixing this
>>>> by playing around with environment variables, cmake options
etc.
>>>> But I think this is a serious concern, we need to think verify
this
>>>> time complexity other wise building a large software with IPRA
enable would
>>>> be very time consuming.
>>>>
>>>> The toughest part for this week was to get lit and FileCheck
work as
>>>> you expect them to work, specially when analysis pass prints
info on stdio
>>>> and there is also a output file generated by llc or opt
command.
>>>>
>>>> So here is brief summary :
>>>>
>>>> Implementation:
>>>> ===========>>>>
>>>> RegUsageInfoCollector is now Calling Convention aware so that
RegMask
>>>> does not mark callee saved register as clobbered register. Due
to this
>>>> register allocator can use callee saved register for caller.
>>>> PhysicalRegisterUsageInfo.cpp renamed to RegisterUsageInfo.cpp.
>>>> StringMap used in RegisterUsageInfo.cpp is replaced by DenseMap
of
>>>> GlobalVariable * to RegMask.
>>>> DummyCGSCCPass moved from TargetPassConfig.cpp to
CallGraphSCCPass.h.
>>>> Minor correction in comments, changes to adhere coding
standards of
>>>> LLVM.
>>>>
>>>> Testing:
>>>> ====>>>>
>>>> The above mentioned changes has been tested with SNU-Realtime
>>>> benchmarks.
>>>>
>>>> Studied lit and FileCheck tool and written simple test to
verify
>>>> functionality of coding.
>>>>
>>>>
>>>> Study and other:
>>>>
>>>> ===========>>>>
>>>> Studied some examples of lit compatible llvm IR with comments
to RUN
>>>> test cases, FileCheck tool syntax and how to use it with in lit
>>>> infrastructure.
>>>>
>>>> I also understand X86 calling convention in more details.
>>>>
>>>> I also studied basic concepts in llvm IR language while reading
.ll
>>>> files written for lit.
>>>>
>>>> I learned about rvalue references and move semantics introduced
in
>>>> C++11.
>>>>
>>>>
>>>> Plan for next week:
>>>>
>>>> 1) Get the patch committed along with proper tets cases.
>>>>
>>>> 2) Analyse time complexity of the approach.
>>>>
>>>> 3) Make target specific pass to CodeGen as it seems it is not
required
>>>> to be target specific.
>>>>
>>>> 4) If possible build a large application with ipra enable and
analyze
>>>> the impact.
>>>>
>>>>
>>>> Sincerely,
>>>>
>>>> Vivek
>>>>
>>>>
>>>> On Sat, May 28, 2016 at 7:31 PM, vivek pandya <vivekvpandya
at gmail.com>
>>>> wrote:
>>>>
>>>>> Dear community,
>>>>>
>>>>> This is to brief you the progress of Interprocedural
Register
>>>>> Allocation, for those who are interested to see the
progress in terms of
>>>>> code please consider http://reviews.llvm.org/D20769
>>>>> This patch contains simple infrastructure to propagate
register usage
>>>>> information of callee to caller in call graph. The code
generation order is
>>>>> changed to follow bottom up order on call graph , Thanks to
Mehdi Amini for
>>>>> the patch for the same !  I will write a blog on this very
soon.
>>>>>
>>>>> So during this week as per the schedule proposed it should
be study
>>>>> related infrastructure in LLVM and finalizing an approach
for IPRA, but
>>>>> instead I am able to implement a working (may not be fully
correct)
>>>>> prototype because I have used community bonding period to
discuss and learn
>>>>> related stuffs from the mentors and also due to patch for
CodeGen
>>>>> reordering was provided by dear mentor Mehdi Amini.
>>>>>
>>>>> So I conclude the work done during this week as follows:
>>>>> Implementation :
>>>>> ===========>>>>> Following passes have been
implemented during this week: An immutable
>>>>> pass to store competed RegMask, a  machine function pass
that iterates
>>>>> through each registers and check if it is used or not and
based on that
>>>>> details create a RegMask and a target specific machine
function pass that
>>>>> uses the RegMask created by second pass and propagates
information by
>>>>> updating call instructions RegMask. To update the RegMask
of MI ,
>>>>> setRegMask() function has been added to MachineOperand, a
command line
>>>>> option -enable-ipra and debug type -debug-only=“ipra"
has been added to
>>>>> control the optimization through llc.
>>>>>
>>>>> Testing:
>>>>> ====>>>>> The above mentioned implementation
has been tested over SNU-Real-Time
>>>>> benchmark suit
(http://www.cprover.org/goto-cc/examples/snu.html) and
>>>>> some simple programs that uses library function ( for a
library function
>>>>> register allocation is not done by LLVM so this
optimization will simply
>>>>> skip them)
>>>>>
>>>>> Study and Other:
>>>>> ============>>>>> I have learned following
things in LLVM, how it stores reg clobbering
>>>>> information? how it is used by Reg allocators through
LivePhysRegs,
>>>>> LiveRegMatrix and other related passes? How to schedule a
pass using
>>>>> TargetPassConfig and TargetMachine? What are called callee
saved registers?
>>>>> What is an Immutable Pass? Apart from that I have also
learned how to use
>>>>> phabricator to send review request. I have also read some
related
>>>>> literatures.
>>>>>
>>>>> During this week though task was to schedule the passes in
proper
>>>>> order so that dependencies of related passes are satisfied.
>>>>>
>>>>> Plan for next week:
>>>>> 1) Perform more testing and debug any known issue
>>>>> 2) Fine ture the implementation so as to eliminate any
unnecessary work
>>>>> 3) During the testing from the stats I have observed that
IPRA does
>>>>> not always improve the work of IntraProcedural register
allocators and it
>>>>> is also observer that the amount of benefit (in terms of
spilled live
>>>>> ranges ) is not deterministic. So I would like to find
reasons for this
>>>>> behavior.
>>>>> 4) Start implementing target specific pass for other
targets if review
>>>>> passes properly with no major bugs.
>>>>>
>>>>> Please provide any feedback/suggestion including for format
of this
>>>>> email.
>>>>>
>>>>> I would also like to thanks my mentors Mehdi Amini , Hal
Finkel, Quentin
>>>>> Colombet, Matthias Braun and other community members for
providing
>>>>> quick help every time when I asked ( I have got replies
even after 8 PM (
>>>>> PDT) ! ) .
>>>>>
>>>>> Sincerely,
>>>>> Vivek
>>>>>
>>>>
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160619/178770c2/attachment.html>

vivek pandya via llvm-dev

2016-Jun-19 19:24 UTC

head link

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Dear Professor,

Thanks to bring this to notice, I tried out a simple test case with
indirect function call:

int foo() {
return 12;
}

int bar(int a) {
return foo() + a;
}

int (*fp)() = 0;
int (*fp1)(int) = 0;

int main() {
fp = foo;
fp();
fp1 = bar;
fp1(15);
return 0;
}

and currently IPRA skips optimization for indirect call. But I think this
can be handled and I will inform you if I update implementation to cover
this. Currently IPRA uses Function * to hold register usage information
across the passes, so my hunch is that if  from the call instruction for
the indirect call, Function * can be derived then it should be able to
handle indirection function calls for procedures defined in a current
module.

Sincerely,
Vivek

On Sun, Jun 19, 2016 at 4:59 PM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Dear Community,
>
> Please find summary of work done during this week as follow:
>
> Implementation:
> ===========>
> During this week we have identified a bug in IPRA due to not considering
> RegMask of function calls in given machine function. The same bug on
> AArch64 has been reported by Chad Rosier and more detailed description can
> be found at https://llvm.org/bugs/show_bug.cgi?id=28144 . To fix this bug
> RegMask calculation have been modified to consider RegMask of function call
> in a Machine Function. The patch is here http://reviews.llvm.org/D21395.
>
> AsmPrinter.cpp is modified to print call preserved registers in comments
> at call site in generated assembly file. This suggestion was by Quentin
> Colombet to improve readability of asm files while experimenting RegMask
> and calling convention etc. This simple patch can be found here
> http://reviews.llvm.org/D21490.
>
> We have also experimented a simple improvement to IPRA by setting callee
> saved registers to none for local function and we have found some
> performance improvement.
>
> Testing:
> =====>
> After bug 28144 fix there is no runtime failures in test suite. Also due
> to bug 28144 there was about 60 run time failures and total time taken for
> test suite compilation was 30% more compare to with out IPRA. After bug fix
> with IPRA total compile time improvement compare to without IPRA is about 4
> to 8 minutes.
>
>
> Study:
>
> ====>
> This week I study code responsible for adding spill and restore for callee
> saved registers. Also studied how calling convention is defined in target
> specific .td files. I studied AsmPrinter.cpp and specifically
> emitComments() method which is responsible for adding comments in llvm
> generated assembly files. I also studied about some linkage type in LLMV IR
> like ‘internal’ which represent local function in module.
>
>
> Plan for next week:
>
> 1) Submit patch related to local function optimization for review
>
> 2) Find more possible improvements
>
> 3) Get active patches committed
>
> 4) Compile large software with IPRA enabled
>
>
> Sincerely,
>
> Vivek
>
> On Wed, Jun 15, 2016 at 8:45 AM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
>
>>
>>
>> On Wed, Jun 15, 2016 at 8:40 AM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 15, 2016 at 6:16 AM, Quentin Colombet <qcolombet at
apple.com>
>>> wrote:
>>>
>>>> Hi Vivek,
>>>>
>>>> How much of the slow down on runtime comes from the different
layout of
>>>> the function in the asm file? (I.e., because of the dummy scc
pass.)
>>>>
>>>> Hello Quentin,
>>>
>>> Please do not consider previous results as there was a major bug in
>>> RegMask calculation due to not considering RegMasks of callee in MF
body
>>> while calculating register usage information, that has been fixed
now ( as
>>> discussed with Matthias Braun and Mehdi Amini ) and after this
bugfix I
>>> have run test-suite with and without IPRA.  Yes there is runtime
slow down
>>> for some test cases ranging from 1% to 64% similarly compile time
slow down
>>> is ranging from 1% to 48%. The runtime performance improvement is
ranging
>>> from 1% to 35% and surprisingly there is also compile time
improvement in a
>>> range from 1% to 60% . I would request you to go through complete
results
>>> at
>>>
https://docs.google.com/document/d/1cavn-POrZdhw-rrdPXV8mSvyppvOWs2rxmLgaOnd6KE/edit?usp=sharing
>>>
>>> In above result baseline is IPRA and current is without IPRA. So
>> actually data with background red is actual improvement and green is
>> regression.
>> -Vivek
>>
>>> Also there is not extra failure due to IPRA now so in the result
above I
>>> have removed failures.
>>>
>>> Sincerely,
>>> Vivek
>>>
>>>
>>>> Cheers,
>>>> Q
>>>>
>>>> Le 11 juin 2016 à 21:49, vivek pandya via llvm-dev <
>>>> llvm-dev at lists.llvm.org> a écrit :
>>>>
>>>> Dear Community,
>>>>
>>>> The patch for Interprocedural Register Allocation has been
committed
>>>> now , thanks to Mehdi Amini for that. We would like you to play
with it and
>>>> let us know your views and more importantly ideas to improve
it.
>>>>
>>>> The test-suite run has indicated some non trivial issue that
results in
>>>> run time failure of the programs, we will be investigating it
more. Here
>>>> are some stats :
>>>>
>>>> test-suite has been tested with IPRA enabled and overall
results are
>>>> not much encouraging. On average 30% increase in compile time.
Many
>>>> programs have also increase in execution time ( average 20%)
that is really
>>>> serious concern for us. About 60 tests have failed on run time
this
>>>> indicates error in compilation. how ever 3 tests have
improvement in their
>>>> runtime and that is 7% average.
>>>>
>>>>
>>>> This week I think good thing for me to learn is to setup llvm
>>>> development environment properly other wise one can end up
wasting too much
>>>> time building the llvm it self.
>>>>
>>>> So here is brief summary:
>>>> Implementation:
>>>> ===========>>>>
>>>> The patch has been split into analysis and transformation
passes. The
>>>> pass responsible for register usage propagation has been made
target
>>>> independent.  A print method and command line option
-print-regusage has
>>>> been added so that RegMaks details can be printed in Release
builds also,
>>>> this enables lit test case to be testable in Release build too.
Other minor
>>>> changes to adhere coding and naming conventions.
>>>>
>>>>
>>>> Testing:
>>>>
>>>> =====>>>>
>>>> test-suite has been tested with IPRA enabled.
>>>>
>>>>
>>>> Study and other:
>>>>
>>>> ============>>>>
>>>> Learned about LNT, test-suite for LLVM, Inline assembly in LLVM
IR,
>>>> fastcc, local functions, MCStream class. In C++ I leaned about
emplace
>>>> family of methods in STL and perfect forwarding introduced in
C++11.
>>>>
>>>>
>>>> Plan for next week:
>>>>
>>>> 1) Investigate issue related to functional correctness that
leads to
>>>> run time failures
>>>>
>>>> 2) profile the compilation process to verify increase in time
due to
>>>> IPRA
>>>>
>>>> 3) Improve IPRA by instructing codegen to not save register for
local
>>>> function.
>>>>
>>>> 4) Make the pass emit asm comments to indicate register
clobbered by
>>>> function call at call site in generated ASM file.
>>>>
>>>>
>>>> Sincerely,
>>>>
>>>> Vivek
>>>>
>>>> On Sun, Jun 5, 2016 at 8:48 AM, vivek pandya <vivekvpandya
at gmail.com>
>>>> wrote:
>>>>
>>>>> Dear Community,
>>>>>
>>>>> This week I got my patch reviewed by mentors and based on
that I have
>>>>> done changes. Also we have identified a problem with
callee-saved registers
>>>>> being marked as clobbered registers so we fixed that bug. I
have described
>>>>> other minor changes in following section.
>>>>>
>>>>> It was expected to get the patch committed by end of this
week but due
>>>>> to unexpected mistake I was not able to complete writing
test cases. Sorry
>>>>> for that.
>>>>> I had build llvm with ipra enable by default and that build
files were
>>>>> on my path ! Due to that next time I tried to build llvm it
was terribly
>>>>> slow  (almost 1 hour for 10% build ). I spend to much time
on fixing this
>>>>> by playing around with environment variables, cmake options
etc.
>>>>> But I think this is a serious concern, we need to think
verify this
>>>>> time complexity other wise building a large software with
IPRA enable would
>>>>> be very time consuming.
>>>>>
>>>>> The toughest part for this week was to get lit and
FileCheck work as
>>>>> you expect them to work, specially when analysis pass
prints info on stdio
>>>>> and there is also a output file generated by llc or opt
command.
>>>>>
>>>>> So here is brief summary :
>>>>>
>>>>> Implementation:
>>>>> ===========>>>>>
>>>>> RegUsageInfoCollector is now Calling Convention aware so
that RegMask
>>>>> does not mark callee saved register as clobbered register.
Due to this
>>>>> register allocator can use callee saved register for
caller.
>>>>> PhysicalRegisterUsageInfo.cpp renamed to
RegisterUsageInfo.cpp.
>>>>> StringMap used in RegisterUsageInfo.cpp is replaced by
DenseMap of
>>>>> GlobalVariable * to RegMask.
>>>>> DummyCGSCCPass moved from TargetPassConfig.cpp to
CallGraphSCCPass.h.
>>>>> Minor correction in comments, changes to adhere coding
standards of
>>>>> LLVM.
>>>>>
>>>>> Testing:
>>>>> ====>>>>>
>>>>> The above mentioned changes has been tested with
SNU-Realtime
>>>>> benchmarks.
>>>>>
>>>>> Studied lit and FileCheck tool and written simple test to
verify
>>>>> functionality of coding.
>>>>>
>>>>>
>>>>> Study and other:
>>>>>
>>>>> ===========>>>>>
>>>>> Studied some examples of lit compatible llvm IR with
comments to RUN
>>>>> test cases, FileCheck tool syntax and how to use it with in
lit
>>>>> infrastructure.
>>>>>
>>>>> I also understand X86 calling convention in more details.
>>>>>
>>>>> I also studied basic concepts in llvm IR language while
reading .ll
>>>>> files written for lit.
>>>>>
>>>>> I learned about rvalue references and move semantics
introduced in
>>>>> C++11.
>>>>>
>>>>>
>>>>> Plan for next week:
>>>>>
>>>>> 1) Get the patch committed along with proper tets cases.
>>>>>
>>>>> 2) Analyse time complexity of the approach.
>>>>>
>>>>> 3) Make target specific pass to CodeGen as it seems it is
not required
>>>>> to be target specific.
>>>>>
>>>>> 4) If possible build a large application with ipra enable
and analyze
>>>>> the impact.
>>>>>
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> Vivek
>>>>>
>>>>>
>>>>> On Sat, May 28, 2016 at 7:31 PM, vivek pandya
<vivekvpandya at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear community,
>>>>>>
>>>>>> This is to brief you the progress of Interprocedural
Register
>>>>>> Allocation, for those who are interested to see the
progress in terms of
>>>>>> code please consider http://reviews.llvm.org/D20769
>>>>>> This patch contains simple infrastructure to propagate
register usage
>>>>>> information of callee to caller in call graph. The code
generation order is
>>>>>> changed to follow bottom up order on call graph ,
Thanks to Mehdi Amini for
>>>>>> the patch for the same !  I will write a blog on this
very soon.
>>>>>>
>>>>>> So during this week as per the schedule proposed it
should be study
>>>>>> related infrastructure in LLVM and finalizing an
approach for IPRA, but
>>>>>> instead I am able to implement a working (may not be
fully correct)
>>>>>> prototype because I have used community bonding period
to discuss and learn
>>>>>> related stuffs from the mentors and also due to patch
for CodeGen
>>>>>> reordering was provided by dear mentor Mehdi Amini.
>>>>>>
>>>>>> So I conclude the work done during this week as
follows:
>>>>>> Implementation :
>>>>>> ===========>>>>>> Following passes
have been implemented during this week: An immutable
>>>>>> pass to store competed RegMask, a  machine function
pass that iterates
>>>>>> through each registers and check if it is used or not
and based on that
>>>>>> details create a RegMask and a target specific machine
function pass that
>>>>>> uses the RegMask created by second pass and propagates
information by
>>>>>> updating call instructions RegMask. To update the
RegMask of MI ,
>>>>>> setRegMask() function has been added to MachineOperand,
a command line
>>>>>> option -enable-ipra and debug type
-debug-only=“ipra" has been added to
>>>>>> control the optimization through llc.
>>>>>>
>>>>>> Testing:
>>>>>> ====>>>>>> The above mentioned
implementation has been tested over SNU-Real-Time
>>>>>> benchmark suit
(http://www.cprover.org/goto-cc/examples/snu.html)
>>>>>> and some simple programs that uses library function (
for a library
>>>>>> function register allocation is not done by LLVM so
this optimization will
>>>>>> simply skip them)
>>>>>>
>>>>>> Study and Other:
>>>>>> ============>>>>>> I have learned
following things in LLVM, how it stores reg clobbering
>>>>>> information? how it is used by Reg allocators through
LivePhysRegs,
>>>>>> LiveRegMatrix and other related passes? How to schedule
a pass using
>>>>>> TargetPassConfig and TargetMachine? What are called
callee saved registers?
>>>>>> What is an Immutable Pass? Apart from that I have also
learned how to use
>>>>>> phabricator to send review request. I have also read
some related
>>>>>> literatures.
>>>>>>
>>>>>> During this week though task was to schedule the passes
in proper
>>>>>> order so that dependencies of related passes are
satisfied.
>>>>>>
>>>>>> Plan for next week:
>>>>>> 1) Perform more testing and debug any known issue
>>>>>> 2) Fine ture the implementation so as to eliminate any
unnecessary
>>>>>> work
>>>>>> 3) During the testing from the stats I have observed
that IPRA does
>>>>>> not always improve the work of IntraProcedural register
allocators and it
>>>>>> is also observer that the amount of benefit (in terms
of spilled live
>>>>>> ranges ) is not deterministic. So I would like to find
reasons for this
>>>>>> behavior.
>>>>>> 4) Start implementing target specific pass for other
targets if
>>>>>> review passes properly with no major bugs.
>>>>>>
>>>>>> Please provide any feedback/suggestion including for
format of this
>>>>>> email.
>>>>>>
>>>>>> I would also like to thanks my mentors Mehdi Amini ,
Hal Finkel, Quentin
>>>>>> Colombet, Matthias Braun and other community members
for providing
>>>>>> quick help every time when I asked ( I have got replies
even after 8 PM (
>>>>>> PDT) ! ) .
>>>>>>
>>>>>> Sincerely,
>>>>>> Vivek
>>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160620/8393fc3c/attachment.html>

vivek pandya via llvm-dev

2016-Jun-20 18:59 UTC

head link

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

On Mon, Jun 20, 2016 at 12:54 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Dear Professor,
>
> Thanks to bring this to notice, I tried out a simple test case with
> indirect function call:
>
> int foo() {
> return 12;
> }
>
> int bar(int a) {
> return foo() + a;
> }
>
> int (*fp)() = 0;
> int (*fp1)(int) = 0;
>
> int main() {
> fp = foo;
> fp();
> fp1 = bar;
> fp1(15);
> return 0;
> }
>
>I have experimented with indirect call specially which are due to use of
function pointers as shown in above example:

Following code in RegUsageInfoPropagate.cpp handles this kind of indirect
calls :

for (MachineBasicBlock &MBB : MF) {
    for (MachineInstr &MI : MBB) {
      if (!MI.isCall())
        continue;
      DEBUG(dbgs()
            << "Call Instruction Before Register Usage Info
Propagation :
\n");
      DEBUG(dbgs() << MI << "\n");

      auto UpdateRegMask = [&](const Function *F) {
        const auto *RegMask = PRUI->getRegUsageInfo(F);
        if (!RegMask)
          return;
        setRegMask(MI, &(*RegMask)[0]);
        Changed = true;
      };

      MachineOperand &Operand = MI.getOperand(0);
      if (Operand.isGlobal())
        UpdateRegMask(cast<Function>(Operand.getGlobal()));
      else if (Operand.isSymbol())
        UpdateRegMask(M->getFunction(Operand.getSymbolName()));
      else if(Operand.isReg()){
// changes starts here
        unsigned VReg = Operand.getReg();
        MachineBasicBlock::iterator CallInstIterator(&MI);
        MachineBasicBlock *MBB = MI.getParent();
        while(CallInstIterator != MBB->begin() &&
!CallInstIterator->definesRegister(VReg))
          --CallInstIterator;
        DEBUG(dbgs() << "Candidate for indirect call \n");
        if (CallInstIterator != MBB->begin()) {
          for (MachineOperand &MO : (*CallInstIterator).operands()) {
            if (MO.isGlobal()){
              UpdateRegMask(cast<Function>(MO.getGlobal()));
              break;
            }
            else if (Operand.isSymbol()) {
              UpdateRegMask(M->getFunction(MO.getSymbolName()));
              break;
            }
          }
          DEBUG(dbgs() << *CallInstIterator);
        }
      }


      DEBUG(dbgs()
            << "Call Instruction After Register Usage Info
Propagation :
\n");
      DEBUG(dbgs() << MI << "\n");
    }
  }

So I would like to have mentors' review/suggestions on this
For virtual function kind of case we have to think differently,  Is this a
valid approach to deal with indirect calls ?
Please let me know your thoughts.

-Vivek
> and currently IPRA skips optimization for indirect call. But I think this
> can be handled and I will inform you if I update implementation to cover
> this. Currently IPRA uses Function * to hold register usage information
> across the passes, so my hunch is that if  from the call instruction for
> the indirect call, Function * can be derived then it should be able to
> handle indirection function calls for procedures defined in a current
> module.
>
> Sincerely,
> Vivek
>
> On Sun, Jun 19, 2016 at 4:59 PM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
>
>> Dear Community,
>>
>> Please find summary of work done during this week as follow:
>>
>> Implementation:
>> ===========>>
>> During this week we have identified a bug in IPRA due to not
considering
>> RegMask of function calls in given machine function. The same bug on
>> AArch64 has been reported by Chad Rosier and more detailed description
can
>> be found at https://llvm.org/bugs/show_bug.cgi?id=28144 . To fix this
>> bug RegMask calculation have been modified to consider RegMask of
function
>> call in a Machine Function. The patch is here
>> http://reviews.llvm.org/D21395.
>>
>> AsmPrinter.cpp is modified to print call preserved registers in
comments
>> at call site in generated assembly file. This suggestion was by Quentin
>> Colombet to improve readability of asm files while experimenting
RegMask
>> and calling convention etc. This simple patch can be found here
>> http://reviews.llvm.org/D21490.
>>
>> We have also experimented a simple improvement to IPRA by setting
callee
>> saved registers to none for local function and we have found some
>> performance improvement.
>>
>> Testing:
>> =====>>
>> After bug 28144 fix there is no runtime failures in test suite. Also
due
>> to bug 28144 there was about 60 run time failures and total time taken
for
>> test suite compilation was 30% more compare to with out IPRA. After bug
fix
>> with IPRA total compile time improvement compare to without IPRA is
about 4
>> to 8 minutes.
>>
>>
>> Study:
>>
>> ====>>
>> This week I study code responsible for adding spill and restore for
>> callee saved registers. Also studied how calling convention is defined
in
>> target specific .td files. I studied AsmPrinter.cpp and specifically
>> emitComments() method which is responsible for adding comments in llvm
>> generated assembly files. I also studied about some linkage type in
LLMV IR
>> like ‘internal’ which represent local function in module.
>>
>>
>> Plan for next week:
>>
>> 1) Submit patch related to local function optimization for review
>>
>> 2) Find more possible improvements
>>
>> 3) Get active patches committed
>>
>> 4) Compile large software with IPRA enabled
>>
>>
>> Sincerely,
>>
>> Vivek
>>
>> On Wed, Jun 15, 2016 at 8:45 AM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 15, 2016 at 8:40 AM, vivek pandya <vivekvpandya at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 15, 2016 at 6:16 AM, Quentin Colombet <qcolombet
at apple.com>
>>>> wrote:
>>>>
>>>>> Hi Vivek,
>>>>>
>>>>> How much of the slow down on runtime comes from the
different layout
>>>>> of the function in the asm file? (I.e., because of the
dummy scc pass.)
>>>>>
>>>>> Hello Quentin,
>>>>
>>>> Please do not consider previous results as there was a major
bug in
>>>> RegMask calculation due to not considering RegMasks of callee
in MF body
>>>> while calculating register usage information, that has been
fixed now ( as
>>>> discussed with Matthias Braun and Mehdi Amini ) and after this
bugfix I
>>>> have run test-suite with and without IPRA.  Yes there is
runtime slow down
>>>> for some test cases ranging from 1% to 64% similarly compile
time slow down
>>>> is ranging from 1% to 48%. The runtime performance improvement
is ranging
>>>> from 1% to 35% and surprisingly there is also compile time
improvement in a
>>>> range from 1% to 60% . I would request you to go through
complete results
>>>> at
>>>>
https://docs.google.com/document/d/1cavn-POrZdhw-rrdPXV8mSvyppvOWs2rxmLgaOnd6KE/edit?usp=sharing
>>>>
>>>> In above result baseline is IPRA and current is without IPRA.
So
>>> actually data with background red is actual improvement and green
is
>>> regression.
>>> -Vivek
>>>
>>>> Also there is not extra failure due to IPRA now so in the
result above
>>>> I have removed failures.
>>>>
>>>> Sincerely,
>>>> Vivek
>>>>
>>>>
>>>>> Cheers,
>>>>> Q
>>>>>
>>>>> Le 11 juin 2016 à 21:49, vivek pandya via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> a écrit :
>>>>>
>>>>> Dear Community,
>>>>>
>>>>> The patch for Interprocedural Register Allocation has been
committed
>>>>> now , thanks to Mehdi Amini for that. We would like you to
play with it and
>>>>> let us know your views and more importantly ideas to
improve it.
>>>>>
>>>>> The test-suite run has indicated some non trivial issue
that results
>>>>> in run time failure of the programs, we will be
investigating it more. Here
>>>>> are some stats :
>>>>>
>>>>> test-suite has been tested with IPRA enabled and overall
results are
>>>>> not much encouraging. On average 30% increase in compile
time. Many
>>>>> programs have also increase in execution time ( average
20%) that is really
>>>>> serious concern for us. About 60 tests have failed on run
time this
>>>>> indicates error in compilation. how ever 3 tests have
improvement in their
>>>>> runtime and that is 7% average.
>>>>>
>>>>>
>>>>> This week I think good thing for me to learn is to setup
llvm
>>>>> development environment properly other wise one can end up
wasting too much
>>>>> time building the llvm it self.
>>>>>
>>>>> So here is brief summary:
>>>>> Implementation:
>>>>> ===========>>>>>
>>>>> The patch has been split into analysis and transformation
passes. The
>>>>> pass responsible for register usage propagation has been
made target
>>>>> independent.  A print method and command line option
-print-regusage has
>>>>> been added so that RegMaks details can be printed in
Release builds also,
>>>>> this enables lit test case to be testable in Release build
too. Other minor
>>>>> changes to adhere coding and naming conventions.
>>>>>
>>>>>
>>>>> Testing:
>>>>>
>>>>> =====>>>>>
>>>>> test-suite has been tested with IPRA enabled.
>>>>>
>>>>>
>>>>> Study and other:
>>>>>
>>>>> ============>>>>>
>>>>> Learned about LNT, test-suite for LLVM, Inline assembly in
LLVM IR,
>>>>> fastcc, local functions, MCStream class. In C++ I leaned
about emplace
>>>>> family of methods in STL and perfect forwarding introduced
in C++11.
>>>>>
>>>>>
>>>>> Plan for next week:
>>>>>
>>>>> 1) Investigate issue related to functional correctness that
leads to
>>>>> run time failures
>>>>>
>>>>> 2) profile the compilation process to verify increase in
time due to
>>>>> IPRA
>>>>>
>>>>> 3) Improve IPRA by instructing codegen to not save register
for local
>>>>> function.
>>>>>
>>>>> 4) Make the pass emit asm comments to indicate register
clobbered by
>>>>> function call at call site in generated ASM file.
>>>>>
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> Vivek
>>>>>
>>>>> On Sun, Jun 5, 2016 at 8:48 AM, vivek pandya
<vivekvpandya at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear Community,
>>>>>>
>>>>>> This week I got my patch reviewed by mentors and based
on that I have
>>>>>> done changes. Also we have identified a problem with
callee-saved registers
>>>>>> being marked as clobbered registers so we fixed that
bug. I have described
>>>>>> other minor changes in following section.
>>>>>>
>>>>>> It was expected to get the patch committed by end of
this week but
>>>>>> due to unexpected mistake I was not able to complete
writing test cases.
>>>>>> Sorry for that.
>>>>>> I had build llvm with ipra enable by default and that
build files
>>>>>> were on my path ! Due to that next time I tried to
build llvm it was
>>>>>> terribly slow  (almost 1 hour for 10% build ). I spend
to much time on
>>>>>> fixing this by playing around with environment
variables, cmake options etc.
>>>>>> But I think this is a serious concern, we need to think
verify this
>>>>>> time complexity other wise building a large software
with IPRA enable would
>>>>>> be very time consuming.
>>>>>>
>>>>>> The toughest part for this week was to get lit and
FileCheck work as
>>>>>> you expect them to work, specially when analysis pass
prints info on stdio
>>>>>> and there is also a output file generated by llc or opt
command.
>>>>>>
>>>>>> So here is brief summary :
>>>>>>
>>>>>> Implementation:
>>>>>> ===========>>>>>>
>>>>>> RegUsageInfoCollector is now Calling Convention aware
so that RegMask
>>>>>> does not mark callee saved register as clobbered
register. Due to this
>>>>>> register allocator can use callee saved register for
caller.
>>>>>> PhysicalRegisterUsageInfo.cpp renamed to
RegisterUsageInfo.cpp.
>>>>>> StringMap used in RegisterUsageInfo.cpp is replaced by
DenseMap of
>>>>>> GlobalVariable * to RegMask.
>>>>>> DummyCGSCCPass moved from TargetPassConfig.cpp to
CallGraphSCCPass.h.
>>>>>> Minor correction in comments, changes to adhere coding
standards of
>>>>>> LLVM.
>>>>>>
>>>>>> Testing:
>>>>>> ====>>>>>>
>>>>>> The above mentioned changes has been tested with
SNU-Realtime
>>>>>> benchmarks.
>>>>>>
>>>>>> Studied lit and FileCheck tool and written simple test
to verify
>>>>>> functionality of coding.
>>>>>>
>>>>>>
>>>>>> Study and other:
>>>>>>
>>>>>> ===========>>>>>>
>>>>>> Studied some examples of lit compatible llvm IR with
comments to RUN
>>>>>> test cases, FileCheck tool syntax and how to use it
with in lit
>>>>>> infrastructure.
>>>>>>
>>>>>> I also understand X86 calling convention in more
details.
>>>>>>
>>>>>> I also studied basic concepts in llvm IR language while
reading .ll
>>>>>> files written for lit.
>>>>>>
>>>>>> I learned about rvalue references and move semantics
introduced in
>>>>>> C++11.
>>>>>>
>>>>>>
>>>>>> Plan for next week:
>>>>>>
>>>>>> 1) Get the patch committed along with proper tets
cases.
>>>>>>
>>>>>> 2) Analyse time complexity of the approach.
>>>>>>
>>>>>> 3) Make target specific pass to CodeGen as it seems it
is not
>>>>>> required to be target specific.
>>>>>>
>>>>>> 4) If possible build a large application with ipra
enable and analyze
>>>>>> the impact.
>>>>>>
>>>>>>
>>>>>> Sincerely,
>>>>>>
>>>>>> Vivek
>>>>>>
>>>>>>
>>>>>> On Sat, May 28, 2016 at 7:31 PM, vivek pandya
<vivekvpandya at gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Dear community,
>>>>>>>
>>>>>>> This is to brief you the progress of
Interprocedural Register
>>>>>>> Allocation, for those who are interested to see the
progress in terms of
>>>>>>> code please consider http://reviews.llvm.org/D20769
>>>>>>> This patch contains simple infrastructure to
propagate register
>>>>>>> usage information of callee to caller in call
graph. The code generation
>>>>>>> order is changed to follow bottom up order on call
graph , Thanks to Mehdi
>>>>>>> Amini for the patch for the same !  I will write a
blog on this very soon.
>>>>>>>
>>>>>>> So during this week as per the schedule proposed it
should be study
>>>>>>> related infrastructure in LLVM and finalizing an
approach for IPRA, but
>>>>>>> instead I am able to implement a working (may not
be fully correct)
>>>>>>> prototype because I have used community bonding
period to discuss and learn
>>>>>>> related stuffs from the mentors and also due to
patch for CodeGen
>>>>>>> reordering was provided by dear mentor Mehdi Amini.
>>>>>>>
>>>>>>> So I conclude the work done during this week as
follows:
>>>>>>> Implementation :
>>>>>>> ===========>>>>>>> Following
passes have been implemented during this week: An
>>>>>>> immutable pass to store competed RegMask, a 
machine function pass that
>>>>>>> iterates through each registers and check if it is
used or not and based on
>>>>>>> that details create a RegMask and a target specific
machine function pass
>>>>>>> that uses the RegMask created by second pass and
propagates information by
>>>>>>> updating call instructions RegMask. To update the
RegMask of MI ,
>>>>>>> setRegMask() function has been added to
MachineOperand, a command line
>>>>>>> option -enable-ipra and debug type
-debug-only=“ipra" has been added to
>>>>>>> control the optimization through llc.
>>>>>>>
>>>>>>> Testing:
>>>>>>> ====>>>>>>> The above
mentioned implementation has been tested over
>>>>>>> SNU-Real-Time benchmark suit (
>>>>>>> http://www.cprover.org/goto-cc/examples/snu.html)
and some simple
>>>>>>> programs that uses library function ( for a library
function register
>>>>>>> allocation is not done by LLVM so this optimization
will simply skip them)
>>>>>>>
>>>>>>> Study and Other:
>>>>>>> ============>>>>>>> I have
learned following things in LLVM, how it stores reg
>>>>>>> clobbering information? how it is used by Reg
allocators through
>>>>>>> LivePhysRegs, LiveRegMatrix and other related
passes? How to schedule a
>>>>>>> pass using TargetPassConfig and TargetMachine? What
are called callee saved
>>>>>>> registers? What is an Immutable Pass? Apart from
that I have also learned
>>>>>>> how to use phabricator to send review request. I
have also read some
>>>>>>> related literatures.
>>>>>>>
>>>>>>> During this week though task was to schedule the
passes in proper
>>>>>>> order so that dependencies of related passes are
satisfied.
>>>>>>>
>>>>>>> Plan for next week:
>>>>>>> 1) Perform more testing and debug any known issue
>>>>>>> 2) Fine ture the implementation so as to eliminate
any unnecessary
>>>>>>> work
>>>>>>> 3) During the testing from the stats I have
observed that IPRA does
>>>>>>> not always improve the work of IntraProcedural
register allocators and it
>>>>>>> is also observer that the amount of benefit (in
terms of spilled live
>>>>>>> ranges ) is not deterministic. So I would like to
find reasons for this
>>>>>>> behavior.
>>>>>>> 4) Start implementing target specific pass for
other targets if
>>>>>>> review passes properly with no major bugs.
>>>>>>>
>>>>>>> Please provide any feedback/suggestion including
for format of this
>>>>>>> email.
>>>>>>>
>>>>>>> I would also like to thanks my mentors Mehdi Amini
, Hal Finkel, Quentin
>>>>>>> Colombet, Matthias Braun and other community
members for providing
>>>>>>> quick help every time when I asked ( I have got
replies even after 8 PM (
>>>>>>> PDT) ! ) .
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Vivek
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160621/40b7e952/attachment-0001.html>

llvm dev - Jun 2016 - [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation