thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation [Jun 2016]

If this information is useful, please help other people find it:
Share via:

vivek pandya via llvm-dev

2016-Jun-20 18:59 UTC

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

On Mon, Jun 20, 2016 at 12:54 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Dear Professor,
>
> Thanks to bring this to notice, I tried out a simple test case with
> indirect function call:
>
> int foo() {
> return 12;
> }
>
> int bar(int a) {
> return foo() + a;
> }
>
> int (*fp)() = 0;
> int (*fp1)(int) = 0;
>
> int main() {
> fp = foo;
> fp();
> fp1 = bar;
> fp1(15);
> return 0;
> }
>
>I have experimented with indirect call specially which are due to use of
function pointers as shown in above example:

Following code in RegUsageInfoPropagate.cpp handles this kind of indirect
calls :

for (MachineBasicBlock &MBB : MF) {
    for (MachineInstr &MI : MBB) {
      if (!MI.isCall())
        continue;
      DEBUG(dbgs()
            << "Call Instruction Before Register Usage Info
Propagation :
\n");
      DEBUG(dbgs() << MI << "\n");

      auto UpdateRegMask = [&](const Function *F) {
        const auto *RegMask = PRUI->getRegUsageInfo(F);
        if (!RegMask)
          return;
        setRegMask(MI, &(*RegMask)[0]);
        Changed = true;
      };

      MachineOperand &Operand = MI.getOperand(0);
      if (Operand.isGlobal())
        UpdateRegMask(cast<Function>(Operand.getGlobal()));
      else if (Operand.isSymbol())
        UpdateRegMask(M->getFunction(Operand.getSymbolName()));
      else if(Operand.isReg()){
// changes starts here
        unsigned VReg = Operand.getReg();
        MachineBasicBlock::iterator CallInstIterator(&MI);
        MachineBasicBlock *MBB = MI.getParent();
        while(CallInstIterator != MBB->begin() &&
!CallInstIterator->definesRegister(VReg))
          --CallInstIterator;
        DEBUG(dbgs() << "Candidate for indirect call \n");
        if (CallInstIterator != MBB->begin()) {
          for (MachineOperand &MO : (*CallInstIterator).operands()) {
            if (MO.isGlobal()){
              UpdateRegMask(cast<Function>(MO.getGlobal()));
              break;
            }
            else if (Operand.isSymbol()) {
              UpdateRegMask(M->getFunction(MO.getSymbolName()));
              break;
            }
          }
          DEBUG(dbgs() << *CallInstIterator);
        }
      }


      DEBUG(dbgs()
            << "Call Instruction After Register Usage Info
Propagation :
\n");
      DEBUG(dbgs() << MI << "\n");
    }
  }

So I would like to have mentors' review/suggestions on this
For virtual function kind of case we have to think differently,  Is this a
valid approach to deal with indirect calls ?
Please let me know your thoughts.

-Vivek
> and currently IPRA skips optimization for indirect call. But I think this
> can be handled and I will inform you if I update implementation to cover
> this. Currently IPRA uses Function * to hold register usage information
> across the passes, so my hunch is that if  from the call instruction for
> the indirect call, Function * can be derived then it should be able to
> handle indirection function calls for procedures defined in a current
> module.
>
> Sincerely,
> Vivek
>
> On Sun, Jun 19, 2016 at 4:59 PM, vivek pandya <vivekvpandya at
gmail.com>
> wrote:
>
>> Dear Community,
>>
>> Please find summary of work done during this week as follow:
>>
>> Implementation:
>> ===========>>
>> During this week we have identified a bug in IPRA due to not
considering
>> RegMask of function calls in given machine function. The same bug on
>> AArch64 has been reported by Chad Rosier and more detailed description
can
>> be found at https://llvm.org/bugs/show_bug.cgi?id=28144 . To fix this
>> bug RegMask calculation have been modified to consider RegMask of
function
>> call in a Machine Function. The patch is here
>> http://reviews.llvm.org/D21395.
>>
>> AsmPrinter.cpp is modified to print call preserved registers in
comments
>> at call site in generated assembly file. This suggestion was by Quentin
>> Colombet to improve readability of asm files while experimenting
RegMask
>> and calling convention etc. This simple patch can be found here
>> http://reviews.llvm.org/D21490.
>>
>> We have also experimented a simple improvement to IPRA by setting
callee
>> saved registers to none for local function and we have found some
>> performance improvement.
>>
>> Testing:
>> =====>>
>> After bug 28144 fix there is no runtime failures in test suite. Also
due
>> to bug 28144 there was about 60 run time failures and total time taken
for
>> test suite compilation was 30% more compare to with out IPRA. After bug
fix
>> with IPRA total compile time improvement compare to without IPRA is
about 4
>> to 8 minutes.
>>
>>
>> Study:
>>
>> ====>>
>> This week I study code responsible for adding spill and restore for
>> callee saved registers. Also studied how calling convention is defined
in
>> target specific .td files. I studied AsmPrinter.cpp and specifically
>> emitComments() method which is responsible for adding comments in llvm
>> generated assembly files. I also studied about some linkage type in
LLMV IR
>> like ‘internal’ which represent local function in module.
>>
>>
>> Plan for next week:
>>
>> 1) Submit patch related to local function optimization for review
>>
>> 2) Find more possible improvements
>>
>> 3) Get active patches committed
>>
>> 4) Compile large software with IPRA enabled
>>
>>
>> Sincerely,
>>
>> Vivek
>>
>> On Wed, Jun 15, 2016 at 8:45 AM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 15, 2016 at 8:40 AM, vivek pandya <vivekvpandya at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 15, 2016 at 6:16 AM, Quentin Colombet <qcolombet
at apple.com>
>>>> wrote:
>>>>
>>>>> Hi Vivek,
>>>>>
>>>>> How much of the slow down on runtime comes from the
different layout
>>>>> of the function in the asm file? (I.e., because of the
dummy scc pass.)
>>>>>
>>>>> Hello Quentin,
>>>>
>>>> Please do not consider previous results as there was a major
bug in
>>>> RegMask calculation due to not considering RegMasks of callee
in MF body
>>>> while calculating register usage information, that has been
fixed now ( as
>>>> discussed with Matthias Braun and Mehdi Amini ) and after this
bugfix I
>>>> have run test-suite with and without IPRA.  Yes there is
runtime slow down
>>>> for some test cases ranging from 1% to 64% similarly compile
time slow down
>>>> is ranging from 1% to 48%. The runtime performance improvement
is ranging
>>>> from 1% to 35% and surprisingly there is also compile time
improvement in a
>>>> range from 1% to 60% . I would request you to go through
complete results
>>>> at
>>>>
https://docs.google.com/document/d/1cavn-POrZdhw-rrdPXV8mSvyppvOWs2rxmLgaOnd6KE/edit?usp=sharing
>>>>
>>>> In above result baseline is IPRA and current is without IPRA.
So
>>> actually data with background red is actual improvement and green
is
>>> regression.
>>> -Vivek
>>>
>>>> Also there is not extra failure due to IPRA now so in the
result above
>>>> I have removed failures.
>>>>
>>>> Sincerely,
>>>> Vivek
>>>>
>>>>
>>>>> Cheers,
>>>>> Q
>>>>>
>>>>> Le 11 juin 2016 à 21:49, vivek pandya via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> a écrit :
>>>>>
>>>>> Dear Community,
>>>>>
>>>>> The patch for Interprocedural Register Allocation has been
committed
>>>>> now , thanks to Mehdi Amini for that. We would like you to
play with it and
>>>>> let us know your views and more importantly ideas to
improve it.
>>>>>
>>>>> The test-suite run has indicated some non trivial issue
that results
>>>>> in run time failure of the programs, we will be
investigating it more. Here
>>>>> are some stats :
>>>>>
>>>>> test-suite has been tested with IPRA enabled and overall
results are
>>>>> not much encouraging. On average 30% increase in compile
time. Many
>>>>> programs have also increase in execution time ( average
20%) that is really
>>>>> serious concern for us. About 60 tests have failed on run
time this
>>>>> indicates error in compilation. how ever 3 tests have
improvement in their
>>>>> runtime and that is 7% average.
>>>>>
>>>>>
>>>>> This week I think good thing for me to learn is to setup
llvm
>>>>> development environment properly other wise one can end up
wasting too much
>>>>> time building the llvm it self.
>>>>>
>>>>> So here is brief summary:
>>>>> Implementation:
>>>>> ===========>>>>>
>>>>> The patch has been split into analysis and transformation
passes. The
>>>>> pass responsible for register usage propagation has been
made target
>>>>> independent.  A print method and command line option
-print-regusage has
>>>>> been added so that RegMaks details can be printed in
Release builds also,
>>>>> this enables lit test case to be testable in Release build
too. Other minor
>>>>> changes to adhere coding and naming conventions.
>>>>>
>>>>>
>>>>> Testing:
>>>>>
>>>>> =====>>>>>
>>>>> test-suite has been tested with IPRA enabled.
>>>>>
>>>>>
>>>>> Study and other:
>>>>>
>>>>> ============>>>>>
>>>>> Learned about LNT, test-suite for LLVM, Inline assembly in
LLVM IR,
>>>>> fastcc, local functions, MCStream class. In C++ I leaned
about emplace
>>>>> family of methods in STL and perfect forwarding introduced
in C++11.
>>>>>
>>>>>
>>>>> Plan for next week:
>>>>>
>>>>> 1) Investigate issue related to functional correctness that
leads to
>>>>> run time failures
>>>>>
>>>>> 2) profile the compilation process to verify increase in
time due to
>>>>> IPRA
>>>>>
>>>>> 3) Improve IPRA by instructing codegen to not save register
for local
>>>>> function.
>>>>>
>>>>> 4) Make the pass emit asm comments to indicate register
clobbered by
>>>>> function call at call site in generated ASM file.
>>>>>
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> Vivek
>>>>>
>>>>> On Sun, Jun 5, 2016 at 8:48 AM, vivek pandya
<vivekvpandya at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear Community,
>>>>>>
>>>>>> This week I got my patch reviewed by mentors and based
on that I have
>>>>>> done changes. Also we have identified a problem with
callee-saved registers
>>>>>> being marked as clobbered registers so we fixed that
bug. I have described
>>>>>> other minor changes in following section.
>>>>>>
>>>>>> It was expected to get the patch committed by end of
this week but
>>>>>> due to unexpected mistake I was not able to complete
writing test cases.
>>>>>> Sorry for that.
>>>>>> I had build llvm with ipra enable by default and that
build files
>>>>>> were on my path ! Due to that next time I tried to
build llvm it was
>>>>>> terribly slow  (almost 1 hour for 10% build ). I spend
to much time on
>>>>>> fixing this by playing around with environment
variables, cmake options etc.
>>>>>> But I think this is a serious concern, we need to think
verify this
>>>>>> time complexity other wise building a large software
with IPRA enable would
>>>>>> be very time consuming.
>>>>>>
>>>>>> The toughest part for this week was to get lit and
FileCheck work as
>>>>>> you expect them to work, specially when analysis pass
prints info on stdio
>>>>>> and there is also a output file generated by llc or opt
command.
>>>>>>
>>>>>> So here is brief summary :
>>>>>>
>>>>>> Implementation:
>>>>>> ===========>>>>>>
>>>>>> RegUsageInfoCollector is now Calling Convention aware
so that RegMask
>>>>>> does not mark callee saved register as clobbered
register. Due to this
>>>>>> register allocator can use callee saved register for
caller.
>>>>>> PhysicalRegisterUsageInfo.cpp renamed to
RegisterUsageInfo.cpp.
>>>>>> StringMap used in RegisterUsageInfo.cpp is replaced by
DenseMap of
>>>>>> GlobalVariable * to RegMask.
>>>>>> DummyCGSCCPass moved from TargetPassConfig.cpp to
CallGraphSCCPass.h.
>>>>>> Minor correction in comments, changes to adhere coding
standards of
>>>>>> LLVM.
>>>>>>
>>>>>> Testing:
>>>>>> ====>>>>>>
>>>>>> The above mentioned changes has been tested with
SNU-Realtime
>>>>>> benchmarks.
>>>>>>
>>>>>> Studied lit and FileCheck tool and written simple test
to verify
>>>>>> functionality of coding.
>>>>>>
>>>>>>
>>>>>> Study and other:
>>>>>>
>>>>>> ===========>>>>>>
>>>>>> Studied some examples of lit compatible llvm IR with
comments to RUN
>>>>>> test cases, FileCheck tool syntax and how to use it
with in lit
>>>>>> infrastructure.
>>>>>>
>>>>>> I also understand X86 calling convention in more
details.
>>>>>>
>>>>>> I also studied basic concepts in llvm IR language while
reading .ll
>>>>>> files written for lit.
>>>>>>
>>>>>> I learned about rvalue references and move semantics
introduced in
>>>>>> C++11.
>>>>>>
>>>>>>
>>>>>> Plan for next week:
>>>>>>
>>>>>> 1) Get the patch committed along with proper tets
cases.
>>>>>>
>>>>>> 2) Analyse time complexity of the approach.
>>>>>>
>>>>>> 3) Make target specific pass to CodeGen as it seems it
is not
>>>>>> required to be target specific.
>>>>>>
>>>>>> 4) If possible build a large application with ipra
enable and analyze
>>>>>> the impact.
>>>>>>
>>>>>>
>>>>>> Sincerely,
>>>>>>
>>>>>> Vivek
>>>>>>
>>>>>>
>>>>>> On Sat, May 28, 2016 at 7:31 PM, vivek pandya
<vivekvpandya at gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Dear community,
>>>>>>>
>>>>>>> This is to brief you the progress of
Interprocedural Register
>>>>>>> Allocation, for those who are interested to see the
progress in terms of
>>>>>>> code please consider http://reviews.llvm.org/D20769
>>>>>>> This patch contains simple infrastructure to
propagate register
>>>>>>> usage information of callee to caller in call
graph. The code generation
>>>>>>> order is changed to follow bottom up order on call
graph , Thanks to Mehdi
>>>>>>> Amini for the patch for the same !  I will write a
blog on this very soon.
>>>>>>>
>>>>>>> So during this week as per the schedule proposed it
should be study
>>>>>>> related infrastructure in LLVM and finalizing an
approach for IPRA, but
>>>>>>> instead I am able to implement a working (may not
be fully correct)
>>>>>>> prototype because I have used community bonding
period to discuss and learn
>>>>>>> related stuffs from the mentors and also due to
patch for CodeGen
>>>>>>> reordering was provided by dear mentor Mehdi Amini.
>>>>>>>
>>>>>>> So I conclude the work done during this week as
follows:
>>>>>>> Implementation :
>>>>>>> ===========>>>>>>> Following
passes have been implemented during this week: An
>>>>>>> immutable pass to store competed RegMask, a 
machine function pass that
>>>>>>> iterates through each registers and check if it is
used or not and based on
>>>>>>> that details create a RegMask and a target specific
machine function pass
>>>>>>> that uses the RegMask created by second pass and
propagates information by
>>>>>>> updating call instructions RegMask. To update the
RegMask of MI ,
>>>>>>> setRegMask() function has been added to
MachineOperand, a command line
>>>>>>> option -enable-ipra and debug type
-debug-only=“ipra" has been added to
>>>>>>> control the optimization through llc.
>>>>>>>
>>>>>>> Testing:
>>>>>>> ====>>>>>>> The above
mentioned implementation has been tested over
>>>>>>> SNU-Real-Time benchmark suit (
>>>>>>> http://www.cprover.org/goto-cc/examples/snu.html)
and some simple
>>>>>>> programs that uses library function ( for a library
function register
>>>>>>> allocation is not done by LLVM so this optimization
will simply skip them)
>>>>>>>
>>>>>>> Study and Other:
>>>>>>> ============>>>>>>> I have
learned following things in LLVM, how it stores reg
>>>>>>> clobbering information? how it is used by Reg
allocators through
>>>>>>> LivePhysRegs, LiveRegMatrix and other related
passes? How to schedule a
>>>>>>> pass using TargetPassConfig and TargetMachine? What
are called callee saved
>>>>>>> registers? What is an Immutable Pass? Apart from
that I have also learned
>>>>>>> how to use phabricator to send review request. I
have also read some
>>>>>>> related literatures.
>>>>>>>
>>>>>>> During this week though task was to schedule the
passes in proper
>>>>>>> order so that dependencies of related passes are
satisfied.
>>>>>>>
>>>>>>> Plan for next week:
>>>>>>> 1) Perform more testing and debug any known issue
>>>>>>> 2) Fine ture the implementation so as to eliminate
any unnecessary
>>>>>>> work
>>>>>>> 3) During the testing from the stats I have
observed that IPRA does
>>>>>>> not always improve the work of IntraProcedural
register allocators and it
>>>>>>> is also observer that the amount of benefit (in
terms of spilled live
>>>>>>> ranges ) is not deterministic. So I would like to
find reasons for this
>>>>>>> behavior.
>>>>>>> 4) Start implementing target specific pass for
other targets if
>>>>>>> review passes properly with no major bugs.
>>>>>>>
>>>>>>> Please provide any feedback/suggestion including
for format of this
>>>>>>> email.
>>>>>>>
>>>>>>> I would also like to thanks my mentors Mehdi Amini
, Hal Finkel, Quentin
>>>>>>> Colombet, Matthias Braun and other community
members for providing
>>>>>>> quick help every time when I asked ( I have got
replies even after 8 PM (
>>>>>>> PDT) ! ) .
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Vivek
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160621/40b7e952/attachment-0001.html>

Sanjoy Das via llvm-dev

2016-Jun-20 19:53 UTC

head link

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Hi Vivek,

vivek pandya via llvm-dev wrote:
 >     int foo() {
 >     return 12;
 >     }
 >
 >     int bar(int a) {
 >     return foo() + a;
 >     }
 >
 >     int (*fp)() = 0;
 >     int (*fp1)(int) = 0;
 >
 >     int main() {
 >     fp = foo;
 >     fp();
 >     fp1 = bar;
 >     fp1(15);
 >     return 0;
 >     }

IMO it is waste of time trying to do a better job at the IPRA level on
IR like the above ^.  LLVM should be folding the indirect calls to
direct calls at the IR level, and if it isn't that's a bug in the IR
level optimizer.

The interesting cases are when you have a call like:

   fnptr target = object->callback;
   target(foo, bar);

and the IR level optimizer has failed to optimize the indirect call to
`target` to a direct call.

-- Sanjoy

Matthias Braun via llvm-dev

2016-Jun-20 20:15 UTC

head link

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

> On Jun 20, 2016, at 12:53 PM, Sanjoy Das via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi Vivek,
> 
> vivek pandya via llvm-dev wrote:
> >     int foo() {
> >     return 12;
> >     }
> >
> >     int bar(int a) {
> >     return foo() + a;
> >     }
> >
> >     int (*fp)() = 0;
> >     int (*fp1)(int) = 0;
> >
> >     int main() {
> >     fp = foo;
> >     fp();
> >     fp1 = bar;
> >     fp1(15);
> >     return 0;
> >     }
> 
> IMO it is waste of time trying to do a better job at the IPRA level on
> IR like the above ^.  LLVM should be folding the indirect calls to
> direct calls at the IR level, and if it isn't that's a bug in the
IR
> level optimizer.+1 from me.

The interesting cases are the non-obvious ones (assumeing foo/bar have the same
parameters). Things gets interesting once you have uncertainty in the mix. The
minimally interesting case would look like this:

int main() {
    int (*fp)();
    if (rand()) {
        fp = foo;
    } else {
        fp = bar;
    }
    fp(42);
}

However predicting the possible targets of a call is IMO a question of computing
a call graph datastructure and improving upon that. We should be sure that we
discuss and implement this independently of the register allocation work!

- Matthias

llvm dev - Jun 2016 - [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation