vivek pandya via llvm-dev
2016-Jun-20 18:59 UTC
[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation
On Mon, Jun 20, 2016 at 12:54 AM, vivek pandya <vivekvpandya at gmail.com> wrote:> Dear Professor, > > Thanks to bring this to notice, I tried out a simple test case with > indirect function call: > > int foo() { > return 12; > } > > int bar(int a) { > return foo() + a; > } > > int (*fp)() = 0; > int (*fp1)(int) = 0; > > int main() { > fp = foo; > fp(); > fp1 = bar; > fp1(15); > return 0; > } > >I have experimented with indirect call specially which are due to use of function pointers as shown in above example: Following code in RegUsageInfoPropagate.cpp handles this kind of indirect calls : for (MachineBasicBlock &MBB : MF) { for (MachineInstr &MI : MBB) { if (!MI.isCall()) continue; DEBUG(dbgs() << "Call Instruction Before Register Usage Info Propagation : \n"); DEBUG(dbgs() << MI << "\n"); auto UpdateRegMask = [&](const Function *F) { const auto *RegMask = PRUI->getRegUsageInfo(F); if (!RegMask) return; setRegMask(MI, &(*RegMask)[0]); Changed = true; }; MachineOperand &Operand = MI.getOperand(0); if (Operand.isGlobal()) UpdateRegMask(cast<Function>(Operand.getGlobal())); else if (Operand.isSymbol()) UpdateRegMask(M->getFunction(Operand.getSymbolName())); else if(Operand.isReg()){ // changes starts here unsigned VReg = Operand.getReg(); MachineBasicBlock::iterator CallInstIterator(&MI); MachineBasicBlock *MBB = MI.getParent(); while(CallInstIterator != MBB->begin() && !CallInstIterator->definesRegister(VReg)) --CallInstIterator; DEBUG(dbgs() << "Candidate for indirect call \n"); if (CallInstIterator != MBB->begin()) { for (MachineOperand &MO : (*CallInstIterator).operands()) { if (MO.isGlobal()){ UpdateRegMask(cast<Function>(MO.getGlobal())); break; } else if (Operand.isSymbol()) { UpdateRegMask(M->getFunction(MO.getSymbolName())); break; } } DEBUG(dbgs() << *CallInstIterator); } } DEBUG(dbgs() << "Call Instruction After Register Usage Info Propagation : \n"); DEBUG(dbgs() << MI << "\n"); } } So I would like to have mentors' review/suggestions on this For virtual function kind of case we have to think differently, Is this a valid approach to deal with indirect calls ? Please let me know your thoughts. -Vivek> and currently IPRA skips optimization for indirect call. But I think this > can be handled and I will inform you if I update implementation to cover > this. Currently IPRA uses Function * to hold register usage information > across the passes, so my hunch is that if from the call instruction for > the indirect call, Function * can be derived then it should be able to > handle indirection function calls for procedures defined in a current > module. > > Sincerely, > Vivek > > On Sun, Jun 19, 2016 at 4:59 PM, vivek pandya <vivekvpandya at gmail.com> > wrote: > >> Dear Community, >> >> Please find summary of work done during this week as follow: >> >> Implementation: >> ===========>> >> During this week we have identified a bug in IPRA due to not considering >> RegMask of function calls in given machine function. The same bug on >> AArch64 has been reported by Chad Rosier and more detailed description can >> be found at https://llvm.org/bugs/show_bug.cgi?id=28144 . To fix this >> bug RegMask calculation have been modified to consider RegMask of function >> call in a Machine Function. The patch is here >> http://reviews.llvm.org/D21395. >> >> AsmPrinter.cpp is modified to print call preserved registers in comments >> at call site in generated assembly file. This suggestion was by Quentin >> Colombet to improve readability of asm files while experimenting RegMask >> and calling convention etc. This simple patch can be found here >> http://reviews.llvm.org/D21490. >> >> We have also experimented a simple improvement to IPRA by setting callee >> saved registers to none for local function and we have found some >> performance improvement. >> >> Testing: >> =====>> >> After bug 28144 fix there is no runtime failures in test suite. Also due >> to bug 28144 there was about 60 run time failures and total time taken for >> test suite compilation was 30% more compare to with out IPRA. After bug fix >> with IPRA total compile time improvement compare to without IPRA is about 4 >> to 8 minutes. >> >> >> Study: >> >> ====>> >> This week I study code responsible for adding spill and restore for >> callee saved registers. Also studied how calling convention is defined in >> target specific .td files. I studied AsmPrinter.cpp and specifically >> emitComments() method which is responsible for adding comments in llvm >> generated assembly files. I also studied about some linkage type in LLMV IR >> like ‘internal’ which represent local function in module. >> >> >> Plan for next week: >> >> 1) Submit patch related to local function optimization for review >> >> 2) Find more possible improvements >> >> 3) Get active patches committed >> >> 4) Compile large software with IPRA enabled >> >> >> Sincerely, >> >> Vivek >> >> On Wed, Jun 15, 2016 at 8:45 AM, vivek pandya <vivekvpandya at gmail.com> >> wrote: >> >>> >>> >>> On Wed, Jun 15, 2016 at 8:40 AM, vivek pandya <vivekvpandya at gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Wed, Jun 15, 2016 at 6:16 AM, Quentin Colombet <qcolombet at apple.com> >>>> wrote: >>>> >>>>> Hi Vivek, >>>>> >>>>> How much of the slow down on runtime comes from the different layout >>>>> of the function in the asm file? (I.e., because of the dummy scc pass.) >>>>> >>>>> Hello Quentin, >>>> >>>> Please do not consider previous results as there was a major bug in >>>> RegMask calculation due to not considering RegMasks of callee in MF body >>>> while calculating register usage information, that has been fixed now ( as >>>> discussed with Matthias Braun and Mehdi Amini ) and after this bugfix I >>>> have run test-suite with and without IPRA. Yes there is runtime slow down >>>> for some test cases ranging from 1% to 64% similarly compile time slow down >>>> is ranging from 1% to 48%. The runtime performance improvement is ranging >>>> from 1% to 35% and surprisingly there is also compile time improvement in a >>>> range from 1% to 60% . I would request you to go through complete results >>>> at >>>> https://docs.google.com/document/d/1cavn-POrZdhw-rrdPXV8mSvyppvOWs2rxmLgaOnd6KE/edit?usp=sharing >>>> >>>> In above result baseline is IPRA and current is without IPRA. So >>> actually data with background red is actual improvement and green is >>> regression. >>> -Vivek >>> >>>> Also there is not extra failure due to IPRA now so in the result above >>>> I have removed failures. >>>> >>>> Sincerely, >>>> Vivek >>>> >>>> >>>>> Cheers, >>>>> Q >>>>> >>>>> Le 11 juin 2016 à 21:49, vivek pandya via llvm-dev < >>>>> llvm-dev at lists.llvm.org> a écrit : >>>>> >>>>> Dear Community, >>>>> >>>>> The patch for Interprocedural Register Allocation has been committed >>>>> now , thanks to Mehdi Amini for that. We would like you to play with it and >>>>> let us know your views and more importantly ideas to improve it. >>>>> >>>>> The test-suite run has indicated some non trivial issue that results >>>>> in run time failure of the programs, we will be investigating it more. Here >>>>> are some stats : >>>>> >>>>> test-suite has been tested with IPRA enabled and overall results are >>>>> not much encouraging. On average 30% increase in compile time. Many >>>>> programs have also increase in execution time ( average 20%) that is really >>>>> serious concern for us. About 60 tests have failed on run time this >>>>> indicates error in compilation. how ever 3 tests have improvement in their >>>>> runtime and that is 7% average. >>>>> >>>>> >>>>> This week I think good thing for me to learn is to setup llvm >>>>> development environment properly other wise one can end up wasting too much >>>>> time building the llvm it self. >>>>> >>>>> So here is brief summary: >>>>> Implementation: >>>>> ===========>>>>> >>>>> The patch has been split into analysis and transformation passes. The >>>>> pass responsible for register usage propagation has been made target >>>>> independent. A print method and command line option -print-regusage has >>>>> been added so that RegMaks details can be printed in Release builds also, >>>>> this enables lit test case to be testable in Release build too. Other minor >>>>> changes to adhere coding and naming conventions. >>>>> >>>>> >>>>> Testing: >>>>> >>>>> =====>>>>> >>>>> test-suite has been tested with IPRA enabled. >>>>> >>>>> >>>>> Study and other: >>>>> >>>>> ============>>>>> >>>>> Learned about LNT, test-suite for LLVM, Inline assembly in LLVM IR, >>>>> fastcc, local functions, MCStream class. In C++ I leaned about emplace >>>>> family of methods in STL and perfect forwarding introduced in C++11. >>>>> >>>>> >>>>> Plan for next week: >>>>> >>>>> 1) Investigate issue related to functional correctness that leads to >>>>> run time failures >>>>> >>>>> 2) profile the compilation process to verify increase in time due to >>>>> IPRA >>>>> >>>>> 3) Improve IPRA by instructing codegen to not save register for local >>>>> function. >>>>> >>>>> 4) Make the pass emit asm comments to indicate register clobbered by >>>>> function call at call site in generated ASM file. >>>>> >>>>> >>>>> Sincerely, >>>>> >>>>> Vivek >>>>> >>>>> On Sun, Jun 5, 2016 at 8:48 AM, vivek pandya <vivekvpandya at gmail.com> >>>>> wrote: >>>>> >>>>>> Dear Community, >>>>>> >>>>>> This week I got my patch reviewed by mentors and based on that I have >>>>>> done changes. Also we have identified a problem with callee-saved registers >>>>>> being marked as clobbered registers so we fixed that bug. I have described >>>>>> other minor changes in following section. >>>>>> >>>>>> It was expected to get the patch committed by end of this week but >>>>>> due to unexpected mistake I was not able to complete writing test cases. >>>>>> Sorry for that. >>>>>> I had build llvm with ipra enable by default and that build files >>>>>> were on my path ! Due to that next time I tried to build llvm it was >>>>>> terribly slow (almost 1 hour for 10% build ). I spend to much time on >>>>>> fixing this by playing around with environment variables, cmake options etc. >>>>>> But I think this is a serious concern, we need to think verify this >>>>>> time complexity other wise building a large software with IPRA enable would >>>>>> be very time consuming. >>>>>> >>>>>> The toughest part for this week was to get lit and FileCheck work as >>>>>> you expect them to work, specially when analysis pass prints info on stdio >>>>>> and there is also a output file generated by llc or opt command. >>>>>> >>>>>> So here is brief summary : >>>>>> >>>>>> Implementation: >>>>>> ===========>>>>>> >>>>>> RegUsageInfoCollector is now Calling Convention aware so that RegMask >>>>>> does not mark callee saved register as clobbered register. Due to this >>>>>> register allocator can use callee saved register for caller. >>>>>> PhysicalRegisterUsageInfo.cpp renamed to RegisterUsageInfo.cpp. >>>>>> StringMap used in RegisterUsageInfo.cpp is replaced by DenseMap of >>>>>> GlobalVariable * to RegMask. >>>>>> DummyCGSCCPass moved from TargetPassConfig.cpp to CallGraphSCCPass.h. >>>>>> Minor correction in comments, changes to adhere coding standards of >>>>>> LLVM. >>>>>> >>>>>> Testing: >>>>>> ====>>>>>> >>>>>> The above mentioned changes has been tested with SNU-Realtime >>>>>> benchmarks. >>>>>> >>>>>> Studied lit and FileCheck tool and written simple test to verify >>>>>> functionality of coding. >>>>>> >>>>>> >>>>>> Study and other: >>>>>> >>>>>> ===========>>>>>> >>>>>> Studied some examples of lit compatible llvm IR with comments to RUN >>>>>> test cases, FileCheck tool syntax and how to use it with in lit >>>>>> infrastructure. >>>>>> >>>>>> I also understand X86 calling convention in more details. >>>>>> >>>>>> I also studied basic concepts in llvm IR language while reading .ll >>>>>> files written for lit. >>>>>> >>>>>> I learned about rvalue references and move semantics introduced in >>>>>> C++11. >>>>>> >>>>>> >>>>>> Plan for next week: >>>>>> >>>>>> 1) Get the patch committed along with proper tets cases. >>>>>> >>>>>> 2) Analyse time complexity of the approach. >>>>>> >>>>>> 3) Make target specific pass to CodeGen as it seems it is not >>>>>> required to be target specific. >>>>>> >>>>>> 4) If possible build a large application with ipra enable and analyze >>>>>> the impact. >>>>>> >>>>>> >>>>>> Sincerely, >>>>>> >>>>>> Vivek >>>>>> >>>>>> >>>>>> On Sat, May 28, 2016 at 7:31 PM, vivek pandya <vivekvpandya at gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Dear community, >>>>>>> >>>>>>> This is to brief you the progress of Interprocedural Register >>>>>>> Allocation, for those who are interested to see the progress in terms of >>>>>>> code please consider http://reviews.llvm.org/D20769 >>>>>>> This patch contains simple infrastructure to propagate register >>>>>>> usage information of callee to caller in call graph. The code generation >>>>>>> order is changed to follow bottom up order on call graph , Thanks to Mehdi >>>>>>> Amini for the patch for the same ! I will write a blog on this very soon. >>>>>>> >>>>>>> So during this week as per the schedule proposed it should be study >>>>>>> related infrastructure in LLVM and finalizing an approach for IPRA, but >>>>>>> instead I am able to implement a working (may not be fully correct) >>>>>>> prototype because I have used community bonding period to discuss and learn >>>>>>> related stuffs from the mentors and also due to patch for CodeGen >>>>>>> reordering was provided by dear mentor Mehdi Amini. >>>>>>> >>>>>>> So I conclude the work done during this week as follows: >>>>>>> Implementation : >>>>>>> ===========>>>>>>> Following passes have been implemented during this week: An >>>>>>> immutable pass to store competed RegMask, a machine function pass that >>>>>>> iterates through each registers and check if it is used or not and based on >>>>>>> that details create a RegMask and a target specific machine function pass >>>>>>> that uses the RegMask created by second pass and propagates information by >>>>>>> updating call instructions RegMask. To update the RegMask of MI , >>>>>>> setRegMask() function has been added to MachineOperand, a command line >>>>>>> option -enable-ipra and debug type -debug-only=“ipra" has been added to >>>>>>> control the optimization through llc. >>>>>>> >>>>>>> Testing: >>>>>>> ====>>>>>>> The above mentioned implementation has been tested over >>>>>>> SNU-Real-Time benchmark suit ( >>>>>>> http://www.cprover.org/goto-cc/examples/snu.html) and some simple >>>>>>> programs that uses library function ( for a library function register >>>>>>> allocation is not done by LLVM so this optimization will simply skip them) >>>>>>> >>>>>>> Study and Other: >>>>>>> ============>>>>>>> I have learned following things in LLVM, how it stores reg >>>>>>> clobbering information? how it is used by Reg allocators through >>>>>>> LivePhysRegs, LiveRegMatrix and other related passes? How to schedule a >>>>>>> pass using TargetPassConfig and TargetMachine? What are called callee saved >>>>>>> registers? What is an Immutable Pass? Apart from that I have also learned >>>>>>> how to use phabricator to send review request. I have also read some >>>>>>> related literatures. >>>>>>> >>>>>>> During this week though task was to schedule the passes in proper >>>>>>> order so that dependencies of related passes are satisfied. >>>>>>> >>>>>>> Plan for next week: >>>>>>> 1) Perform more testing and debug any known issue >>>>>>> 2) Fine ture the implementation so as to eliminate any unnecessary >>>>>>> work >>>>>>> 3) During the testing from the stats I have observed that IPRA does >>>>>>> not always improve the work of IntraProcedural register allocators and it >>>>>>> is also observer that the amount of benefit (in terms of spilled live >>>>>>> ranges ) is not deterministic. So I would like to find reasons for this >>>>>>> behavior. >>>>>>> 4) Start implementing target specific pass for other targets if >>>>>>> review passes properly with no major bugs. >>>>>>> >>>>>>> Please provide any feedback/suggestion including for format of this >>>>>>> email. >>>>>>> >>>>>>> I would also like to thanks my mentors Mehdi Amini , Hal Finkel, Quentin >>>>>>> Colombet, Matthias Braun and other community members for providing >>>>>>> quick help every time when I asked ( I have got replies even after 8 PM ( >>>>>>> PDT) ! ) . >>>>>>> >>>>>>> Sincerely, >>>>>>> Vivek >>>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160621/40b7e952/attachment-0001.html>
Sanjoy Das via llvm-dev
2016-Jun-20 19:53 UTC
[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation
Hi Vivek, vivek pandya via llvm-dev wrote: > int foo() { > return 12; > } > > int bar(int a) { > return foo() + a; > } > > int (*fp)() = 0; > int (*fp1)(int) = 0; > > int main() { > fp = foo; > fp(); > fp1 = bar; > fp1(15); > return 0; > } IMO it is waste of time trying to do a better job at the IPRA level on IR like the above ^. LLVM should be folding the indirect calls to direct calls at the IR level, and if it isn't that's a bug in the IR level optimizer. The interesting cases are when you have a call like: fnptr target = object->callback; target(foo, bar); and the IR level optimizer has failed to optimize the indirect call to `target` to a direct call. -- Sanjoy
Matthias Braun via llvm-dev
2016-Jun-20 20:15 UTC
[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation
> On Jun 20, 2016, at 12:53 PM, Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi Vivek, > > vivek pandya via llvm-dev wrote: > > int foo() { > > return 12; > > } > > > > int bar(int a) { > > return foo() + a; > > } > > > > int (*fp)() = 0; > > int (*fp1)(int) = 0; > > > > int main() { > > fp = foo; > > fp(); > > fp1 = bar; > > fp1(15); > > return 0; > > } > > IMO it is waste of time trying to do a better job at the IPRA level on > IR like the above ^. LLVM should be folding the indirect calls to > direct calls at the IR level, and if it isn't that's a bug in the IR > level optimizer.+1 from me. The interesting cases are the non-obvious ones (assumeing foo/bar have the same parameters). Things gets interesting once you have uncertainty in the mix. The minimally interesting case would look like this: int main() { int (*fp)(); if (rand()) { fp = foo; } else { fp = bar; } fp(42); } However predicting the possible targets of a call is IMO a question of computing a call graph datastructure and improving upon that. We should be sure that we discuss and implement this independently of the register allocation work! - Matthias