vivek pandya via llvm-dev
2016-May-25 02:34 UTC
[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback
On Wed, May 25, 2016 at 3:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > ------------------------------ > > *From: *"vivek pandya" <vivekvpandya at gmail.com> > *To: *"Quentin Colombet" <qcolombet at apple.com> > *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org>, > "Matthias Braun" <matze at braunis.de>, "Mehdi Amini" <mehdi.amini at apple.com> > *Sent: *Tuesday, May 24, 2016 1:00:58 PM > *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation - > Introduction and Feedback > > Hello, > > I have written following code to check each register if it is used by > machineFunction or not : > > MachineRegisterInfo *MRI = &MF.getRegInfo(); > TargetRegisterInfo *TRI = (TargetRegisterInfo > *)MF.getSubtarget().getRegisterInfo(); > > Some reason you can't use a const pointer here? >MCRegisterInfo is just used to get conventional name of register for given target like AX, BX on X86.> > const TargetMachine &TM = MF.getTarget(); > const MCRegisterInfo *MCRI = TM.getMCRegisterInfo(); > DEBUG(dbgs() << "Function Name : " << MF.getName() << "\n"); > > for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e > (*TRI).regclass_end(); i != e; i++ ) { > for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege > (*i)->end(); pregi != prege; pregi++ ) { > DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) << " is > modified "<< MRI->isPhysRegModified(*pregi) << " \n"); > > Try isPhysRegUsed. >ok> > } > } > DEBUG(dbgs() << "\n"); > > The pass which is executing this code is schedule POST-RA stage but this > gives me true for all registers i.e in each function all registers are > being used except EBP and some other similar, Is this a correct way to get > register usage information ? I think I have made some mistake please help. > > > You might look at the implementation of these functions in > lib/CodeGen/MachineRegisterInfo.cpp and figure out if they're returning > true because UsedPhysRegMask.test(PhysReg) is true or because > reg_nodbg_empty(*AliasReg) is true. >Yes that helped now I am getting actual register which have been used by given function, but a little problem The updated code is as shown below : for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e (*TRI).regclass_end(); i != e; i++ ) { for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege (*i)->end(); pregi != prege; pregi++ ) { for (MCRegAliasIterator AliasReg(*pregi, TRI, true); AliasReg.isValid(); ++AliasReg) { if (!MRI->reg_nodbg_empty(*AliasReg)) { DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) << " is used "<< MRI->isPhysRegUsed(*pregi) << " \n"); break; // no need to process more alias } } } } But here some registers are getting processed with in different classes (unnecessary processing) Is this only way to iterate through all used register (using RegClass iterator) ? Is there any way to avoid duplicate regs? Of course currently I am just printing but next I am thinking to use a map to track usage info , in that only distinct register info will be stored but still due to loop structure I need to iterate through a single register 3 - 4 times making it time consuming. -Vivek> > > -Hal > > > > Vivek > > On Wed, May 18, 2016 at 11:42 PM, Quentin Colombet <qcolombet at apple.com> > wrote: > >> >> On May 18, 2016, at 11:00 AM, vivek pandya <vivekvpandya at gmail.com> >> wrote: >> >> >> >> *Vivek Pandya* >> >> >> On Wed, May 18, 2016 at 11:25 PM, Quentin Colombet <qcolombet at apple.com> >> wrote: >> >>> >>> On May 18, 2016, at 10:46 AM, vivek pandya <vivekvpandya at gmail.com> >>> wrote: >>> >>> >>> >>> *Vivek Pandya* >>> >>> >>> On Wed, May 11, 2016 at 4:01 PM, Hal Finkel <hfinkel at anl.gov> wrote: >>> >>>> >>>> ------------------------------ >>>> >>>> *From: *"vivek pandya" <vivekvpandya at gmail.com> >>>> *To: *"Mehdi Amini" <mehdi.amini at apple.com> >>>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "Quentin Colombet" < >>>> qcolombet at apple.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias >>>> Braun" <matze at braunis.de> >>>> *Sent: *Wednesday, May 11, 2016 3:15:03 AM >>>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation - >>>> Introduction and Feedback >>>> >>>> >>>> >>>> >>>> *Vivek Pandya* >>>> >>>> >>>> On Wed, May 11, 2016 at 10:02 AM, vivek pandya <vivekvpandya at gmail.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> *Vivek Pandya* >>>>> >>>>> >>>>> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com> >>>>> wrote: >>>>> >>>>>> >>>>>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote: >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> >>>>>> *From: *"vivek pandya" <vivekvpandya at gmail.com> >>>>>> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Tim Amini Golling" < >>>>>> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at anl.gov> >>>>>> *Cc: *"Quentin Colombet" <qcolombet at apple.com> >>>>>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM >>>>>> *Subject: *[GSoC 2016] Interprocedural Register Allocation - >>>>>> Introduction and Feedback >>>>>> >>>>>> Hello LLVM Community, >>>>>> >>>>>> Sorry for delay as I was busy in final exams. >>>>>> >>>>>> I am Vivek from India. Thanks for choosing my proposal for >>>>>> Interprocedural Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal >>>>>> Finkel will be mentoring me for this project. >>>>>> >>>>>> IPRA can reduce code size and runtime of programs by allocating >>>>>> register across the module and procedure boundaries. >>>>>> >>>>>> I have identified some old but effective research work on this area. >>>>>> I want community's feedback for feasibility of these approach and I >>>>>> am targeting to implement two of them during this project. >>>>>> >>>>>> Here is list of the papers, I have read first two papers and I would >>>>>> like to discuss those approach first, I will read other two paper then >>>>>> initiate discussion for them as well. All I want is to find out a concrete >>>>>> implementation plan before 23 May, 2016 and for that I need community's >>>>>> help. >>>>>> >>>>>> 1) Compile time ----- Minimizing register usage penalty at procedure >>>>>> calls - http://dl.acm.org/citation.cfm?id=53999 >>>>>> ====================================================================In >>>>>> this approach intra-procedural register allocation is used as base but >>>>>> machine code generation order is bottom up traversal of call graph and >>>>>> inter-procedural effect is achieved by propagating register usage >>>>>> information of callee function to caller (i.e child to parent in CallGraph) >>>>>> so that caller can use different registers than callee and can save load >>>>>> store cost at procedure call, this is not trivial as it seems due to >>>>>> recursive calls, library function usage etc. Also for upper region of the >>>>>> graph in this technique available number of registers might become zero in >>>>>> that case it should fall back to normal load store at procedure call. Apart >>>>>> from these difficulties other difficulties have been identified please >>>>>> follow this mail-chain >>>>>> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion >>>>>> My mentor has already provided me a patch that alters code generation >>>>>> order as per bottom up call graph traversal, I am working from that point >>>>>> now. Any other help/suggestion is always welcomed. >>>>>> >>>>>> 2) Link time ----- Global register allocation at link time - >>>>>> http://dl.acm.org/citation.cfm?id=989415 >>>>>> ====================================================================In >>>>>> this particular approach (sort of true IPRA) registers will be reallocated >>>>>> (this optimization will be optional if turned off still code will be >>>>>> compiled as per intra-procedural allocation) at link time. Here modules are >>>>>> first complied as per normal compilation but the object code is annotated >>>>>> with details so that linker can build call graph and also calculate usage >>>>>> information at link time. Compiler also write hints in object code that if >>>>>> particular variable is allocated in some other register ( due to new >>>>>> allocation) then how the code should be changed? Thus linker can use these >>>>>> information to decide which variables (global) need to be in same register >>>>>> through out the program execution and also according to register usage >>>>>> information in call graph which procedure will not be active simultaneously >>>>>> so that locals for that procedures can be in same registers with out load >>>>>> store at procedure calls. >>>>>> For these particular method help me to analyze feasibility: >>>>>> 1) Can llvm collects following information at module level in >>>>>> MachineIR? list of procedures in module, list of locals in procedures, list >>>>>> of procedures that a particular procedure can call, and a list of the >>>>>> variables this procedure references. Each entry in the last two lists >>>>>> includes an estimate of the number of times the procedure is called or the >>>>>> variable is referenced in each execution of this procedure >>>>>> 2) Can llvm write informative commands to object files? >>>>>> 3) Can LTO is capable of leveraging those commands? >>>>>> >>>>>> In terms of scoping the project for the summer, I definitely >>>>>> recommend that you focus on (1) first. If you finish that, we can certainly >>>>>> move on to other things. >>>>>> >>>>>> >>>>>> I'll add +1 here, but I already wrote the same thing on IRC when >>>>>> discussing with Vivek. True IPRA without a proper MachineModule >>>>>> infrastructure won't be doable in my opinion (even with such >>>>>> infrastructure, it may not be trivial in LLVM in general). >>>>>> >>>>>> Regarding link time, note that any such a design would likely look >>>>>> much different than in David Wall's paper however, because our LTO >>>>>> re-codegens everything anyway. The paper says, "Finally, it keeps us honest >>>>>> as designers of the system; once we postpone anything until link time, the >>>>>> temptation is great to postpone everything, ..." - Well, we've long-since >>>>>> succumb to that temptation when we LTO. C'est la vie. >>>>>> >>>>>> >>>>>> +1 as well, our LTO will benefit naturally from the leaf-to-root >>>>>> information propagation. ThinLTO will be more challenging/interesting >>>>>> though! >>>>>> >>>>>> For the first part a mechanism similar to MachineModulePass would be >>>>>> desirable but that may not be possible during this project, but if we can >>>>>> make some sort of smaller version of that to suit our purpose. >>>>>> >>>>>> I don't think we need to make any kind of MachineModulePass to make >>>>>> this work. Once we alter the visitation order based on the CGSCC iteration >>>>>> scheme, we can keep state in-between functions in the pre-existing hacky >>>>>> way (using static members of the relevant function passes). >>>>>> >>>>>> Sorry my mistake here by first part I mean 1) requirement in the >>>>> link time approach. >>>>> >>>>>> >>>>> >>>>>> I also don't see where/why we need a MachineModule(Pass) for the >>>>>> CGSCC scheme, that said I'd rather avoid using a function pass with static >>>>>> members, if we can have a ModuleAnalysis that is bookkeeping the results >>>>>> for functions in the module and queries by the register allocator somehow. >>>>>> Matthias/Quentin may have other inputs on this aspect. >>>>>> >>>>> >>>> @Hal do you mean to add a simple MachineFunction pass that will just >>>> operate on register allocated function and prepare a BitVector to indicate >>>> which register is being used by MachineFunction, and then use this pass as >>>> analysis pass (i.e just simply return static BitVector for clobbered >>>> register when register allocation for next function begins. This part is >>>> not much clear to me) this thing can be done by scheduling a pass post >>>> register allocation in lib/CodeGen/Passes.cpp >>>> >>>> void TargetPassConfig::addMachinePasses() { >>>> . >>>> . >>>> . >>>> // Run pre-ra passes. >>>> addPreRegAlloc(); >>>> >>>> // Run register allocation and passes that are tightly coupled with >>>> it, >>>> // including phi elimination and scheduling. >>>> if (getOptimizeRegAlloc()) >>>> addOptimizedRegAlloc(createRegAllocPass(true)); >>>> else >>>> addFastRegAlloc(createRegAllocPass(false)); >>>> >>>> // Run post-ra passes. >>>> addPostRegAlloc(); >>>> // Adding a new pass here which keeps register mask information across >>>> function calls. >>>> . >>>> . >>>> . >>>> } >>>> >>>> But this also requires current register allocators to use this >>>> information in someway because RegMaskBits in LiveIntervalAnalysis.cpp is >>>> not static across calls. I mean I am not clear for how to propagate static >>>> info to Intra-procedural Register allocators (if possible without >>>> disturbing their code ) >>>> >>>> First, my hope is that we won't need to change the register allocators, >>>> as such, in order to make use of this information. Instead, we'll simply be >>>> able to alter the register masks generated for the call instructions. These >>>> masks will indicate fewer clobbers than might otherwise be present based on >>>> the ABI because of information gathered during the codegen of the callee. >>>> These masks are generally constructed by target based on the calling >>>> convention. The PowerPC backend, for example, looks like this: >>>> >>>> // Add a register mask operand representing the call-preserved >>>> registers. >>>> const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo(); >>>> const uint32_t *Mask >>>> TRI->getCallPreservedMask(DAG.getMachineFunction(), CallConv); >>>> assert(Mask && "Missing call preserved mask for calling convention"); >>>> Ops.push_back(DAG.getRegisterMask(Mask)); >>>> >>>> but it can be more complicated. If you look for uses of >>>> 'getRegisterMask' in Target/*/*ISelLowering.cpp, you'll see what I mean. >>>> Regardless, the code ends up calling some method is the targets >>>> TargetRegisterInfo subclass. These methods generally look something like >>>> this: >>>> >>>> const uint32_t * >>>> PPCRegisterInfo::getCallPreservedMask(const MachineFunction &MF, >>>> CallingConv::ID CC) const { >>>> const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>(); >>>> ... >>>> return TM.isPPC64() ? (Subtarget.hasAltivec() ? >>>> CSR_SVR464_Altivec_RegMask >>>> : CSR_SVR464_RegMask) >>>> : (Subtarget.hasAltivec() ? >>>> CSR_SVR432_Altivec_RegMask >>>> : CSR_SVR432_RegMask); >>>> } >>>> >>>> In any case, the fundamental idea here is that, when someone calls >>>> getCallPreservedMask in order to set the regmask on a call, we might not >>>> have to use the CC at all. Instead, if we've already codegened the >>>> function, we might use a cache of 'exact' register masks computed during >>>> codegen of the potential callees instead. >>>> >>> I am thinking to add a simple Immutable pass MachineRegisterUsageInfo >>> similar to MachineBranchProbabilityInfo that can maintain >>> RegisterUsageInformation per function. Can it be simply done by using >>> UsedPhysRegMask from MachineRegisterInfo ?? >>> >>> >>> No, like the comment said, UsedPhysRegMask gives only the registers >>> clobbered by calls: >>> // This bit vector represents all the registers clobbered by function >>> calls. >>> >>> You want to build this information yourself on top of >>> MachineRegisterInfo::isPhysRegModified >>> >> Ok but then the time complexity will be O(n) n = number of physical >> register on the target. Am I going correct? >> >> >> Yes, this is correct. >> >> >>> Here getCallPreservedMask will call API provided by >>> MachineRegisterUsageInfo to avail the exact register mask but how it can >>> know that the function is already codegen or it will query each time when >>> getCallPreservedMask is called and of available MachineRegisterUsageInfo >>> will return the details otherwise simply return NULL. >>> So changes will be now in TargetRegisterInfo implementation for each >>> target right ?? >>> >>> >>>> In order to do this, I think we'll need to provide a function callable >>>> from the target's getCallPreservedMask implementation, which can return >>>> such an 'exact' regmask when available. I think we need to do it this way >>>> for two reasons: >>>> >>>> 1. Not all of the target code calls getCallPreservedMask, but >>>> sometimes calls other similar target-specific functions (e.g. >>>> getTLSCallPreservedMask). >>>> 2. The targets need to opt-in to this behavior because only the target >>>> can know that all register uses are really tagged correctly post "pre-emit". >>>> >>>> Because the target is free to introduce uses of registers at >>>> essentially any time, we need to do the scanning for used registers after >>>> the "pre-emit" passes run. This can be done by scheduling some simple >>>> register-use scanning pass after the call to addPreEmitPass in >>>> lib/CodeGen/Passes.cpp. >>>> >>>> >>>> I think this also applies in someway to Mehdi Amini's idea to keep a >>>> ModulePass for bookkeeping but then existing register allocators will be >>>> required to change so that the code can query the ModulePass for >>>> RegMaskBits for particular function. >>>> >>>> I think that the simplest way to do this is to create an immutable >>>> analysis pass (e.g. BasicAA) that keeps the cache of the computed register >>>> masks. This is somewhat similar in spirit to how the 'AssumptionCache' >>>> analysis works at the IR level. This analysis can then be created by >>>> lib/CodeGen/Passes.cpp early, and then queried and passed around later by >>>> the CodeGen/Target code. Because it is an immutable analysis, it won't get >>>> destroyed until the very end, which is also important because, I imagine, >>>> it will need to own the memory associated with the generated register masks. >>>> >>>> -Hal >>>> >>>> >>>> Vivek >>>> >>>> >>>>>> Yes for propagating register usage approach we don't need >>>>> MachineModulePass >>>>> >>>>> Vivek >>>>> >>>>>> -- >>>>>> Mehdi >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Hal Finkel >>>> Assistant Computational Scientist >>>> Leadership Computing Facility >>>> Argonne National Laboratory >>>> >>> >> > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160525/2e928ec0/attachment-0001.html>
Hal Finkel via llvm-dev
2016-May-25 03:14 UTC
[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback
----- Original Message -----> From: "vivek pandya" <vivekvpandya at gmail.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias Braun" > <matze at braunis.de>, "Mehdi Amini" <mehdi.amini at apple.com>, "Quentin > Colombet" <qcolombet at apple.com> > Sent: Tuesday, May 24, 2016 9:34:29 PM > Subject: Re: [GSoC 2016] Interprocedural Register Allocation - > Introduction and Feedback> On Wed, May 25, 2016 at 3:53 AM, Hal Finkel < hfinkel at anl.gov > > wrote:> > > From: "vivek pandya" < vivekvpandya at gmail.com > > > > > > > To: "Quentin Colombet" < qcolombet at apple.com > > > > > > > Cc: "Hal Finkel" < hfinkel at anl.gov >, "llvm-dev" < > > > llvm-dev at lists.llvm.org >, "Matthias Braun" < matze at braunis.de >, > > > "Mehdi Amini" < mehdi.amini at apple.com > > > > > > > Sent: Tuesday, May 24, 2016 1:00:58 PM > > > > > > Subject: Re: [GSoC 2016] Interprocedural Register Allocation - > > > Introduction and Feedback > > >> > > Hello, > > >> > > I have written following code to check each register if it is > > > used > > > by > > > machineFunction or not : > > >> > > MachineRegisterInfo *MRI = &MF.getRegInfo(); > > > > > > TargetRegisterInfo *TRI = (TargetRegisterInfo > > > *)MF.getSubtarget().getRegisterInfo(); > > > > > Some reason you can't use a const pointer here? > > MCRegisterInfo is just used to get conventional name of register for > given target like AX, BX on X86. > > > const TargetMachine &TM = MF.getTarget(); > > > > > > const MCRegisterInfo *MCRI = TM.getMCRegisterInfo(); > > > > > > DEBUG(dbgs() << "Function Name : " << MF.getName() << "\n"); > > >> > > for(TargetRegisterInfo::regclass_iterator i > > > (*TRI).regclass_begin(), e = (*TRI).regclass_end(); i != e; i++ ) > > > { > > > > > > for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege > > > (*i)->end(); pregi != prege; pregi++ ) { > > > > > > DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) > > > << > > > " > > > is modified "<< MRI->isPhysRegModified(*pregi) << " \n"); > > > > > Try isPhysRegUsed. > > ok > > > } > > > > > > } > > > > > > DEBUG(dbgs() << "\n"); > > >> > > The pass which is executing this code is schedule POST-RA stage > > > but > > > this gives me true for all registers i.e in each function all > > > registers are being used except EBP and some other similar, Is > > > this > > > a correct way to get register usage information ? I think I have > > > made some mistake please help. > > > > > You might look at the implementation of these functions in > > lib/CodeGen/MachineRegisterInfo.cpp and figure out if they're > > returning true because UsedPhysRegMask.test(PhysReg) is true or > > because reg_nodbg_empty(*AliasReg) is true. > > Yes that helped now I am getting actual register which have been used > by given function, but a little problem > The updated code is as shown below : > for(TargetRegisterInfo::regclass_iterator i > (*TRI).regclass_begin(), e = (*TRI).regclass_end(); i != e; i++ ) { > for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege > (*i)->end(); pregi != prege; pregi++ ) { > for (MCRegAliasIterator AliasReg(*pregi, TRI, true); > AliasReg.isValid(); ++AliasReg) { > if (!MRI->reg_nodbg_empty(*AliasReg)) { > DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) << " > is used "<< MRI->isPhysRegUsed(*pregi) << " \n"); > break; // no need to process more alias > } > } > } > } > But here some registers are getting processed with in different > classes (unnecessary processing) Is this only way to iterate through > all used register (using RegClass iterator) ? Is there any way to > avoid duplicate regs? > Of course currently I am just printing but next I am thinking to use > a map to track usage info , in that only distinct register info will > be stored but still due to loop structure I need to iterate through > a single register 3 - 4 times making it time consuming.Yes, I believe you can just do: for (unsigned Reg = 0; Reg < TRI->getNumRegs(); ++Reg) { -Hal> -Vivek > > -Hal >> > > Vivek > > >> > > On Wed, May 18, 2016 at 11:42 PM, Quentin Colombet < > > > qcolombet at apple.com > wrote: > > >> > > > > On May 18, 2016, at 11:00 AM, vivek pandya < > > > > > vivekvpandya at gmail.com > > > > > > > > > > > wrote: > > > > > > > > > >> > > > > Vivek Pandya > > > > > > > > > >> > > > > On Wed, May 18, 2016 at 11:25 PM, Quentin Colombet < > > > > > qcolombet at apple.com > wrote: > > > > > > > > > >> > > > > > > On May 18, 2016, at 10:46 AM, vivek pandya < > > > > > > > vivekvpandya at gmail.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > >> > > > > > > Vivek Pandya > > > > > > > > > > > > > > > > > > > > >> > > > > > > On Wed, May 11, 2016 at 4:01 PM, Hal Finkel < > > > > > > > hfinkel at anl.gov > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > From: "vivek pandya" < vivekvpandya at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Mehdi Amini" < mehdi.amini at apple.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cc: "Hal Finkel" < hfinkel at anl.gov >, "Quentin > > > > > > > > > Colombet" > > > > > > > > > < > > > > > > > > > qcolombet at apple.com >, "llvm-dev" < > > > > > > > > > llvm-dev at lists.llvm.org > > > > > > > > > >, > > > > > > > > > "Matthias Braun" < matze at braunis.de > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Wednesday, May 11, 2016 3:15:03 AM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Subject: Re: [GSoC 2016] Interprocedural Register > > > > > > > > > Allocation > > > > > > > > > - > > > > > > > > > Introduction and Feedback > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > Vivek Pandya > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > On Wed, May 11, 2016 at 10:02 AM, vivek pandya < > > > > > > > > > vivekvpandya at gmail.com > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Vivek Pandya > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini < > > > > > > > > > > mehdi.amini at apple.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > On May 10, 2016, at 6:06 PM, Hal Finkel < > > > > > > > > > > > > hfinkel at anl.gov > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > From: "vivek pandya" < vivekvpandya at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "llvm-dev" < llvm-dev at lists.llvm.org >, > > > > > > > > > > > > > "Tim > > > > > > > > > > > > > Amini > > > > > > > > > > > > > Golling" > > > > > > > > > > > > > < > > > > > > > > > > > > > mehdi.amini at apple.com >, "Hal Finkel" < > > > > > > > > > > > > > hfinkel at anl.gov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cc: "Quentin Colombet" < qcolombet at apple.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Tuesday, May 10, 2016 2:59:16 PM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Subject: [GSoC 2016] Interprocedural Register > > > > > > > > > > > > > Allocation > > > > > > > > > > > > > - > > > > > > > > > > > > > Introduction and Feedback > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > Hello LLVM Community, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > Sorry for delay as I was busy in final exams. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > I am Vivek from India. Thanks for choosing my > > > > > > > > > > > > > proposal > > > > > > > > > > > > > for > > > > > > > > > > > > > Interprocedural Register Allocation (IPRA) in > > > > > > > > > > > > > LLVM. > > > > > > > > > > > > > Mehdi > > > > > > > > > > > > > Amini > > > > > > > > > > > > > and > > > > > > > > > > > > > Hal Finkel will be mentoring me for this > > > > > > > > > > > > > project. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > IPRA can reduce code size and runtime of > > > > > > > > > > > > > programs > > > > > > > > > > > > > by > > > > > > > > > > > > > allocating > > > > > > > > > > > > > register across the module and procedure > > > > > > > > > > > > > boundaries. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > I have identified some old but effective > > > > > > > > > > > > > research > > > > > > > > > > > > > work > > > > > > > > > > > > > on > > > > > > > > > > > > > this > > > > > > > > > > > > > area. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I want community's feedback for feasibility > > > > > > > > > > > > > of > > > > > > > > > > > > > these > > > > > > > > > > > > > approach > > > > > > > > > > > > > and > > > > > > > > > > > > > I > > > > > > > > > > > > > am targeting to implement two of them during > > > > > > > > > > > > > this > > > > > > > > > > > > > project. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > Here is list of the papers, I have read first > > > > > > > > > > > > > two > > > > > > > > > > > > > papers > > > > > > > > > > > > > and > > > > > > > > > > > > > I > > > > > > > > > > > > > would > > > > > > > > > > > > > like to discuss those approach first, I will > > > > > > > > > > > > > read > > > > > > > > > > > > > other > > > > > > > > > > > > > two > > > > > > > > > > > > > paper > > > > > > > > > > > > > then initiate discussion for them as well. > > > > > > > > > > > > > All > > > > > > > > > > > > > I > > > > > > > > > > > > > want > > > > > > > > > > > > > is > > > > > > > > > > > > > to > > > > > > > > > > > > > find > > > > > > > > > > > > > out > > > > > > > > > > > > > a concrete implementation plan before 23 May, > > > > > > > > > > > > > 2016 > > > > > > > > > > > > > and > > > > > > > > > > > > > for > > > > > > > > > > > > > that > > > > > > > > > > > > > I > > > > > > > > > > > > > need community's help. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > 1) Compile time ----- Minimizing register > > > > > > > > > > > > > usage > > > > > > > > > > > > > penalty > > > > > > > > > > > > > at > > > > > > > > > > > > > procedure > > > > > > > > > > > > > calls - > > > > > > > > > > > > > http://dl.acm.org/citation.cfm?id=53999 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ====================================================================In > > > > > > > > > > > > > this approach intra-procedural register > > > > > > > > > > > > > allocation > > > > > > > > > > > > > is > > > > > > > > > > > > > used > > > > > > > > > > > > > as > > > > > > > > > > > > > base > > > > > > > > > > > > > but machine code generation order is bottom > > > > > > > > > > > > > up > > > > > > > > > > > > > traversal > > > > > > > > > > > > > of > > > > > > > > > > > > > call > > > > > > > > > > > > > graph and inter-procedural effect is achieved > > > > > > > > > > > > > by > > > > > > > > > > > > > propagating > > > > > > > > > > > > > register usage information of callee function > > > > > > > > > > > > > to > > > > > > > > > > > > > caller > > > > > > > > > > > > > (i.e > > > > > > > > > > > > > child > > > > > > > > > > > > > to parent in CallGraph) so that caller can > > > > > > > > > > > > > use > > > > > > > > > > > > > different > > > > > > > > > > > > > registers > > > > > > > > > > > > > than callee and can save load store cost at > > > > > > > > > > > > > procedure > > > > > > > > > > > > > call, > > > > > > > > > > > > > this > > > > > > > > > > > > > is > > > > > > > > > > > > > not trivial as it seems due to recursive > > > > > > > > > > > > > calls, > > > > > > > > > > > > > library > > > > > > > > > > > > > function > > > > > > > > > > > > > usage etc. Also for upper region of the graph > > > > > > > > > > > > > in > > > > > > > > > > > > > this > > > > > > > > > > > > > technique > > > > > > > > > > > > > available number of registers might become > > > > > > > > > > > > > zero > > > > > > > > > > > > > in > > > > > > > > > > > > > that > > > > > > > > > > > > > case > > > > > > > > > > > > > it > > > > > > > > > > > > > should fall back to normal load store at > > > > > > > > > > > > > procedure > > > > > > > > > > > > > call. > > > > > > > > > > > > > Apart > > > > > > > > > > > > > from > > > > > > > > > > > > > these difficulties other difficulties have > > > > > > > > > > > > > been > > > > > > > > > > > > > identified > > > > > > > > > > > > > please > > > > > > > > > > > > > follow this mail-chain > > > > > > > > > > > > > https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My mentor has already provided me a patch > > > > > > > > > > > > > that > > > > > > > > > > > > > alters > > > > > > > > > > > > > code > > > > > > > > > > > > > generation > > > > > > > > > > > > > order as per bottom up call graph traversal, > > > > > > > > > > > > > I > > > > > > > > > > > > > am > > > > > > > > > > > > > working > > > > > > > > > > > > > from > > > > > > > > > > > > > that > > > > > > > > > > > > > point now. Any other help/suggestion is > > > > > > > > > > > > > always > > > > > > > > > > > > > welcomed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > 2) Link time ----- Global register allocation > > > > > > > > > > > > > at > > > > > > > > > > > > > link > > > > > > > > > > > > > time > > > > > > > > > > > > > - > > > > > > > > > > > > > http://dl.acm.org/citation.cfm?id=989415 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ====================================================================In > > > > > > > > > > > > > this particular approach (sort of true IPRA) > > > > > > > > > > > > > registers > > > > > > > > > > > > > will > > > > > > > > > > > > > be > > > > > > > > > > > > > reallocated (this optimization will be > > > > > > > > > > > > > optional > > > > > > > > > > > > > if > > > > > > > > > > > > > turned > > > > > > > > > > > > > off > > > > > > > > > > > > > still > > > > > > > > > > > > > code will be compiled as per intra-procedural > > > > > > > > > > > > > allocation) > > > > > > > > > > > > > at > > > > > > > > > > > > > link > > > > > > > > > > > > > time. Here modules are first complied as per > > > > > > > > > > > > > normal > > > > > > > > > > > > > compilation > > > > > > > > > > > > > but > > > > > > > > > > > > > the object code is annotated with details so > > > > > > > > > > > > > that > > > > > > > > > > > > > linker > > > > > > > > > > > > > can > > > > > > > > > > > > > build > > > > > > > > > > > > > call graph and also calculate usage > > > > > > > > > > > > > information > > > > > > > > > > > > > at > > > > > > > > > > > > > link > > > > > > > > > > > > > time. > > > > > > > > > > > > > Compiler also write hints in object code that > > > > > > > > > > > > > if > > > > > > > > > > > > > particular > > > > > > > > > > > > > variable > > > > > > > > > > > > > is allocated in some other register ( due to > > > > > > > > > > > > > new > > > > > > > > > > > > > allocation) > > > > > > > > > > > > > then > > > > > > > > > > > > > how the code should be changed? Thus linker > > > > > > > > > > > > > can > > > > > > > > > > > > > use > > > > > > > > > > > > > these > > > > > > > > > > > > > information to decide which variables > > > > > > > > > > > > > (global) > > > > > > > > > > > > > need > > > > > > > > > > > > > to > > > > > > > > > > > > > be > > > > > > > > > > > > > in > > > > > > > > > > > > > same > > > > > > > > > > > > > register through out the program execution > > > > > > > > > > > > > and > > > > > > > > > > > > > also > > > > > > > > > > > > > according > > > > > > > > > > > > > to > > > > > > > > > > > > > register usage information in call graph > > > > > > > > > > > > > which > > > > > > > > > > > > > procedure > > > > > > > > > > > > > will > > > > > > > > > > > > > not > > > > > > > > > > > > > be > > > > > > > > > > > > > active simultaneously so that locals for that > > > > > > > > > > > > > procedures > > > > > > > > > > > > > can > > > > > > > > > > > > > be > > > > > > > > > > > > > in > > > > > > > > > > > > > same registers with out load store at > > > > > > > > > > > > > procedure > > > > > > > > > > > > > calls. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For these particular method help me to > > > > > > > > > > > > > analyze > > > > > > > > > > > > > feasibility: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) Can llvm collects following information at > > > > > > > > > > > > > module > > > > > > > > > > > > > level > > > > > > > > > > > > > in > > > > > > > > > > > > > MachineIR? list of procedures in module, list > > > > > > > > > > > > > of > > > > > > > > > > > > > locals > > > > > > > > > > > > > in > > > > > > > > > > > > > procedures, list of procedures that a > > > > > > > > > > > > > particular > > > > > > > > > > > > > procedure > > > > > > > > > > > > > can > > > > > > > > > > > > > call, > > > > > > > > > > > > > and a list of the variables this procedure > > > > > > > > > > > > > references. > > > > > > > > > > > > > Each > > > > > > > > > > > > > entry > > > > > > > > > > > > > in > > > > > > > > > > > > > the last two lists includes an estimate of > > > > > > > > > > > > > the > > > > > > > > > > > > > number > > > > > > > > > > > > > of > > > > > > > > > > > > > times > > > > > > > > > > > > > the > > > > > > > > > > > > > procedure is called or the variable is > > > > > > > > > > > > > referenced > > > > > > > > > > > > > in > > > > > > > > > > > > > each > > > > > > > > > > > > > execution > > > > > > > > > > > > > of this procedure > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) Can llvm write informative commands to > > > > > > > > > > > > > object > > > > > > > > > > > > > files? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3) Can LTO is capable of leveraging those > > > > > > > > > > > > > commands? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > In terms of scoping the project for the summer, > > > > > > > > > > > > I > > > > > > > > > > > > definitely > > > > > > > > > > > > recommend that you focus on (1) first. If you > > > > > > > > > > > > finish > > > > > > > > > > > > that, > > > > > > > > > > > > we > > > > > > > > > > > > can > > > > > > > > > > > > certainly move on to other things. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll add +1 here, but I already wrote the same > > > > > > > > > > > thing > > > > > > > > > > > on > > > > > > > > > > > IRC > > > > > > > > > > > when > > > > > > > > > > > discussing with Vivek. True IPRA without a proper > > > > > > > > > > > MachineModule > > > > > > > > > > > infrastructure won't be doable in my opinion > > > > > > > > > > > (even > > > > > > > > > > > with > > > > > > > > > > > such > > > > > > > > > > > infrastructure, it may not be trivial in LLVM in > > > > > > > > > > > general). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > Regarding link time, note that any such a > > > > > > > > > > > > design > > > > > > > > > > > > would > > > > > > > > > > > > likely > > > > > > > > > > > > look > > > > > > > > > > > > much different than in David Wall's paper > > > > > > > > > > > > however, > > > > > > > > > > > > because > > > > > > > > > > > > our > > > > > > > > > > > > LTO > > > > > > > > > > > > re-codegens everything anyway. The paper says, > > > > > > > > > > > > "Finally, > > > > > > > > > > > > it > > > > > > > > > > > > keeps > > > > > > > > > > > > us > > > > > > > > > > > > honest as designers of the system; once we > > > > > > > > > > > > postpone > > > > > > > > > > > > anything > > > > > > > > > > > > until > > > > > > > > > > > > link time, the temptation is great to postpone > > > > > > > > > > > > everything, > > > > > > > > > > > > ..." > > > > > > > > > > > > - > > > > > > > > > > > > Well, we've long-since succumb to that > > > > > > > > > > > > temptation > > > > > > > > > > > > when > > > > > > > > > > > > we > > > > > > > > > > > > LTO. > > > > > > > > > > > > C'est > > > > > > > > > > > > la vie. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > +1 as well, our LTO will benefit naturally from > > > > > > > > > > > the > > > > > > > > > > > leaf-to-root > > > > > > > > > > > information propagation. ThinLTO will be more > > > > > > > > > > > challenging/interesting though! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For the first part a mechanism similar to > > > > > > > > > > > > > MachineModulePass > > > > > > > > > > > > > would > > > > > > > > > > > > > be > > > > > > > > > > > > > desirable but that may not be possible during > > > > > > > > > > > > > this > > > > > > > > > > > > > project, > > > > > > > > > > > > > but > > > > > > > > > > > > > if > > > > > > > > > > > > > we can make some sort of smaller version of > > > > > > > > > > > > > that > > > > > > > > > > > > > to > > > > > > > > > > > > > suit > > > > > > > > > > > > > our > > > > > > > > > > > > > purpose. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think we need to make any kind of > > > > > > > > > > > > MachineModulePass > > > > > > > > > > > > to > > > > > > > > > > > > make > > > > > > > > > > > > this work. Once we alter the visitation order > > > > > > > > > > > > based > > > > > > > > > > > > on > > > > > > > > > > > > the > > > > > > > > > > > > CGSCC > > > > > > > > > > > > iteration scheme, we can keep state in-between > > > > > > > > > > > > functions > > > > > > > > > > > > in > > > > > > > > > > > > the > > > > > > > > > > > > pre-existing hacky way (using static members of > > > > > > > > > > > > the > > > > > > > > > > > > relevant > > > > > > > > > > > > function passes). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Sorry my mistake here by first part I mean 1) > > > > > > > > > > requirement > > > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > link > > > > > > > > > > time approach. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also don't see where/why we need a > > > > > > > > > > > MachineModule(Pass) > > > > > > > > > > > for > > > > > > > > > > > the > > > > > > > > > > > CGSCC scheme, that said I'd rather avoid using a > > > > > > > > > > > function > > > > > > > > > > > pass > > > > > > > > > > > with > > > > > > > > > > > static members, if we can have a ModuleAnalysis > > > > > > > > > > > that > > > > > > > > > > > is > > > > > > > > > > > bookkeeping > > > > > > > > > > > the results for functions in the module and > > > > > > > > > > > queries > > > > > > > > > > > by > > > > > > > > > > > the > > > > > > > > > > > register > > > > > > > > > > > allocator somehow. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Matthias/Quentin may have other inputs on this > > > > > > > > > > > aspect. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > @Hal do you mean to add a simple MachineFunction pass > > > > > > > > > that > > > > > > > > > will > > > > > > > > > just > > > > > > > > > operate on register allocated function and prepare a > > > > > > > > > BitVector > > > > > > > > > to > > > > > > > > > indicate which register is being used by > > > > > > > > > MachineFunction, > > > > > > > > > and > > > > > > > > > then > > > > > > > > > use this pass as analysis pass (i.e just simply > > > > > > > > > return > > > > > > > > > static > > > > > > > > > BitVector for clobbered register when register > > > > > > > > > allocation > > > > > > > > > for > > > > > > > > > next > > > > > > > > > function begins. This part is not much clear to me) > > > > > > > > > this > > > > > > > > > thing > > > > > > > > > can > > > > > > > > > be done by scheduling a pass post register allocation > > > > > > > > > in > > > > > > > > > lib/CodeGen/Passes.cpp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > void TargetPassConfig::addMachinePasses() { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > // Run pre-ra passes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > addPreRegAlloc(); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > // Run register allocation and passes that are > > > > > > > > > tightly > > > > > > > > > coupled > > > > > > > > > with > > > > > > > > > it, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > // including phi elimination and scheduling. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > if (getOptimizeRegAlloc()) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > addOptimizedRegAlloc(createRegAllocPass(true)); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > else > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > addFastRegAlloc(createRegAllocPass(false)); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > // Run post-ra passes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > addPostRegAlloc(); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > // Adding a new pass here which keeps register mask > > > > > > > > > information > > > > > > > > > across function calls. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > But this also requires current register allocators to > > > > > > > > > use > > > > > > > > > this > > > > > > > > > information in someway because RegMaskBits in > > > > > > > > > LiveIntervalAnalysis.cpp is not static across calls. > > > > > > > > > I > > > > > > > > > mean > > > > > > > > > I > > > > > > > > > am > > > > > > > > > not > > > > > > > > > clear for how to propagate static info to > > > > > > > > > Intra-procedural > > > > > > > > > Register > > > > > > > > > allocators (if possible without disturbing their code > > > > > > > > > ) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > First, my hope is that we won't need to change the > > > > > > > > register > > > > > > > > allocators, as such, in order to make use of this > > > > > > > > information. > > > > > > > > Instead, we'll simply be able to alter the register > > > > > > > > masks > > > > > > > > generated > > > > > > > > for the call instructions. These masks will indicate > > > > > > > > fewer > > > > > > > > clobbers > > > > > > > > than might otherwise be present based on the ABI > > > > > > > > because > > > > > > > > of > > > > > > > > information gathered during the codegen of the callee. > > > > > > > > These > > > > > > > > masks > > > > > > > > are generally constructed by target based on the > > > > > > > > calling > > > > > > > > convention. > > > > > > > > The PowerPC backend, for example, looks like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > // Add a register mask operand representing the > > > > > > > > call-preserved > > > > > > > > registers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > const TargetRegisterInfo *TRI > > > > > > > > Subtarget.getRegisterInfo(); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > const uint32_t *Mask > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > TRI->getCallPreservedMask(DAG.getMachineFunction(), > > > > > > > > CallConv); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > assert(Mask && "Missing call preserved mask for calling > > > > > > > > convention"); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ops.push_back(DAG.getRegisterMask(Mask)); > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > but it can be more complicated. If you look for uses of > > > > > > > > 'getRegisterMask' in Target/*/*ISelLowering.cpp, you'll > > > > > > > > see > > > > > > > > what > > > > > > > > I > > > > > > > > mean. Regardless, the code ends up calling some method > > > > > > > > is > > > > > > > > the > > > > > > > > targets TargetRegisterInfo subclass. These methods > > > > > > > > generally > > > > > > > > look > > > > > > > > something like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > const uint32_t * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > PPCRegisterInfo::getCallPreservedMask(const > > > > > > > > MachineFunction > > > > > > > > &MF, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > CallingConv::ID CC) const { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > const PPCSubtarget &Subtarget > > > > > > > > MF.getSubtarget<PPCSubtarget>(); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > return TM.isPPC64() ? (Subtarget.hasAltivec() ? > > > > > > > > CSR_SVR464_Altivec_RegMask > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > : CSR_SVR464_RegMask) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > : (Subtarget.hasAltivec() ? CSR_SVR432_Altivec_RegMask > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > : CSR_SVR432_RegMask); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > In any case, the fundamental idea here is that, when > > > > > > > > someone > > > > > > > > calls > > > > > > > > getCallPreservedMask in order to set the regmask on a > > > > > > > > call, > > > > > > > > we > > > > > > > > might > > > > > > > > not have to use the CC at all. Instead, if we've > > > > > > > > already > > > > > > > > codegened > > > > > > > > the function, we might use a cache of 'exact' register > > > > > > > > masks > > > > > > > > computed during codegen of the potential callees > > > > > > > > instead. > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > I am thinking to add a simple Immutable pass > > > > > > > MachineRegisterUsageInfo > > > > > > > similar to MachineBranchProbabilityInfo that can maintain > > > > > > > RegisterUsageInformation per function. Can it be simply > > > > > > > done > > > > > > > by > > > > > > > using UsedPhysRegMask from MachineRegisterInfo ?? > > > > > > > > > > > > > > > > > > > > > > > > > > > No, like the comment said, UsedPhysRegMask gives only the > > > > > > registers > > > > > > clobbered by calls: > > > > > > > > > > > > > > > > > > > > > // This bit vector represents all the registers clobbered > > > > > > by > > > > > > function > > > > > > calls. > > > > > > > > > > > > > > >> > > > > > You want to build this information yourself on top of > > > > > > MachineRegisterInfo:: isPhysRegModified > > > > > > > > > > > > > > > > > > > > Ok but then the time complexity will be O(n) n = number of > > > > > physical > > > > > register on the target. Am I going correct? > > > > > > > > > > > > > > Yes, this is correct. > > > > > >> > > > > > > Here getCallPreservedMask will call API provided by > > > > > > > MachineRegisterUsageInfo to avail the exact register mask > > > > > > > but > > > > > > > how > > > > > > > it > > > > > > > can know that the function is already codegen or it will > > > > > > > query > > > > > > > each > > > > > > > time when getCallPreservedMask is called and of available > > > > > > > MachineRegisterUsageInfo will return the details > > > > > > > otherwise > > > > > > > simply > > > > > > > return NULL. > > > > > > > > > > > > > > > > > > > > > > > > > > > > So changes will be now in TargetRegisterInfo > > > > > > > implementation > > > > > > > for > > > > > > > each > > > > > > > target right ?? > > > > > > > > > > > > > > > > > > > > >> > > > > > > > In order to do this, I think we'll need to provide a > > > > > > > > function > > > > > > > > callable from the target's getCallPreservedMask > > > > > > > > implementation, > > > > > > > > which can return such an 'exact' regmask when > > > > > > > > available. > > > > > > > > I > > > > > > > > think > > > > > > > > we > > > > > > > > need to do it this way for two reasons: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > 1. Not all of the target code calls > > > > > > > > getCallPreservedMask, > > > > > > > > but > > > > > > > > sometimes calls other similar target-specific functions > > > > > > > > (e.g. > > > > > > > > getTLSCallPreservedMask). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. The targets need to opt-in to this behavior because > > > > > > > > only > > > > > > > > the > > > > > > > > target can know that all register uses are really > > > > > > > > tagged > > > > > > > > correctly > > > > > > > > post "pre-emit". > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > Because the target is free to introduce uses of > > > > > > > > registers > > > > > > > > at > > > > > > > > essentially any time, we need to do the scanning for > > > > > > > > used > > > > > > > > registers > > > > > > > > after the "pre-emit" passes run. This can be done by > > > > > > > > scheduling > > > > > > > > some > > > > > > > > simple register-use scanning pass after the call to > > > > > > > > addPreEmitPass > > > > > > > > in lib/CodeGen/Passes.cpp. > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > I think this also applies in someway to Mehdi Amini's > > > > > > > > > idea > > > > > > > > > to > > > > > > > > > keep > > > > > > > > > a > > > > > > > > > ModulePass for bookkeeping but then existing register > > > > > > > > > allocators > > > > > > > > > will be required to change so that the code can query > > > > > > > > > the > > > > > > > > > ModulePass > > > > > > > > > for RegMaskBits for particular function. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that the simplest way to do this is to create > > > > > > > > an > > > > > > > > immutable > > > > > > > > analysis pass (e.g. BasicAA) that keeps the cache of > > > > > > > > the > > > > > > > > computed > > > > > > > > register masks. This is somewhat similar in spirit to > > > > > > > > how > > > > > > > > the > > > > > > > > 'AssumptionCache' analysis works at the IR level. This > > > > > > > > analysis > > > > > > > > can > > > > > > > > then be created by lib/CodeGen/Passes.cpp early, and > > > > > > > > then > > > > > > > > queried > > > > > > > > and passed around later by the CodeGen/Target code. > > > > > > > > Because > > > > > > > > it > > > > > > > > is > > > > > > > > an > > > > > > > > immutable analysis, it won't get destroyed until the > > > > > > > > very > > > > > > > > end, > > > > > > > > which > > > > > > > > is also important because, I imagine, it will need to > > > > > > > > own > > > > > > > > the > > > > > > > > memory > > > > > > > > associated with the generated register masks. > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > -Hal > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > Vivek > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Yes for propagating register usage approach we > > > > > > > > > > don't > > > > > > > > > > need > > > > > > > > > > MachineModulePass > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Vivek > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mehdi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > Hal Finkel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Assistant Computational Scientist > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Leadership Computing Facility > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- >> > Hal Finkel > > > Assistant Computational Scientist > > > Leadership Computing Facility > > > Argonne National Laboratory >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160524/b389f47e/attachment.html>
vivek pandya via llvm-dev
2016-May-25 03:16 UTC
[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback
On Wed, May 25, 2016 at 8:44 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > ------------------------------ > > *From: *"vivek pandya" <vivekvpandya at gmail.com> > *To: *"Hal Finkel" <hfinkel at anl.gov> > *Cc: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias Braun" < > matze at braunis.de>, "Mehdi Amini" <mehdi.amini at apple.com>, "Quentin > Colombet" <qcolombet at apple.com> > *Sent: *Tuesday, May 24, 2016 9:34:29 PM > > *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation - > Introduction and Feedback > > > > On Wed, May 25, 2016 at 3:53 AM, Hal Finkel <hfinkel at anl.gov> wrote: > >> >> ------------------------------ >> >> *From: *"vivek pandya" <vivekvpandya at gmail.com> >> *To: *"Quentin Colombet" <qcolombet at apple.com> >> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org>, >> "Matthias Braun" <matze at braunis.de>, "Mehdi Amini" <mehdi.amini at apple.com >> > >> *Sent: *Tuesday, May 24, 2016 1:00:58 PM >> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation - >> Introduction and Feedback >> >> Hello, >> >> I have written following code to check each register if it is used by >> machineFunction or not : >> >> MachineRegisterInfo *MRI = &MF.getRegInfo(); >> TargetRegisterInfo *TRI = (TargetRegisterInfo >> *)MF.getSubtarget().getRegisterInfo(); >> >> Some reason you can't use a const pointer here? >> > MCRegisterInfo is just used to get conventional name of register for given > target like AX, BX on X86. > >> >> const TargetMachine &TM = MF.getTarget(); >> const MCRegisterInfo *MCRI = TM.getMCRegisterInfo(); >> DEBUG(dbgs() << "Function Name : " << MF.getName() << "\n"); >> >> for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e >> = (*TRI).regclass_end(); i != e; i++ ) { >> for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege >> (*i)->end(); pregi != prege; pregi++ ) { >> DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) << " is >> modified "<< MRI->isPhysRegModified(*pregi) << " \n"); >> >> Try isPhysRegUsed. >> > ok > >> >> } >> } >> DEBUG(dbgs() << "\n"); >> >> The pass which is executing this code is schedule POST-RA stage but this >> gives me true for all registers i.e in each function all registers are >> being used except EBP and some other similar, Is this a correct way to get >> register usage information ? I think I have made some mistake please help. >> >> >> You might look at the implementation of these functions in >> lib/CodeGen/MachineRegisterInfo.cpp and figure out if they're returning >> true because UsedPhysRegMask.test(PhysReg) is true or because >> reg_nodbg_empty(*AliasReg) is true. >> > Yes that helped now I am getting actual register which have been used by > given function, but a little problem > The updated code is as shown below : > for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e > (*TRI).regclass_end(); i != e; i++ ) { > for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege > (*i)->end(); pregi != prege; pregi++ ) { > for (MCRegAliasIterator AliasReg(*pregi, TRI, true); AliasReg.isValid(); > ++AliasReg) { > if (!MRI->reg_nodbg_empty(*AliasReg)) { > DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) << " > is used "<< MRI->isPhysRegUsed(*pregi) << " \n"); > break; // no need to process more alias > } > } > } > } > But here some registers are getting processed with in different classes > (unnecessary processing) Is this only way to iterate through all used > register (using RegClass iterator) ? Is there any way to avoid duplicate > regs? > Of course currently I am just printing but next I am thinking to use a map > to track usage info , in that only distinct register info will be stored > but still due to loop structure I need to iterate through a single register > 3 - 4 times making it time consuming. > > Yes, I believe you can just do: > > for (unsigned Reg = 0; Reg < TRI->getNumRegs(); ++Reg) { >Oh yes thanks I just forgot that PhyReg starts at 0.> > > -Hal > > > -Vivek > >> >> >> -Hal >> >> >> >> Vivek >> >> On Wed, May 18, 2016 at 11:42 PM, Quentin Colombet <qcolombet at apple.com> >> wrote: >> >>> >>> On May 18, 2016, at 11:00 AM, vivek pandya <vivekvpandya at gmail.com> >>> wrote: >>> >>> >>> >>> *Vivek Pandya* >>> >>> >>> On Wed, May 18, 2016 at 11:25 PM, Quentin Colombet <qcolombet at apple.com> >>> wrote: >>> >>>> >>>> On May 18, 2016, at 10:46 AM, vivek pandya <vivekvpandya at gmail.com> >>>> wrote: >>>> >>>> >>>> >>>> *Vivek Pandya* >>>> >>>> >>>> On Wed, May 11, 2016 at 4:01 PM, Hal Finkel <hfinkel at anl.gov> wrote: >>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> *From: *"vivek pandya" <vivekvpandya at gmail.com> >>>>> *To: *"Mehdi Amini" <mehdi.amini at apple.com> >>>>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "Quentin Colombet" < >>>>> qcolombet at apple.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias >>>>> Braun" <matze at braunis.de> >>>>> *Sent: *Wednesday, May 11, 2016 3:15:03 AM >>>>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation - >>>>> Introduction and Feedback >>>>> >>>>> >>>>> >>>>> >>>>> *Vivek Pandya* >>>>> >>>>> >>>>> On Wed, May 11, 2016 at 10:02 AM, vivek pandya <vivekvpandya at gmail.com >>>>> > wrote: >>>>> >>>>>> >>>>>> >>>>>> *Vivek Pandya* >>>>>> >>>>>> >>>>>> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------ >>>>>>> >>>>>>> *From: *"vivek pandya" <vivekvpandya at gmail.com> >>>>>>> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Tim Amini Golling" < >>>>>>> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at anl.gov> >>>>>>> *Cc: *"Quentin Colombet" <qcolombet at apple.com> >>>>>>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM >>>>>>> *Subject: *[GSoC 2016] Interprocedural Register Allocation - >>>>>>> Introduction and Feedback >>>>>>> >>>>>>> Hello LLVM Community, >>>>>>> >>>>>>> Sorry for delay as I was busy in final exams. >>>>>>> >>>>>>> I am Vivek from India. Thanks for choosing my proposal for >>>>>>> Interprocedural Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal >>>>>>> Finkel will be mentoring me for this project. >>>>>>> >>>>>>> IPRA can reduce code size and runtime of programs by allocating >>>>>>> register across the module and procedure boundaries. >>>>>>> >>>>>>> I have identified some old but effective research work on this area. >>>>>>> I want community's feedback for feasibility of these approach and I >>>>>>> am targeting to implement two of them during this project. >>>>>>> >>>>>>> Here is list of the papers, I have read first two papers and I would >>>>>>> like to discuss those approach first, I will read other two paper then >>>>>>> initiate discussion for them as well. All I want is to find out a concrete >>>>>>> implementation plan before 23 May, 2016 and for that I need community's >>>>>>> help. >>>>>>> >>>>>>> 1) Compile time ----- Minimizing register usage penalty at procedure >>>>>>> calls - http://dl.acm.org/citation.cfm?id=53999 >>>>>>> ====================================================================In >>>>>>> this approach intra-procedural register allocation is used as base but >>>>>>> machine code generation order is bottom up traversal of call graph and >>>>>>> inter-procedural effect is achieved by propagating register usage >>>>>>> information of callee function to caller (i.e child to parent in CallGraph) >>>>>>> so that caller can use different registers than callee and can save load >>>>>>> store cost at procedure call, this is not trivial as it seems due to >>>>>>> recursive calls, library function usage etc. Also for upper region of the >>>>>>> graph in this technique available number of registers might become zero in >>>>>>> that case it should fall back to normal load store at procedure call. Apart >>>>>>> from these difficulties other difficulties have been identified please >>>>>>> follow this mail-chain >>>>>>> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion >>>>>>> My mentor has already provided me a patch that alters code >>>>>>> generation order as per bottom up call graph traversal, I am working from >>>>>>> that point now. Any other help/suggestion is always welcomed. >>>>>>> >>>>>>> 2) Link time ----- Global register allocation at link time - >>>>>>> http://dl.acm.org/citation.cfm?id=989415 >>>>>>> ====================================================================In >>>>>>> this particular approach (sort of true IPRA) registers will be reallocated >>>>>>> (this optimization will be optional if turned off still code will be >>>>>>> compiled as per intra-procedural allocation) at link time. Here modules are >>>>>>> first complied as per normal compilation but the object code is annotated >>>>>>> with details so that linker can build call graph and also calculate usage >>>>>>> information at link time. Compiler also write hints in object code that if >>>>>>> particular variable is allocated in some other register ( due to new >>>>>>> allocation) then how the code should be changed? Thus linker can use these >>>>>>> information to decide which variables (global) need to be in same register >>>>>>> through out the program execution and also according to register usage >>>>>>> information in call graph which procedure will not be active simultaneously >>>>>>> so that locals for that procedures can be in same registers with out load >>>>>>> store at procedure calls. >>>>>>> For these particular method help me to analyze feasibility: >>>>>>> 1) Can llvm collects following information at module level in >>>>>>> MachineIR? list of procedures in module, list of locals in procedures, list >>>>>>> of procedures that a particular procedure can call, and a list of the >>>>>>> variables this procedure references. Each entry in the last two lists >>>>>>> includes an estimate of the number of times the procedure is called or the >>>>>>> variable is referenced in each execution of this procedure >>>>>>> 2) Can llvm write informative commands to object files? >>>>>>> 3) Can LTO is capable of leveraging those commands? >>>>>>> >>>>>>> In terms of scoping the project for the summer, I definitely >>>>>>> recommend that you focus on (1) first. If you finish that, we can certainly >>>>>>> move on to other things. >>>>>>> >>>>>>> >>>>>>> I'll add +1 here, but I already wrote the same thing on IRC when >>>>>>> discussing with Vivek. True IPRA without a proper MachineModule >>>>>>> infrastructure won't be doable in my opinion (even with such >>>>>>> infrastructure, it may not be trivial in LLVM in general). >>>>>>> >>>>>>> Regarding link time, note that any such a design would likely look >>>>>>> much different than in David Wall's paper however, because our LTO >>>>>>> re-codegens everything anyway. The paper says, "Finally, it keeps us honest >>>>>>> as designers of the system; once we postpone anything until link time, the >>>>>>> temptation is great to postpone everything, ..." - Well, we've long-since >>>>>>> succumb to that temptation when we LTO. C'est la vie. >>>>>>> >>>>>>> >>>>>>> +1 as well, our LTO will benefit naturally from the leaf-to-root >>>>>>> information propagation. ThinLTO will be more challenging/interesting >>>>>>> though! >>>>>>> >>>>>>> For the first part a mechanism similar to MachineModulePass would be >>>>>>> desirable but that may not be possible during this project, but if we can >>>>>>> make some sort of smaller version of that to suit our purpose. >>>>>>> >>>>>>> I don't think we need to make any kind of MachineModulePass to make >>>>>>> this work. Once we alter the visitation order based on the CGSCC iteration >>>>>>> scheme, we can keep state in-between functions in the pre-existing hacky >>>>>>> way (using static members of the relevant function passes). >>>>>>> >>>>>>> Sorry my mistake here by first part I mean 1) requirement in the >>>>>> link time approach. >>>>>> >>>>>>> >>>>>> >>>>>>> I also don't see where/why we need a MachineModule(Pass) for the >>>>>>> CGSCC scheme, that said I'd rather avoid using a function pass with static >>>>>>> members, if we can have a ModuleAnalysis that is bookkeeping the results >>>>>>> for functions in the module and queries by the register allocator somehow. >>>>>>> Matthias/Quentin may have other inputs on this aspect. >>>>>>> >>>>>> >>>>> @Hal do you mean to add a simple MachineFunction pass that will just >>>>> operate on register allocated function and prepare a BitVector to indicate >>>>> which register is being used by MachineFunction, and then use this pass as >>>>> analysis pass (i.e just simply return static BitVector for clobbered >>>>> register when register allocation for next function begins. This part is >>>>> not much clear to me) this thing can be done by scheduling a pass post >>>>> register allocation in lib/CodeGen/Passes.cpp >>>>> >>>>> void TargetPassConfig::addMachinePasses() { >>>>> . >>>>> . >>>>> . >>>>> // Run pre-ra passes. >>>>> addPreRegAlloc(); >>>>> >>>>> // Run register allocation and passes that are tightly coupled with >>>>> it, >>>>> // including phi elimination and scheduling. >>>>> if (getOptimizeRegAlloc()) >>>>> addOptimizedRegAlloc(createRegAllocPass(true)); >>>>> else >>>>> addFastRegAlloc(createRegAllocPass(false)); >>>>> >>>>> // Run post-ra passes. >>>>> addPostRegAlloc(); >>>>> // Adding a new pass here which keeps register mask information across >>>>> function calls. >>>>> . >>>>> . >>>>> . >>>>> } >>>>> >>>>> But this also requires current register allocators to use this >>>>> information in someway because RegMaskBits in LiveIntervalAnalysis.cpp is >>>>> not static across calls. I mean I am not clear for how to propagate static >>>>> info to Intra-procedural Register allocators (if possible without >>>>> disturbing their code ) >>>>> >>>>> First, my hope is that we won't need to change the register >>>>> allocators, as such, in order to make use of this information. Instead, >>>>> we'll simply be able to alter the register masks generated for the call >>>>> instructions. These masks will indicate fewer clobbers than might otherwise >>>>> be present based on the ABI because of information gathered during the >>>>> codegen of the callee. These masks are generally constructed by target >>>>> based on the calling convention. The PowerPC backend, for example, looks >>>>> like this: >>>>> >>>>> // Add a register mask operand representing the call-preserved >>>>> registers. >>>>> const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo(); >>>>> const uint32_t *Mask >>>>> TRI->getCallPreservedMask(DAG.getMachineFunction(), CallConv); >>>>> assert(Mask && "Missing call preserved mask for calling >>>>> convention"); >>>>> Ops.push_back(DAG.getRegisterMask(Mask)); >>>>> >>>>> but it can be more complicated. If you look for uses of >>>>> 'getRegisterMask' in Target/*/*ISelLowering.cpp, you'll see what I mean. >>>>> Regardless, the code ends up calling some method is the targets >>>>> TargetRegisterInfo subclass. These methods generally look something like >>>>> this: >>>>> >>>>> const uint32_t * >>>>> PPCRegisterInfo::getCallPreservedMask(const MachineFunction &MF, >>>>> CallingConv::ID CC) const { >>>>> const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>(); >>>>> ... >>>>> return TM.isPPC64() ? (Subtarget.hasAltivec() ? >>>>> CSR_SVR464_Altivec_RegMask >>>>> : CSR_SVR464_RegMask) >>>>> : (Subtarget.hasAltivec() ? >>>>> CSR_SVR432_Altivec_RegMask >>>>> : CSR_SVR432_RegMask); >>>>> } >>>>> >>>>> In any case, the fundamental idea here is that, when someone calls >>>>> getCallPreservedMask in order to set the regmask on a call, we might not >>>>> have to use the CC at all. Instead, if we've already codegened the >>>>> function, we might use a cache of 'exact' register masks computed during >>>>> codegen of the potential callees instead. >>>>> >>>> I am thinking to add a simple Immutable pass MachineRegisterUsageInfo >>>> similar to MachineBranchProbabilityInfo that can maintain >>>> RegisterUsageInformation per function. Can it be simply done by using >>>> UsedPhysRegMask from MachineRegisterInfo ?? >>>> >>>> >>>> No, like the comment said, UsedPhysRegMask gives only the registers >>>> clobbered by calls: >>>> // This bit vector represents all the registers clobbered by function >>>> calls. >>>> >>>> You want to build this information yourself on top of >>>> MachineRegisterInfo::isPhysRegModified >>>> >>> Ok but then the time complexity will be O(n) n = number of physical >>> register on the target. Am I going correct? >>> >>> >>> Yes, this is correct. >>> >>> >>>> Here getCallPreservedMask will call API provided by >>>> MachineRegisterUsageInfo to avail the exact register mask but how it can >>>> know that the function is already codegen or it will query each time when >>>> getCallPreservedMask is called and of available MachineRegisterUsageInfo >>>> will return the details otherwise simply return NULL. >>>> So changes will be now in TargetRegisterInfo implementation for each >>>> target right ?? >>>> >>>> >>>>> In order to do this, I think we'll need to provide a function callable >>>>> from the target's getCallPreservedMask implementation, which can return >>>>> such an 'exact' regmask when available. I think we need to do it this way >>>>> for two reasons: >>>>> >>>>> 1. Not all of the target code calls getCallPreservedMask, but >>>>> sometimes calls other similar target-specific functions (e.g. >>>>> getTLSCallPreservedMask). >>>>> 2. The targets need to opt-in to this behavior because only the >>>>> target can know that all register uses are really tagged correctly post >>>>> "pre-emit". >>>>> >>>>> Because the target is free to introduce uses of registers at >>>>> essentially any time, we need to do the scanning for used registers after >>>>> the "pre-emit" passes run. This can be done by scheduling some simple >>>>> register-use scanning pass after the call to addPreEmitPass in >>>>> lib/CodeGen/Passes.cpp. >>>>> >>>>> >>>>> I think this also applies in someway to Mehdi Amini's idea to keep a >>>>> ModulePass for bookkeeping but then existing register allocators will be >>>>> required to change so that the code can query the ModulePass for >>>>> RegMaskBits for particular function. >>>>> >>>>> I think that the simplest way to do this is to create an immutable >>>>> analysis pass (e.g. BasicAA) that keeps the cache of the computed register >>>>> masks. This is somewhat similar in spirit to how the 'AssumptionCache' >>>>> analysis works at the IR level. This analysis can then be created by >>>>> lib/CodeGen/Passes.cpp early, and then queried and passed around later by >>>>> the CodeGen/Target code. Because it is an immutable analysis, it won't get >>>>> destroyed until the very end, which is also important because, I imagine, >>>>> it will need to own the memory associated with the generated register masks. >>>>> >>>>> -Hal >>>>> >>>>> >>>>> Vivek >>>>> >>>>> >>>>>>> Yes for propagating register usage approach we don't need >>>>>> MachineModulePass >>>>>> >>>>>> Vivek >>>>>> >>>>>>> -- >>>>>>> Mehdi >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Hal Finkel >>>>> Assistant Computational Scientist >>>>> Leadership Computing Facility >>>>> Argonne National Laboratory >>>>> >>>> >>> >> >> >> >> -- >> Hal Finkel >> Assistant Computational Scientist >> Leadership Computing Facility >> Argonne National Laboratory >> > > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160525/0182e15d/attachment-0001.html>
Possibly Parallel Threads
- [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback
- [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback
- [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback
- [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback
- [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback