thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] Interprocedural Register Allocation

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-May-25 06:26 UTC

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Sent from my iPhone
> On May 24, 2016, at 11:04 PM, vivek pandya <vivekvpandya at
gmail.com> wrote:
> 
> 
> 
>> On Wed, May 25, 2016 at 10:46 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> 
>>> On May 24, 2016, at 10:08 PM, vivek pandya <vivekvpandya at
gmail.com> wrote:
>>> 
>>> 
>>> 
>>> On Wed, May 25, 2016 at 10:08 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>>>> 
>>>>> On May 24, 2016, at 9:17 PM, vivek pandya <vivekvpandya
at gmail.com> wrote:
>>>>> 
>>>>> Dear Mentors,
>>>>> 
>>>>> Please help me to understand our plan to implement
Interprocedural Register allocator by propogating register usage info. While
writing this mail I am considering all previous discussion over llvm-dev and
IRC.
>>>>> 
>>>>> 1) A MachineFunction pass to be executed POST-RA to collect
the information about the used Registers.
>>>>> 2) An Immutable pass which will store reg usage info
collected by previous pass and return it whenever queried.
>>>>> 3) A Target specific MachineFucntion pass that will use the
register usage info for available for call instrction to 	achive IPRA. This pass
should run at PRE-RA.
>>>>> 
>>>>> Relation among above passes:
>>>>> 
>>>>> 1) pass will store info to 2) pass as well use info for
call instruction found while processing.
>>>>> 
>>>>> 3) pass only requires to query information from 2) pass.
>>>>> 
>>>>> 
>>>>> Questions
>>>>> ========>>>>> 
>>>>>  Which pass is responsible for load/store of callee saved
register, at the begining of each function call? And how does it uses RegMask of
call instruction to generate load/store. I think Intra-procedural register
allocator is not responsible to generate load/store around the call site.
>>>>> 
>>>>> 				 /- - -> (A) - - -> (D)
>>>>> 				/
>>>>> 	(K)- - ->(T)- - -> (B) - - -> (E)
>>>>> 				\
>>>>> 				 \- - -> (C) - - -> (F)
>>>>> 
>>>>> So as per our discussion we would require following passes:
>>>>> 
>>>>> Suppose in given example call graph , register allocation
for D is completed now we have that information available So 3) pass while
processing A , it would collect reg usage info for all callees and OR them and
then it should update A's regmask by going to parant procedure that actually
calls A ??
>>>> 
>>>> No, Pass 3) is only looking for every call MI in A and updating
the associated regmask by replacing it with the information stored in the
immutable pass.
>>> What is the use of that ? Please be more specific.
>> 
>> I'm not sure I get the question. You describe 3 different passes at
the beginning, and it seems fairly obvious what is the responsibility of each.
>> 
>> This seems quite straightforward to me, so I'm not sure what to
explain, here is the logical sequence on D and A:
>> 
>> 1) Codegen function D, MachineFunction post-RA: collect the register
usage for D. Store the new regmask in the immutable pass
>> 2) Move to function A, perform ISel,  then run the MachineFunction
pre-RA that finds the call to D. Query the immutable pass and get the regmask
for D. Update the regmask associated with the call.
>> 
>> Done.
> I understand the sequence we discussed till now but as I am also reading
literature about IPRA so I am confused that how this would remove load/store of
registers (so that callee can clobber it with out worrying about caller's
content in registers)
From the beginning I told you that it is not true IPRA on this aspect: callees
have to preserve callee-saved register according to the CC, unless internal.


> and also how caller will not use register used by callee.
Caller can use them, just like it does now: by saving restoring them
before/after the call if it does.
> So this boils down to question what magic is done by just updating reg mask
at callsite and we do not eve require to intimate this to intra-procedural
register allocator,
Ask yourself how the RA knows how to deal with what to save around calls in
general (hint: think about what is the use of regmask).

> this also means that some where a code is written that is responsible for
inserting store/load for register so that callee can preserve register but in
presence of IPRA   how it will not add those store/load?
What we're doing here is purely an optimization of the calling convention.
This does not have to be more intrusive...


> 
> It seems that my questions have confused Mehdi Amini :P
> Other masters please help us !
>  
>> 
>> 
>> 
>>>> 
>>>> 
>>>>> How reg mask details of call D would be used by Register
allocator while allocating register for A and also not generating load/store for
register being used by A in body of D as we have callee saved convention.
>>>> 
>>>> I expect all of that to be handled automatically when updating
the regmask.
>>> Please elaborate your both comments, what ever you have think of it
to be.
>>>> 
>>>> 
>>>>> 
>>>>> How the pass responsible for generating load/store will
optimize for the child node of call graph where it does not require to
load/store because caller will not use register used by callee ? I mean how our
IPRA will take care of this?
>>>> 
>>>> I don't understand that.
>>> In above example call graph when pass 3) is executed for D it does
not do any work because there is not call to any other function in body of D but
actually in presence of IPRA there is no need to preserve (store/load) any
register in function D ( and also in other such leaf node of call graph). The
reason for the same is A now aware of register usage in D so it will not use any
register which intersect with D's register usage thus D need not to worry
about any thing.
>>> Is this make sense?
>> 
>> No. D has to obey the calling convention if it is not internal. So any
callee-saved register has to be ... saved.
> Why is that required when caller of D does not use any register used by D? 
You don't know about the callers when you are doing codegen for D.
Again: we're not trying to perform pure IPRA here.

-- 
Mehdi

>> 
>> 
>> -- 
>> Mehdi
>> 
>> 
>> 
>> 
>>>> 
>>>> 
>>>>> 
>>>>> In short I am not much clear with the method for using
information to get effect of IPRA without modifying Register allocator them
self(i.e by updating regMask of call instructions).
>>>>> 
>>>>> Also 1) pass and 3) pass are seem to intersecting for their
work, for example consider while scanning register usage info for T function the
final register usage info should be <all regs used by T> OR < reg usage
info A > OR <reg usage info B > OR < reg usage info C > because K
should not use any register which is used by T, A ,D, B, C, E, F with out
load/store the relevant paper also discuss this situation and suggest to fall
back to load/store approach. So as we move to upper region of the call graph it
is very likely that enough regiseters are not there to allocate.
>>>> 
>>>> The calling convention (for anything else than internal
function) will always have some callee-saved registers.
>>>> If we have a deep call-graph of internal-only function, then we
may generate a lot of spill at the top. Somehow we may have to think about
driving some heuristic with PGO.
>>> 
>>> This kind of problem is addressed by this paper Register allocation
across procedure and module boundaries - http://dl.acm.org/citation.cfm?id=93551
>>> In this paper authors have tried to eliminate load/store only in
call intensive regions ( contrast to simple bottom up ) by analyzing call
frequency at compile time ( also has facility to plug in Profile info) and rest
of the call graph will follow normal register allocator. But yes we will think
about this later.
>>> 
>>>> I suspect we can think about that a bit later. Let's focus
on the simple for now.
>>>> 
>>>> -- 
>>>> Mehdi
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Please bear with my silly questions.
>>>>> 
>>>>> Sincerely,
>>>>> Vivek
>>>>> 
>>>>> 
>>>>>> On Wed, May 25, 2016 at 8:46 AM, vivek pandya
<vivekvpandya at gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Wed, May 25, 2016 at 8:44 AM, Hal Finkel
<hfinkel at anl.gov> wrote:
>>>>>>> 
>>>>>>> From: "vivek pandya" <vivekvpandya at
gmail.com>
>>>>>>> To: "Hal Finkel" <hfinkel at
anl.gov>
>>>>>>> Cc: "llvm-dev" <llvm-dev at
lists.llvm.org>, "Matthias Braun" <matze at braunis.de>,
"Mehdi Amini" <mehdi.amini at apple.com>, "Quentin
Colombet" <qcolombet at apple.com>
>>>>>>> Sent: Tuesday, May 24, 2016 9:34:29 PM
>>>>>>> 
>>>>>>> Subject: Re: [GSoC 2016] Interprocedural Register
Allocation - Introduction and Feedback
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Wed, May 25, 2016 at 3:53 AM, Hal Finkel
<hfinkel at anl.gov> wrote:
>>>>>>>> 
>>>>>>>> From: "vivek pandya" <vivekvpandya
at gmail.com>
>>>>>>>> To: "Quentin Colombet" <qcolombet
at apple.com>
>>>>>>>> Cc: "Hal Finkel" <hfinkel at
anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org>,
"Matthias Braun" <matze at braunis.de>, "Mehdi Amini"
<mehdi.amini at apple.com>
>>>>>>>> Sent: Tuesday, May 24, 2016 1:00:58 PM
>>>>>>>> Subject: Re: [GSoC 2016] Interprocedural
Register Allocation - Introduction and Feedback
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> I have written following code to check each
register if it is used by machineFunction or not :
>>>>>>>> 
>>>>>>>> MachineRegisterInfo *MRI =
&MF.getRegInfo();
>>>>>>>> 	TargetRegisterInfo *TRI = (TargetRegisterInfo
*)MF.getSubtarget().getRegisterInfo();
>>>>>>>> Some reason you can't use a const pointer
here?
>>>>>>> MCRegisterInfo is just used to get conventional
name of register for given target like AX, BX on X86.
>>>>>>>> 
>>>>>>>> 	const TargetMachine &TM = MF.getTarget();
>>>>>>>> 	const MCRegisterInfo *MCRI =
TM.getMCRegisterInfo();
>>>>>>>> 	DEBUG(dbgs() << "Function Name :
" << MF.getName() << "\n");
>>>>>>>> 
>>>>>>>> 	for(TargetRegisterInfo::regclass_iterator i =
(*TRI).regclass_begin(), e = (*TRI).regclass_end(); i != e; i++ ) {
>>>>>>>> 		for(TargetRegisterClass::iterator pregi =
(*i)->begin(), prege = (*i)->end(); pregi != prege; pregi++ ) {
>>>>>>>> 			DEBUG( dbgs() << "Physical
Register : " << MCRI->getName(*pregi) << " is modified
"<< MRI->isPhysRegModified(*pregi) << " \n");
>>>>>>>> Try isPhysRegUsed.
>>>>>>> ok 
>>>>>>>> 
>>>>>>>> 		}
>>>>>>>> 	}
>>>>>>>> 	DEBUG(dbgs() << "\n");
>>>>>>>> 
>>>>>>>> The pass which is executing this code is
schedule POST-RA stage but this gives me true for all registers i.e in each
function all registers are being used except EBP and some other similar, Is this
a correct way to get register usage information ? I think I have made some
mistake please help.
>>>>>>>> 
>>>>>>>> You might look at the implementation of these
functions in lib/CodeGen/MachineRegisterInfo.cpp and figure out if they're
returning true because UsedPhysRegMask.test(PhysReg) is true or because
reg_nodbg_empty(*AliasReg) is true.
>>>>>>> Yes that helped now I am getting actual register
which have been used by given function, but a little problem
>>>>>>> The updated code is as shown below : 
>>>>>>> for(TargetRegisterInfo::regclass_iterator i =
(*TRI).regclass_begin(), e = (*TRI).regclass_end(); i != e; i++ ) {
>>>>>>> 		for(TargetRegisterClass::iterator pregi =
(*i)->begin(), prege = (*i)->end(); pregi != prege; pregi++ ) {
>>>>>>> 			for (MCRegAliasIterator AliasReg(*pregi, TRI,
true); AliasReg.isValid(); ++AliasReg) {
>>>>>>> 			    if (!MRI->reg_nodbg_empty(*AliasReg)) {
>>>>>>> 			    	DEBUG( dbgs() << "Physical
Register : " << MCRI->getName(*pregi) << " is used
"<< MRI->isPhysRegUsed(*pregi) << " \n");
>>>>>>> 			    	break; // no need to process more alias
>>>>>>> 			    }
>>>>>>>  			 }	
>>>>>>> 		}
>>>>>>> 	} 
>>>>>>> But here some registers are getting processed with
in different classes (unnecessary processing) Is this only way to iterate
through all used register (using RegClass iterator) ? Is there any way to avoid
duplicate regs?
>>>>>>> Of course currently I am just printing but next I
am thinking to use a map to track usage info , in that only distinct register
info will be stored but still due to loop structure I need to iterate through a
single register 3 - 4 times making it time consuming.
>>>>>>> Yes, I believe you can just do:
>>>>>>> 
>>>>>>>   for (unsigned Reg = 0; Reg <
TRI->getNumRegs(); ++Reg) {
>>>>>> 
>>>>>> Oh yes thanks I just forgot that PhyReg starts at 0. 
>>>>>>> 
>>>>>>> 
>>>>>>>  -Hal
>>>>>>> 
>>>>>>> 
>>>>>>> -Vivek
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  -Hal
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Vivek
>>>>>>>> 
>>>>>>>>> On Wed, May 18, 2016 at 11:42 PM, Quentin
Colombet <qcolombet at apple.com> wrote:
>>>>>>>>> 
>>>>>>>>> On May 18, 2016, at 11:00 AM, vivek pandya
<vivekvpandya at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Vivek Pandya
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Wed, May 18, 2016 at 11:25 PM,
Quentin Colombet <qcolombet at apple.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> On May 18, 2016, at 10:46 AM, vivek
pandya <vivekvpandya at gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Vivek Pandya
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Wed, May 11, 2016 at 4:01 PM,
Hal Finkel <hfinkel at anl.gov> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> From: "vivek pandya"
<vivekvpandya at gmail.com>
>>>>>>>>>>> To: "Mehdi Amini"
<mehdi.amini at apple.com>
>>>>>>>>>>> Cc: "Hal Finkel"
<hfinkel at anl.gov>, "Quentin Colombet" <qcolombet at
apple.com>, "llvm-dev" <llvm-dev at lists.llvm.org>,
"Matthias Braun" <matze at braunis.de>
>>>>>>>>>>> Sent: Wednesday, May 11, 2016
3:15:03 AM
>>>>>>>>>>> Subject: Re: [GSoC 2016]
Interprocedural Register Allocation - Introduction and Feedback
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Vivek Pandya
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, May 11, 2016 at 10:02
AM, vivek pandya <vivekvpandya at gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Vivek Pandya
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 11, 2016 at
9:43 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 10, 2016, at 6:06
PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> From: "vivek
pandya" <vivekvpandya at gmail.com>
>>>>>>>>>>>>> To: "llvm-dev"
<llvm-dev at lists.llvm.org>, "Tim Amini Golling"
<mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at
anl.gov>
>>>>>>>>>>>>> Cc: "Quentin
Colombet" <qcolombet at apple.com>
>>>>>>>>>>>>> Sent: Tuesday, May 10, 2016
2:59:16 PM
>>>>>>>>>>>>> Subject: [GSoC 2016]
Interprocedural Register Allocation - Introduction and Feedback
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hello LLVM Community,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sorry for delay as I was
busy in final exams.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am Vivek from India.
Thanks for choosing my proposal for Interprocedural Register Allocation (IPRA)
in LLVM. Mehdi Amini and Hal Finkel will be mentoring me for this project.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> IPRA can reduce code size
and runtime of programs by allocating register across the module and procedure
boundaries.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have identified some old
but effective research work on this area.
>>>>>>>>>>>>> I want community's
feedback for feasibility of these approach and I am targeting to implement two
of them during this project.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here is list of the papers,
I have read first two papers and I would like to discuss those approach first, I
will read other two paper then initiate discussion for them as well. All I want
is to find out a concrete implementation plan before 23 May, 2016 and for that I
need community's help.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1) Compile time -----
Minimizing register usage penalty at procedure calls -
http://dl.acm.org/citation.cfm?id=53999
>>>>>>>>>>>>>
====================================================================In this
approach intra-procedural register allocation is used as base but machine code
generation order is bottom up traversal of call graph and inter-procedural
effect is achieved by propagating register usage information of callee function
to caller (i.e child to parent in CallGraph) so that caller can use different
registers than callee and can save load store cost at procedure call, this is
not trivial as it seems due to recursive calls, library function usage etc. Also
for upper region of the graph in this technique available number of registers
might become zero in that case it should fall back to normal load store at
procedure call. Apart from these difficulties other difficulties have been
identified please follow this mail-chain
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>>>>>>>>>>>>> My mentor has already
provided me a patch that alters code generation order as per bottom up call
graph traversal, I am working from that point now. Any other help/suggestion is
always welcomed.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2) Link time ----- Global
register allocation at link time - http://dl.acm.org/citation.cfm?id=989415
>>>>>>>>>>>>>
====================================================================In this
particular approach (sort of true IPRA) registers will be reallocated (this
optimization will be optional if turned off still code will be compiled as per
intra-procedural allocation) at link time. Here modules are first complied as
per normal compilation but the object code is annotated with details so that
linker can build call graph and also calculate usage information at link time.
Compiler also write hints in object code that if particular variable is
allocated in some other register ( due to new allocation) then how the code
should be changed? Thus linker can use these information to decide which
variables (global) need to be in same register through out the program execution
and also according to register usage information in call graph which procedure
will not be active simultaneously so that locals for that procedures can be in
same registers with out load store at procedure calls.
>>>>>>>>>>>>> For these particular method
help me to analyze feasibility:
>>>>>>>>>>>>> 1) Can llvm collects
following information at module level in MachineIR? list of procedures in
module, list of locals in procedures, list of procedures that a particular
procedure can call, and a list of the variables this procedure references. Each
entry in the last two lists includes an estimate of the number of times the
procedure is called or the variable is referenced in each execution of this
procedure
>>>>>>>>>>>>> 2) Can llvm write
informative commands to object files?
>>>>>>>>>>>>> 3) Can LTO is capable of
leveraging those commands?
>>>>>>>>>>>>> In terms of scoping the
project for the summer, I definitely recommend that you focus on (1) first. If
you finish that, we can certainly move on to other things.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'll add +1 here, but I
already wrote the same thing on IRC when discussing with Vivek. True IPRA
without a proper MachineModule infrastructure won't be doable in my opinion
(even with such infrastructure, it may not be trivial in LLVM in general).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regarding link time, note
that any such a design would likely look much different than in David Wall's
paper however, because our LTO re-codegens everything anyway. The paper says,
"Finally, it keeps us honest as designers of the system; once we postpone
anything until link time, the temptation is great to postpone everything,
..." - Well, we've long-since succumb to that temptation when we LTO.
C'est la vie.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> +1 as well, our LTO will
benefit naturally from the leaf-to-root information propagation. ThinLTO will be
more challenging/interesting though!
>>>>>>>>>>>>> For the first part a
mechanism similar to MachineModulePass would be desirable but that may not be
possible during this project, but if we can make some sort of smaller version of
that to suit our purpose.
>>>>>>>>>>>>> I don't think we need
to make any kind of MachineModulePass to make this work. Once we alter the
visitation order based on the CGSCC iteration scheme, we can keep state
in-between functions in the pre-existing hacky way (using static members of the
relevant function passes).
>>>>>>>>>>>> 
>>>>>>>>>>>>  Sorry my mistake here by first
part I mean 1) requirement in the link time approach.
>>>>>>>>>>>>  
>>>>>>>>>>>>> I also don't see
where/why we need a MachineModule(Pass) for the CGSCC scheme, that said I'd
rather avoid using a function pass with static members, if we can have a
ModuleAnalysis that is bookkeeping the results for functions in the module and
queries by the register allocator somehow.
>>>>>>>>>>>>> Matthias/Quentin may have
other inputs on this aspect.
>>>>>>>>>>>  
>>>>>>>>>>> @Hal do you mean to add a simple
MachineFunction pass that will just operate on register allocated function and
prepare a BitVector to indicate which register is being used by MachineFunction,
and then use this pass as analysis pass (i.e just simply return static BitVector
for clobbered register when register allocation for next function begins. This
part is not much clear to me) this thing can be done by scheduling a pass post
register allocation in lib/CodeGen/Passes.cpp
>>>>>>>>>>> 
>>>>>>>>>>> void
TargetPassConfig::addMachinePasses() {
>>>>>>>>>>> . 
>>>>>>>>>>> .
>>>>>>>>>>> .
>>>>>>>>>>>   // Run pre-ra passes.
>>>>>>>>>>>   addPreRegAlloc();
>>>>>>>>>>> 
>>>>>>>>>>>   // Run register allocation and
passes that are tightly coupled with it,
>>>>>>>>>>>   // including phi elimination and
scheduling.
>>>>>>>>>>>   if (getOptimizeRegAlloc())
>>>>>>>>>>>    
addOptimizedRegAlloc(createRegAllocPass(true));
>>>>>>>>>>>   else
>>>>>>>>>>>    
addFastRegAlloc(createRegAllocPass(false));
>>>>>>>>>>> 
>>>>>>>>>>>   // Run post-ra passes.
>>>>>>>>>>>   addPostRegAlloc();
>>>>>>>>>>> // Adding a new pass here which
keeps register mask information across function calls.
>>>>>>>>>>> .
>>>>>>>>>>> .
>>>>>>>>>>> .
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> But this also requires current
register allocators to use this information in someway because RegMaskBits in
LiveIntervalAnalysis.cpp is not static across calls. I mean I am not clear for
how to propagate static info to Intra-procedural Register allocators (if
possible without disturbing their code )
>>>>>>>>>>> First, my hope is that we won't
need to change the register allocators, as such, in order to make use of this
information. Instead, we'll simply be able to alter the register masks
generated for the call instructions. These masks will indicate fewer clobbers
than might otherwise be present based on the ABI because of information gathered
during the codegen of the callee. These masks are generally constructed by
target based on the calling convention. The PowerPC backend, for example, looks
like this:
>>>>>>>>>>> 
>>>>>>>>>>>   // Add a register mask operand
representing the call-preserved registers.
>>>>>>>>>>>   const TargetRegisterInfo *TRI =
Subtarget.getRegisterInfo();
>>>>>>>>>>>   const uint32_t *Mask
>>>>>>>>>>>      
TRI->getCallPreservedMask(DAG.getMachineFunction(), CallConv);
>>>>>>>>>>>   assert(Mask &&
"Missing call preserved mask for calling convention");
>>>>>>>>>>>  
Ops.push_back(DAG.getRegisterMask(Mask));
>>>>>>>>>>> 
>>>>>>>>>>> but it can be more complicated. If
you look for uses of 'getRegisterMask' in Target/*/*ISelLowering.cpp,
you'll see what I mean. Regardless, the code ends up calling some method is
the targets TargetRegisterInfo subclass. These methods generally look something
like this:
>>>>>>>>>>> 
>>>>>>>>>>> const uint32_t *
>>>>>>>>>>>
PPCRegisterInfo::getCallPreservedMask(const MachineFunction &MF,
>>>>>>>>>>>                                    
CallingConv::ID CC) const {
>>>>>>>>>>>   const PPCSubtarget &Subtarget
= MF.getSubtarget<PPCSubtarget>();
>>>>>>>>>>>   ...
>>>>>>>>>>>   return TM.isPPC64() ?
(Subtarget.hasAltivec() ? CSR_SVR464_Altivec_RegMask
>>>>>>>>>>>                                    
: CSR_SVR464_RegMask)
>>>>>>>>>>>                       :
(Subtarget.hasAltivec() ? CSR_SVR432_Altivec_RegMask
>>>>>>>>>>>                                    
: CSR_SVR432_RegMask);
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> In any case, the fundamental idea
here is that, when someone calls getCallPreservedMask in order to set the
regmask on a call, we might not have to use the CC at all. Instead, if we've
already codegened the function, we might use a cache of 'exact' register
masks computed during codegen of the potential callees instead.
>>>>>>>>>> I am thinking to add a simple Immutable
pass MachineRegisterUsageInfo similar to MachineBranchProbabilityInfo that can
maintain RegisterUsageInformation per function. Can it be simply done by using
UsedPhysRegMask from MachineRegisterInfo ??
>>>>>>>>>> 
>>>>>>>>>> No, like the comment said,
UsedPhysRegMask gives only the registers clobbered by calls:
>>>>>>>>>> // This bit vector represents all the
registers clobbered by function calls.
>>>>>>>>>> 
>>>>>>>>>> You want to build this information
yourself on top of MachineRegisterInfo::isPhysRegModified
>>>>>>>>> Ok but then the time complexity will be
O(n) n = number of physical register on the target. Am I going correct?
>>>>>>>>> 
>>>>>>>>> Yes, this is correct.
>>>>>>>>> 
>>>>>>>>>>   
>>>>>>>>>> Here getCallPreservedMask will call API
provided by MachineRegisterUsageInfo to avail the exact register mask but how it
can know that the function is already codegen or it will query each time when
getCallPreservedMask is called and of available MachineRegisterUsageInfo will
return the details otherwise simply return NULL.
>>>>>>>>>> So changes will be now in
TargetRegisterInfo implementation for each target right ??
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> In order to do this, I think
we'll need to provide a function callable from the target's
getCallPreservedMask implementation, which can return such an 'exact'
regmask when available. I think we need to do it this way for two reasons:
>>>>>>>>>>> 
>>>>>>>>>>>  1. Not all of the target code
calls getCallPreservedMask, but sometimes calls other similar target-specific
functions (e.g. getTLSCallPreservedMask).
>>>>>>>>>>>  2. The targets need to opt-in to
this behavior because only the target can know that all register uses are really
tagged correctly post "pre-emit".
>>>>>>>>>>> 
>>>>>>>>>>> Because the target is free to
introduce uses of registers at essentially any time, we need to do the scanning
for used registers after the "pre-emit" passes run. This can be done
by scheduling some simple register-use scanning pass after the call to
addPreEmitPass in lib/CodeGen/Passes.cpp.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I think this also applies in
someway to Mehdi Amini's idea to keep a ModulePass for bookkeeping but then
existing register allocators will be required to change so that the code can
query the ModulePass for RegMaskBits for particular function.
>>>>>>>>>>> I think that the simplest way to do
this is to create an immutable analysis pass (e.g. BasicAA) that keeps the cache
of the computed register masks. This is somewhat similar in spirit to how the
'AssumptionCache' analysis works at the IR level. This analysis can then
be created by lib/CodeGen/Passes.cpp early, and then queried and passed around
later by the CodeGen/Target code. Because it is an immutable analysis, it
won't get destroyed until the very end, which is also important because, I
imagine, it will need to own the memory associated with the generated register
masks.
>>>>>>>>>>> 
>>>>>>>>>>>  -Hal
>>>>>>>>>>> 
>>>>>>>>>>> Vivek
>>>>>>>>>>> 
>>>>>>>>>>>> Yes for propagating register
usage approach we don't need MachineModulePass
>>>>>>>>>>>>  
>>>>>>>>>>>> Vivek
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> Mehdi
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> Hal Finkel
>>>>>>>>>>> Assistant Computational Scientist
>>>>>>>>>>> Leadership Computing Facility
>>>>>>>>>>> Argonne National Laboratory
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Hal Finkel
>>>>>>>> Assistant Computational Scientist
>>>>>>>> Leadership Computing Facility
>>>>>>>> Argonne National Laboratory
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Hal Finkel
>>>>>>> Assistant Computational Scientist
>>>>>>> Leadership Computing Facility
>>>>>>> Argonne National Laboratory
> -------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160524/97951f09/attachment-0001.html>

Hal Finkel via llvm-dev

2016-May-25 19:01 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

----- Original Message -----
> From: "Mehdi Amini" <mehdi.amini at apple.com>
> To: "vivek pandya" <vivekvpandya at gmail.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev"
> <llvm-dev at lists.llvm.org>, "Matthias Braun" <matze at
braunis.de>,
> "Quentin Colombet" <qcolombet at apple.com>
> Sent: Wednesday, May 25, 2016 1:26:56 AM
> Subject: Re: [GSoC 2016] Interprocedural Register Allocation -
> Introduction and Feedback
> Sent from my iPhone
> On May 24, 2016, at 11:04 PM, vivek pandya < vivekvpandya at gmail.com
>
> wrote:
> > On Wed, May 25, 2016 at 10:46 AM, Mehdi Amini <
> > mehdi.amini at apple.com
> > > wrote:
> 
> > > > On May 24, 2016, at 10:08 PM, vivek pandya <
> > > > vivekvpandya at gmail.com
> > > > >
> > > > wrote:
> > > 
> > 
> 
> > > > On Wed, May 25, 2016 at 10:08 AM, Mehdi Amini <
> > > > mehdi.amini at apple.com
> > > > > wrote:
> > > 
> > 
> 
> > > > > > On May 24, 2016, at 9:17 PM, vivek pandya <
> > > > > > vivekvpandya at gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Dear Mentors,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Please help me to understand our plan to implement
> > > > > > Interprocedural
> > > > > > Register allocator by propogating register usage
info.
> > > > > > While
> > > > > > writing
> > > > > > this mail I am considering all previous discussion
over
> > > > > > llvm-dev
> > > > > > and
> > > > > > IRC.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 1) A MachineFunction pass to be executed POST-RA
to collect
> > > > > > the
> > > > > > information about the used Registers.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 2) An Immutable pass which will store reg usage
info
> > > > > > collected
> > > > > > by
> > > > > > previous pass and return it whenever queried.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 3) A Target specific MachineFucntion pass that
will use the
> > > > > > register
> > > > > > usage info for available for call instrction to
achive
> > > > > > IPRA.
> > > > > > This
> > > > > > pass should run at PRE-RA.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Relation among above passes:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 1) pass will store info to 2) pass as well use
info for
> > > > > > call
> > > > > > instruction found while processing.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 3) pass only requires to query information from 2)
pass.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Questions
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > ========> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Which pass is responsible for load/store of callee
saved
> > > > > > register,
> > > > > > at
> > > > > > the begining of each function call? And how does
it uses
> > > > > > RegMask
> > > > > > of
> > > > > > call instruction to generate load/store. I think
> > > > > > Intra-procedural
> > > > > > register allocator is not responsible to generate
> > > > > > load/store
> > > > > > around
> > > > > > the call site.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > /- - -> (A) - - -> (D)
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > /
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > (K)- - ->(T)- - -> (B) - - -> (E)
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > \
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > \- - -> (C) - - -> (F)
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > So as per our discussion we would require
following passes:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Suppose in given example call graph , register
allocation
> > > > > > for
> > > > > > D
> > > > > > is
> > > > > > completed now we have that information available
So 3) pass
> > > > > > while
> > > > > > processing A , it would collect reg usage info for
all
> > > > > > callees
> > > > > > and
> > > > > > OR them and then it should update A's regmask
by going to
> > > > > > parant
> > > > > > procedure that actually calls A ??
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > No, Pass 3) is only looking for every call MI in A and
> > > > > updating
> > > > > the
> > > > > associated regmask by replacing it with the information
> > > > > stored
> > > > > in
> > > > > the immutable pass.
> > > > 
> > > 
> > 
> 
> > > > What is the use of that ? Please be more specific.
> > > 
> > 
> 
> > > I'm not sure I get the question. You describe 3 different
passes
> > > at
> > > the beginning, and it seems fairly obvious what is the
> > > responsibility of each.
> > 
> 
> > > This seems quite straightforward to me, so I'm not sure what
to
> > > explain, here is the logical sequence on D and A:
> > 
> 
> > > 1) Codegen function D, MachineFunction post-RA: collect the
> > > register
> > > usage for D. Store the new regmask in the immutable pass
> > 
> 
> > > 2) Move to function A, perform ISel, then run the MachineFunction
> > > pre-RA that finds the call to D. Query the immutable pass and get
> > > the regmask for D. Update the regmask associated with the call.
> > 
> 
> > > Done.
> > 
> 
> > I understand the sequence we discussed till now but as I am also
> > reading literature about IPRA so I am confused that how this would
> > remove load/store of registers (so that callee can clobber it with
> > out worrying about caller's content in registers)
> 
> From the beginning I told you that it is not true IPRA on this
> aspect: callees have to preserve callee-saved register according to
> the CC, unless internal.
> > and also how caller will not use register used by callee.
> 
> Caller can use them, just like it does now: by saving restoring them
> before/after the call if it does.
> > So this boils down to question what magic is done by just updating
> > reg mask at callsite and we do not eve require to intimate this to
> > intra-procedural register allocator,
> 
> Ask yourself how the RA knows how to deal with what to save around
> calls in general (hint: think about what is the use of regmask).
> > this also means that some where a code is written that is
> > responsible
> > for inserting store/load for register so that callee can preserve
> > register but in presence of IPRA how it will not add those
> > store/load?
> 
> What we're doing here is purely an optimization of the calling
> convention. This does not have to be more intrusive...
> > It seems that my questions have confused Mehdi Amini :P
> 
> > Other masters please help us !
> 
> > > > > > How reg mask details of call D would be used by
Register
> > > > > > allocator
> > > > > > while allocating register for A and also not
generating
> > > > > > load/store
> > > > > > for register being used by A in body of D as we
have callee
> > > > > > saved
> > > > > > convention.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > I expect all of that to be handled automatically when
> > > > > updating
> > > > > the
> > > > > regmask.
> > > > 
> > > 
> > 
> 
> > > > Please elaborate your both comments, what ever you have
think
> > > > of
> > > > it
> > > > to be.
> > > 
> > 
> 
> > > > > > How the pass responsible for generating load/store
will
> > > > > > optimize
> > > > > > for
> > > > > > the child node of call graph where it does not
require to
> > > > > > load/store
> > > > > > because caller will not use register used by
callee ? I
> > > > > > mean
> > > > > > how
> > > > > > our
> > > > > > IPRA will take care of this?
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > I don't understand that.
> > > > 
> > > 
> > 
> 
> > > > In above example call graph when pass 3) is executed for D
it
> > > > does
> > > > not do any work because there is not call to any other
function
> > > > in
> > > > body of D but actually in presence of IPRA there is no need
to
> > > > preserve (store/load) any register in function D ( and also
in
> > > > other
> > > > such leaf node of call graph). The reason for the same is A
now
> > > > aware of register usage in D so it will not use any register
> > > > which
> > > > intersect with D's register usage thus D need not to
worry
> > > > about
> > > > any
> > > > thing.
> > > 
> > 
> 
> > > > Is this make sense?
> > > 
> > 
> 
> > > No. D has to obey the calling convention if it is not internal.
> > > So
> > > any callee-saved register has to be ... saved.
> > 
> 
> > Why is that required when caller of D does not use any register
> > used
> > by D?
> 
> You don't know about the callers when you are doing codegen for D.
> Again: we're not trying to perform pure IPRA here.+1 -- Please stick to the initial plan. The literature contains many interesting
possibilities, but I think that the register-mask overriding is a useful subset
that we can accomplish within the summer. Once we have this working, we'll
all be in a better place to think about more-sophisticated possibilities.

-Hal 
> --
> Mehdi
> > > --
> > 
> 
> > > Mehdi
> > 
> 
> > > > > > In short I am not much clear with the method for
using
> > > > > > information
> > > > > > to
> > > > > > get effect of IPRA without modifying Register
allocator
> > > > > > them
> > > > > > self(i.e by updating regMask of call
instructions).
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Also 1) pass and 3) pass are seem to intersecting
for their
> > > > > > work,
> > > > > > for
> > > > > > example consider while scanning register usage
info for T
> > > > > > function
> > > > > > the final register usage info should be <all
regs used by
> > > > > > T>
> > > > > > OR
> > > > > > <
> > > > > > reg usage info A > OR <reg usage info B >
OR < reg usage
> > > > > > info
> > > > > > C
> > > > > > >
> > > > > > because K should not use any register which is
used by T, A
> > > > > > ,D,
> > > > > > B,
> > > > > > C, E, F with out load/store the relevant paper
also discuss
> > > > > > this
> > > > > > situation and suggest to fall back to load/store
approach.
> > > > > > So
> > > > > > as
> > > > > > we
> > > > > > move to upper region of the call graph it is very
likely
> > > > > > that
> > > > > > enough
> > > > > > regiseters are not there to allocate.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > The calling convention (for anything else than internal
> > > > > function)
> > > > > will always have some callee-saved registers.
> > > > 
> > > 
> > 
> 
> > > > > If we have a deep call-graph of internal-only function,
then
> > > > > we
> > > > > may
> > > > > generate a lot of spill at the top. Somehow we may have
to
> > > > > think
> > > > > about driving some heuristic with PGO.
> > > > 
> > > 
> > 
> 
> > > > This kind of problem is addressed by this paper Register
> > > > allocation
> > > > across procedure and module boundaries -
> > > > http://dl.acm.org/citation.cfm?id=93551
> > > 
> > 
> 
> > > > In this paper authors have tried to eliminate load/store
only
> > > > in
> > > > call
> > > > intensive regions ( contrast to simple bottom up ) by
analyzing
> > > > call
> > > > frequency at compile time ( also has facility to plug in
> > > > Profile
> > > > info) and rest of the call graph will follow normal register
> > > > allocator. But yes we will think about this later.
> > > 
> > 
> 
> > > > > I suspect we can think about that a bit later.
Let's focus on
> > > > > the
> > > > > simple for now.
> > > > 
> > > 
> > 
> 
> > > > > --
> > > > 
> > > 
> > 
> 
> > > > > Mehdi
> > > > 
> > > 
> > 
> 
> > > > > > Please bear with my silly questions.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Sincerely,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Vivek
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > On Wed, May 25, 2016 at 8:46 AM, vivek pandya <
> > > > > > vivekvpandya at gmail.com > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On Wed, May 25, 2016 at 8:44 AM, Hal Finkel
<
> > > > > > > hfinkel at anl.gov
> > > > > > > >
> > > > > > > wrote:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > From: "vivek pandya" <
vivekvpandya at gmail.com >
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > To: "Hal Finkel" <
hfinkel at anl.gov >
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Cc: "llvm-dev" <
llvm-dev at lists.llvm.org >, "Matthias
> > > > > > > > > Braun"
> > > > > > > > > <
> > > > > > > > > matze at braunis.de >,
"Mehdi Amini" <
> > > > > > > > > mehdi.amini at apple.com
> > > > > > > > > >,
> > > > > > > > > "Quentin Colombet" <
qcolombet at apple.com >
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Sent: Tuesday, May 24, 2016 9:34:29
PM
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Subject: Re: [GSoC 2016]
Interprocedural Register
> > > > > > > > > Allocation
> > > > > > > > > -
> > > > > > > > > Introduction and Feedback
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > On Wed, May 25, 2016 at 3:53 AM,
Hal Finkel <
> > > > > > > > > hfinkel at anl.gov
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > From: "vivek
pandya" < vivekvpandya at gmail.com >
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > To: "Quentin
Colombet" < qcolombet at apple.com >
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Cc: "Hal
Finkel" < hfinkel at anl.gov >, "llvm-dev"
> > > > > > > > > > > <
> > > > > > > > > > > llvm-dev at
lists.llvm.org >, "Matthias Braun" <
> > > > > > > > > > > matze at braunis.de
> > > > > > > > > > > >,
> > > > > > > > > > > "Mehdi Amini"
< mehdi.amini at apple.com >
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Sent: Tuesday, May 24,
2016 1:00:58 PM
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Subject: Re: [GSoC 2016]
Interprocedural Register
> > > > > > > > > > > Allocation
> > > > > > > > > > > -
> > > > > > > > > > > Introduction and Feedback
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Hello,
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > I have written following
code to check each
> > > > > > > > > > > register
> > > > > > > > > > > if
> > > > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > used
> > > > > > > > > > > by
> > > > > > > > > > > machineFunction or not :
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > MachineRegisterInfo *MRI
= &MF.getRegInfo();
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > TargetRegisterInfo *TRI =
(TargetRegisterInfo
> > > > > > > > > > >
*)MF.getSubtarget().getRegisterInfo();
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Some reason you can't use
a const pointer here?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > MCRegisterInfo is just used to get
conventional name
> > > > > > > > > of
> > > > > > > > > register
> > > > > > > > > for
> > > > > > > > > given target like AX, BX on X86.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > const TargetMachine
&TM = MF.getTarget();
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > const MCRegisterInfo
*MCRI > > > > > > > > > > >
TM.getMCRegisterInfo();
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > DEBUG(dbgs() <<
"Function Name : " <<
> > > > > > > > > > > MF.getName()
> > > > > > > > > > > <<
> > > > > > > > > > > "\n");
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > >
for(TargetRegisterInfo::regclass_iterator i > > > > > > >
> > > > (*TRI).regclass_begin(), e > > > > > >
> > > > > (*TRI).regclass_end();
> > > > > > > > > > > i
> > > > > > > > > > > !> > > > >
> > > > > > e;
> > > > > > > > > > > i++
> > > > > > > > > > > )
> > > > > > > > > > > {
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > >
for(TargetRegisterClass::iterator pregi > > > > > > > >
> > > (*i)->begin(),
> > > > > > > > > > > prege
> > > > > > > > > > > > > > > >
> > > > > > (*i)->end(); pregi != prege; pregi++ ) {
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > DEBUG( dbgs() <<
"Physical Register : " <<
> > > > > > > > > > > MCRI->getName(*pregi)
> > > > > > > > > > > <<
> > > > > > > > > > > "
> > > > > > > > > > > is modified
"<< MRI->isPhysRegModified(*pregi) <<
> > > > > > > > > > > "
> > > > > > > > > > > \n");
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Try isPhysRegUsed.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > ok
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > DEBUG(dbgs() <<
"\n");
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > The pass which is
executing this code is schedule
> > > > > > > > > > > POST-RA
> > > > > > > > > > > stage
> > > > > > > > > > > but
> > > > > > > > > > > this gives me true for
all registers i.e in each
> > > > > > > > > > > function
> > > > > > > > > > > all
> > > > > > > > > > > registers are being used
except EBP and some
> > > > > > > > > > > other
> > > > > > > > > > > similar,
> > > > > > > > > > > Is
> > > > > > > > > > > this
> > > > > > > > > > > a correct way to get
register usage information ?
> > > > > > > > > > > I
> > > > > > > > > > > think
> > > > > > > > > > > I
> > > > > > > > > > > have
> > > > > > > > > > > made some mistake please
help.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > You might look at the
implementation of these
> > > > > > > > > > functions
> > > > > > > > > > in
> > > > > > > > > >
lib/CodeGen/MachineRegisterInfo.cpp and figure out
> > > > > > > > > > if
> > > > > > > > > > they're
> > > > > > > > > > returning true because
> > > > > > > > > > UsedPhysRegMask.test(PhysReg)
> > > > > > > > > > is
> > > > > > > > > > true
> > > > > > > > > > or
> > > > > > > > > > because
reg_nodbg_empty(*AliasReg) is true.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Yes that helped now I am getting
actual register
> > > > > > > > > which
> > > > > > > > > have
> > > > > > > > > been
> > > > > > > > > used
> > > > > > > > > by given function, but a little
problem
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > The updated code is as shown below
:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > >
for(TargetRegisterInfo::regclass_iterator i > > > > > > >
> > (*TRI).regclass_begin(), e = (*TRI).regclass_end(); i
> > > > > > > > > !> > > > > > >
> > e;
> > > > > > > > > i++
> > > > > > > > > )
> > > > > > > > > {
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > for(TargetRegisterClass::iterator
pregi > > > > > > > > > (*i)->begin(),
> > > > > > > > > prege
> > > > > > > > > > > > > > > >
> > (*i)->end(); pregi != prege; pregi++ ) {
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > for (MCRegAliasIterator
AliasReg(*pregi, TRI, true);
> > > > > > > > > AliasReg.isValid(); ++AliasReg) {
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > if
(!MRI->reg_nodbg_empty(*AliasReg)) {
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > DEBUG( dbgs() <<
"Physical Register : " <<
> > > > > > > > > MCRI->getName(*pregi)
> > > > > > > > > <<
> > > > > > > > > "
> > > > > > > > > is used "<<
MRI->isPhysRegUsed(*pregi) << " \n");
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > break; // no need to process more
alias
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > }
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > }
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > }
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > }
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > But here some registers are getting
processed with in
> > > > > > > > > different
> > > > > > > > > classes (unnecessary processing) Is
this only way to
> > > > > > > > > iterate
> > > > > > > > > through
> > > > > > > > > all used register (using RegClass
iterator) ? Is
> > > > > > > > > there
> > > > > > > > > any
> > > > > > > > > way
> > > > > > > > > to
> > > > > > > > > avoid duplicate regs?
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Of course currently I am just
printing but next I am
> > > > > > > > > thinking
> > > > > > > > > to
> > > > > > > > > use
> > > > > > > > > a map to track usage info , in that
only distinct
> > > > > > > > > register
> > > > > > > > > info
> > > > > > > > > will
> > > > > > > > > be stored but still due to loop
structure I need to
> > > > > > > > > iterate
> > > > > > > > > through
> > > > > > > > > a single register 3 - 4 times
making it time
> > > > > > > > > consuming.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Yes, I believe you can just do:
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > for (unsigned Reg = 0; Reg <
TRI->getNumRegs(); ++Reg)
> > > > > > > > {
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Oh yes thanks I just forgot that PhyReg
starts at 0.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > -Hal
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > -Vivek
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > -Hal
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Vivek
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > On Wed, May 18, 2016 at
11:42 PM, Quentin
> > > > > > > > > > > Colombet
> > > > > > > > > > > <
> > > > > > > > > > > qcolombet at apple.com
> wrote:
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > On May 18,
2016, at 11:00 AM, vivek pandya <
> > > > > > > > > > > > > vivekvpandya at
gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > Vivek Pandya
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > On Wed, May 18,
2016 at 11:25 PM, Quentin
> > > > > > > > > > > > > Colombet
> > > > > > > > > > > > > <
> > > > > > > > > > > > > qcolombet at
apple.com > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > On
May 18, 2016, at 10:46 AM, vivek
> > > > > > > > > > > > > > >
pandya
> > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > >
vivekvpandya at gmail.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > Vivek
Pandya
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > On
Wed, May 11, 2016 at 4:01 PM, Hal
> > > > > > > > > > > > > > >
Finkel
> > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > >
hfinkel at anl.gov
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> From: "vivek pandya" <
> > > > > > > > > > > > > > > >
> vivekvpandya at gmail.com
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> To: "Mehdi Amini" <
> > > > > > > > > > > > > > > >
> mehdi.amini at apple.com
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> Cc: "Hal Finkel" < hfinkel at anl.gov >,
> > > > > > > > > > > > > > > >
> "Quentin
> > > > > > > > > > > > > > > >
> Colombet"
> > > > > > > > > > > > > > > >
> <
> > > > > > > > > > > > > > > >
> qcolombet at apple.com >, "llvm-dev" <
> > > > > > > > > > > > > > > >
> llvm-dev at lists.llvm.org
> > > > > > > > > > > > > > > >
> >,
> > > > > > > > > > > > > > > >
> "Matthias Braun" < matze at braunis.de >
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> Sent: Wednesday, May 11, 2016 3:15:03
> > > > > > > > > > > > > > > >
> AM
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> Subject: Re: [GSoC 2016]
> > > > > > > > > > > > > > > >
> Interprocedural
> > > > > > > > > > > > > > > >
> Register
> > > > > > > > > > > > > > > >
> Allocation
> > > > > > > > > > > > > > > >
> -
> > > > > > > > > > > > > > > >
> Introduction and Feedback
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> Vivek Pandya
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> On Wed, May 11, 2016 at 10:02 AM,
> > > > > > > > > > > > > > > >
> vivek
> > > > > > > > > > > > > > > >
> pandya
> > > > > > > > > > > > > > > >
> <
> > > > > > > > > > > > > > > >
> vivekvpandya at gmail.com > wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > Vivek Pandya
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > On Wed, May 11, 2016 at 9:43 AM,
> > > > > > > > > > > > > > > >
> > Mehdi
> > > > > > > > > > > > > > > >
> > Amini
> > > > > > > > > > > > > > > >
> > <
> > > > > > > > > > > > > > > >
> > mehdi.amini at apple.com
> > > > > > > > > > > > > > > >
> > > wrote:
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > On May 10, 2016, at 6:06 PM,
> > > > > > > > > > > > > > > >
> > > > Hal
> > > > > > > > > > > > > > > >
> > > > Finkel
> > > > > > > > > > > > > > > >
> > > > <
> > > > > > > > > > > > > > > >
> > > > hfinkel at anl.gov
> > > > > > > > > > > > > > > >
> > > > >
> > > > > > > > > > > > > > > >
> > > > wrote:
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > From: "vivek pandya" <
> > > > > > > > > > > > > > > >
> > > > > vivekvpandya at gmail.com
> > > > > > > > > > > > > > > >
> > > > > >
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > To: "llvm-dev" <
> > > > > > > > > > > > > > > >
> > > > > llvm-dev at lists.llvm.org
> > > > > > > > > > > > > > > >
> > > > > >,
> > > > > > > > > > > > > > > >
> > > > > "Tim
> > > > > > > > > > > > > > > >
> > > > > Amini
> > > > > > > > > > > > > > > >
> > > > > Golling"
> > > > > > > > > > > > > > > >
> > > > > <
> > > > > > > > > > > > > > > >
> > > > > mehdi.amini at apple.com >, "Hal
> > > > > > > > > > > > > > > >
> > > > > Finkel"
> > > > > > > > > > > > > > > >
> > > > > <
> > > > > > > > > > > > > > > >
> > > > > hfinkel at anl.gov
> > > > > > > > > > > > > > > >
> > > > > >
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > Cc: "Quentin Colombet" <
> > > > > > > > > > > > > > > >
> > > > > qcolombet at apple.com
> > > > > > > > > > > > > > > >
> > > > > >
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > Sent: Tuesday, May 10, 2016
> > > > > > > > > > > > > > > >
> > > > > 2:59:16
> > > > > > > > > > > > > > > >
> > > > > PM
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > Subject: [GSoC 2016]
> > > > > > > > > > > > > > > >
> > > > > Interprocedural
> > > > > > > > > > > > > > > >
> > > > > Register
> > > > > > > > > > > > > > > >
> > > > > Allocation
> > > > > > > > > > > > > > > >
> > > > > -
> > > > > > > > > > > > > > > >
> > > > > Introduction and Feedback
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > Hello LLVM Community,
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > Sorry for delay as I was busy
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > final
> > > > > > > > > > > > > > > >
> > > > > exams.
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > I am Vivek from India. Thanks
> > > > > > > > > > > > > > > >
> > > > > for
> > > > > > > > > > > > > > > >
> > > > > choosing
> > > > > > > > > > > > > > > >
> > > > > my
> > > > > > > > > > > > > > > >
> > > > > proposal
> > > > > > > > > > > > > > > >
> > > > > for
> > > > > > > > > > > > > > > >
> > > > > Interprocedural Register
> > > > > > > > > > > > > > > >
> > > > > Allocation
> > > > > > > > > > > > > > > >
> > > > > (IPRA)
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > LLVM.
> > > > > > > > > > > > > > > >
> > > > > Mehdi
> > > > > > > > > > > > > > > >
> > > > > Amini
> > > > > > > > > > > > > > > >
> > > > > and
> > > > > > > > > > > > > > > >
> > > > > Hal Finkel will be mentoring
> > > > > > > > > > > > > > > >
> > > > > me
> > > > > > > > > > > > > > > >
> > > > > for
> > > > > > > > > > > > > > > >
> > > > > this
> > > > > > > > > > > > > > > >
> > > > > project.
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > IPRA can reduce code size and
> > > > > > > > > > > > > > > >
> > > > > runtime
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > programs
> > > > > > > > > > > > > > > >
> > > > > by
> > > > > > > > > > > > > > > >
> > > > > allocating
> > > > > > > > > > > > > > > >
> > > > > register across the module
> > > > > > > > > > > > > > > >
> > > > > and
> > > > > > > > > > > > > > > >
> > > > > procedure
> > > > > > > > > > > > > > > >
> > > > > boundaries.
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > I have identified some old
> > > > > > > > > > > > > > > >
> > > > > but
> > > > > > > > > > > > > > > >
> > > > > effective
> > > > > > > > > > > > > > > >
> > > > > research
> > > > > > > > > > > > > > > >
> > > > > work
> > > > > > > > > > > > > > > >
> > > > > on
> > > > > > > > > > > > > > > >
> > > > > this
> > > > > > > > > > > > > > > >
> > > > > area.
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > I want community's feedback
> > > > > > > > > > > > > > > >
> > > > > for
> > > > > > > > > > > > > > > >
> > > > > feasibility
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > these
> > > > > > > > > > > > > > > >
> > > > > approach
> > > > > > > > > > > > > > > >
> > > > > and
> > > > > > > > > > > > > > > >
> > > > > I
> > > > > > > > > > > > > > > >
> > > > > am targeting to implement two
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > them
> > > > > > > > > > > > > > > >
> > > > > during
> > > > > > > > > > > > > > > >
> > > > > this
> > > > > > > > > > > > > > > >
> > > > > project.
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > Here is list of the papers, I
> > > > > > > > > > > > > > > >
> > > > > have
> > > > > > > > > > > > > > > >
> > > > > read
> > > > > > > > > > > > > > > >
> > > > > first
> > > > > > > > > > > > > > > >
> > > > > two
> > > > > > > > > > > > > > > >
> > > > > papers
> > > > > > > > > > > > > > > >
> > > > > and
> > > > > > > > > > > > > > > >
> > > > > I
> > > > > > > > > > > > > > > >
> > > > > would
> > > > > > > > > > > > > > > >
> > > > > like to discuss those
> > > > > > > > > > > > > > > >
> > > > > approach
> > > > > > > > > > > > > > > >
> > > > > first,
> > > > > > > > > > > > > > > >
> > > > > I
> > > > > > > > > > > > > > > >
> > > > > will
> > > > > > > > > > > > > > > >
> > > > > read
> > > > > > > > > > > > > > > >
> > > > > other
> > > > > > > > > > > > > > > >
> > > > > two
> > > > > > > > > > > > > > > >
> > > > > paper
> > > > > > > > > > > > > > > >
> > > > > then initiate discussion for
> > > > > > > > > > > > > > > >
> > > > > them
> > > > > > > > > > > > > > > >
> > > > > as
> > > > > > > > > > > > > > > >
> > > > > well.
> > > > > > > > > > > > > > > >
> > > > > All
> > > > > > > > > > > > > > > >
> > > > > I
> > > > > > > > > > > > > > > >
> > > > > want
> > > > > > > > > > > > > > > >
> > > > > is
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > find
> > > > > > > > > > > > > > > >
> > > > > out
> > > > > > > > > > > > > > > >
> > > > > a concrete implementation
> > > > > > > > > > > > > > > >
> > > > > plan
> > > > > > > > > > > > > > > >
> > > > > before
> > > > > > > > > > > > > > > >
> > > > > 23
> > > > > > > > > > > > > > > >
> > > > > May,
> > > > > > > > > > > > > > > >
> > > > > 2016
> > > > > > > > > > > > > > > >
> > > > > and
> > > > > > > > > > > > > > > >
> > > > > for
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > I
> > > > > > > > > > > > > > > >
> > > > > need community's help.
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > 1) Compile time -----
> > > > > > > > > > > > > > > >
> > > > > Minimizing
> > > > > > > > > > > > > > > >
> > > > > register
> > > > > > > > > > > > > > > >
> > > > > usage
> > > > > > > > > > > > > > > >
> > > > > penalty
> > > > > > > > > > > > > > > >
> > > > > at
> > > > > > > > > > > > > > > >
> > > > > procedure
> > > > > > > > > > > > > > > >
> > > > > calls -
> > > > > > > > > > > > > > > >
> > > > > http://dl.acm.org/citation.cfm?id=53999
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > >
====================================================================In
> > > > > > > > > > > > > > > >
> > > > > this approach
> > > > > > > > > > > > > > > >
> > > > > intra-procedural
> > > > > > > > > > > > > > > >
> > > > > register
> > > > > > > > > > > > > > > >
> > > > > allocation
> > > > > > > > > > > > > > > >
> > > > > is
> > > > > > > > > > > > > > > >
> > > > > used
> > > > > > > > > > > > > > > >
> > > > > as
> > > > > > > > > > > > > > > >
> > > > > base
> > > > > > > > > > > > > > > >
> > > > > but machine code generation
> > > > > > > > > > > > > > > >
> > > > > order
> > > > > > > > > > > > > > > >
> > > > > is
> > > > > > > > > > > > > > > >
> > > > > bottom
> > > > > > > > > > > > > > > >
> > > > > up
> > > > > > > > > > > > > > > >
> > > > > traversal
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > call
> > > > > > > > > > > > > > > >
> > > > > graph and inter-procedural
> > > > > > > > > > > > > > > >
> > > > > effect
> > > > > > > > > > > > > > > >
> > > > > is
> > > > > > > > > > > > > > > >
> > > > > achieved
> > > > > > > > > > > > > > > >
> > > > > by
> > > > > > > > > > > > > > > >
> > > > > propagating
> > > > > > > > > > > > > > > >
> > > > > register usage information of
> > > > > > > > > > > > > > > >
> > > > > callee
> > > > > > > > > > > > > > > >
> > > > > function
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > caller
> > > > > > > > > > > > > > > >
> > > > > (i.e
> > > > > > > > > > > > > > > >
> > > > > child
> > > > > > > > > > > > > > > >
> > > > > to parent in CallGraph) so
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > caller
> > > > > > > > > > > > > > > >
> > > > > can
> > > > > > > > > > > > > > > >
> > > > > use
> > > > > > > > > > > > > > > >
> > > > > different
> > > > > > > > > > > > > > > >
> > > > > registers
> > > > > > > > > > > > > > > >
> > > > > than callee and can save load
> > > > > > > > > > > > > > > >
> > > > > store
> > > > > > > > > > > > > > > >
> > > > > cost
> > > > > > > > > > > > > > > >
> > > > > at
> > > > > > > > > > > > > > > >
> > > > > procedure
> > > > > > > > > > > > > > > >
> > > > > call,
> > > > > > > > > > > > > > > >
> > > > > this
> > > > > > > > > > > > > > > >
> > > > > is
> > > > > > > > > > > > > > > >
> > > > > not trivial as it seems due
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > recursive
> > > > > > > > > > > > > > > >
> > > > > calls,
> > > > > > > > > > > > > > > >
> > > > > library
> > > > > > > > > > > > > > > >
> > > > > function
> > > > > > > > > > > > > > > >
> > > > > usage etc. Also for upper
> > > > > > > > > > > > > > > >
> > > > > region
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > the
> > > > > > > > > > > > > > > >
> > > > > graph
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > this
> > > > > > > > > > > > > > > >
> > > > > technique
> > > > > > > > > > > > > > > >
> > > > > available number of registers
> > > > > > > > > > > > > > > >
> > > > > might
> > > > > > > > > > > > > > > >
> > > > > become
> > > > > > > > > > > > > > > >
> > > > > zero
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > case
> > > > > > > > > > > > > > > >
> > > > > it
> > > > > > > > > > > > > > > >
> > > > > should fall back to normal
> > > > > > > > > > > > > > > >
> > > > > load
> > > > > > > > > > > > > > > >
> > > > > store
> > > > > > > > > > > > > > > >
> > > > > at
> > > > > > > > > > > > > > > >
> > > > > procedure
> > > > > > > > > > > > > > > >
> > > > > call.
> > > > > > > > > > > > > > > >
> > > > > Apart
> > > > > > > > > > > > > > > >
> > > > > from
> > > > > > > > > > > > > > > >
> > > > > these difficulties other
> > > > > > > > > > > > > > > >
> > > > > difficulties
> > > > > > > > > > > > > > > >
> > > > > have
> > > > > > > > > > > > > > > >
> > > > > been
> > > > > > > > > > > > > > > >
> > > > > identified
> > > > > > > > > > > > > > > >
> > > > > please
> > > > > > > > > > > > > > > >
> > > > > follow this mail-chain
> > > > > > > > > > > > > > > >
> > > > >
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > My mentor has already
> > > > > > > > > > > > > > > >
> > > > > provided
> > > > > > > > > > > > > > > >
> > > > > me
> > > > > > > > > > > > > > > >
> > > > > a
> > > > > > > > > > > > > > > >
> > > > > patch
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > alters
> > > > > > > > > > > > > > > >
> > > > > code
> > > > > > > > > > > > > > > >
> > > > > generation
> > > > > > > > > > > > > > > >
> > > > > order as per bottom up call
> > > > > > > > > > > > > > > >
> > > > > graph
> > > > > > > > > > > > > > > >
> > > > > traversal,
> > > > > > > > > > > > > > > >
> > > > > I
> > > > > > > > > > > > > > > >
> > > > > am
> > > > > > > > > > > > > > > >
> > > > > working
> > > > > > > > > > > > > > > >
> > > > > from
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > point now. Any other
> > > > > > > > > > > > > > > >
> > > > > help/suggestion
> > > > > > > > > > > > > > > >
> > > > > is
> > > > > > > > > > > > > > > >
> > > > > always
> > > > > > > > > > > > > > > >
> > > > > welcomed.
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > 2) Link time ----- Global
> > > > > > > > > > > > > > > >
> > > > > register
> > > > > > > > > > > > > > > >
> > > > > allocation
> > > > > > > > > > > > > > > >
> > > > > at
> > > > > > > > > > > > > > > >
> > > > > link
> > > > > > > > > > > > > > > >
> > > > > time
> > > > > > > > > > > > > > > >
> > > > > -
> > > > > > > > > > > > > > > >
> > > > > http://dl.acm.org/citation.cfm?id=989415
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > >
====================================================================In
> > > > > > > > > > > > > > > >
> > > > > this particular approach
> > > > > > > > > > > > > > > >
> > > > > (sort
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > true
> > > > > > > > > > > > > > > >
> > > > > IPRA)
> > > > > > > > > > > > > > > >
> > > > > registers
> > > > > > > > > > > > > > > >
> > > > > will
> > > > > > > > > > > > > > > >
> > > > > be
> > > > > > > > > > > > > > > >
> > > > > reallocated (this
> > > > > > > > > > > > > > > >
> > > > > optimization
> > > > > > > > > > > > > > > >
> > > > > will
> > > > > > > > > > > > > > > >
> > > > > be
> > > > > > > > > > > > > > > >
> > > > > optional
> > > > > > > > > > > > > > > >
> > > > > if
> > > > > > > > > > > > > > > >
> > > > > turned
> > > > > > > > > > > > > > > >
> > > > > off
> > > > > > > > > > > > > > > >
> > > > > still
> > > > > > > > > > > > > > > >
> > > > > code will be compiled as per
> > > > > > > > > > > > > > > >
> > > > > intra-procedural
> > > > > > > > > > > > > > > >
> > > > > allocation)
> > > > > > > > > > > > > > > >
> > > > > at
> > > > > > > > > > > > > > > >
> > > > > link
> > > > > > > > > > > > > > > >
> > > > > time. Here modules are first
> > > > > > > > > > > > > > > >
> > > > > complied
> > > > > > > > > > > > > > > >
> > > > > as
> > > > > > > > > > > > > > > >
> > > > > per
> > > > > > > > > > > > > > > >
> > > > > normal
> > > > > > > > > > > > > > > >
> > > > > compilation
> > > > > > > > > > > > > > > >
> > > > > but
> > > > > > > > > > > > > > > >
> > > > > the object code is annotated
> > > > > > > > > > > > > > > >
> > > > > with
> > > > > > > > > > > > > > > >
> > > > > details
> > > > > > > > > > > > > > > >
> > > > > so
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > linker
> > > > > > > > > > > > > > > >
> > > > > can
> > > > > > > > > > > > > > > >
> > > > > build
> > > > > > > > > > > > > > > >
> > > > > call graph and also calculate
> > > > > > > > > > > > > > > >
> > > > > usage
> > > > > > > > > > > > > > > >
> > > > > information
> > > > > > > > > > > > > > > >
> > > > > at
> > > > > > > > > > > > > > > >
> > > > > link
> > > > > > > > > > > > > > > >
> > > > > time.
> > > > > > > > > > > > > > > >
> > > > > Compiler also write hints in
> > > > > > > > > > > > > > > >
> > > > > object
> > > > > > > > > > > > > > > >
> > > > > code
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > if
> > > > > > > > > > > > > > > >
> > > > > particular
> > > > > > > > > > > > > > > >
> > > > > variable
> > > > > > > > > > > > > > > >
> > > > > is allocated in some other
> > > > > > > > > > > > > > > >
> > > > > register
> > > > > > > > > > > > > > > >
> > > > > (
> > > > > > > > > > > > > > > >
> > > > > due
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > new
> > > > > > > > > > > > > > > >
> > > > > allocation)
> > > > > > > > > > > > > > > >
> > > > > then
> > > > > > > > > > > > > > > >
> > > > > how the code should be
> > > > > > > > > > > > > > > >
> > > > > changed?
> > > > > > > > > > > > > > > >
> > > > > Thus
> > > > > > > > > > > > > > > >
> > > > > linker
> > > > > > > > > > > > > > > >
> > > > > can
> > > > > > > > > > > > > > > >
> > > > > use
> > > > > > > > > > > > > > > >
> > > > > these
> > > > > > > > > > > > > > > >
> > > > > information to decide which
> > > > > > > > > > > > > > > >
> > > > > variables
> > > > > > > > > > > > > > > >
> > > > > (global)
> > > > > > > > > > > > > > > >
> > > > > need
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > be
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > same
> > > > > > > > > > > > > > > >
> > > > > register through out the
> > > > > > > > > > > > > > > >
> > > > > program
> > > > > > > > > > > > > > > >
> > > > > execution
> > > > > > > > > > > > > > > >
> > > > > and
> > > > > > > > > > > > > > > >
> > > > > also
> > > > > > > > > > > > > > > >
> > > > > according
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > register usage information in
> > > > > > > > > > > > > > > >
> > > > > call
> > > > > > > > > > > > > > > >
> > > > > graph
> > > > > > > > > > > > > > > >
> > > > > which
> > > > > > > > > > > > > > > >
> > > > > procedure
> > > > > > > > > > > > > > > >
> > > > > will
> > > > > > > > > > > > > > > >
> > > > > not
> > > > > > > > > > > > > > > >
> > > > > be
> > > > > > > > > > > > > > > >
> > > > > active simultaneously so that
> > > > > > > > > > > > > > > >
> > > > > locals
> > > > > > > > > > > > > > > >
> > > > > for
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > procedures
> > > > > > > > > > > > > > > >
> > > > > can
> > > > > > > > > > > > > > > >
> > > > > be
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > same registers with out load
> > > > > > > > > > > > > > > >
> > > > > store
> > > > > > > > > > > > > > > >
> > > > > at
> > > > > > > > > > > > > > > >
> > > > > procedure
> > > > > > > > > > > > > > > >
> > > > > calls.
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > For these particular method
> > > > > > > > > > > > > > > >
> > > > > help
> > > > > > > > > > > > > > > >
> > > > > me
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > analyze
> > > > > > > > > > > > > > > >
> > > > > feasibility:
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > 1) Can llvm collects
> > > > > > > > > > > > > > > >
> > > > > following
> > > > > > > > > > > > > > > >
> > > > > information
> > > > > > > > > > > > > > > >
> > > > > at
> > > > > > > > > > > > > > > >
> > > > > module
> > > > > > > > > > > > > > > >
> > > > > level
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > MachineIR? list of procedures
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > module,
> > > > > > > > > > > > > > > >
> > > > > list
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > locals
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > procedures, list of
> > > > > > > > > > > > > > > >
> > > > > procedures
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > a
> > > > > > > > > > > > > > > >
> > > > > particular
> > > > > > > > > > > > > > > >
> > > > > procedure
> > > > > > > > > > > > > > > >
> > > > > can
> > > > > > > > > > > > > > > >
> > > > > call,
> > > > > > > > > > > > > > > >
> > > > > and a list of the variables
> > > > > > > > > > > > > > > >
> > > > > this
> > > > > > > > > > > > > > > >
> > > > > procedure
> > > > > > > > > > > > > > > >
> > > > > references.
> > > > > > > > > > > > > > > >
> > > > > Each
> > > > > > > > > > > > > > > >
> > > > > entry
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > the last two lists includes
> > > > > > > > > > > > > > > >
> > > > > an
> > > > > > > > > > > > > > > >
> > > > > estimate
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > the
> > > > > > > > > > > > > > > >
> > > > > number
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > times
> > > > > > > > > > > > > > > >
> > > > > the
> > > > > > > > > > > > > > > >
> > > > > procedure is called or the
> > > > > > > > > > > > > > > >
> > > > > variable
> > > > > > > > > > > > > > > >
> > > > > is
> > > > > > > > > > > > > > > >
> > > > > referenced
> > > > > > > > > > > > > > > >
> > > > > in
> > > > > > > > > > > > > > > >
> > > > > each
> > > > > > > > > > > > > > > >
> > > > > execution
> > > > > > > > > > > > > > > >
> > > > > of this procedure
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > 2) Can llvm write informative
> > > > > > > > > > > > > > > >
> > > > > commands
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > object
> > > > > > > > > > > > > > > >
> > > > > files?
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > 3) Can LTO is capable of
> > > > > > > > > > > > > > > >
> > > > > leveraging
> > > > > > > > > > > > > > > >
> > > > > those
> > > > > > > > > > > > > > > >
> > > > > commands?
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > In terms of scoping the project
> > > > > > > > > > > > > > > >
> > > > for
> > > > > > > > > > > > > > > >
> > > > the
> > > > > > > > > > > > > > > >
> > > > summer,
> > > > > > > > > > > > > > > >
> > > > I
> > > > > > > > > > > > > > > >
> > > > definitely
> > > > > > > > > > > > > > > >
> > > > recommend that you focus on (1)
> > > > > > > > > > > > > > > >
> > > > first.
> > > > > > > > > > > > > > > >
> > > > If
> > > > > > > > > > > > > > > >
> > > > you
> > > > > > > > > > > > > > > >
> > > > finish
> > > > > > > > > > > > > > > >
> > > > that,
> > > > > > > > > > > > > > > >
> > > > we
> > > > > > > > > > > > > > > >
> > > > can
> > > > > > > > > > > > > > > >
> > > > certainly move on to other
> > > > > > > > > > > > > > > >
> > > > things.
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > I'll add +1 here, but I already
> > > > > > > > > > > > > > > >
> > > wrote
> > > > > > > > > > > > > > > >
> > > the
> > > > > > > > > > > > > > > >
> > > same
> > > > > > > > > > > > > > > >
> > > thing
> > > > > > > > > > > > > > > >
> > > on
> > > > > > > > > > > > > > > >
> > > IRC
> > > > > > > > > > > > > > > >
> > > when
> > > > > > > > > > > > > > > >
> > > discussing with Vivek. True IPRA
> > > > > > > > > > > > > > > >
> > > without
> > > > > > > > > > > > > > > >
> > > a
> > > > > > > > > > > > > > > >
> > > proper
> > > > > > > > > > > > > > > >
> > > MachineModule
> > > > > > > > > > > > > > > >
> > > infrastructure won't be doable in
> > > > > > > > > > > > > > > >
> > > my
> > > > > > > > > > > > > > > >
> > > opinion
> > > > > > > > > > > > > > > >
> > > (even
> > > > > > > > > > > > > > > >
> > > with
> > > > > > > > > > > > > > > >
> > > such
> > > > > > > > > > > > > > > >
> > > infrastructure, it may not be
> > > > > > > > > > > > > > > >
> > > trivial
> > > > > > > > > > > > > > > >
> > > in
> > > > > > > > > > > > > > > >
> > > LLVM
> > > > > > > > > > > > > > > >
> > > in
> > > > > > > > > > > > > > > >
> > > general).
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > Regarding link time, note that
> > > > > > > > > > > > > > > >
> > > > any
> > > > > > > > > > > > > > > >
> > > > such
> > > > > > > > > > > > > > > >
> > > > a
> > > > > > > > > > > > > > > >
> > > > design
> > > > > > > > > > > > > > > >
> > > > would
> > > > > > > > > > > > > > > >
> > > > likely
> > > > > > > > > > > > > > > >
> > > > look
> > > > > > > > > > > > > > > >
> > > > much different than in David
> > > > > > > > > > > > > > > >
> > > > Wall's
> > > > > > > > > > > > > > > >
> > > > paper
> > > > > > > > > > > > > > > >
> > > > however,
> > > > > > > > > > > > > > > >
> > > > because
> > > > > > > > > > > > > > > >
> > > > our
> > > > > > > > > > > > > > > >
> > > > LTO
> > > > > > > > > > > > > > > >
> > > > re-codegens everything anyway.
> > > > > > > > > > > > > > > >
> > > > The
> > > > > > > > > > > > > > > >
> > > > paper
> > > > > > > > > > > > > > > >
> > > > says,
> > > > > > > > > > > > > > > >
> > > > "Finally,
> > > > > > > > > > > > > > > >
> > > > it
> > > > > > > > > > > > > > > >
> > > > keeps
> > > > > > > > > > > > > > > >
> > > > us
> > > > > > > > > > > > > > > >
> > > > honest as designers of the
> > > > > > > > > > > > > > > >
> > > > system;
> > > > > > > > > > > > > > > >
> > > > once
> > > > > > > > > > > > > > > >
> > > > we
> > > > > > > > > > > > > > > >
> > > > postpone
> > > > > > > > > > > > > > > >
> > > > anything
> > > > > > > > > > > > > > > >
> > > > until
> > > > > > > > > > > > > > > >
> > > > link time, the temptation is
> > > > > > > > > > > > > > > >
> > > > great
> > > > > > > > > > > > > > > >
> > > > to
> > > > > > > > > > > > > > > >
> > > > postpone
> > > > > > > > > > > > > > > >
> > > > everything,
> > > > > > > > > > > > > > > >
> > > > ..."
> > > > > > > > > > > > > > > >
> > > > -
> > > > > > > > > > > > > > > >
> > > > Well, we've long-since succumb
> > > > > > > > > > > > > > > >
> > > > to
> > > > > > > > > > > > > > > >
> > > > that
> > > > > > > > > > > > > > > >
> > > > temptation
> > > > > > > > > > > > > > > >
> > > > when
> > > > > > > > > > > > > > > >
> > > > we
> > > > > > > > > > > > > > > >
> > > > LTO.
> > > > > > > > > > > > > > > >
> > > > C'est
> > > > > > > > > > > > > > > >
> > > > la vie.
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > +1 as well, our LTO will benefit
> > > > > > > > > > > > > > > >
> > > naturally
> > > > > > > > > > > > > > > >
> > > from
> > > > > > > > > > > > > > > >
> > > the
> > > > > > > > > > > > > > > >
> > > leaf-to-root
> > > > > > > > > > > > > > > >
> > > information propagation. ThinLTO
> > > > > > > > > > > > > > > >
> > > will
> > > > > > > > > > > > > > > >
> > > be
> > > > > > > > > > > > > > > >
> > > more
> > > > > > > > > > > > > > > >
> > > challenging/interesting though!
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > > For the first part a
> > > > > > > > > > > > > > > >
> > > > > mechanism
> > > > > > > > > > > > > > > >
> > > > > similar
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > MachineModulePass
> > > > > > > > > > > > > > > >
> > > > > would
> > > > > > > > > > > > > > > >
> > > > > be
> > > > > > > > > > > > > > > >
> > > > > desirable but that may not be
> > > > > > > > > > > > > > > >
> > > > > possible
> > > > > > > > > > > > > > > >
> > > > > during
> > > > > > > > > > > > > > > >
> > > > > this
> > > > > > > > > > > > > > > >
> > > > > project,
> > > > > > > > > > > > > > > >
> > > > > but
> > > > > > > > > > > > > > > >
> > > > > if
> > > > > > > > > > > > > > > >
> > > > > we can make some sort of
> > > > > > > > > > > > > > > >
> > > > > smaller
> > > > > > > > > > > > > > > >
> > > > > version
> > > > > > > > > > > > > > > >
> > > > > of
> > > > > > > > > > > > > > > >
> > > > > that
> > > > > > > > > > > > > > > >
> > > > > to
> > > > > > > > > > > > > > > >
> > > > > suit
> > > > > > > > > > > > > > > >
> > > > > our
> > > > > > > > > > > > > > > >
> > > > > purpose.
> > > > > > > > > > > > > > > >
> > > >
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > > I don't think we need to make
> > > > > > > > > > > > > > > >
> > > > any
> > > > > > > > > > > > > > > >
> > > > kind
> > > > > > > > > > > > > > > >
> > > > of
> > > > > > > > > > > > > > > >
> > > > MachineModulePass
> > > > > > > > > > > > > > > >
> > > > to
> > > > > > > > > > > > > > > >
> > > > make
> > > > > > > > > > > > > > > >
> > > > this work. Once we alter the
> > > > > > > > > > > > > > > >
> > > > visitation
> > > > > > > > > > > > > > > >
> > > > order
> > > > > > > > > > > > > > > >
> > > > based
> > > > > > > > > > > > > > > >
> > > > on
> > > > > > > > > > > > > > > >
> > > > the
> > > > > > > > > > > > > > > >
> > > > CGSCC
> > > > > > > > > > > > > > > >
> > > > iteration scheme, we can keep
> > > > > > > > > > > > > > > >
> > > > state
> > > > > > > > > > > > > > > >
> > > > in-between
> > > > > > > > > > > > > > > >
> > > > functions
> > > > > > > > > > > > > > > >
> > > > in
> > > > > > > > > > > > > > > >
> > > > the
> > > > > > > > > > > > > > > >
> > > > pre-existing hacky way (using
> > > > > > > > > > > > > > > >
> > > > static
> > > > > > > > > > > > > > > >
> > > > members
> > > > > > > > > > > > > > > >
> > > > of
> > > > > > > > > > > > > > > >
> > > > the
> > > > > > > > > > > > > > > >
> > > > relevant
> > > > > > > > > > > > > > > >
> > > > function passes).
> > > > > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > Sorry my mistake here by first part
> > > > > > > > > > > > > > > >
> > I
> > > > > > > > > > > > > > > >
> > mean
> > > > > > > > > > > > > > > >
> > 1)
> > > > > > > > > > > > > > > >
> > requirement
> > > > > > > > > > > > > > > >
> > in
> > > > > > > > > > > > > > > >
> > the
> > > > > > > > > > > > > > > >
> > link
> > > > > > > > > > > > > > > >
> > time approach.
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > I also don't see where/why we
> > > > > > > > > > > > > > > >
> > > need
> > > > > > > > > > > > > > > >
> > > a
> > > > > > > > > > > > > > > >
> > > MachineModule(Pass)
> > > > > > > > > > > > > > > >
> > > for
> > > > > > > > > > > > > > > >
> > > the
> > > > > > > > > > > > > > > >
> > > CGSCC scheme, that said I'd
> > > > > > > > > > > > > > > >
> > > rather
> > > > > > > > > > > > > > > >
> > > avoid
> > > > > > > > > > > > > > > >
> > > using
> > > > > > > > > > > > > > > >
> > > a
> > > > > > > > > > > > > > > >
> > > function
> > > > > > > > > > > > > > > >
> > > pass
> > > > > > > > > > > > > > > >
> > > with
> > > > > > > > > > > > > > > >
> > > static members, if we can have a
> > > > > > > > > > > > > > > >
> > > ModuleAnalysis
> > > > > > > > > > > > > > > >
> > > that
> > > > > > > > > > > > > > > >
> > > is
> > > > > > > > > > > > > > > >
> > > bookkeeping
> > > > > > > > > > > > > > > >
> > > the results for functions in the
> > > > > > > > > > > > > > > >
> > > module
> > > > > > > > > > > > > > > >
> > > and
> > > > > > > > > > > > > > > >
> > > queries
> > > > > > > > > > > > > > > >
> > > by
> > > > > > > > > > > > > > > >
> > > the
> > > > > > > > > > > > > > > >
> > > register
> > > > > > > > > > > > > > > >
> > > allocator somehow.
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > Matthias/Quentin may have other
> > > > > > > > > > > > > > > >
> > > inputs
> > > > > > > > > > > > > > > >
> > > on
> > > > > > > > > > > > > > > >
> > > this
> > > > > > > > > > > > > > > >
> > > aspect.
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> @Hal do you mean to add a simple
> > > > > > > > > > > > > > > >
> MachineFunction
> > > > > > > > > > > > > > > >
> pass
> > > > > > > > > > > > > > > >
> that
> > > > > > > > > > > > > > > >
> will
> > > > > > > > > > > > > > > >
> just
> > > > > > > > > > > > > > > >
> operate on register allocated
> > > > > > > > > > > > > > > >
> function
> > > > > > > > > > > > > > > >
> and
> > > > > > > > > > > > > > > >
> prepare
> > > > > > > > > > > > > > > >
> a
> > > > > > > > > > > > > > > >
> BitVector
> > > > > > > > > > > > > > > >
> to
> > > > > > > > > > > > > > > >
> indicate which register is being used
> > > > > > > > > > > > > > > >
> by
> > > > > > > > > > > > > > > >
> MachineFunction,
> > > > > > > > > > > > > > > >
> and
> > > > > > > > > > > > > > > >
> then
> > > > > > > > > > > > > > > >
> use this pass as analysis pass (i.e
> > > > > > > > > > > > > > > >
> just
> > > > > > > > > > > > > > > >
> simply
> > > > > > > > > > > > > > > >
> return
> > > > > > > > > > > > > > > >
> static
> > > > > > > > > > > > > > > >
> BitVector for clobbered register when
> > > > > > > > > > > > > > > >
> register
> > > > > > > > > > > > > > > >
> allocation
> > > > > > > > > > > > > > > >
> for
> > > > > > > > > > > > > > > >
> next
> > > > > > > > > > > > > > > >
> function begins. This part is not
> > > > > > > > > > > > > > > >
> much
> > > > > > > > > > > > > > > >
> clear
> > > > > > > > > > > > > > > >
> to
> > > > > > > > > > > > > > > >
> me)
> > > > > > > > > > > > > > > >
> this
> > > > > > > > > > > > > > > >
> thing
> > > > > > > > > > > > > > > >
> can
> > > > > > > > > > > > > > > >
> be done by scheduling a pass post
> > > > > > > > > > > > > > > >
> register
> > > > > > > > > > > > > > > >
> allocation
> > > > > > > > > > > > > > > >
> in
> > > > > > > > > > > > > > > >
> lib/CodeGen/Passes.cpp
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> void
> > > > > > > > > > > > > > > >
> TargetPassConfig::addMachinePasses()
> > > > > > > > > > > > > > > >
> {
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> .
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> .
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> .
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> // Run pre-ra passes.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> addPreRegAlloc();
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> // Run register allocation and passes
> > > > > > > > > > > > > > > >
> that
> > > > > > > > > > > > > > > >
> are
> > > > > > > > > > > > > > > >
> tightly
> > > > > > > > > > > > > > > >
> coupled
> > > > > > > > > > > > > > > >
> with
> > > > > > > > > > > > > > > >
> it,
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> // including phi elimination and
> > > > > > > > > > > > > > > >
> scheduling.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> if (getOptimizeRegAlloc())
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> addOptimizedRegAlloc(createRegAllocPass(true));
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> else
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> addFastRegAlloc(createRegAllocPass(false));
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> // Run post-ra passes.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> addPostRegAlloc();
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> // Adding a new pass here which keeps
> > > > > > > > > > > > > > > >
> register
> > > > > > > > > > > > > > > >
> mask
> > > > > > > > > > > > > > > >
> information
> > > > > > > > > > > > > > > >
> across function calls.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> .
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> .
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> .
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> }
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> But this also requires current
> > > > > > > > > > > > > > > >
> register
> > > > > > > > > > > > > > > >
> allocators
> > > > > > > > > > > > > > > >
> to
> > > > > > > > > > > > > > > >
> use
> > > > > > > > > > > > > > > >
> this
> > > > > > > > > > > > > > > >
> information in someway because
> > > > > > > > > > > > > > > >
> RegMaskBits
> > > > > > > > > > > > > > > >
> in
> > > > > > > > > > > > > > > >
> LiveIntervalAnalysis.cpp is not
> > > > > > > > > > > > > > > >
> static
> > > > > > > > > > > > > > > >
> across
> > > > > > > > > > > > > > > >
> calls.
> > > > > > > > > > > > > > > >
> I
> > > > > > > > > > > > > > > >
> mean
> > > > > > > > > > > > > > > >
> I
> > > > > > > > > > > > > > > >
> am
> > > > > > > > > > > > > > > >
> not
> > > > > > > > > > > > > > > >
> clear for how to propagate static
> > > > > > > > > > > > > > > >
> info
> > > > > > > > > > > > > > > >
> to
> > > > > > > > > > > > > > > >
> Intra-procedural
> > > > > > > > > > > > > > > >
> Register
> > > > > > > > > > > > > > > >
> allocators (if possible without
> > > > > > > > > > > > > > > >
> disturbing
> > > > > > > > > > > > > > > >
> their
> > > > > > > > > > > > > > > >
> code
> > > > > > > > > > > > > > > >
> )
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
First, my hope is that we won't need to
> > > > > > > > > > > > > > > >
change
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
register
> > > > > > > > > > > > > > > >
allocators, as such, in order to make
> > > > > > > > > > > > > > > >
use
> > > > > > > > > > > > > > > >
of
> > > > > > > > > > > > > > > >
this
> > > > > > > > > > > > > > > >
information.
> > > > > > > > > > > > > > > >
Instead, we'll simply be able to alter
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
register
> > > > > > > > > > > > > > > >
masks
> > > > > > > > > > > > > > > >
generated
> > > > > > > > > > > > > > > >
for the call instructions. These masks
> > > > > > > > > > > > > > > >
will
> > > > > > > > > > > > > > > >
indicate
> > > > > > > > > > > > > > > >
fewer
> > > > > > > > > > > > > > > >
clobbers
> > > > > > > > > > > > > > > >
than might otherwise be present based
> > > > > > > > > > > > > > > >
on
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
ABI
> > > > > > > > > > > > > > > >
because
> > > > > > > > > > > > > > > >
of
> > > > > > > > > > > > > > > >
information gathered during the codegen
> > > > > > > > > > > > > > > >
of
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
callee.
> > > > > > > > > > > > > > > >
These
> > > > > > > > > > > > > > > >
masks
> > > > > > > > > > > > > > > >
are generally constructed by target
> > > > > > > > > > > > > > > >
based
> > > > > > > > > > > > > > > >
on
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
calling
> > > > > > > > > > > > > > > >
convention.
> > > > > > > > > > > > > > > >
The PowerPC backend, for example, looks
> > > > > > > > > > > > > > > >
like
> > > > > > > > > > > > > > > >
this:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
// Add a register mask operand
> > > > > > > > > > > > > > > >
representing
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
call-preserved
> > > > > > > > > > > > > > > >
registers.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
const TargetRegisterInfo *TRI > > > > > > > > > >
> > > > > > Subtarget.getRegisterInfo();
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
const uint32_t *Mask > > > > > > > > > > > >
> > >
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
TRI->getCallPreservedMask(DAG.getMachineFunction(),
> > > > > > > > > > > > > > > >
CallConv);
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
assert(Mask && "Missing call preserved
> > > > > > > > > > > > > > > >
mask
> > > > > > > > > > > > > > > >
for
> > > > > > > > > > > > > > > >
calling
> > > > > > > > > > > > > > > >
convention");
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
Ops.push_back(DAG.getRegisterMask(Mask));
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
but it can be more complicated. If you
> > > > > > > > > > > > > > > >
look
> > > > > > > > > > > > > > > >
for
> > > > > > > > > > > > > > > >
uses
> > > > > > > > > > > > > > > >
of
> > > > > > > > > > > > > > > >
'getRegisterMask' in
> > > > > > > > > > > > > > > >
Target/*/*ISelLowering.cpp,
> > > > > > > > > > > > > > > >
you'll
> > > > > > > > > > > > > > > >
see
> > > > > > > > > > > > > > > >
what
> > > > > > > > > > > > > > > >
I
> > > > > > > > > > > > > > > >
mean. Regardless, the code ends up
> > > > > > > > > > > > > > > >
calling
> > > > > > > > > > > > > > > >
some
> > > > > > > > > > > > > > > >
method
> > > > > > > > > > > > > > > >
is
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
targets TargetRegisterInfo subclass.
> > > > > > > > > > > > > > > >
These
> > > > > > > > > > > > > > > >
methods
> > > > > > > > > > > > > > > >
generally
> > > > > > > > > > > > > > > >
look
> > > > > > > > > > > > > > > >
something like this:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
const uint32_t *
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
PPCRegisterInfo::getCallPreservedMask(const
> > > > > > > > > > > > > > > >
MachineFunction
> > > > > > > > > > > > > > > >
&MF,
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
CallingConv::ID CC) const {
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
const PPCSubtarget &Subtarget > > > > > > > > >
> > > > > > > MF.getSubtarget<PPCSubtarget>();
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
...
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
return TM.isPPC64() ?
> > > > > > > > > > > > > > > >
(Subtarget.hasAltivec()
> > > > > > > > > > > > > > > >
?
> > > > > > > > > > > > > > > >
CSR_SVR464_Altivec_RegMask
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
: CSR_SVR464_RegMask)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
: (Subtarget.hasAltivec() ?
> > > > > > > > > > > > > > > >
CSR_SVR432_Altivec_RegMask
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
: CSR_SVR432_RegMask);
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
}
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
In any case, the fundamental idea here
> > > > > > > > > > > > > > > >
is
> > > > > > > > > > > > > > > >
that,
> > > > > > > > > > > > > > > >
when
> > > > > > > > > > > > > > > >
someone
> > > > > > > > > > > > > > > >
calls
> > > > > > > > > > > > > > > >
getCallPreservedMask in order to set
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
regmask
> > > > > > > > > > > > > > > >
on
> > > > > > > > > > > > > > > >
a
> > > > > > > > > > > > > > > >
call,
> > > > > > > > > > > > > > > >
we
> > > > > > > > > > > > > > > >
might
> > > > > > > > > > > > > > > >
not have to use the CC at all. Instead,
> > > > > > > > > > > > > > > >
if
> > > > > > > > > > > > > > > >
we've
> > > > > > > > > > > > > > > >
already
> > > > > > > > > > > > > > > >
codegened
> > > > > > > > > > > > > > > >
the function, we might use a cache of
> > > > > > > > > > > > > > > >
'exact'
> > > > > > > > > > > > > > > >
register
> > > > > > > > > > > > > > > >
masks
> > > > > > > > > > > > > > > >
computed during codegen of the
> > > > > > > > > > > > > > > >
potential
> > > > > > > > > > > > > > > >
callees
> > > > > > > > > > > > > > > >
instead.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > I am
thinking to add a simple Immutable
> > > > > > > > > > > > > > > pass
> > > > > > > > > > > > > > >
MachineRegisterUsageInfo
> > > > > > > > > > > > > > >
similar to MachineBranchProbabilityInfo
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > >
maintain
> > > > > > > > > > > > > > >
RegisterUsageInformation per function.
> > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > >
simply
> > > > > > > > > > > > > > > done
> > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > using
UsedPhysRegMask from
> > > > > > > > > > > > > > >
MachineRegisterInfo
> > > > > > > > > > > > > > > ??
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > No, like
the comment said, UsedPhysRegMask
> > > > > > > > > > > > > > gives
> > > > > > > > > > > > > > only
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > registers
> > > > > > > > > > > > > > clobbered
by calls:
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > // This
bit vector represents all the
> > > > > > > > > > > > > > registers
> > > > > > > > > > > > > > clobbered
> > > > > > > > > > > > > > by
> > > > > > > > > > > > > > function
> > > > > > > > > > > > > > calls.
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > You want
to build this information yourself
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > top
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > >
MachineRegisterInfo:: isPhysRegModified
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > Ok but then the
time complexity will be O(n)
> > > > > > > > > > > > > n
> > > > > > > > > > > > > > > >
> > > > > > > > > > number
> > > > > > > > > > > > > of
> > > > > > > > > > > > > physical
> > > > > > > > > > > > > register on the
target. Am I going correct?
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > Yes, this is
correct.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > Here
getCallPreservedMask will call API
> > > > > > > > > > > > > > >
provided
> > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > >
MachineRegisterUsageInfo to avail the
> > > > > > > > > > > > > > > exact
> > > > > > > > > > > > > > >
register
> > > > > > > > > > > > > > > mask
> > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > can
know that the function is already
> > > > > > > > > > > > > > >
codegen
> > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > query
> > > > > > > > > > > > > > > each
> > > > > > > > > > > > > > > time
when getCallPreservedMask is called
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > >
available
> > > > > > > > > > > > > > >
MachineRegisterUsageInfo will return the
> > > > > > > > > > > > > > >
details
> > > > > > > > > > > > > > >
otherwise
> > > > > > > > > > > > > > >
simply
> > > > > > > > > > > > > > >
return NULL.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > So
changes will be now in
> > > > > > > > > > > > > > >
TargetRegisterInfo
> > > > > > > > > > > > > > >
implementation
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > each
> > > > > > > > > > > > > > >
target right ??
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
In order to do this, I think we'll need
> > > > > > > > > > > > > > > >
to
> > > > > > > > > > > > > > > >
provide
> > > > > > > > > > > > > > > >
a
> > > > > > > > > > > > > > > >
function
> > > > > > > > > > > > > > > >
callable from the target's
> > > > > > > > > > > > > > > >
getCallPreservedMask
> > > > > > > > > > > > > > > >
implementation,
> > > > > > > > > > > > > > > >
which can return such an 'exact'
> > > > > > > > > > > > > > > >
regmask
> > > > > > > > > > > > > > > >
when
> > > > > > > > > > > > > > > >
available.
> > > > > > > > > > > > > > > >
I
> > > > > > > > > > > > > > > >
think
> > > > > > > > > > > > > > > >
we
> > > > > > > > > > > > > > > >
need to do it this way for two reasons:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
1. Not all of the target code calls
> > > > > > > > > > > > > > > >
getCallPreservedMask,
> > > > > > > > > > > > > > > >
but
> > > > > > > > > > > > > > > >
sometimes calls other similar
> > > > > > > > > > > > > > > >
target-specific
> > > > > > > > > > > > > > > >
functions
> > > > > > > > > > > > > > > >
(e.g.
> > > > > > > > > > > > > > > >
getTLSCallPreservedMask).
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
2. The targets need to opt-in to this
> > > > > > > > > > > > > > > >
behavior
> > > > > > > > > > > > > > > >
because
> > > > > > > > > > > > > > > >
only
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
target can know that all register uses
> > > > > > > > > > > > > > > >
are
> > > > > > > > > > > > > > > >
really
> > > > > > > > > > > > > > > >
tagged
> > > > > > > > > > > > > > > >
correctly
> > > > > > > > > > > > > > > >
post "pre-emit".
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
Because the target is free to introduce
> > > > > > > > > > > > > > > >
uses
> > > > > > > > > > > > > > > >
of
> > > > > > > > > > > > > > > >
registers
> > > > > > > > > > > > > > > >
at
> > > > > > > > > > > > > > > >
essentially any time, we need to do the
> > > > > > > > > > > > > > > >
scanning
> > > > > > > > > > > > > > > >
for
> > > > > > > > > > > > > > > >
used
> > > > > > > > > > > > > > > >
registers
> > > > > > > > > > > > > > > >
after the "pre-emit" passes run. This
> > > > > > > > > > > > > > > >
can
> > > > > > > > > > > > > > > >
be
> > > > > > > > > > > > > > > >
done
> > > > > > > > > > > > > > > >
by
> > > > > > > > > > > > > > > >
scheduling
> > > > > > > > > > > > > > > >
some
> > > > > > > > > > > > > > > >
simple register-use scanning pass after
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
call
> > > > > > > > > > > > > > > >
to
> > > > > > > > > > > > > > > >
addPreEmitPass
> > > > > > > > > > > > > > > >
in lib/CodeGen/Passes.cpp.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> I think this also applies in someway
> > > > > > > > > > > > > > > >
> to
> > > > > > > > > > > > > > > >
> Mehdi
> > > > > > > > > > > > > > > >
> Amini's
> > > > > > > > > > > > > > > >
> idea
> > > > > > > > > > > > > > > >
> to
> > > > > > > > > > > > > > > >
> keep
> > > > > > > > > > > > > > > >
> a
> > > > > > > > > > > > > > > >
> ModulePass for bookkeeping but then
> > > > > > > > > > > > > > > >
> existing
> > > > > > > > > > > > > > > >
> register
> > > > > > > > > > > > > > > >
> allocators
> > > > > > > > > > > > > > > >
> will be required to change so that
> > > > > > > > > > > > > > > >
> the
> > > > > > > > > > > > > > > >
> code
> > > > > > > > > > > > > > > >
> can
> > > > > > > > > > > > > > > >
> query
> > > > > > > > > > > > > > > >
> the
> > > > > > > > > > > > > > > >
> ModulePass
> > > > > > > > > > > > > > > >
> for RegMaskBits for particular
> > > > > > > > > > > > > > > >
> function.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
I think that the simplest way to do
> > > > > > > > > > > > > > > >
this
> > > > > > > > > > > > > > > >
is
> > > > > > > > > > > > > > > >
to
> > > > > > > > > > > > > > > >
create
> > > > > > > > > > > > > > > >
an
> > > > > > > > > > > > > > > >
immutable
> > > > > > > > > > > > > > > >
analysis pass (e.g. BasicAA) that keeps
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
cache
> > > > > > > > > > > > > > > >
of
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
computed
> > > > > > > > > > > > > > > >
register masks. This is somewhat
> > > > > > > > > > > > > > > >
similar
> > > > > > > > > > > > > > > >
in
> > > > > > > > > > > > > > > >
spirit
> > > > > > > > > > > > > > > >
to
> > > > > > > > > > > > > > > >
how
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
'AssumptionCache' analysis works at the
> > > > > > > > > > > > > > > >
IR
> > > > > > > > > > > > > > > >
level.
> > > > > > > > > > > > > > > >
This
> > > > > > > > > > > > > > > >
analysis
> > > > > > > > > > > > > > > >
can
> > > > > > > > > > > > > > > >
then be created by
> > > > > > > > > > > > > > > >
lib/CodeGen/Passes.cpp
> > > > > > > > > > > > > > > >
early,
> > > > > > > > > > > > > > > >
and
> > > > > > > > > > > > > > > >
then
> > > > > > > > > > > > > > > >
queried
> > > > > > > > > > > > > > > >
and passed around later by the
> > > > > > > > > > > > > > > >
CodeGen/Target
> > > > > > > > > > > > > > > >
code.
> > > > > > > > > > > > > > > >
Because
> > > > > > > > > > > > > > > >
it
> > > > > > > > > > > > > > > >
is
> > > > > > > > > > > > > > > >
an
> > > > > > > > > > > > > > > >
immutable analysis, it won't get
> > > > > > > > > > > > > > > >
destroyed
> > > > > > > > > > > > > > > >
until
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
very
> > > > > > > > > > > > > > > >
end,
> > > > > > > > > > > > > > > >
which
> > > > > > > > > > > > > > > >
is also important because, I imagine,
> > > > > > > > > > > > > > > >
it
> > > > > > > > > > > > > > > >
will
> > > > > > > > > > > > > > > >
need
> > > > > > > > > > > > > > > >
to
> > > > > > > > > > > > > > > >
own
> > > > > > > > > > > > > > > >
the
> > > > > > > > > > > > > > > >
memory
> > > > > > > > > > > > > > > >
associated with the generated register
> > > > > > > > > > > > > > > >
masks.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
-Hal
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> Vivek
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > Yes for propagating register usage
> > > > > > > > > > > > > > > >
> > approach
> > > > > > > > > > > > > > > >
> > we
> > > > > > > > > > > > > > > >
> > don't
> > > > > > > > > > > > > > > >
> > need
> > > > > > > > > > > > > > > >
> > MachineModulePass
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > Vivek
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > --
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
> > > Mehdi
> > > > > > > > > > > > > > > >
> >
> > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
--
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
Hal Finkel
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
Assistant Computational Scientist
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
Leadership Computing Facility
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > > > > >
Argonne National Laboratory
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > --
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Hal Finkel
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Assistant Computational
Scientist
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Leadership Computing Facility
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Argonne National Laboratory
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > --
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Hal Finkel
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Assistant Computational Scientist
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Leadership Computing Facility
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Argonne National Laboratory
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160525/9d418259/attachment-0001.html>

vivek pandya via llvm-dev

2016-May-26 02:26 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Yes , we will first finish this approach, it just me being little bit
excited :)

On Thu, May 26, 2016 at 12:31 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> ------------------------------
>
> *From: *"Mehdi Amini" <mehdi.amini at apple.com>
> *To: *"vivek pandya" <vivekvpandya at gmail.com>
> *Cc: *"Hal Finkel" <hfinkel at anl.gov>,
"llvm-dev" <llvm-dev at lists.llvm.org>,
> "Matthias Braun" <matze at braunis.de>, "Quentin
Colombet" <
> qcolombet at apple.com>
> *Sent: *Wednesday, May 25, 2016 1:26:56 AM
> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
> Introduction and Feedback
>
>
>
> Sent from my iPhone
>
> On May 24, 2016, at 11:04 PM, vivek pandya <vivekvpandya at
gmail.com> wrote:
>
>
>
> On Wed, May 25, 2016 at 10:46 AM, Mehdi Amini <mehdi.amini at
apple.com>
> wrote:
>
>>
>> On May 24, 2016, at 10:08 PM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>
>>
>> On Wed, May 25, 2016 at 10:08 AM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>>
>>>
>>> On May 24, 2016, at 9:17 PM, vivek pandya <vivekvpandya at
gmail.com>
>>> wrote:
>>>
>>> Dear Mentors,
>>>
>>> Please help me to understand our plan to implement Interprocedural
>>> Register allocator by propogating register usage info. While
writing this
>>> mail I am considering all previous discussion over llvm-dev and
IRC.
>>>
>>> 1) A MachineFunction pass to be executed POST-RA to collect the
>>> information about the used Registers.
>>> 2) An Immutable pass which will store reg usage info collected by
>>> previous pass and return it whenever queried.
>>> 3) A Target specific MachineFucntion pass that will use the
register
>>> usage info for available for call instrction to achive IPRA. This
pass
>>> should run at PRE-RA.
>>>
>>> Relation among above passes:
>>>
>>> 1) pass will store info to 2) pass as well use info for call
instruction
>>> found while processing.
>>>
>>> 3) pass only requires to query information from 2) pass.
>>>
>>>
>>> Questions
>>> ========>>>
>>>  Which pass is responsible for load/store of callee saved register,
at
>>> the begining of each function call? And how does it uses RegMask of
call
>>> instruction to generate load/store. I think Intra-procedural
register
>>> allocator is not responsible to generate load/store around the call
site.
>>>
>>> /- - -> (A) - - -> (D)
>>> /
>>> (K)- - ->(T)- - -> (B) - - -> (E)
>>> \
>>> \- - -> (C) - - -> (F)
>>>
>>> So as per our discussion we would require following passes:
>>>
>>> Suppose in given example call graph , register allocation for D is
>>> completed now we have that information available So 3) pass while
>>> processing A , it would collect reg usage info for all callees and
OR them
>>> and then it should update A's regmask by going to parant
procedure that
>>> actually calls A ??
>>>
>>>
>>> No, Pass 3) is only looking for every call MI in A and updating the
>>> associated regmask by replacing it with the information stored in
the
>>> immutable pass.
>>>
>> What is the use of that ? Please be more specific.
>>
>>
>> I'm not sure I get the question. You describe 3 different passes at
the
>> beginning, and it seems fairly obvious what is the responsibility of
each.
>>
>> This seems quite straightforward to me, so I'm not sure what to
explain,
>> here is the logical sequence on D and A:
>>
>> 1) Codegen function D, MachineFunction post-RA: collect the register
>> usage for D. Store the new regmask in the immutable pass
>> 2) Move to function A, perform ISel,  then run the MachineFunction
pre-RA
>> that finds the call to D. Query the immutable pass and get the regmask
for
>> D. Update the regmask associated with the call.
>>
>> Done.
>>
> I understand the sequence we discussed till now but as I am also reading
> literature about IPRA so I am confused that how this would remove
> load/store of registers (so that callee can clobber it with out worrying
> about caller's content in registers)
>
>
> From the beginning I told you that it is not true IPRA on this aspect:
> callees have to preserve callee-saved register according to the CC, unless
> internal.
>
>
>
> and also how caller will not use register used by callee.
>
>
> Caller can use them, just like it does now: by saving restoring them
> before/after the call if it does.
>
> So this boils down to question what magic is done by just updating reg
> mask at callsite and we do not eve require to intimate this to
> intra-procedural register allocator,
>
>
> Ask yourself how the RA knows how to deal with what to save around calls
> in general (hint: think about what is the use of regmask).
>
>
> this also means that some where a code is written that is responsible for
> inserting store/load for register so that callee can preserve register but
> in presence of IPRA   how it will not add those store/load?
>
>
> What we're doing here is purely an optimization of the calling
convention.
> This does not have to be more intrusive...
>
>
>
>
> It seems that my questions have confused Mehdi Amini :P
> Other masters please help us !
>
>
>>
>>
>>
>>
>>>
>>> How reg mask details of call D would be used by Register allocator
while
>>> allocating register for A and also not generating load/store for
register
>>> being used by A in body of D as we have callee saved convention.
>>>
>>>
>>> I expect all of that to be handled automatically when updating the
>>> regmask.
>>>
>> Please elaborate your both comments, what ever you have think of it to
>> be.
>>
>>
>>>
>>>
>>> How the pass responsible for generating load/store will optimize
for the
>>> child node of call graph where it does not require to load/store
because
>>> caller will not use register used by callee ? I mean how our IPRA
will take
>>> care of this?
>>>
>>>
>>> I don't understand that.
>>>
>> In above example call graph when pass 3) is executed for D it does not
do
>> any work because there is not call to any other function in body of D
but
>> actually in presence of IPRA there is no need to preserve (store/load)
any
>> register in function D ( and also in other such leaf node of call
graph).
>> The reason for the same is A now aware of register usage in D so it
will
>> not use any register which intersect with D's register usage thus D
need
>> not to worry about any thing.
>> Is this make sense?
>>
>>
>> No. D has to obey the calling convention if it is not internal. So any
>> callee-saved register has to be ... saved.
>>
> Why is that required when caller of D does not use any register used by D?
>
>
> You don't know about the callers when you are doing codegen for D.
> Again: we're not trying to perform pure IPRA here.
>
> +1 -- Please stick to the initial plan. The literature contains many
> interesting possibilities, but I think that the register-mask overriding is
> a useful subset that we can accomplish within the summer. Once we have this
> working, we'll all be in a better place to think about
more-sophisticated
> possibilities.
>
>  -Hal
>
>
> --
> Mehdi
>
>
>
>>
>> --
>> Mehdi
>>
>>
>>
>>
>>
>>>
>>>
>>> In short I am not much clear with the method for using information
to
>>> get effect of IPRA without modifying Register allocator them
self(i.e by
>>> updating regMask of call instructions).
>>>
>>> Also 1) pass and 3) pass are seem to intersecting for their work,
for
>>> example consider while scanning register usage info for T function
the
>>> final register usage info should be <all regs used by T> OR
< reg usage
>>> info A > OR <reg usage info B > OR < reg usage info C
> because K should
>>> not use any register which is used by T, A ,D, B, C, E, F with out
>>> load/store the relevant paper also discuss this situation and
suggest to
>>> fall back to load/store approach. So as we move to upper region of
the call
>>> graph it is very likely that enough regiseters are not there to
allocate.
>>>
>>>
>>> The calling convention (for anything else than internal function)
will
>>> always have some callee-saved registers.
>>> If we have a deep call-graph of internal-only function, then we may
>>> generate a lot of spill at the top. Somehow we may have to think
about
>>> driving some heuristic with PGO.
>>>
>>
>> This kind of problem is addressed by this paper Register allocation
>> across procedure and module boundaries -
>> http://dl.acm.org/citation.cfm?id=93551
>> In this paper authors have tried to eliminate load/store only in call
>> intensive regions ( contrast to simple bottom up ) by analyzing call
>> frequency at compile time ( also has facility to plug in Profile info)
and
>> rest of the call graph will follow normal register allocator. But yes
we
>> will think about this later.
>>
>> I suspect we can think about that a bit later. Let's focus on the
simple
>>> for now.
>>>
>>> --
>>> Mehdi
>>>
>>>
>>>
>>>
>>> Please bear with my silly questions.
>>>
>>> Sincerely,
>>> Vivek
>>>
>>>
>>> On Wed, May 25, 2016 at 8:46 AM, vivek pandya <vivekvpandya at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, May 25, 2016 at 8:44 AM, Hal Finkel <hfinkel at
anl.gov> wrote:
>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> *From: *"vivek pandya" <vivekvpandya at
gmail.com>
>>>>> *To: *"Hal Finkel" <hfinkel at anl.gov>
>>>>> *Cc: *"llvm-dev" <llvm-dev at
lists.llvm.org>, "Matthias Braun" <
>>>>> matze at braunis.de>, "Mehdi Amini"
<mehdi.amini at apple.com>, "Quentin
>>>>> Colombet" <qcolombet at apple.com>
>>>>> *Sent: *Tuesday, May 24, 2016 9:34:29 PM
>>>>>
>>>>> *Subject: *Re: [GSoC 2016] Interprocedural Register
Allocation -
>>>>> Introduction and Feedback
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 25, 2016 at 3:53 AM, Hal Finkel <hfinkel at
anl.gov> wrote:
>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>> *From: *"vivek pandya" <vivekvpandya at
gmail.com>
>>>>>> *To: *"Quentin Colombet" <qcolombet at
apple.com>
>>>>>> *Cc: *"Hal Finkel" <hfinkel at
anl.gov>, "llvm-dev" <
>>>>>> llvm-dev at lists.llvm.org>, "Matthias
Braun" <matze at braunis.de>,
>>>>>> "Mehdi Amini" <mehdi.amini at
apple.com>
>>>>>> *Sent: *Tuesday, May 24, 2016 1:00:58 PM
>>>>>> *Subject: *Re: [GSoC 2016] Interprocedural Register
Allocation -
>>>>>> Introduction and Feedback
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have written following code to check each register if
it is used by
>>>>>> machineFunction or not :
>>>>>>
>>>>>> MachineRegisterInfo *MRI = &MF.getRegInfo();
>>>>>> TargetRegisterInfo *TRI = (TargetRegisterInfo
>>>>>> *)MF.getSubtarget().getRegisterInfo();
>>>>>>
>>>>>> Some reason you can't use a const pointer here?
>>>>>>
>>>>> MCRegisterInfo is just used to get conventional name of
register for
>>>>> given target like AX, BX on X86.
>>>>>
>>>>>>
>>>>>> const TargetMachine &TM = MF.getTarget();
>>>>>> const MCRegisterInfo *MCRI = TM.getMCRegisterInfo();
>>>>>> DEBUG(dbgs() << "Function Name : "
<< MF.getName() << "\n");
>>>>>>
>>>>>> for(TargetRegisterInfo::regclass_iterator i
>>>>>> (*TRI).regclass_begin(), e = (*TRI).regclass_end(); i
!= e; i++ ) {
>>>>>> for(TargetRegisterClass::iterator pregi =
(*i)->begin(), prege >>>>>> (*i)->end(); pregi != prege;
pregi++ ) {
>>>>>> DEBUG( dbgs() << "Physical Register : "
<< MCRI->getName(*pregi) << "
>>>>>> is modified "<<
MRI->isPhysRegModified(*pregi) << " \n");
>>>>>>
>>>>>> Try isPhysRegUsed.
>>>>>>
>>>>> ok
>>>>>
>>>>>>
>>>>>> }
>>>>>> }
>>>>>> DEBUG(dbgs() << "\n");
>>>>>>
>>>>>> The pass which is executing this code is schedule
POST-RA stage but
>>>>>> this gives me true for all registers i.e in each
function all registers are
>>>>>> being used except EBP and some other similar, Is this a
correct way to get
>>>>>> register usage information ? I think I have made some
mistake please help.
>>>>>>
>>>>>>
>>>>>> You might look at the implementation of these functions
in
>>>>>> lib/CodeGen/MachineRegisterInfo.cpp and figure out if
they're returning
>>>>>> true because UsedPhysRegMask.test(PhysReg) is true or
because
>>>>>> reg_nodbg_empty(*AliasReg) is true.
>>>>>>
>>>>> Yes that helped now I am getting actual register which have
been used
>>>>> by given function, but a little problem
>>>>> The updated code is as shown below :
>>>>> for(TargetRegisterInfo::regclass_iterator i =
(*TRI).regclass_begin(),
>>>>> e = (*TRI).regclass_end(); i != e; i++ ) {
>>>>> for(TargetRegisterClass::iterator pregi = (*i)->begin(),
prege >>>>> (*i)->end(); pregi != prege; pregi++ ) {
>>>>> for (MCRegAliasIterator AliasReg(*pregi, TRI, true);
>>>>> AliasReg.isValid(); ++AliasReg) {
>>>>>    if (!MRI->reg_nodbg_empty(*AliasReg)) {
>>>>>     DEBUG( dbgs() << "Physical Register : "
<< MCRI->getName(*pregi)
>>>>> << " is used "<<
MRI->isPhysRegUsed(*pregi) << " \n");
>>>>>     break; // no need to process more alias
>>>>>    }
>>>>>   }
>>>>> }
>>>>> }
>>>>> But here some registers are getting processed with in
different
>>>>> classes (unnecessary processing) Is this only way to
iterate through all
>>>>> used register (using RegClass iterator) ? Is there any way
to avoid
>>>>> duplicate regs?
>>>>> Of course currently I am just printing but next I am
thinking to use a
>>>>> map to track usage info , in that only distinct register
info will be
>>>>> stored but still due to loop structure I need to iterate
through a single
>>>>> register 3 - 4 times making it time consuming.
>>>>>
>>>>> Yes, I believe you can just do:
>>>>>
>>>>>   for (unsigned Reg = 0; Reg < TRI->getNumRegs();
++Reg) {
>>>>>
>>>> Oh yes thanks I just forgot that PhyReg starts at 0.
>>>>
>>>>>
>>>>>
>>>>>  -Hal
>>>>>
>>>>>
>>>>> -Vivek
>>>>>
>>>>>>
>>>>>>
>>>>>>  -Hal
>>>>>>
>>>>>>
>>>>>>
>>>>>> Vivek
>>>>>>
>>>>>> On Wed, May 18, 2016 at 11:42 PM, Quentin Colombet <
>>>>>> qcolombet at apple.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> On May 18, 2016, at 11:00 AM, vivek pandya
<vivekvpandya at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Vivek Pandya*
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 18, 2016 at 11:25 PM, Quentin Colombet
<
>>>>>>> qcolombet at apple.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On May 18, 2016, at 10:46 AM, vivek pandya
<vivekvpandya at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Vivek Pandya*
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 11, 2016 at 4:01 PM, Hal Finkel
<hfinkel at anl.gov>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------
>>>>>>>>>
>>>>>>>>> *From: *"vivek pandya"
<vivekvpandya at gmail.com>
>>>>>>>>> *To: *"Mehdi Amini"
<mehdi.amini at apple.com>
>>>>>>>>> *Cc: *"Hal Finkel" <hfinkel at
anl.gov>, "Quentin Colombet" <
>>>>>>>>> qcolombet at apple.com>,
"llvm-dev" <llvm-dev at lists.llvm.org>,
>>>>>>>>> "Matthias Braun" <matze at
braunis.de>
>>>>>>>>> *Sent: *Wednesday, May 11, 2016 3:15:03 AM
>>>>>>>>> *Subject: *Re: [GSoC 2016] Interprocedural
Register Allocation -
>>>>>>>>> Introduction and Feedback
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Vivek Pandya*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 11, 2016 at 10:02 AM, vivek
pandya <
>>>>>>>>> vivekvpandya at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Vivek Pandya*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 11, 2016 at 9:43 AM, Mehdi
Amini <
>>>>>>>>>> mehdi.amini at apple.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On May 10, 2016, at 6:06 PM, Hal
Finkel <hfinkel at anl.gov> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------
>>>>>>>>>>>
>>>>>>>>>>> *From: *"vivek pandya"
<vivekvpandya at gmail.com>
>>>>>>>>>>> *To: *"llvm-dev"
<llvm-dev at lists.llvm.org>, "Tim Amini Golling"
>>>>>>>>>>> <mehdi.amini at apple.com>,
"Hal Finkel" <hfinkel at anl.gov>
>>>>>>>>>>> *Cc: *"Quentin Colombet"
<qcolombet at apple.com>
>>>>>>>>>>> *Sent: *Tuesday, May 10, 2016
2:59:16 PM
>>>>>>>>>>> *Subject: *[GSoC 2016]
Interprocedural Register Allocation -
>>>>>>>>>>> Introduction and Feedback
>>>>>>>>>>>
>>>>>>>>>>> Hello LLVM Community,
>>>>>>>>>>>
>>>>>>>>>>> Sorry for delay as I was busy in
final exams.
>>>>>>>>>>>
>>>>>>>>>>> I am Vivek from India. Thanks for
choosing my proposal for
>>>>>>>>>>> Interprocedural Register Allocation
(IPRA) in LLVM. Mehdi Amini and Hal
>>>>>>>>>>> Finkel will be mentoring me for
this project.
>>>>>>>>>>>
>>>>>>>>>>> IPRA can reduce code size and
runtime of programs by allocating
>>>>>>>>>>> register across the module and
procedure boundaries.
>>>>>>>>>>>
>>>>>>>>>>> I have identified some old but
effective research work on this
>>>>>>>>>>> area.
>>>>>>>>>>> I want community's feedback for
feasibility of these approach
>>>>>>>>>>> and I am targeting to implement two
of them during this project.
>>>>>>>>>>>
>>>>>>>>>>> Here is list of the papers, I have
read first two papers and I
>>>>>>>>>>> would like to discuss those
approach first, I will read other two paper
>>>>>>>>>>> then initiate discussion for them
as well. All I want is to find out a
>>>>>>>>>>> concrete implementation plan before
23 May, 2016 and for that I need
>>>>>>>>>>> community's help.
>>>>>>>>>>>
>>>>>>>>>>> 1) Compile time ----- Minimizing
register usage penalty at
>>>>>>>>>>> procedure calls -
http://dl.acm.org/citation.cfm?id=53999
>>>>>>>>>>>
====================================================================In
>>>>>>>>>>> this approach intra-procedural
register allocation is used as base but
>>>>>>>>>>> machine code generation order is
bottom up traversal of call graph and
>>>>>>>>>>> inter-procedural effect is achieved
by propagating register usage
>>>>>>>>>>> information of callee function to
caller (i.e child to parent in CallGraph)
>>>>>>>>>>> so that caller can use different
registers than callee and can save load
>>>>>>>>>>> store cost at procedure call, this
is not trivial as it seems due to
>>>>>>>>>>> recursive calls, library function
usage etc. Also for upper region of the
>>>>>>>>>>> graph in this technique available
number of registers might become zero in
>>>>>>>>>>> that case it should fall back to
normal load store at procedure call. Apart
>>>>>>>>>>> from these difficulties other
difficulties have been identified please
>>>>>>>>>>> follow this mail-chain
>>>>>>>>>>>
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>>>>>>>>>>> My mentor has already provided me a
patch that alters code
>>>>>>>>>>> generation order as per bottom up
call graph traversal, I am working from
>>>>>>>>>>> that point now. Any other
help/suggestion is always welcomed.
>>>>>>>>>>>
>>>>>>>>>>> 2) Link time ----- Global register
allocation at link time -
>>>>>>>>>>>
http://dl.acm.org/citation.cfm?id=989415
>>>>>>>>>>>
====================================================================In
>>>>>>>>>>> this particular approach (sort of
true IPRA) registers will be reallocated
>>>>>>>>>>> (this optimization will be optional
if turned off still code will be
>>>>>>>>>>> compiled as per intra-procedural
allocation) at link time. Here modules are
>>>>>>>>>>> first complied as per normal
compilation but the object code is annotated
>>>>>>>>>>> with details so that linker can
build call graph and also calculate usage
>>>>>>>>>>> information at link time. Compiler
also write hints in object code that if
>>>>>>>>>>> particular variable is allocated in
some other register ( due to new
>>>>>>>>>>> allocation) then how the code
should be changed? Thus linker can use these
>>>>>>>>>>> information to decide which
variables (global) need to be in same register
>>>>>>>>>>> through out the program execution
and also according to register usage
>>>>>>>>>>> information in call graph which
procedure will not be active simultaneously
>>>>>>>>>>> so that locals for that procedures
can be in same registers with out load
>>>>>>>>>>> store at procedure calls.
>>>>>>>>>>> For these particular method help me
to analyze feasibility:
>>>>>>>>>>> 1) Can llvm collects following
information at module level in
>>>>>>>>>>> MachineIR? list of procedures in
module, list of locals in procedures, list
>>>>>>>>>>> of procedures that a particular
procedure can call, and a list of the
>>>>>>>>>>> variables this procedure
references. Each entry in the last two lists
>>>>>>>>>>> includes an estimate of the number
of times the procedure is called or the
>>>>>>>>>>> variable is referenced in each
execution of this procedure
>>>>>>>>>>> 2) Can llvm write informative
commands to object files?
>>>>>>>>>>> 3) Can LTO is capable of leveraging
those commands?
>>>>>>>>>>>
>>>>>>>>>>> In terms of scoping the project for
the summer, I definitely
>>>>>>>>>>> recommend that you focus on (1)
first. If you finish that, we can certainly
>>>>>>>>>>> move on to other things.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'll add +1 here, but I already
wrote the same thing on IRC when
>>>>>>>>>>> discussing with Vivek. True IPRA
without a proper MachineModule
>>>>>>>>>>> infrastructure won't be doable
in my opinion (even with such
>>>>>>>>>>> infrastructure, it may not be
trivial in LLVM in general).
>>>>>>>>>>>
>>>>>>>>>>> Regarding link time, note that any
such a design would likely
>>>>>>>>>>> look much different than in David
Wall's paper however, because our LTO
>>>>>>>>>>> re-codegens everything anyway. The
paper says, "Finally, it keeps us honest
>>>>>>>>>>> as designers of the system; once we
postpone anything until link time, the
>>>>>>>>>>> temptation is great to postpone
everything, ..." - Well, we've long-since
>>>>>>>>>>> succumb to that temptation when we
LTO. C'est la vie.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +1 as well, our LTO will benefit
naturally from the leaf-to-root
>>>>>>>>>>> information propagation. ThinLTO
will be more challenging/interesting
>>>>>>>>>>> though!
>>>>>>>>>>>
>>>>>>>>>>> For the first part a mechanism
similar to MachineModulePass
>>>>>>>>>>> would be desirable but that may not
be possible during this project, but if
>>>>>>>>>>> we can make some sort of smaller
version of that to suit our purpose.
>>>>>>>>>>>
>>>>>>>>>>> I don't think we need to make
any kind of MachineModulePass to
>>>>>>>>>>> make this work. Once we alter the
visitation order based on the CGSCC
>>>>>>>>>>> iteration scheme, we can keep state
in-between functions in the
>>>>>>>>>>> pre-existing hacky way (using
static members of the relevant function
>>>>>>>>>>> passes).
>>>>>>>>>>>
>>>>>>>>>>>  Sorry my mistake here by first
part I mean 1) requirement in
>>>>>>>>>> the link time approach.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I also don't see where/why we
need a MachineModule(Pass) for the
>>>>>>>>>>> CGSCC scheme, that said I'd
rather avoid using a function pass with static
>>>>>>>>>>> members, if we can have a
ModuleAnalysis that is bookkeeping the results
>>>>>>>>>>> for functions in the module and
queries by the register allocator somehow.
>>>>>>>>>>> Matthias/Quentin may have other
inputs on this aspect.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> @Hal do you mean to add a simple
MachineFunction pass that will
>>>>>>>>> just operate on register allocated function
and prepare a BitVector to
>>>>>>>>> indicate which register is being used by
MachineFunction, and then use this
>>>>>>>>> pass as analysis pass (i.e just simply
return static BitVector for
>>>>>>>>> clobbered register when register allocation
for next function begins. This
>>>>>>>>> part is not much clear to me) this thing
can be done by scheduling a pass
>>>>>>>>> post register allocation in
lib/CodeGen/Passes.cpp
>>>>>>>>>
>>>>>>>>> void TargetPassConfig::addMachinePasses() {
>>>>>>>>> .
>>>>>>>>> .
>>>>>>>>> .
>>>>>>>>>   // Run pre-ra passes.
>>>>>>>>>   addPreRegAlloc();
>>>>>>>>>
>>>>>>>>>   // Run register allocation and passes
that are tightly coupled
>>>>>>>>> with it,
>>>>>>>>>   // including phi elimination and
scheduling.
>>>>>>>>>   if (getOptimizeRegAlloc())
>>>>>>>>>    
addOptimizedRegAlloc(createRegAllocPass(true));
>>>>>>>>>   else
>>>>>>>>>    
addFastRegAlloc(createRegAllocPass(false));
>>>>>>>>>
>>>>>>>>>   // Run post-ra passes.
>>>>>>>>>   addPostRegAlloc();
>>>>>>>>> // Adding a new pass here which keeps
register mask information
>>>>>>>>> across function calls.
>>>>>>>>> .
>>>>>>>>> .
>>>>>>>>> .
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> But this also requires current register
allocators to use this
>>>>>>>>> information in someway because RegMaskBits
in LiveIntervalAnalysis.cpp is
>>>>>>>>> not static across calls. I mean I am not
clear for how to propagate static
>>>>>>>>> info to Intra-procedural Register
allocators (if possible without
>>>>>>>>> disturbing their code )
>>>>>>>>>
>>>>>>>>> First, my hope is that we won't need to
change the register
>>>>>>>>> allocators, as such, in order to make use
of this information. Instead,
>>>>>>>>> we'll simply be able to alter the
register masks generated for the call
>>>>>>>>> instructions. These masks will indicate
fewer clobbers than might otherwise
>>>>>>>>> be present based on the ABI because of
information gathered during the
>>>>>>>>> codegen of the callee. These masks are
generally constructed by target
>>>>>>>>> based on the calling convention. The
PowerPC backend, for example, looks
>>>>>>>>> like this:
>>>>>>>>>
>>>>>>>>>   // Add a register mask operand
representing the call-preserved
>>>>>>>>> registers.
>>>>>>>>>   const TargetRegisterInfo *TRI =
Subtarget.getRegisterInfo();
>>>>>>>>>   const uint32_t *Mask
>>>>>>>>>      
TRI->getCallPreservedMask(DAG.getMachineFunction(),
>>>>>>>>> CallConv);
>>>>>>>>>   assert(Mask && "Missing call
preserved mask for calling
>>>>>>>>> convention");
>>>>>>>>>   Ops.push_back(DAG.getRegisterMask(Mask));
>>>>>>>>>
>>>>>>>>> but it can be more complicated. If you look
for uses of
>>>>>>>>> 'getRegisterMask' in
Target/*/*ISelLowering.cpp, you'll see what I mean.
>>>>>>>>> Regardless, the code ends up calling some
method is the targets
>>>>>>>>> TargetRegisterInfo subclass. These methods
generally look something like
>>>>>>>>> this:
>>>>>>>>>
>>>>>>>>> const uint32_t *
>>>>>>>>> PPCRegisterInfo::getCallPreservedMask(const
MachineFunction &MF,
>>>>>>>>>                                      
CallingConv::ID CC) const {
>>>>>>>>>   const PPCSubtarget &Subtarget =
MF.getSubtarget<PPCSubtarget>();
>>>>>>>>>   ...
>>>>>>>>>   return TM.isPPC64() ?
(Subtarget.hasAltivec() ?
>>>>>>>>> CSR_SVR464_Altivec_RegMask
>>>>>>>>>                                            
:
>>>>>>>>> CSR_SVR464_RegMask)
>>>>>>>>>                       :
(Subtarget.hasAltivec() ?
>>>>>>>>> CSR_SVR432_Altivec_RegMask
>>>>>>>>>                                            
:
>>>>>>>>> CSR_SVR432_RegMask);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> In any case, the fundamental idea here is
that, when someone calls
>>>>>>>>> getCallPreservedMask in order to set the
regmask on a call, we might not
>>>>>>>>> have to use the CC at all. Instead, if
we've already codegened the
>>>>>>>>> function, we might use a cache of
'exact' register masks computed during
>>>>>>>>> codegen of the potential callees instead.
>>>>>>>>>
>>>>>>>> I am thinking to add a simple Immutable pass
>>>>>>>> MachineRegisterUsageInfo similar to
MachineBranchProbabilityInfo that can
>>>>>>>> maintain RegisterUsageInformation per function.
Can it be simply done by
>>>>>>>> using UsedPhysRegMask from MachineRegisterInfo
??
>>>>>>>>
>>>>>>>>
>>>>>>>> No, like the comment said, UsedPhysRegMask
gives only the registers
>>>>>>>> clobbered by calls:
>>>>>>>> // This bit vector represents all the registers
clobbered by
>>>>>>>> function calls.
>>>>>>>>
>>>>>>>> You want to build this information yourself on
top of
>>>>>>>> MachineRegisterInfo::isPhysRegModified
>>>>>>>>
>>>>>>> Ok but then the time complexity will be O(n) n =
number of physical
>>>>>>> register on the target. Am I going correct?
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is correct.
>>>>>>>
>>>>>>>
>>>>>>>> Here getCallPreservedMask will call API
provided by
>>>>>>>> MachineRegisterUsageInfo to avail the exact
register mask but how it can
>>>>>>>> know that the function is already codegen or it
will query each time when
>>>>>>>> getCallPreservedMask is called and of available
MachineRegisterUsageInfo
>>>>>>>> will return the details otherwise simply return
NULL.
>>>>>>>> So changes will be now in TargetRegisterInfo
implementation for
>>>>>>>> each target right ??
>>>>>>>>
>>>>>>>>
>>>>>>>>> In order to do this, I think we'll need
to provide a function
>>>>>>>>> callable from the target's
getCallPreservedMask implementation, which can
>>>>>>>>> return such an 'exact' regmask when
available. I think we need to do it
>>>>>>>>> this way for two reasons:
>>>>>>>>>
>>>>>>>>>  1. Not all of the target code calls
getCallPreservedMask, but
>>>>>>>>> sometimes calls other similar
target-specific functions (e.g.
>>>>>>>>> getTLSCallPreservedMask).
>>>>>>>>>  2. The targets need to opt-in to this
behavior because only the
>>>>>>>>> target can know that all register uses are
really tagged correctly post
>>>>>>>>> "pre-emit".
>>>>>>>>>
>>>>>>>>> Because the target is free to introduce
uses of registers at
>>>>>>>>> essentially any time, we need to do the
scanning for used registers after
>>>>>>>>> the "pre-emit" passes run. This
can be done by scheduling some simple
>>>>>>>>> register-use scanning pass after the call
to addPreEmitPass in
>>>>>>>>> lib/CodeGen/Passes.cpp.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think this also applies in someway to
Mehdi Amini's idea to keep
>>>>>>>>> a ModulePass for bookkeeping but then
existing register allocators will be
>>>>>>>>> required to change so that the code can
query the ModulePass for
>>>>>>>>> RegMaskBits for particular function.
>>>>>>>>>
>>>>>>>>> I think that the simplest way to do this is
to create an immutable
>>>>>>>>> analysis pass (e.g. BasicAA) that keeps the
cache of the computed register
>>>>>>>>> masks. This is somewhat similar in spirit
to how the 'AssumptionCache'
>>>>>>>>> analysis works at the IR level. This
analysis can then be created by
>>>>>>>>> lib/CodeGen/Passes.cpp early, and then
queried and passed around later by
>>>>>>>>> the CodeGen/Target code. Because it is an
immutable analysis, it won't get
>>>>>>>>> destroyed until the very end, which is also
important because, I imagine,
>>>>>>>>> it will need to own the memory associated
with the generated register masks.
>>>>>>>>>
>>>>>>>>>  -Hal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vivek
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Yes for propagating register usage
approach we don't need
>>>>>>>>>> MachineModulePass
>>>>>>>>>>
>>>>>>>>>> Vivek
>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Mehdi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Hal Finkel
>>>>>>>>> Assistant Computational Scientist
>>>>>>>>> Leadership Computing Facility
>>>>>>>>> Argonne National Laboratory
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hal Finkel
>>>>>> Assistant Computational Scientist
>>>>>> Leadership Computing Facility
>>>>>> Argonne National Laboratory
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Hal Finkel
>>>>> Assistant Computational Scientist
>>>>> Leadership Computing Facility
>>>>> Argonne National Laboratory
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160526/27ec29ba/attachment.html>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - May 2016 - [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Reasonably Related Threads