thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] Interprocedural Register Allocation

If this information is useful, please help other people find it:
Share via:

vivek pandya via llvm-dev

2016-May-24 18:00 UTC

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Hello,

I have written following code to check each register if it is used by
machineFunction or not :

MachineRegisterInfo *MRI = &MF.getRegInfo();
TargetRegisterInfo *TRI = (TargetRegisterInfo
*)MF.getSubtarget().getRegisterInfo();
const TargetMachine &TM = MF.getTarget();
const MCRegisterInfo *MCRI = TM.getMCRegisterInfo();
DEBUG(dbgs() << "Function Name : " << MF.getName()
<< "\n");

for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e
(*TRI).regclass_end(); i != e; i++ ) {
for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege
(*i)->end(); pregi != prege; pregi++ ) {
DEBUG( dbgs() << "Physical Register : " <<
MCRI->getName(*pregi) << " is
modified "<< MRI->isPhysRegModified(*pregi) << "
\n");
}
}
DEBUG(dbgs() << "\n");

The pass which is executing this code is schedule POST-RA stage but this
gives me true for all registers i.e in each function all registers are
being used except EBP and some other similar, Is this a correct way to get
register usage information ? I think I have made some mistake please help.

Vivek

On Wed, May 18, 2016 at 11:42 PM, Quentin Colombet <qcolombet at
apple.com>
wrote:
>
> On May 18, 2016, at 11:00 AM, vivek pandya <vivekvpandya at
gmail.com> wrote:
>
>
>
> *Vivek Pandya*
>
>
> On Wed, May 18, 2016 at 11:25 PM, Quentin Colombet <qcolombet at
apple.com>
> wrote:
>
>>
>> On May 18, 2016, at 10:46 AM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>
>>
>> *Vivek Pandya*
>>
>>
>> On Wed, May 11, 2016 at 4:01 PM, Hal Finkel <hfinkel at anl.gov>
wrote:
>>
>>>
>>> ------------------------------
>>>
>>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>>> *To: *"Mehdi Amini" <mehdi.amini at apple.com>
>>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>,
"Quentin Colombet" <
>>> qcolombet at apple.com>, "llvm-dev" <llvm-dev at
lists.llvm.org>, "Matthias
>>> Braun" <matze at braunis.de>
>>> *Sent: *Wednesday, May 11, 2016 3:15:03 AM
>>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
>>> Introduction and Feedback
>>>
>>>
>>>
>>>
>>> *Vivek Pandya*
>>>
>>>
>>> On Wed, May 11, 2016 at 10:02 AM, vivek pandya <vivekvpandya at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> *Vivek Pandya*
>>>>
>>>>
>>>> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at
apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at
anl.gov> wrote:
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> *From: *"vivek pandya" <vivekvpandya at
gmail.com>
>>>>> *To: *"llvm-dev" <llvm-dev at
lists.llvm.org>, "Tim Amini Golling" <
>>>>> mehdi.amini at apple.com>, "Hal Finkel"
<hfinkel at anl.gov>
>>>>> *Cc: *"Quentin Colombet" <qcolombet at
apple.com>
>>>>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
>>>>> *Subject: *[GSoC 2016] Interprocedural Register Allocation
-
>>>>> Introduction and Feedback
>>>>>
>>>>> Hello LLVM Community,
>>>>>
>>>>> Sorry for delay as I was busy in final exams.
>>>>>
>>>>> I am Vivek from India. Thanks for choosing my proposal for
>>>>> Interprocedural Register Allocation (IPRA) in LLVM. Mehdi
Amini and Hal
>>>>> Finkel will be mentoring me for this project.
>>>>>
>>>>> IPRA can reduce code size and runtime of programs by
allocating
>>>>> register across the module and procedure boundaries.
>>>>>
>>>>> I have identified some old but effective research work on
this area.
>>>>> I want community's feedback for feasibility of these
approach and I am
>>>>> targeting to implement two of them during this project.
>>>>>
>>>>> Here is list of the papers, I have read first two papers
and I would
>>>>> like to discuss those approach first, I will read other two
paper then
>>>>> initiate discussion for them as well. All I want is to find
out a concrete
>>>>> implementation plan before 23 May, 2016 and for that I need
community's
>>>>> help.
>>>>>
>>>>> 1) Compile time ----- Minimizing register usage penalty at
procedure
>>>>> calls - http://dl.acm.org/citation.cfm?id=53999
>>>>>
====================================================================In
>>>>> this approach intra-procedural register allocation is used
as base but
>>>>> machine code generation order is bottom up traversal of
call graph and
>>>>> inter-procedural effect is achieved by propagating register
usage
>>>>> information of callee function to caller (i.e child to
parent in CallGraph)
>>>>> so that caller can use different registers than callee and
can save load
>>>>> store cost at procedure call, this is not trivial as it
seems due to
>>>>> recursive calls, library function usage etc. Also for upper
region of the
>>>>> graph in this technique available number of registers might
become zero in
>>>>> that case it should fall back to normal load store at
procedure call. Apart
>>>>> from these difficulties other difficulties have been
identified please
>>>>> follow this mail-chain
>>>>>
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>>>>> My mentor has already provided me a patch that alters code
generation
>>>>> order as per bottom up call graph traversal, I am working
from that point
>>>>> now. Any other help/suggestion is always welcomed.
>>>>>
>>>>> 2) Link time ----- Global register allocation at link time
-
>>>>> http://dl.acm.org/citation.cfm?id=989415
>>>>>
====================================================================In
>>>>> this particular approach (sort of true IPRA) registers will
be reallocated
>>>>> (this optimization will be optional if turned off still
code will be
>>>>> compiled as per intra-procedural allocation) at link time.
Here modules are
>>>>> first complied as per normal compilation but the object
code is annotated
>>>>> with details so that linker can build call graph and also
calculate usage
>>>>> information at link time. Compiler also write hints in
object code that if
>>>>> particular variable is allocated in some other register (
due to new
>>>>> allocation) then how the code should be changed? Thus
linker can use these
>>>>> information to decide which variables (global) need to be
in same register
>>>>> through out the program execution and also according to
register usage
>>>>> information in call graph which procedure will not be
active simultaneously
>>>>> so that locals for that procedures can be in same registers
with out load
>>>>> store at procedure calls.
>>>>> For these particular method help me to analyze feasibility:
>>>>> 1) Can llvm collects following information at module level
in
>>>>> MachineIR? list of procedures in module, list of locals in
procedures, list
>>>>> of procedures that a particular procedure can call, and a
list of the
>>>>> variables this procedure references. Each entry in the last
two lists
>>>>> includes an estimate of the number of times the procedure
is called or the
>>>>> variable is referenced in each execution of this procedure
>>>>> 2) Can llvm write informative commands to object files?
>>>>> 3) Can LTO is capable of leveraging those commands?
>>>>>
>>>>> In terms of scoping the project for the summer, I
definitely recommend
>>>>> that you focus on (1) first. If you finish that, we can
certainly move on
>>>>> to other things.
>>>>>
>>>>>
>>>>> I'll add +1 here, but I already wrote the same thing on
IRC when
>>>>> discussing with Vivek. True IPRA without a proper
MachineModule
>>>>> infrastructure won't be doable in my opinion (even with
such
>>>>> infrastructure, it may not be trivial in LLVM in general).
>>>>>
>>>>> Regarding link time, note that any such a design would
likely look
>>>>> much different than in David Wall's paper however,
because our LTO
>>>>> re-codegens everything anyway. The paper says,
"Finally, it keeps us honest
>>>>> as designers of the system; once we postpone anything until
link time, the
>>>>> temptation is great to postpone everything, ..." -
Well, we've long-since
>>>>> succumb to that temptation when we LTO. C'est la vie.
>>>>>
>>>>>
>>>>> +1 as well, our LTO will benefit naturally from the
leaf-to-root
>>>>> information propagation. ThinLTO will be more
challenging/interesting
>>>>> though!
>>>>>
>>>>> For the first part a mechanism similar to MachineModulePass
would be
>>>>> desirable but that may not be possible during this project,
but if we can
>>>>> make some sort of smaller version of that to suit our
purpose.
>>>>>
>>>>> I don't think we need to make any kind of
MachineModulePass to make
>>>>> this work. Once we alter the visitation order based on the
CGSCC iteration
>>>>> scheme, we can keep state in-between functions in the
pre-existing hacky
>>>>> way (using static members of the relevant function passes).
>>>>>
>>>>>  Sorry my mistake here by first part I mean 1) requirement
in the
>>>> link time approach.
>>>>
>>>>>
>>>>
>>>>> I also don't see where/why we need a
MachineModule(Pass) for the CGSCC
>>>>> scheme, that said I'd rather avoid using a function
pass with static
>>>>> members, if we can have a ModuleAnalysis that is
bookkeeping the results
>>>>> for functions in the module and queries by the register
allocator somehow.
>>>>> Matthias/Quentin may have other inputs on this aspect.
>>>>>
>>>>
>>> @Hal do you mean to add a simple MachineFunction pass that will
just
>>> operate on register allocated function and prepare a BitVector to
indicate
>>> which register is being used by MachineFunction, and then use this
pass as
>>> analysis pass (i.e just simply return static BitVector for
clobbered
>>> register when register allocation for next function begins. This
part is
>>> not much clear to me) this thing can be done by scheduling a pass
post
>>> register allocation in lib/CodeGen/Passes.cpp
>>>
>>> void TargetPassConfig::addMachinePasses() {
>>> .
>>> .
>>> .
>>>   // Run pre-ra passes.
>>>   addPreRegAlloc();
>>>
>>>   // Run register allocation and passes that are tightly coupled
with
>>> it,
>>>   // including phi elimination and scheduling.
>>>   if (getOptimizeRegAlloc())
>>>     addOptimizedRegAlloc(createRegAllocPass(true));
>>>   else
>>>     addFastRegAlloc(createRegAllocPass(false));
>>>
>>>   // Run post-ra passes.
>>>   addPostRegAlloc();
>>> // Adding a new pass here which keeps register mask information
across
>>> function calls.
>>> .
>>> .
>>> .
>>> }
>>>
>>> But this also requires current register allocators to use this
>>> information in someway because RegMaskBits in
LiveIntervalAnalysis.cpp is
>>> not static across calls. I mean I am not clear for how to propagate
static
>>> info to Intra-procedural Register allocators (if possible without
>>> disturbing their code )
>>>
>>> First, my hope is that we won't need to change the register
allocators,
>>> as such, in order to make use of this information. Instead,
we'll simply be
>>> able to alter the register masks generated for the call
instructions. These
>>> masks will indicate fewer clobbers than might otherwise be present
based on
>>> the ABI because of information gathered during the codegen of the
callee.
>>> These masks are generally constructed by target based on the
calling
>>> convention. The PowerPC backend, for example, looks like this:
>>>
>>>   // Add a register mask operand representing the call-preserved
>>> registers.
>>>   const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo();
>>>   const uint32_t *Mask >>>      
TRI->getCallPreservedMask(DAG.getMachineFunction(), CallConv);
>>>   assert(Mask && "Missing call preserved mask for
calling convention");
>>>   Ops.push_back(DAG.getRegisterMask(Mask));
>>>
>>> but it can be more complicated. If you look for uses of
>>> 'getRegisterMask' in Target/*/*ISelLowering.cpp, you'll
see what I mean.
>>> Regardless, the code ends up calling some method is the targets
>>> TargetRegisterInfo subclass. These methods generally look something
like
>>> this:
>>>
>>> const uint32_t *
>>> PPCRegisterInfo::getCallPreservedMask(const MachineFunction
&MF,
>>>                                       CallingConv::ID CC) const {
>>>   const PPCSubtarget &Subtarget =
MF.getSubtarget<PPCSubtarget>();
>>>   ...
>>>   return TM.isPPC64() ? (Subtarget.hasAltivec() ?
>>> CSR_SVR464_Altivec_RegMask
>>>                                                 :
CSR_SVR464_RegMask)
>>>                       : (Subtarget.hasAltivec() ?
>>> CSR_SVR432_Altivec_RegMask
>>>                                                 :
CSR_SVR432_RegMask);
>>> }
>>>
>>> In any case, the fundamental idea here is that, when someone calls
>>> getCallPreservedMask in order to set the regmask on a call, we
might not
>>> have to use the CC at all. Instead, if we've already codegened
the
>>> function, we might use a cache of 'exact' register masks
computed during
>>> codegen of the potential callees instead.
>>>
>> I am thinking to add a simple Immutable pass MachineRegisterUsageInfo
>> similar to MachineBranchProbabilityInfo that can maintain
>> RegisterUsageInformation per function. Can it be simply done by using
>> UsedPhysRegMask from MachineRegisterInfo ??
>>
>>
>> No, like the comment said, UsedPhysRegMask gives only the registers
>> clobbered by calls:
>> // This bit vector represents all the registers clobbered by function
>> calls.
>>
>> You want to build this information yourself on top of
>> MachineRegisterInfo::isPhysRegModified
>>
> Ok but then the time complexity will be O(n) n = number of physical
> register on the target. Am I going correct?
>
>
> Yes, this is correct.
>
>
>> Here getCallPreservedMask will call API provided by
>> MachineRegisterUsageInfo to avail the exact register mask but how it
can
>> know that the function is already codegen or it will query each time
when
>> getCallPreservedMask is called and of available
MachineRegisterUsageInfo
>> will return the details otherwise simply return NULL.
>> So changes will be now in TargetRegisterInfo implementation for each
>> target right ??
>>
>>
>>> In order to do this, I think we'll need to provide a function
callable
>>> from the target's getCallPreservedMask implementation, which
can return
>>> such an 'exact' regmask when available. I think we need to
do it this way
>>> for two reasons:
>>>
>>>  1. Not all of the target code calls getCallPreservedMask, but
sometimes
>>> calls other similar target-specific functions (e.g.
>>> getTLSCallPreservedMask).
>>>  2. The targets need to opt-in to this behavior because only the
target
>>> can know that all register uses are really tagged correctly post
"pre-emit".
>>>
>>> Because the target is free to introduce uses of registers at
essentially
>>> any time, we need to do the scanning for used registers after the
>>> "pre-emit" passes run. This can be done by scheduling
some simple
>>> register-use scanning pass after the call to addPreEmitPass in
>>> lib/CodeGen/Passes.cpp.
>>>
>>>
>>> I think this also applies in someway to Mehdi Amini's idea to
keep a
>>> ModulePass for bookkeeping but then existing register allocators
will be
>>> required to change so that the code can query the ModulePass for
>>> RegMaskBits for particular function.
>>>
>>> I think that the simplest way to do this is to create an immutable
>>> analysis pass (e.g. BasicAA) that keeps the cache of the computed
register
>>> masks. This is somewhat similar in spirit to how the
'AssumptionCache'
>>> analysis works at the IR level. This analysis can then be created
by
>>> lib/CodeGen/Passes.cpp early, and then queried and passed around
later by
>>> the CodeGen/Target code. Because it is an immutable analysis, it
won't get
>>> destroyed until the very end, which is also important because, I
imagine,
>>> it will need to own the memory associated with the generated
register masks.
>>>
>>>  -Hal
>>>
>>>
>>> Vivek
>>>
>>>
>>>>> Yes for propagating register usage approach we don't
need
>>>> MachineModulePass
>>>>
>>>> Vivek
>>>>
>>>>> --
>>>>> Mehdi
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160524/e339b3f5/attachment-0001.html>

Hal Finkel via llvm-dev

2016-May-24 22:23 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

----- Original Message -----
> From: "vivek pandya" <vivekvpandya at gmail.com>
> To: "Quentin Colombet" <qcolombet at apple.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev"
> <llvm-dev at lists.llvm.org>, "Matthias Braun" <matze at
braunis.de>,
> "Mehdi Amini" <mehdi.amini at apple.com>
> Sent: Tuesday, May 24, 2016 1:00:58 PM
> Subject: Re: [GSoC 2016] Interprocedural Register Allocation -
> Introduction and Feedback
> Hello,
> I have written following code to check each register if it is used by
> machineFunction or not :
> MachineRegisterInfo *MRI = &MF.getRegInfo();
> TargetRegisterInfo *TRI = (TargetRegisterInfo
> *)MF.getSubtarget().getRegisterInfo();Some reason you can't use a const pointer here? 
> const TargetMachine &TM = MF.getTarget();
> const MCRegisterInfo *MCRI = TM.getMCRegisterInfo();
> DEBUG(dbgs() << "Function Name : " << MF.getName()
<< "\n");
> for(TargetRegisterInfo::regclass_iterator i > (*TRI).regclass_begin(), e
= (*TRI).regclass_end(); i != e; i++ ) {
> for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege >
(*i)->end(); pregi != prege; pregi++ ) {
> DEBUG( dbgs() << "Physical Register : " <<
MCRI->getName(*pregi) << "
> is modified "<< MRI->isPhysRegModified(*pregi) <<
" \n");Try isPhysRegUsed. 
> }
> }
> DEBUG(dbgs() << "\n");
> The pass which is executing this code is schedule POST-RA stage but
> this gives me true for all registers i.e in each function all
> registers are being used except EBP and some other similar, Is this
> a correct way to get register usage information ? I think I have
> made some mistake please help.You might look at the implementation of these functions in
lib/CodeGen/MachineRegisterInfo.cpp and figure out if they're returning true
because UsedPhysRegMask.test(PhysReg) is true or because
reg_nodbg_empty(*AliasReg) is true.

-Hal 
> Vivek
> On Wed, May 18, 2016 at 11:42 PM, Quentin Colombet <
> qcolombet at apple.com > wrote:
> > > On May 18, 2016, at 11:00 AM, vivek pandya <
> > > vivekvpandya at gmail.com
> > > >
> > > wrote:
> > 
> 
> > > Vivek Pandya
> > 
> 
> > > On Wed, May 18, 2016 at 11:25 PM, Quentin Colombet <
> > > qcolombet at apple.com > wrote:
> > 
> 
> > > > > On May 18, 2016, at 10:46 AM, vivek pandya <
> > > > > vivekvpandya at gmail.com
> > > > > >
> > > > > wrote:
> > > > 
> > > 
> > 
> 
> > > > > Vivek Pandya
> > > > 
> > > 
> > 
> 
> > > > > On Wed, May 11, 2016 at 4:01 PM, Hal Finkel <
hfinkel at anl.gov
> > > > > >
> > > > > wrote:
> > > > 
> > > 
> > 
> 
> > > > > > > From: "vivek pandya" <
vivekvpandya at gmail.com >
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > To: "Mehdi Amini" < mehdi.amini
at apple.com >
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Cc: "Hal Finkel" < hfinkel at
anl.gov >, "Quentin Colombet"
> > > > > > > <
> > > > > > > qcolombet at apple.com >,
"llvm-dev" <
> > > > > > > llvm-dev at lists.llvm.org
> > > > > > > >,
> > > > > > > "Matthias Braun" < matze at
braunis.de >
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Sent: Wednesday, May 11, 2016 3:15:03 AM
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Subject: Re: [GSoC 2016] Interprocedural
Register
> > > > > > > Allocation
> > > > > > > -
> > > > > > > Introduction and Feedback
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Vivek Pandya
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On Wed, May 11, 2016 at 10:02 AM, vivek
pandya <
> > > > > > > vivekvpandya at gmail.com > wrote:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Vivek Pandya
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > On Wed, May 11, 2016 at 9:43 AM, Mehdi
Amini <
> > > > > > > > mehdi.amini at apple.com
> > > > > > > > > wrote:
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > On May 10, 2016, at 6:06 PM,
Hal Finkel <
> > > > > > > > > > hfinkel at anl.gov
> > > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > From: "vivek
pandya" < vivekvpandya at gmail.com >
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > To: "llvm-dev"
< llvm-dev at lists.llvm.org >, "Tim
> > > > > > > > > > > Amini
> > > > > > > > > > > Golling"
> > > > > > > > > > > <
> > > > > > > > > > > mehdi.amini at apple.com
>, "Hal Finkel" <
> > > > > > > > > > > hfinkel at anl.gov
> > > > > > > > > > > >
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Cc: "Quentin
Colombet" < qcolombet at apple.com >
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Sent: Tuesday, May 10,
2016 2:59:16 PM
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Subject: [GSoC 2016]
Interprocedural Register
> > > > > > > > > > > Allocation
> > > > > > > > > > > -
> > > > > > > > > > > Introduction and Feedback
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Hello LLVM Community,
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Sorry for delay as I was
busy in final exams.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > I am Vivek from India.
Thanks for choosing my
> > > > > > > > > > > proposal
> > > > > > > > > > > for
> > > > > > > > > > > Interprocedural Register
Allocation (IPRA) in
> > > > > > > > > > > LLVM.
> > > > > > > > > > > Mehdi
> > > > > > > > > > > Amini
> > > > > > > > > > > and
> > > > > > > > > > > Hal Finkel will be
mentoring me for this project.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > IPRA can reduce code size
and runtime of programs
> > > > > > > > > > > by
> > > > > > > > > > > allocating
> > > > > > > > > > > register across the
module and procedure
> > > > > > > > > > > boundaries.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > I have identified some
old but effective research
> > > > > > > > > > > work
> > > > > > > > > > > on
> > > > > > > > > > > this
> > > > > > > > > > > area.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > I want community's
feedback for feasibility of
> > > > > > > > > > > these
> > > > > > > > > > > approach
> > > > > > > > > > > and
> > > > > > > > > > > I
> > > > > > > > > > > am targeting to implement
two of them during this
> > > > > > > > > > > project.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Here is list of the
papers, I have read first two
> > > > > > > > > > > papers
> > > > > > > > > > > and
> > > > > > > > > > > I
> > > > > > > > > > > would
> > > > > > > > > > > like to discuss those
approach first, I will read
> > > > > > > > > > > other
> > > > > > > > > > > two
> > > > > > > > > > > paper
> > > > > > > > > > > then initiate discussion
for them as well. All I
> > > > > > > > > > > want
> > > > > > > > > > > is
> > > > > > > > > > > to
> > > > > > > > > > > find
> > > > > > > > > > > out
> > > > > > > > > > > a concrete implementation
plan before 23 May,
> > > > > > > > > > > 2016
> > > > > > > > > > > and
> > > > > > > > > > > for
> > > > > > > > > > > that
> > > > > > > > > > > I
> > > > > > > > > > > need community's
help.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > 1) Compile time -----
Minimizing register usage
> > > > > > > > > > > penalty
> > > > > > > > > > > at
> > > > > > > > > > > procedure
> > > > > > > > > > > calls -
http://dl.acm.org/citation.cfm?id=53999
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > >
====================================================================In
> > > > > > > > > > > this approach
intra-procedural register
> > > > > > > > > > > allocation
> > > > > > > > > > > is
> > > > > > > > > > > used
> > > > > > > > > > > as
> > > > > > > > > > > base
> > > > > > > > > > > but machine code
generation order is bottom up
> > > > > > > > > > > traversal
> > > > > > > > > > > of
> > > > > > > > > > > call
> > > > > > > > > > > graph and
inter-procedural effect is achieved by
> > > > > > > > > > > propagating
> > > > > > > > > > > register usage
information of callee function to
> > > > > > > > > > > caller
> > > > > > > > > > > (i.e
> > > > > > > > > > > child
> > > > > > > > > > > to parent in CallGraph)
so that caller can use
> > > > > > > > > > > different
> > > > > > > > > > > registers
> > > > > > > > > > > than callee and can save
load store cost at
> > > > > > > > > > > procedure
> > > > > > > > > > > call,
> > > > > > > > > > > this
> > > > > > > > > > > is
> > > > > > > > > > > not trivial as it seems
due to recursive calls,
> > > > > > > > > > > library
> > > > > > > > > > > function
> > > > > > > > > > > usage etc. Also for upper
region of the graph in
> > > > > > > > > > > this
> > > > > > > > > > > technique
> > > > > > > > > > > available number of
registers might become zero
> > > > > > > > > > > in
> > > > > > > > > > > that
> > > > > > > > > > > case
> > > > > > > > > > > it
> > > > > > > > > > > should fall back to
normal load store at
> > > > > > > > > > > procedure
> > > > > > > > > > > call.
> > > > > > > > > > > Apart
> > > > > > > > > > > from
> > > > > > > > > > > these difficulties other
difficulties have been
> > > > > > > > > > > identified
> > > > > > > > > > > please
> > > > > > > > > > > follow this mail-chain
> > > > > > > > > > >
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > My mentor has already
provided me a patch that
> > > > > > > > > > > alters
> > > > > > > > > > > code
> > > > > > > > > > > generation
> > > > > > > > > > > order as per bottom up
call graph traversal, I am
> > > > > > > > > > > working
> > > > > > > > > > > from
> > > > > > > > > > > that
> > > > > > > > > > > point now. Any other
help/suggestion is always
> > > > > > > > > > > welcomed.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > 2) Link time ----- Global
register allocation at
> > > > > > > > > > > link
> > > > > > > > > > > time
> > > > > > > > > > > -
> > > > > > > > > > >
http://dl.acm.org/citation.cfm?id=989415
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > >
====================================================================In
> > > > > > > > > > > this particular approach
(sort of true IPRA)
> > > > > > > > > > > registers
> > > > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > reallocated (this
optimization will be optional
> > > > > > > > > > > if
> > > > > > > > > > > turned
> > > > > > > > > > > off
> > > > > > > > > > > still
> > > > > > > > > > > code will be compiled as
per intra-procedural
> > > > > > > > > > > allocation)
> > > > > > > > > > > at
> > > > > > > > > > > link
> > > > > > > > > > > time. Here modules are
first complied as per
> > > > > > > > > > > normal
> > > > > > > > > > > compilation
> > > > > > > > > > > but
> > > > > > > > > > > the object code is
annotated with details so that
> > > > > > > > > > > linker
> > > > > > > > > > > can
> > > > > > > > > > > build
> > > > > > > > > > > call graph and also
calculate usage information
> > > > > > > > > > > at
> > > > > > > > > > > link
> > > > > > > > > > > time.
> > > > > > > > > > > Compiler also write hints
in object code that if
> > > > > > > > > > > particular
> > > > > > > > > > > variable
> > > > > > > > > > > is allocated in some
other register ( due to new
> > > > > > > > > > > allocation)
> > > > > > > > > > > then
> > > > > > > > > > > how the code should be
changed? Thus linker can
> > > > > > > > > > > use
> > > > > > > > > > > these
> > > > > > > > > > > information to decide
which variables (global)
> > > > > > > > > > > need
> > > > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > in
> > > > > > > > > > > same
> > > > > > > > > > > register through out the
program execution and
> > > > > > > > > > > also
> > > > > > > > > > > according
> > > > > > > > > > > to
> > > > > > > > > > > register usage
information in call graph which
> > > > > > > > > > > procedure
> > > > > > > > > > > will
> > > > > > > > > > > not
> > > > > > > > > > > be
> > > > > > > > > > > active simultaneously so
that locals for that
> > > > > > > > > > > procedures
> > > > > > > > > > > can
> > > > > > > > > > > be
> > > > > > > > > > > in
> > > > > > > > > > > same registers with out
load store at procedure
> > > > > > > > > > > calls.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > For these particular
method help me to analyze
> > > > > > > > > > > feasibility:
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > 1) Can llvm collects
following information at
> > > > > > > > > > > module
> > > > > > > > > > > level
> > > > > > > > > > > in
> > > > > > > > > > > MachineIR? list of
procedures in module, list of
> > > > > > > > > > > locals
> > > > > > > > > > > in
> > > > > > > > > > > procedures, list of
procedures that a particular
> > > > > > > > > > > procedure
> > > > > > > > > > > can
> > > > > > > > > > > call,
> > > > > > > > > > > and a list of the
variables this procedure
> > > > > > > > > > > references.
> > > > > > > > > > > Each
> > > > > > > > > > > entry
> > > > > > > > > > > in
> > > > > > > > > > > the last two lists
includes an estimate of the
> > > > > > > > > > > number
> > > > > > > > > > > of
> > > > > > > > > > > times
> > > > > > > > > > > the
> > > > > > > > > > > procedure is called or
the variable is referenced
> > > > > > > > > > > in
> > > > > > > > > > > each
> > > > > > > > > > > execution
> > > > > > > > > > > of this procedure
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > 2) Can llvm write
informative commands to object
> > > > > > > > > > > files?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > 3) Can LTO is capable of
leveraging those
> > > > > > > > > > > commands?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > In terms of scoping the
project for the summer, I
> > > > > > > > > > definitely
> > > > > > > > > > recommend that you focus on
(1) first. If you
> > > > > > > > > > finish
> > > > > > > > > > that,
> > > > > > > > > > we
> > > > > > > > > > can
> > > > > > > > > > certainly move on to other
things.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > I'll add +1 here, but I already
wrote the same thing
> > > > > > > > > on
> > > > > > > > > IRC
> > > > > > > > > when
> > > > > > > > > discussing with Vivek. True IPRA
without a proper
> > > > > > > > > MachineModule
> > > > > > > > > infrastructure won't be doable
in my opinion (even
> > > > > > > > > with
> > > > > > > > > such
> > > > > > > > > infrastructure, it may not be
trivial in LLVM in
> > > > > > > > > general).
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Regarding link time, note that
any such a design
> > > > > > > > > > would
> > > > > > > > > > likely
> > > > > > > > > > look
> > > > > > > > > > much different than in David
Wall's paper however,
> > > > > > > > > > because
> > > > > > > > > > our
> > > > > > > > > > LTO
> > > > > > > > > > re-codegens everything anyway.
The paper says,
> > > > > > > > > > "Finally,
> > > > > > > > > > it
> > > > > > > > > > keeps
> > > > > > > > > > us
> > > > > > > > > > honest as designers of the
system; once we postpone
> > > > > > > > > > anything
> > > > > > > > > > until
> > > > > > > > > > link time, the temptation is
great to postpone
> > > > > > > > > > everything,
> > > > > > > > > > ..."
> > > > > > > > > > -
> > > > > > > > > > Well, we've long-since
succumb to that temptation
> > > > > > > > > > when
> > > > > > > > > > we
> > > > > > > > > > LTO.
> > > > > > > > > > C'est
> > > > > > > > > > la vie.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > +1 as well, our LTO will benefit
naturally from the
> > > > > > > > > leaf-to-root
> > > > > > > > > information propagation. ThinLTO
will be more
> > > > > > > > > challenging/interesting though!
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > For the first part a
mechanism similar to
> > > > > > > > > > > MachineModulePass
> > > > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > desirable but that may
not be possible during
> > > > > > > > > > > this
> > > > > > > > > > > project,
> > > > > > > > > > > but
> > > > > > > > > > > if
> > > > > > > > > > > we can make some sort of
smaller version of that
> > > > > > > > > > > to
> > > > > > > > > > > suit
> > > > > > > > > > > our
> > > > > > > > > > > purpose.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > I don't think we need to
make any kind of
> > > > > > > > > > MachineModulePass
> > > > > > > > > > to
> > > > > > > > > > make
> > > > > > > > > > this work. Once we alter the
visitation order based
> > > > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > CGSCC
> > > > > > > > > > iteration scheme, we can keep
state in-between
> > > > > > > > > > functions
> > > > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > pre-existing hacky way (using
static members of the
> > > > > > > > > > relevant
> > > > > > > > > > function passes).
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Sorry my mistake here by first part I
mean 1)
> > > > > > > > requirement
> > > > > > > > in
> > > > > > > > the
> > > > > > > > link
> > > > > > > > time approach.
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > I also don't see where/why we
need a
> > > > > > > > > MachineModule(Pass)
> > > > > > > > > for
> > > > > > > > > the
> > > > > > > > > CGSCC scheme, that said I'd
rather avoid using a
> > > > > > > > > function
> > > > > > > > > pass
> > > > > > > > > with
> > > > > > > > > static members, if we can have a
ModuleAnalysis that
> > > > > > > > > is
> > > > > > > > > bookkeeping
> > > > > > > > > the results for functions in the
module and queries
> > > > > > > > > by
> > > > > > > > > the
> > > > > > > > > register
> > > > > > > > > allocator somehow.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Matthias/Quentin may have other
inputs on this
> > > > > > > > > aspect.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > @Hal do you mean to add a simple
MachineFunction pass
> > > > > > > that
> > > > > > > will
> > > > > > > just
> > > > > > > operate on register allocated function and
prepare a
> > > > > > > BitVector
> > > > > > > to
> > > > > > > indicate which register is being used by
MachineFunction,
> > > > > > > and
> > > > > > > then
> > > > > > > use this pass as analysis pass (i.e just
simply return
> > > > > > > static
> > > > > > > BitVector for clobbered register when
register allocation
> > > > > > > for
> > > > > > > next
> > > > > > > function begins. This part is not much clear
to me) this
> > > > > > > thing
> > > > > > > can
> > > > > > > be done by scheduling a pass post register
allocation in
> > > > > > > lib/CodeGen/Passes.cpp
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > void TargetPassConfig::addMachinePasses() {
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > .
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > .
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > .
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > // Run pre-ra passes.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > addPreRegAlloc();
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > // Run register allocation and passes that
are tightly
> > > > > > > coupled
> > > > > > > with
> > > > > > > it,
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > // including phi elimination and scheduling.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > if (getOptimizeRegAlloc())
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
addOptimizedRegAlloc(createRegAllocPass(true));
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > else
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > addFastRegAlloc(createRegAllocPass(false));
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > // Run post-ra passes.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > addPostRegAlloc();
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > // Adding a new pass here which keeps
register mask
> > > > > > > information
> > > > > > > across function calls.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > .
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > .
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > .
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > }
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > But this also requires current register
allocators to use
> > > > > > > this
> > > > > > > information in someway because RegMaskBits in
> > > > > > > LiveIntervalAnalysis.cpp is not static across
calls. I
> > > > > > > mean
> > > > > > > I
> > > > > > > am
> > > > > > > not
> > > > > > > clear for how to propagate static info to
> > > > > > > Intra-procedural
> > > > > > > Register
> > > > > > > allocators (if possible without disturbing
their code )
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > First, my hope is that we won't need to change
the register
> > > > > > allocators, as such, in order to make use of this
> > > > > > information.
> > > > > > Instead, we'll simply be able to alter the
register masks
> > > > > > generated
> > > > > > for the call instructions. These masks will
indicate fewer
> > > > > > clobbers
> > > > > > than might otherwise be present based on the ABI
because of
> > > > > > information gathered during the codegen of the
callee.
> > > > > > These
> > > > > > masks
> > > > > > are generally constructed by target based on the
calling
> > > > > > convention.
> > > > > > The PowerPC backend, for example, looks like this:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > // Add a register mask operand representing the
> > > > > > call-preserved
> > > > > > registers.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > const TargetRegisterInfo *TRI > > > >
> > Subtarget.getRegisterInfo();
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > const uint32_t *Mask > > > > > 
> > > > 
> > > 
> > 
> 
> > > > > >
TRI->getCallPreservedMask(DAG.getMachineFunction(),
> > > > > > CallConv);
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > assert(Mask && "Missing call
preserved mask for calling
> > > > > > convention");
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Ops.push_back(DAG.getRegisterMask(Mask));
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > but it can be more complicated. If you look for
uses of
> > > > > > 'getRegisterMask' in
Target/*/*ISelLowering.cpp, you'll see
> > > > > > what
> > > > > > I
> > > > > > mean. Regardless, the code ends up calling some
method is
> > > > > > the
> > > > > > targets TargetRegisterInfo subclass. These methods
> > > > > > generally
> > > > > > look
> > > > > > something like this:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > const uint32_t *
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > PPCRegisterInfo::getCallPreservedMask(const
MachineFunction
> > > > > > &MF,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > CallingConv::ID CC) const {
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > const PPCSubtarget &Subtarget > > >
> > > MF.getSubtarget<PPCSubtarget>();
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > ...
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > return TM.isPPC64() ? (Subtarget.hasAltivec() ?
> > > > > > CSR_SVR464_Altivec_RegMask
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > : CSR_SVR464_RegMask)
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > : (Subtarget.hasAltivec() ?
CSR_SVR432_Altivec_RegMask
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > : CSR_SVR432_RegMask);
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > }
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > In any case, the fundamental idea here is that,
when
> > > > > > someone
> > > > > > calls
> > > > > > getCallPreservedMask in order to set the regmask
on a call,
> > > > > > we
> > > > > > might
> > > > > > not have to use the CC at all. Instead, if
we've already
> > > > > > codegened
> > > > > > the function, we might use a cache of
'exact' register
> > > > > > masks
> > > > > > computed during codegen of the potential callees
instead.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > I am thinking to add a simple Immutable pass
> > > > > MachineRegisterUsageInfo
> > > > > similar to MachineBranchProbabilityInfo that can
maintain
> > > > > RegisterUsageInformation per function. Can it be simply
done
> > > > > by
> > > > > using UsedPhysRegMask from MachineRegisterInfo ??
> > > > 
> > > 
> > 
> 
> > > > No, like the comment said, UsedPhysRegMask gives only the
> > > > registers
> > > > clobbered by calls:
> > > 
> > 
> 
> > > > // This bit vector represents all the registers clobbered by
> > > > function
> > > > calls.
> > > 
> > 
> 
> > > > You want to build this information yourself on top of
> > > > MachineRegisterInfo:: isPhysRegModified
> > > 
> > 
> 
> > > Ok but then the time complexity will be O(n) n = number of
> > > physical
> > > register on the target. Am I going correct?
> > 
> 
> > Yes, this is correct.
> 
> > > > > Here getCallPreservedMask will call API provided by
> > > > > MachineRegisterUsageInfo to avail the exact register
mask but
> > > > > how
> > > > > it
> > > > > can know that the function is already codegen or it
will
> > > > > query
> > > > > each
> > > > > time when getCallPreservedMask is called and of
available
> > > > > MachineRegisterUsageInfo will return the details
otherwise
> > > > > simply
> > > > > return NULL.
> > > > 
> > > 
> > 
> 
> > > > > So changes will be now in TargetRegisterInfo
implementation
> > > > > for
> > > > > each
> > > > > target right ??
> > > > 
> > > 
> > 
> 
> > > > > > In order to do this, I think we'll need to
provide a
> > > > > > function
> > > > > > callable from the target's
getCallPreservedMask
> > > > > > implementation,
> > > > > > which can return such an 'exact' regmask
when available. I
> > > > > > think
> > > > > > we
> > > > > > need to do it this way for two reasons:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 1. Not all of the target code calls
getCallPreservedMask,
> > > > > > but
> > > > > > sometimes calls other similar target-specific
functions
> > > > > > (e.g.
> > > > > > getTLSCallPreservedMask).
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 2. The targets need to opt-in to this behavior
because only
> > > > > > the
> > > > > > target can know that all register uses are really
tagged
> > > > > > correctly
> > > > > > post "pre-emit".
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Because the target is free to introduce uses of
registers
> > > > > > at
> > > > > > essentially any time, we need to do the scanning
for used
> > > > > > registers
> > > > > > after the "pre-emit" passes run. This
can be done by
> > > > > > scheduling
> > > > > > some
> > > > > > simple register-use scanning pass after the call
to
> > > > > > addPreEmitPass
> > > > > > in lib/CodeGen/Passes.cpp.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > I think this also applies in someway to Mehdi
Amini's
> > > > > > > idea
> > > > > > > to
> > > > > > > keep
> > > > > > > a
> > > > > > > ModulePass for bookkeeping but then existing
register
> > > > > > > allocators
> > > > > > > will be required to change so that the code
can query the
> > > > > > > ModulePass
> > > > > > > for RegMaskBits for particular function.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > I think that the simplest way to do this is to
create an
> > > > > > immutable
> > > > > > analysis pass (e.g. BasicAA) that keeps the cache
of the
> > > > > > computed
> > > > > > register masks. This is somewhat similar in spirit
to how
> > > > > > the
> > > > > > 'AssumptionCache' analysis works at the IR
level. This
> > > > > > analysis
> > > > > > can
> > > > > > then be created by lib/CodeGen/Passes.cpp early,
and then
> > > > > > queried
> > > > > > and passed around later by the CodeGen/Target
code. Because
> > > > > > it
> > > > > > is
> > > > > > an
> > > > > > immutable analysis, it won't get destroyed
until the very
> > > > > > end,
> > > > > > which
> > > > > > is also important because, I imagine, it will need
to own
> > > > > > the
> > > > > > memory
> > > > > > associated with the generated register masks.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > -Hal
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Vivek
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Yes for propagating register usage
approach we don't
> > > > > > > > need
> > > > > > > > MachineModulePass
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Vivek
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > --
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Mehdi
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > --
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Hal Finkel
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Assistant Computational Scientist
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Leadership Computing Facility
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Argonne National Laboratory
> > > > > 
> > > > 
> > > 
> > 
> -- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160524/4e184874/attachment-0001.html>

vivek pandya via llvm-dev

2016-May-25 02:34 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

On Wed, May 25, 2016 at 3:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> ------------------------------
>
> *From: *"vivek pandya" <vivekvpandya at gmail.com>
> *To: *"Quentin Colombet" <qcolombet at apple.com>
> *Cc: *"Hal Finkel" <hfinkel at anl.gov>,
"llvm-dev" <llvm-dev at lists.llvm.org>,
> "Matthias Braun" <matze at braunis.de>, "Mehdi
Amini" <mehdi.amini at apple.com>
> *Sent: *Tuesday, May 24, 2016 1:00:58 PM
> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
> Introduction and Feedback
>
> Hello,
>
> I have written following code to check each register if it is used by
> machineFunction or not :
>
> MachineRegisterInfo *MRI = &MF.getRegInfo();
> TargetRegisterInfo *TRI = (TargetRegisterInfo
> *)MF.getSubtarget().getRegisterInfo();
>
> Some reason you can't use a const pointer here?
>MCRegisterInfo is just used to get conventional name of register for given
target like AX, BX on X86.
>
> const TargetMachine &TM = MF.getTarget();
> const MCRegisterInfo *MCRI = TM.getMCRegisterInfo();
> DEBUG(dbgs() << "Function Name : " << MF.getName()
<< "\n");
>
> for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e
> (*TRI).regclass_end(); i != e; i++ ) {
> for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege >
(*i)->end(); pregi != prege; pregi++ ) {
> DEBUG( dbgs() << "Physical Register : " <<
MCRI->getName(*pregi) << " is
> modified "<< MRI->isPhysRegModified(*pregi) << "
\n");
>
> Try isPhysRegUsed.
>ok
>
> }
> }
> DEBUG(dbgs() << "\n");
>
> The pass which is executing this code is schedule POST-RA stage but this
> gives me true for all registers i.e in each function all registers are
> being used except EBP and some other similar, Is this a correct way to get
> register usage information ? I think I have made some mistake please help.
>
>
> You might look at the implementation of these functions in
> lib/CodeGen/MachineRegisterInfo.cpp and figure out if they're returning
> true because UsedPhysRegMask.test(PhysReg) is true or because
> reg_nodbg_empty(*AliasReg) is true.
>Yes that helped now I am getting actual register which have been used by
given function, but a little problem
The updated code is as shown below :
for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e
(*TRI).regclass_end(); i != e; i++ ) {
for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege
(*i)->end(); pregi != prege; pregi++ ) {
for (MCRegAliasIterator AliasReg(*pregi, TRI, true); AliasReg.isValid();
++AliasReg) {
   if (!MRI->reg_nodbg_empty(*AliasReg)) {
    DEBUG( dbgs() << "Physical Register : " <<
MCRI->getName(*pregi) << "
is used "<< MRI->isPhysRegUsed(*pregi) << " \n");
    break; // no need to process more alias
   }
  }
}
}
But here some registers are getting processed with in different classes
(unnecessary processing) Is this only way to iterate through all used
register (using RegClass iterator) ? Is there any way to avoid duplicate
regs?
Of course currently I am just printing but next I am thinking to use a map
to track usage info , in that only distinct register info will be stored
but still due to loop structure I need to iterate through a single register
3 - 4 times making it time consuming.

-Vivek
>
>
>  -Hal
>
>
>
> Vivek
>
> On Wed, May 18, 2016 at 11:42 PM, Quentin Colombet <qcolombet at
apple.com>
> wrote:
>
>>
>> On May 18, 2016, at 11:00 AM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>
>>
>> *Vivek Pandya*
>>
>>
>> On Wed, May 18, 2016 at 11:25 PM, Quentin Colombet <qcolombet at
apple.com>
>> wrote:
>>
>>>
>>> On May 18, 2016, at 10:46 AM, vivek pandya <vivekvpandya at
gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> *Vivek Pandya*
>>>
>>>
>>> On Wed, May 11, 2016 at 4:01 PM, Hal Finkel <hfinkel at
anl.gov> wrote:
>>>
>>>>
>>>> ------------------------------
>>>>
>>>> *From: *"vivek pandya" <vivekvpandya at
gmail.com>
>>>> *To: *"Mehdi Amini" <mehdi.amini at apple.com>
>>>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>,
"Quentin Colombet" <
>>>> qcolombet at apple.com>, "llvm-dev" <llvm-dev
at lists.llvm.org>, "Matthias
>>>> Braun" <matze at braunis.de>
>>>> *Sent: *Wednesday, May 11, 2016 3:15:03 AM
>>>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation
-
>>>> Introduction and Feedback
>>>>
>>>>
>>>>
>>>>
>>>> *Vivek Pandya*
>>>>
>>>>
>>>> On Wed, May 11, 2016 at 10:02 AM, vivek pandya <vivekvpandya
at gmail.com>
>>>>  wrote:
>>>>
>>>>>
>>>>>
>>>>> *Vivek Pandya*
>>>>>
>>>>>
>>>>> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini
<mehdi.amini at apple.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at
anl.gov> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>> *From: *"vivek pandya" <vivekvpandya at
gmail.com>
>>>>>> *To: *"llvm-dev" <llvm-dev at
lists.llvm.org>, "Tim Amini Golling" <
>>>>>> mehdi.amini at apple.com>, "Hal Finkel"
<hfinkel at anl.gov>
>>>>>> *Cc: *"Quentin Colombet" <qcolombet at
apple.com>
>>>>>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
>>>>>> *Subject: *[GSoC 2016] Interprocedural Register
Allocation -
>>>>>> Introduction and Feedback
>>>>>>
>>>>>> Hello LLVM Community,
>>>>>>
>>>>>> Sorry for delay as I was busy in final exams.
>>>>>>
>>>>>> I am Vivek from India. Thanks for choosing my proposal
for
>>>>>> Interprocedural Register Allocation (IPRA) in LLVM.
Mehdi Amini and Hal
>>>>>> Finkel will be mentoring me for this project.
>>>>>>
>>>>>> IPRA can reduce code size and runtime of programs by
allocating
>>>>>> register across the module and procedure boundaries.
>>>>>>
>>>>>> I have identified some old but effective research work
on this area.
>>>>>> I want community's feedback for feasibility of
these approach and I
>>>>>> am targeting to implement two of them during this
project.
>>>>>>
>>>>>> Here is list of the papers, I have read first two
papers and I would
>>>>>> like to discuss those approach first, I will read other
two paper then
>>>>>> initiate discussion for them as well. All I want is to
find out a concrete
>>>>>> implementation plan before 23 May, 2016 and for that I
need community's
>>>>>> help.
>>>>>>
>>>>>> 1) Compile time ----- Minimizing register usage penalty
at procedure
>>>>>> calls - http://dl.acm.org/citation.cfm?id=53999
>>>>>>
====================================================================In
>>>>>> this approach intra-procedural register allocation is
used as base but
>>>>>> machine code generation order is bottom up traversal of
call graph and
>>>>>> inter-procedural effect is achieved by propagating
register usage
>>>>>> information of callee function to caller (i.e child to
parent in CallGraph)
>>>>>> so that caller can use different registers than callee
and can save load
>>>>>> store cost at procedure call, this is not trivial as it
seems due to
>>>>>> recursive calls, library function usage etc. Also for
upper region of the
>>>>>> graph in this technique available number of registers
might become zero in
>>>>>> that case it should fall back to normal load store at
procedure call. Apart
>>>>>> from these difficulties other difficulties have been
identified please
>>>>>> follow this mail-chain
>>>>>>
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>>>>>> My mentor has already provided me a patch that alters
code generation
>>>>>> order as per bottom up call graph traversal, I am
working from that point
>>>>>> now. Any other help/suggestion is always welcomed.
>>>>>>
>>>>>> 2) Link time ----- Global register allocation at link
time -
>>>>>> http://dl.acm.org/citation.cfm?id=989415
>>>>>>
====================================================================In
>>>>>> this particular approach (sort of true IPRA) registers
will be reallocated
>>>>>> (this optimization will be optional if turned off still
code will be
>>>>>> compiled as per intra-procedural allocation) at link
time. Here modules are
>>>>>> first complied as per normal compilation but the object
code is annotated
>>>>>> with details so that linker can build call graph and
also calculate usage
>>>>>> information at link time. Compiler also write hints in
object code that if
>>>>>> particular variable is allocated in some other register
( due to new
>>>>>> allocation) then how the code should be changed? Thus
linker can use these
>>>>>> information to decide which variables (global) need to
be in same register
>>>>>> through out the program execution and also according to
register usage
>>>>>> information in call graph which procedure will not be
active simultaneously
>>>>>> so that locals for that procedures can be in same
registers with out load
>>>>>> store at procedure calls.
>>>>>> For these particular method help me to analyze
feasibility:
>>>>>> 1) Can llvm collects following information at module
level in
>>>>>> MachineIR? list of procedures in module, list of locals
in procedures, list
>>>>>> of procedures that a particular procedure can call, and
a list of the
>>>>>> variables this procedure references. Each entry in the
last two lists
>>>>>> includes an estimate of the number of times the
procedure is called or the
>>>>>> variable is referenced in each execution of this
procedure
>>>>>> 2) Can llvm write informative commands to object files?
>>>>>> 3) Can LTO is capable of leveraging those commands?
>>>>>>
>>>>>> In terms of scoping the project for the summer, I
definitely
>>>>>> recommend that you focus on (1) first. If you finish
that, we can certainly
>>>>>> move on to other things.
>>>>>>
>>>>>>
>>>>>> I'll add +1 here, but I already wrote the same
thing on IRC when
>>>>>> discussing with Vivek. True IPRA without a proper
MachineModule
>>>>>> infrastructure won't be doable in my opinion (even
with such
>>>>>> infrastructure, it may not be trivial in LLVM in
general).
>>>>>>
>>>>>> Regarding link time, note that any such a design would
likely look
>>>>>> much different than in David Wall's paper however,
because our LTO
>>>>>> re-codegens everything anyway. The paper says,
"Finally, it keeps us honest
>>>>>> as designers of the system; once we postpone anything
until link time, the
>>>>>> temptation is great to postpone everything, ..." -
Well, we've long-since
>>>>>> succumb to that temptation when we LTO. C'est la
vie.
>>>>>>
>>>>>>
>>>>>> +1 as well, our LTO will benefit naturally from the
leaf-to-root
>>>>>> information propagation. ThinLTO will be more
challenging/interesting
>>>>>> though!
>>>>>>
>>>>>> For the first part a mechanism similar to
MachineModulePass would be
>>>>>> desirable but that may not be possible during this
project, but if we can
>>>>>> make some sort of smaller version of that to suit our
purpose.
>>>>>>
>>>>>> I don't think we need to make any kind of
MachineModulePass to make
>>>>>> this work. Once we alter the visitation order based on
the CGSCC iteration
>>>>>> scheme, we can keep state in-between functions in the
pre-existing hacky
>>>>>> way (using static members of the relevant function
passes).
>>>>>>
>>>>>>  Sorry my mistake here by first part I mean 1)
requirement in the
>>>>> link time approach.
>>>>>
>>>>>>
>>>>>
>>>>>> I also don't see where/why we need a
MachineModule(Pass) for the
>>>>>> CGSCC scheme, that said I'd rather avoid using a
function pass with static
>>>>>> members, if we can have a ModuleAnalysis that is
bookkeeping the results
>>>>>> for functions in the module and queries by the register
allocator somehow.
>>>>>> Matthias/Quentin may have other inputs on this aspect.
>>>>>>
>>>>>
>>>> @Hal do you mean to add a simple MachineFunction pass that will
just
>>>> operate on register allocated function and prepare a BitVector
to indicate
>>>> which register is being used by MachineFunction, and then use
this pass as
>>>> analysis pass (i.e just simply return static BitVector for
clobbered
>>>> register when register allocation for next function begins.
This part is
>>>> not much clear to me) this thing can be done by scheduling a
pass post
>>>> register allocation in lib/CodeGen/Passes.cpp
>>>>
>>>> void TargetPassConfig::addMachinePasses() {
>>>> .
>>>> .
>>>> .
>>>>   // Run pre-ra passes.
>>>>   addPreRegAlloc();
>>>>
>>>>   // Run register allocation and passes that are tightly
coupled with
>>>> it,
>>>>   // including phi elimination and scheduling.
>>>>   if (getOptimizeRegAlloc())
>>>>     addOptimizedRegAlloc(createRegAllocPass(true));
>>>>   else
>>>>     addFastRegAlloc(createRegAllocPass(false));
>>>>
>>>>   // Run post-ra passes.
>>>>   addPostRegAlloc();
>>>> // Adding a new pass here which keeps register mask information
across
>>>> function calls.
>>>> .
>>>> .
>>>> .
>>>> }
>>>>
>>>> But this also requires current register allocators to use this
>>>> information in someway because RegMaskBits in
LiveIntervalAnalysis.cpp is
>>>> not static across calls. I mean I am not clear for how to
propagate static
>>>> info to Intra-procedural Register allocators (if possible
without
>>>> disturbing their code )
>>>>
>>>> First, my hope is that we won't need to change the register
allocators,
>>>> as such, in order to make use of this information. Instead,
we'll simply be
>>>> able to alter the register masks generated for the call
instructions. These
>>>> masks will indicate fewer clobbers than might otherwise be
present based on
>>>> the ABI because of information gathered during the codegen of
the callee.
>>>> These masks are generally constructed by target based on the
calling
>>>> convention. The PowerPC backend, for example, looks like this:
>>>>
>>>>   // Add a register mask operand representing the
call-preserved
>>>> registers.
>>>>   const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo();
>>>>   const uint32_t *Mask >>>>      
TRI->getCallPreservedMask(DAG.getMachineFunction(), CallConv);
>>>>   assert(Mask && "Missing call preserved mask for
calling convention");
>>>>   Ops.push_back(DAG.getRegisterMask(Mask));
>>>>
>>>> but it can be more complicated. If you look for uses of
>>>> 'getRegisterMask' in Target/*/*ISelLowering.cpp,
you'll see what I mean.
>>>> Regardless, the code ends up calling some method is the targets
>>>> TargetRegisterInfo subclass. These methods generally look
something like
>>>> this:
>>>>
>>>> const uint32_t *
>>>> PPCRegisterInfo::getCallPreservedMask(const MachineFunction
&MF,
>>>>                                       CallingConv::ID CC) const
{
>>>>   const PPCSubtarget &Subtarget =
MF.getSubtarget<PPCSubtarget>();
>>>>   ...
>>>>   return TM.isPPC64() ? (Subtarget.hasAltivec() ?
>>>> CSR_SVR464_Altivec_RegMask
>>>>                                                 :
CSR_SVR464_RegMask)
>>>>                       : (Subtarget.hasAltivec() ?
>>>> CSR_SVR432_Altivec_RegMask
>>>>                                                 :
CSR_SVR432_RegMask);
>>>> }
>>>>
>>>> In any case, the fundamental idea here is that, when someone
calls
>>>> getCallPreservedMask in order to set the regmask on a call, we
might not
>>>> have to use the CC at all. Instead, if we've already
codegened the
>>>> function, we might use a cache of 'exact' register
masks computed during
>>>> codegen of the potential callees instead.
>>>>
>>> I am thinking to add a simple Immutable pass
MachineRegisterUsageInfo
>>> similar to MachineBranchProbabilityInfo that can maintain
>>> RegisterUsageInformation per function. Can it be simply done by
using
>>> UsedPhysRegMask from MachineRegisterInfo ??
>>>
>>>
>>> No, like the comment said, UsedPhysRegMask gives only the registers
>>> clobbered by calls:
>>> // This bit vector represents all the registers clobbered by
function
>>> calls.
>>>
>>> You want to build this information yourself on top of
>>> MachineRegisterInfo::isPhysRegModified
>>>
>> Ok but then the time complexity will be O(n) n = number of physical
>> register on the target. Am I going correct?
>>
>>
>> Yes, this is correct.
>>
>>
>>> Here getCallPreservedMask will call API provided by
>>> MachineRegisterUsageInfo to avail the exact register mask but how
it can
>>> know that the function is already codegen or it will query each
time when
>>> getCallPreservedMask is called and of available
MachineRegisterUsageInfo
>>> will return the details otherwise simply return NULL.
>>> So changes will be now in TargetRegisterInfo implementation for
each
>>> target right ??
>>>
>>>
>>>> In order to do this, I think we'll need to provide a
function callable
>>>> from the target's getCallPreservedMask implementation,
which can return
>>>> such an 'exact' regmask when available. I think we need
to do it this way
>>>> for two reasons:
>>>>
>>>>  1. Not all of the target code calls getCallPreservedMask, but
>>>> sometimes calls other similar target-specific functions (e.g.
>>>> getTLSCallPreservedMask).
>>>>  2. The targets need to opt-in to this behavior because only
the target
>>>> can know that all register uses are really tagged correctly
post "pre-emit".
>>>>
>>>> Because the target is free to introduce uses of registers at
>>>> essentially any time, we need to do the scanning for used
registers after
>>>> the "pre-emit" passes run. This can be done by
scheduling some simple
>>>> register-use scanning pass after the call to addPreEmitPass in
>>>> lib/CodeGen/Passes.cpp.
>>>>
>>>>
>>>> I think this also applies in someway to Mehdi Amini's idea
to keep a
>>>> ModulePass for bookkeeping but then existing register
allocators will be
>>>> required to change so that the code can query the ModulePass
for
>>>> RegMaskBits for particular function.
>>>>
>>>> I think that the simplest way to do this is to create an
immutable
>>>> analysis pass (e.g. BasicAA) that keeps the cache of the
computed register
>>>> masks. This is somewhat similar in spirit to how the
'AssumptionCache'
>>>> analysis works at the IR level. This analysis can then be
created by
>>>> lib/CodeGen/Passes.cpp early, and then queried and passed
around later by
>>>> the CodeGen/Target code. Because it is an immutable analysis,
it won't get
>>>> destroyed until the very end, which is also important because,
I imagine,
>>>> it will need to own the memory associated with the generated
register masks.
>>>>
>>>>  -Hal
>>>>
>>>>
>>>> Vivek
>>>>
>>>>
>>>>>> Yes for propagating register usage approach we
don't need
>>>>> MachineModulePass
>>>>>
>>>>> Vivek
>>>>>
>>>>>> --
>>>>>> Mehdi
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>>
>>>
>>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160525/2e928ec0/attachment-0001.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - May 2016 - [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Reasonably Related Threads