thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] Interprocedural Register Allocation

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-May-11 04:13 UTC

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> 
> 
> From: "vivek pandya" <vivekvpandya at gmail.com>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>, "Tim
Amini Golling" <mehdi.amini at apple.com>, "Hal Finkel"
<hfinkel at anl.gov>
> Cc: "Quentin Colombet" <qcolombet at apple.com>
> Sent: Tuesday, May 10, 2016 2:59:16 PM
> Subject: [GSoC 2016] Interprocedural Register Allocation - Introduction and
Feedback
> 
> Hello LLVM Community,
> 
> Sorry for delay as I was busy in final exams.
> 
> I am Vivek from India. Thanks for choosing my proposal for Interprocedural
Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal Finkel will be mentoring
me for this project.
> 
> IPRA can reduce code size and runtime of programs by allocating register
across the module and procedure boundaries.
> 
> I have identified some old but effective research work on this area.
> I want community's feedback for feasibility of these approach and I am
targeting to implement two of them during this project.
> 
> Here is list of the papers, I have read first two papers and I would like
to discuss those approach first, I will read other two paper then initiate
discussion for them as well. All I want is to find out a concrete implementation
plan before 23 May, 2016 and for that I need community's help.
> 
> 1) Compile time ----- Minimizing register usage penalty at procedure calls
- http://dl.acm.org/citation.cfm?id=53999
<http://dl.acm.org/citation.cfm?id=53999>
> ====================================================================In this
approach intra-procedural register allocation is used as base but machine code
generation order is bottom up traversal of call graph and inter-procedural
effect is achieved by propagating register usage information of callee function
to caller (i.e child to parent in CallGraph) so that caller can use different
registers than callee and can save load store cost at procedure call, this is
not trivial as it seems due to recursive calls, library function usage etc. Also
for upper region of the graph in this technique available number of registers
might become zero in that case it should fall back to normal load store at
procedure call. Apart from these difficulties other difficulties have been
identified please follow this mail-chain
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
<https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion>
> My mentor has already provided me a patch that alters code generation order
as per bottom up call graph traversal, I am working from that point now. Any
other help/suggestion is always welcomed.
> 
> 2) Link time ----- Global register allocation at link time -
http://dl.acm.org/citation.cfm?id=989415
<http://dl.acm.org/citation.cfm?id=989415>
> ====================================================================In this
particular approach (sort of true IPRA) registers will be reallocated (this
optimization will be optional if turned off still code will be compiled as per
intra-procedural allocation) at link time. Here modules are first complied as
per normal compilation but the object code is annotated with details so that
linker can build call graph and also calculate usage information at link time.
Compiler also write hints in object code that if particular variable is
allocated in some other register ( due to new allocation) then how the code
should be changed? Thus linker can use these information to decide which
variables (global) need to be in same register through out the program execution
and also according to register usage information in call graph which procedure
will not be active simultaneously so that locals for that procedures can be in
same registers with out load store at procedure calls.
> For these particular method help me to analyze feasibility: 
> 1) Can llvm collects following information at module level in MachineIR?
list of procedures in module, list of locals in procedures, list of procedures
that a particular procedure can call, and a list of the variables this procedure
references. Each entry in the last two lists includes an estimate of the number
of times the procedure is called or the variable is referenced in each execution
of this procedure
> 2) Can llvm write informative commands to object files?
> 3) Can LTO is capable of leveraging those commands? 
> In terms of scoping the project for the summer, I definitely recommend that
you focus on (1) first. If you finish that, we can certainly move on to other
things.
I'll add +1 here, but I already wrote the same thing on IRC when discussing
with Vivek. True IPRA without a proper MachineModule infrastructure won't be
doable in my opinion (even with such infrastructure, it may not be trivial in
LLVM in general).
> Regarding link time, note that any such a design would likely look much
different than in David Wall's paper however, because our LTO re-codegens
everything anyway. The paper says, "Finally, it keeps us honest as
designers of the system; once we postpone anything until link time, the
temptation is great to postpone everything, ..." - Well, we've
long-since succumb to that temptation when we LTO. C'est la vie.
+1 as well, our LTO will benefit naturally from the leaf-to-root information
propagation. ThinLTO will be more challenging/interesting
though!> For the first part a mechanism similar to MachineModulePass would be
desirable but that may not be possible during this project, but if we can make
some sort of smaller version of that to suit our purpose.
> I don't think we need to make any kind of MachineModulePass to make
this work. Once we alter the visitation order based on the CGSCC iteration
scheme, we can keep state in-between functions in the pre-existing hacky way
(using static members of the relevant function passes).
I also don't see where/why we need a MachineModule(Pass) for the CGSCC
scheme, that said I'd rather avoid using a function pass with static
members, if we can have a ModuleAnalysis that is bookkeeping the results for
functions in the module and queries by the register allocator somehow.
Matthias/Quentin may have other inputs on this aspect.

-- 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160510/090b958d/attachment.html>

vivek pandya via llvm-dev

2016-May-11 04:32 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

*Vivek Pandya*


On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>
>
> ------------------------------
>
> *From: *"vivek pandya" <vivekvpandya at gmail.com>
> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Tim
Amini Golling" <
> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at
anl.gov>
> *Cc: *"Quentin Colombet" <qcolombet at apple.com>
> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
> *Subject: *[GSoC 2016] Interprocedural Register Allocation - Introduction
> and Feedback
>
> Hello LLVM Community,
>
> Sorry for delay as I was busy in final exams.
>
> I am Vivek from India. Thanks for choosing my proposal for Interprocedural
> Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal Finkel will be
> mentoring me for this project.
>
> IPRA can reduce code size and runtime of programs by allocating register
> across the module and procedure boundaries.
>
> I have identified some old but effective research work on this area.
> I want community's feedback for feasibility of these approach and I am
> targeting to implement two of them during this project.
>
> Here is list of the papers, I have read first two papers and I would like
> to discuss those approach first, I will read other two paper then initiate
> discussion for them as well. All I want is to find out a concrete
> implementation plan before 23 May, 2016 and for that I need community's
> help.
>
> 1) Compile time ----- Minimizing register usage penalty at procedure calls
> - http://dl.acm.org/citation.cfm?id=53999
> ====================================================================In
> this approach intra-procedural register allocation is used as base but
> machine code generation order is bottom up traversal of call graph and
> inter-procedural effect is achieved by propagating register usage
> information of callee function to caller (i.e child to parent in CallGraph)
> so that caller can use different registers than callee and can save load
> store cost at procedure call, this is not trivial as it seems due to
> recursive calls, library function usage etc. Also for upper region of the
> graph in this technique available number of registers might become zero in
> that case it should fall back to normal load store at procedure call. Apart
> from these difficulties other difficulties have been identified please
> follow this mail-chain
> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
> My mentor has already provided me a patch that alters code generation
> order as per bottom up call graph traversal, I am working from that point
> now. Any other help/suggestion is always welcomed.
>
> 2) Link time ----- Global register allocation at link time -
> http://dl.acm.org/citation.cfm?id=989415
> ====================================================================In
> this particular approach (sort of true IPRA) registers will be reallocated
> (this optimization will be optional if turned off still code will be
> compiled as per intra-procedural allocation) at link time. Here modules are
> first complied as per normal compilation but the object code is annotated
> with details so that linker can build call graph and also calculate usage
> information at link time. Compiler also write hints in object code that if
> particular variable is allocated in some other register ( due to new
> allocation) then how the code should be changed? Thus linker can use these
> information to decide which variables (global) need to be in same register
> through out the program execution and also according to register usage
> information in call graph which procedure will not be active simultaneously
> so that locals for that procedures can be in same registers with out load
> store at procedure calls.
> For these particular method help me to analyze feasibility:
> 1) Can llvm collects following information at module level in MachineIR?
> list of procedures in module, list of locals in procedures, list of
> procedures that a particular procedure can call, and a list of the
> variables this procedure references. Each entry in the last two lists
> includes an estimate of the number of times the procedure is called or the
> variable is referenced in each execution of this procedure
> 2) Can llvm write informative commands to object files?
> 3) Can LTO is capable of leveraging those commands?
>
> In terms of scoping the project for the summer, I definitely recommend
> that you focus on (1) first. If you finish that, we can certainly move on
> to other things.
>
>
> I'll add +1 here, but I already wrote the same thing on IRC when
> discussing with Vivek. True IPRA without a proper MachineModule
> infrastructure won't be doable in my opinion (even with such
> infrastructure, it may not be trivial in LLVM in general).
>
> Regarding link time, note that any such a design would likely look much
> different than in David Wall's paper however, because our LTO
re-codegens
> everything anyway. The paper says, "Finally, it keeps us honest as
> designers of the system; once we postpone anything until link time, the
> temptation is great to postpone everything, ..." - Well, we've
long-since
> succumb to that temptation when we LTO. C'est la vie.
>
>
> +1 as well, our LTO will benefit naturally from the leaf-to-root
> information propagation. ThinLTO will be more challenging/interesting
> though!
>
> For the first part a mechanism similar to MachineModulePass would be
> desirable but that may not be possible during this project, but if we can
> make some sort of smaller version of that to suit our purpose.
>
> I don't think we need to make any kind of MachineModulePass to make
this
> work. Once we alter the visitation order based on the CGSCC iteration
> scheme, we can keep state in-between functions in the pre-existing hacky
> way (using static members of the relevant function passes).
>
>  Sorry my mistake here by first part I mean 1) requirement in the linktime approach.
>
> I also don't see where/why we need a MachineModule(Pass) for the CGSCC
> scheme, that said I'd rather avoid using a function pass with static
> members, if we can have a ModuleAnalysis that is bookkeeping the results
> for functions in the module and queries by the register allocator somehow.
> Matthias/Quentin may have other inputs on this aspect.
>
> Yes for propagating register usage approach we don't need
MachineModulePass
Vivek
> --
> Mehdi
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160511/968e45d3/attachment.html>

vivek pandya via llvm-dev

2016-May-11 08:15 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

*Vivek Pandya*


On Wed, May 11, 2016 at 10:02 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
>
>
> *Vivek Pandya*
>
>
> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at
apple.com>
> wrote:
>
>>
>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov>
wrote:
>>
>>
>>
>> ------------------------------
>>
>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>,
"Tim Amini Golling" <
>> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at
anl.gov>
>> *Cc: *"Quentin Colombet" <qcolombet at apple.com>
>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
>> *Subject: *[GSoC 2016] Interprocedural Register Allocation -
>> Introduction and Feedback
>>
>> Hello LLVM Community,
>>
>> Sorry for delay as I was busy in final exams.
>>
>> I am Vivek from India. Thanks for choosing my proposal for
>> Interprocedural Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal
>> Finkel will be mentoring me for this project.
>>
>> IPRA can reduce code size and runtime of programs by allocating
register
>> across the module and procedure boundaries.
>>
>> I have identified some old but effective research work on this area.
>> I want community's feedback for feasibility of these approach and I
am
>> targeting to implement two of them during this project.
>>
>> Here is list of the papers, I have read first two papers and I would
like
>> to discuss those approach first, I will read other two paper then
initiate
>> discussion for them as well. All I want is to find out a concrete
>> implementation plan before 23 May, 2016 and for that I need
community's
>> help.
>>
>> 1) Compile time ----- Minimizing register usage penalty at procedure
>> calls - http://dl.acm.org/citation.cfm?id=53999
>> ====================================================================In
>> this approach intra-procedural register allocation is used as base but
>> machine code generation order is bottom up traversal of call graph and
>> inter-procedural effect is achieved by propagating register usage
>> information of callee function to caller (i.e child to parent in
CallGraph)
>> so that caller can use different registers than callee and can save
load
>> store cost at procedure call, this is not trivial as it seems due to
>> recursive calls, library function usage etc. Also for upper region of
the
>> graph in this technique available number of registers might become zero
in
>> that case it should fall back to normal load store at procedure call.
Apart
>> from these difficulties other difficulties have been identified please
>> follow this mail-chain
>> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>> My mentor has already provided me a patch that alters code generation
>> order as per bottom up call graph traversal, I am working from that
point
>> now. Any other help/suggestion is always welcomed.
>>
>> 2) Link time ----- Global register allocation at link time -
>> http://dl.acm.org/citation.cfm?id=989415
>> ====================================================================In
>> this particular approach (sort of true IPRA) registers will be
reallocated
>> (this optimization will be optional if turned off still code will be
>> compiled as per intra-procedural allocation) at link time. Here modules
are
>> first complied as per normal compilation but the object code is
annotated
>> with details so that linker can build call graph and also calculate
usage
>> information at link time. Compiler also write hints in object code that
if
>> particular variable is allocated in some other register ( due to new
>> allocation) then how the code should be changed? Thus linker can use
these
>> information to decide which variables (global) need to be in same
register
>> through out the program execution and also according to register usage
>> information in call graph which procedure will not be active
simultaneously
>> so that locals for that procedures can be in same registers with out
load
>> store at procedure calls.
>> For these particular method help me to analyze feasibility:
>> 1) Can llvm collects following information at module level in
MachineIR?
>> list of procedures in module, list of locals in procedures, list of
>> procedures that a particular procedure can call, and a list of the
>> variables this procedure references. Each entry in the last two lists
>> includes an estimate of the number of times the procedure is called or
the
>> variable is referenced in each execution of this procedure
>> 2) Can llvm write informative commands to object files?
>> 3) Can LTO is capable of leveraging those commands?
>>
>> In terms of scoping the project for the summer, I definitely recommend
>> that you focus on (1) first. If you finish that, we can certainly move
on
>> to other things.
>>
>>
>> I'll add +1 here, but I already wrote the same thing on IRC when
>> discussing with Vivek. True IPRA without a proper MachineModule
>> infrastructure won't be doable in my opinion (even with such
>> infrastructure, it may not be trivial in LLVM in general).
>>
>> Regarding link time, note that any such a design would likely look much
>> different than in David Wall's paper however, because our LTO
re-codegens
>> everything anyway. The paper says, "Finally, it keeps us honest as
>> designers of the system; once we postpone anything until link time, the
>> temptation is great to postpone everything, ..." - Well, we've
long-since
>> succumb to that temptation when we LTO. C'est la vie.
>>
>>
>> +1 as well, our LTO will benefit naturally from the leaf-to-root
>> information propagation. ThinLTO will be more challenging/interesting
>> though!
>>
>> For the first part a mechanism similar to MachineModulePass would be
>> desirable but that may not be possible during this project, but if we
can
>> make some sort of smaller version of that to suit our purpose.
>>
>> I don't think we need to make any kind of MachineModulePass to make
this
>> work. Once we alter the visitation order based on the CGSCC iteration
>> scheme, we can keep state in-between functions in the pre-existing
hacky
>> way (using static members of the relevant function passes).
>>
>>  Sorry my mistake here by first part I mean 1) requirement in the link
> time approach.
>
>>
>
>> I also don't see where/why we need a MachineModule(Pass) for the
CGSCC
>> scheme, that said I'd rather avoid using a function pass with
static
>> members, if we can have a ModuleAnalysis that is bookkeeping the
results
>> for functions in the module and queries by the register allocator
somehow.
>> Matthias/Quentin may have other inputs on this aspect.
>>
>@Hal do you mean to add a simple MachineFunction pass that will just
operate on register allocated function and prepare a BitVector to indicate
which register is being used by MachineFunction, and then use this pass as
analysis pass (i.e just simply return static BitVector for clobbered
register when register allocation for next function begins. This part is
not much clear to me) this thing can be done by scheduling a pass post
register allocation in lib/CodeGen/Passes.cpp

void TargetPassConfig::addMachinePasses() {
.
.
.
  // Run pre-ra passes.
  addPreRegAlloc();

  // Run register allocation and passes that are tightly coupled with it,
  // including phi elimination and scheduling.
  if (getOptimizeRegAlloc())
    addOptimizedRegAlloc(createRegAllocPass(true));
  else
    addFastRegAlloc(createRegAllocPass(false));

  // Run post-ra passes.
  addPostRegAlloc();
// Adding a new pass here which keeps register mask information across
function calls.
.
.
.
}

But this also requires current register allocators to use this information
in someway because RegMaskBits in LiveIntervalAnalysis.cpp is not static
across calls. I mean I am not clear for how to propagate static info to
Intra-procedural Register allocators (if possible without disturbing their
code )

I think this also applies in someway to Mehdi Amini's idea to keep a
ModulePass for bookkeeping but then existing register allocators will be
required to change so that the code can query the ModulePass for
RegMaskBits for particular function.

Vivek

>> Yes for propagating register usage approach we don't need
> MachineModulePass
>
> Vivek
>
>> --
>> Mehdi
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160511/b66f1611/attachment.html>

vivek pandya via llvm-dev

2016-May-15 07:43 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

*Vivek Pandya*


On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>
>
> ------------------------------
>
> *From: *"vivek pandya" <vivekvpandya at gmail.com>
> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Tim
Amini Golling" <
> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at
anl.gov>
> *Cc: *"Quentin Colombet" <qcolombet at apple.com>
> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
> *Subject: *[GSoC 2016] Interprocedural Register Allocation - Introduction
> and Feedback
>
> Hello LLVM Community,
>
> Sorry for delay as I was busy in final exams.
>
> I am Vivek from India. Thanks for choosing my proposal for Interprocedural
> Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal Finkel will be
> mentoring me for this project.
>
> IPRA can reduce code size and runtime of programs by allocating register
> across the module and procedure boundaries.
>
> I have identified some old but effective research work on this area.
> I want community's feedback for feasibility of these approach and I am
> targeting to implement two of them during this project.
>
> Here is list of the papers, I have read first two papers and I would like
> to discuss those approach first, I will read other two paper then initiate
> discussion for them as well. All I want is to find out a concrete
> implementation plan before 23 May, 2016 and for that I need community's
> help.
>
> 1) Compile time ----- Minimizing register usage penalty at procedure calls
> - http://dl.acm.org/citation.cfm?id=53999
> ====================================================================In
> this approach intra-procedural register allocation is used as base but
> machine code generation order is bottom up traversal of call graph and
> inter-procedural effect is achieved by propagating register usage
> information of callee function to caller (i.e child to parent in CallGraph)
> so that caller can use different registers than callee and can save load
> store cost at procedure call, this is not trivial as it seems due to
> recursive calls, library function usage etc. Also for upper region of the
> graph in this technique available number of registers might become zero in
> that case it should fall back to normal load store at procedure call. Apart
> from these difficulties other difficulties have been identified please
> follow this mail-chain
> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
> My mentor has already provided me a patch that alters code generation
> order as per bottom up call graph traversal, I am working from that point
> now. Any other help/suggestion is always welcomed.
>
> 2) Link time ----- Global register allocation at link time -
> http://dl.acm.org/citation.cfm?id=989415
> ====================================================================In
> this particular approach (sort of true IPRA) registers will be reallocated
> (this optimization will be optional if turned off still code will be
> compiled as per intra-procedural allocation) at link time. Here modules are
> first complied as per normal compilation but the object code is annotated
> with details so that linker can build call graph and also calculate usage
> information at link time. Compiler also write hints in object code that if
> particular variable is allocated in some other register ( due to new
> allocation) then how the code should be changed? Thus linker can use these
> information to decide which variables (global) need to be in same register
> through out the program execution and also according to register usage
> information in call graph which procedure will not be active simultaneously
> so that locals for that procedures can be in same registers with out load
> store at procedure calls.
> For these particular method help me to analyze feasibility:
> 1) Can llvm collects following information at module level in MachineIR?
> list of procedures in module, list of locals in procedures, list of
> procedures that a particular procedure can call, and a list of the
> variables this procedure references. Each entry in the last two lists
> includes an estimate of the number of times the procedure is called or the
> variable is referenced in each execution of this procedure
> 2) Can llvm write informative commands to object files?
> 3) Can LTO is capable of leveraging those commands?
>
> In terms of scoping the project for the summer, I definitely recommend
> that you focus on (1) first. If you finish that, we can certainly move on
> to other things.
>
>
> I'll add +1 here, but I already wrote the same thing on IRC when
> discussing with Vivek. True IPRA without a proper MachineModule
> infrastructure won't be doable in my opinion (even with such
> infrastructure, it may not be trivial in LLVM in general).
>
> Regarding link time, note that any such a design would likely look much
> different than in David Wall's paper however, because our LTO
re-codegens
> everything anyway. The paper says, "Finally, it keeps us honest as
> designers of the system; once we postpone anything until link time, the
> temptation is great to postpone everything, ..." - Well, we've
long-since
> succumb to that temptation when we LTO. C'est la vie.
>
>
> +1 as well, our LTO will benefit naturally from the leaf-to-root
> information propagation. ThinLTO will be more challenging/interesting
> though!
>@Mehdi I don't understand this comment, it seems to me that two different
approaches has been merged here. Information propagation in calligraph will
be done without LTO.
> For the first part a mechanism similar to MachineModulePass would be
> desirable but that may not be possible during this project, but if we can
> make some sort of smaller version of that to suit our purpose.
>
> I don't think we need to make any kind of MachineModulePass to make
this
> work. Once we alter the visitation order based on the CGSCC iteration
> scheme, we can keep state in-between functions in the pre-existing hacky
> way (using static members of the relevant function passes).
>
>
> I also don't see where/why we need a MachineModule(Pass) for the CGSCC
> scheme, that said I'd rather avoid using a function pass with static
> members, if we can have a ModuleAnalysis that is bookkeeping the results
> for functions in the module and queries by the register allocator somehow.
> Matthias/Quentin may have other inputs on this aspect.
>
> --
> Mehdi
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160515/51d21893/attachment-0001.html>

Mehdi Amini via llvm-dev

2016-May-15 18:21 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

> On May 15, 2016, at 12:43 AM, vivek pandya <vivekvpandya at
gmail.com> wrote:
> 
> 
> 
> Vivek Pandya
> 
> 
> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> 
>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov
<mailto:hfinkel at anl.gov>> wrote:
>> 
>> 
>> 
>> From: "vivek pandya" <vivekvpandya at gmail.com
<mailto:vivekvpandya at gmail.com>>
>> To: "llvm-dev" <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>, "Tim Amini Golling"
<mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>>,
"Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at
anl.gov>>
>> Cc: "Quentin Colombet" <qcolombet at apple.com
<mailto:qcolombet at apple.com>>
>> Sent: Tuesday, May 10, 2016 2:59:16 PM
>> Subject: [GSoC 2016] Interprocedural Register Allocation - Introduction
and Feedback
>> 
>> Hello LLVM Community,
>> 
>> Sorry for delay as I was busy in final exams.
>> 
>> I am Vivek from India. Thanks for choosing my proposal for
Interprocedural Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal Finkel
will be mentoring me for this project.
>> 
>> IPRA can reduce code size and runtime of programs by allocating
register across the module and procedure boundaries.
>> 
>> I have identified some old but effective research work on this area.
>> I want community's feedback for feasibility of these approach and I
am targeting to implement two of them during this project.
>> 
>> Here is list of the papers, I have read first two papers and I would
like to discuss those approach first, I will read other two paper then initiate
discussion for them as well. All I want is to find out a concrete implementation
plan before 23 May, 2016 and for that I need community's help.
>> 
>> 1) Compile time ----- Minimizing register usage penalty at procedure
calls - http://dl.acm.org/citation.cfm?id=53999
<http://dl.acm.org/citation.cfm?id=53999>
>> ====================================================================In
this approach intra-procedural register allocation is used as base but machine
code generation order is bottom up traversal of call graph and inter-procedural
effect is achieved by propagating register usage information of callee function
to caller (i.e child to parent in CallGraph) so that caller can use different
registers than callee and can save load store cost at procedure call, this is
not trivial as it seems due to recursive calls, library function usage etc. Also
for upper region of the graph in this technique available number of registers
might become zero in that case it should fall back to normal load store at
procedure call. Apart from these difficulties other difficulties have been
identified please follow this mail-chain
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
<https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion>
>> My mentor has already provided me a patch that alters code generation
order as per bottom up call graph traversal, I am working from that point now.
Any other help/suggestion is always welcomed.
>> 
>> 2) Link time ----- Global register allocation at link time -
http://dl.acm.org/citation.cfm?id=989415
<http://dl.acm.org/citation.cfm?id=989415>
>> ====================================================================In
this particular approach (sort of true IPRA) registers will be reallocated (this
optimization will be optional if turned off still code will be compiled as per
intra-procedural allocation) at link time. Here modules are first complied as
per normal compilation but the object code is annotated with details so that
linker can build call graph and also calculate usage information at link time.
Compiler also write hints in object code that if particular variable is
allocated in some other register ( due to new allocation) then how the code
should be changed? Thus linker can use these information to decide which
variables (global) need to be in same register through out the program execution
and also according to register usage information in call graph which procedure
will not be active simultaneously so that locals for that procedures can be in
same registers with out load store at procedure calls.
>> For these particular method help me to analyze feasibility: 
>> 1) Can llvm collects following information at module level in
MachineIR? list of procedures in module, list of locals in procedures, list of
procedures that a particular procedure can call, and a list of the variables
this procedure references. Each entry in the last two lists includes an estimate
of the number of times the procedure is called or the variable is referenced in
each execution of this procedure
>> 2) Can llvm write informative commands to object files?
>> 3) Can LTO is capable of leveraging those commands? 
>> In terms of scoping the project for the summer, I definitely recommend
that you focus on (1) first. If you finish that, we can certainly move on to
other things.
> 
> I'll add +1 here, but I already wrote the same thing on IRC when
discussing with Vivek. True IPRA without a proper MachineModule infrastructure
won't be doable in my opinion (even with such infrastructure, it may not be
trivial in LLVM in general).
> 
>> Regarding link time, note that any such a design would likely look much
different than in David Wall's paper however, because our LTO re-codegens
everything anyway. The paper says, "Finally, it keeps us honest as
designers of the system; once we postpone anything until link time, the
temptation is great to postpone everything, ..." - Well, we've
long-since succumb to that temptation when we LTO. C'est la vie.
> 
> +1 as well, our LTO will benefit naturally from the leaf-to-root
information propagation. ThinLTO will be more challenging/interesting though!
> @Mehdi I don't understand this comment, it seems to me that two
different approaches has been merged here. Information propagation in calligraph
will be done without LTO.
You are propagating information on the callgraph, but this is efficient only
when you have the function definition that is codegen in the same module as the
call site.
So why LTO? 
Because you see "all" the definition for "all" the call
sites in your program.

-- 
Mehdi
 


>> For the first part a mechanism similar to MachineModulePass would be
desirable but that may not be possible during this project, but if we can make
some sort of smaller version of that to suit our purpose.
>> I don't think we need to make any kind of MachineModulePass to make
this work. Once we alter the visitation order based on the CGSCC iteration
scheme, we can keep state in-between functions in the pre-existing hacky way
(using static members of the relevant function passes).
> 
> I also don't see where/why we need a MachineModule(Pass) for the CGSCC
scheme, that said I'd rather avoid using a function pass with static
members, if we can have a ModuleAnalysis that is bookkeeping the results for
functions in the module and queries by the register allocator somehow.
> Matthias/Quentin may have other inputs on this aspect.
> 
> -- 
> Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160515/b5f805cc/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - May 2016 - [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Possibly Parallel Threads