thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] Interprocedural Register Allocation

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2016-May-11 18:16 UTC

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

----- Original Message -----
> From: "vivek pandya" <vivekvpandya at gmail.com>
> To: "Matthias Braun" <matze at braunis.de>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev"
> <llvm-dev at lists.llvm.org>
> Sent: Wednesday, May 11, 2016 1:14:07 PM
> Subject: Re: [llvm-dev] [GSoC 2016] Interprocedural Register
> Allocation - Introduction and Feedback
> Vivek Pandya
> On Wed, May 11, 2016 at 11:19 PM, Matthias Braun < matze at braunis.de
>
> wrote:
> > Yes there is also MachineRegisterInfo::UsedPhysRegMask which should
> > be the union of all regmasks in the function.
> 
> Does this means there is no requirement of a separate pass for
> propagating register usage information ? And also only codegen order
> needs to be changed to DFS on call graph. And currently no
> intra-procedural register is using UsedPhysRegMask to avoid
> load/store. Does it mean we need to change that ? Or I am getting it
> wrong.No, it just means that the "scanning" procedure is simple, it does not
actually need to visit each instruction.

-Hal 
> - Vivek
> > > On May 11, 2016, at 10:47 AM, Hal Finkel via llvm-dev <
> > > llvm-dev at lists.llvm.org > wrote:
> > 
> 
> > > > From: "Matthias Braun" < matze at braunis.de
>
> > > 
> > 
> 
> > > > To: "Hal Finkel" < hfinkel at anl.gov >
> > > 
> > 
> 
> > > > Cc: "vivek pandya" < vivekvpandya at gmail.com
>, "llvm-dev" <
> > > > llvm-dev at lists.llvm.org >
> > > 
> > 
> 
> > > > Sent: Wednesday, May 11, 2016 12:46:25 PM
> > > 
> > 
> 
> > > > Subject: Re: [llvm-dev] [GSoC 2016] Interprocedural Register
> > > > Allocation - Introduction and Feedback
> > > 
> > 
> 
> > > > > On May 11, 2016, at 3:31 AM, Hal Finkel via llvm-dev
<
> > > > > llvm-dev at lists.llvm.org > wrote:
> > > > 
> > > 
> > 
> 
> > > > > > From: "vivek pandya" < vivekvpandya
at gmail.com >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > To: "Mehdi Amini" < mehdi.amini at
apple.com >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Cc: "Hal Finkel" < hfinkel at anl.gov
>, "Quentin Colombet" <
> > > > > > qcolombet at apple.com >, "llvm-dev"
< llvm-dev at lists.llvm.org
> > > > > > >,
> > > > > > "Matthias Braun" < matze at
braunis.de >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Sent: Wednesday, May 11, 2016 3:15:03 AM
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Subject: Re: [GSoC 2016] Interprocedural Register
> > > > > > Allocation
> > > > > > -
> > > > > > Introduction and Feedback
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Vivek Pandya
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > On Wed, May 11, 2016 at 10:02 AM, vivek pandya
<
> > > > > > vivekvpandya at gmail.com > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Vivek Pandya
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini
<
> > > > > > > mehdi.amini at apple.com
> > > > > > > > wrote:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > On May 10, 2016, at 6:06 PM, Hal
Finkel <
> > > > > > > > > hfinkel at anl.gov
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > From: "vivek pandya"
< vivekvpandya at gmail.com >
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > To: "llvm-dev" <
llvm-dev at lists.llvm.org >, "Tim
> > > > > > > > > > Amini
> > > > > > > > > > Golling"
> > > > > > > > > > <
> > > > > > > > > > mehdi.amini at apple.com >,
"Hal Finkel" <
> > > > > > > > > > hfinkel at anl.gov
> > > > > > > > > > >
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Cc: "Quentin
Colombet" < qcolombet at apple.com >
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Sent: Tuesday, May 10, 2016
2:59:16 PM
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Subject: [GSoC 2016]
Interprocedural Register
> > > > > > > > > > Allocation
> > > > > > > > > > -
> > > > > > > > > > Introduction and Feedback
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Hello LLVM Community,
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Sorry for delay as I was busy
in final exams.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > I am Vivek from India. Thanks
for choosing my
> > > > > > > > > > proposal
> > > > > > > > > > for
> > > > > > > > > > Interprocedural Register
Allocation (IPRA) in LLVM.
> > > > > > > > > > Mehdi
> > > > > > > > > > Amini
> > > > > > > > > > and
> > > > > > > > > > Hal Finkel will be mentoring
me for this project.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > IPRA can reduce code size and
runtime of programs
> > > > > > > > > > by
> > > > > > > > > > allocating
> > > > > > > > > > register across the module and
procedure
> > > > > > > > > > boundaries.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > I have identified some old but
effective research
> > > > > > > > > > work
> > > > > > > > > > on
> > > > > > > > > > this
> > > > > > > > > > area.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > I want community's
feedback for feasibility of
> > > > > > > > > > these
> > > > > > > > > > approach
> > > > > > > > > > and
> > > > > > > > > > I
> > > > > > > > > > am targeting to implement two
of them during this
> > > > > > > > > > project.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Here is list of the papers, I
have read first two
> > > > > > > > > > papers
> > > > > > > > > > and
> > > > > > > > > > I
> > > > > > > > > > would
> > > > > > > > > > like to discuss those approach
first, I will read
> > > > > > > > > > other
> > > > > > > > > > two
> > > > > > > > > > paper
> > > > > > > > > > then initiate discussion for
them as well. All I
> > > > > > > > > > want
> > > > > > > > > > is
> > > > > > > > > > to
> > > > > > > > > > find
> > > > > > > > > > out
> > > > > > > > > > a concrete implementation plan
before 23 May, 2016
> > > > > > > > > > and
> > > > > > > > > > for
> > > > > > > > > > that
> > > > > > > > > > I
> > > > > > > > > > need community's help.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > 1) Compile time -----
Minimizing register usage
> > > > > > > > > > penalty
> > > > > > > > > > at
> > > > > > > > > > procedure
> > > > > > > > > > calls -
http://dl.acm.org/citation.cfm?id=53999
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > >
====================================================================In
> > > > > > > > > > this approach intra-procedural
register allocation
> > > > > > > > > > is
> > > > > > > > > > used
> > > > > > > > > > as
> > > > > > > > > > base
> > > > > > > > > > but machine code generation
order is bottom up
> > > > > > > > > > traversal
> > > > > > > > > > of
> > > > > > > > > > call
> > > > > > > > > > graph and inter-procedural
effect is achieved by
> > > > > > > > > > propagating
> > > > > > > > > > register usage information of
callee function to
> > > > > > > > > > caller
> > > > > > > > > > (i.e
> > > > > > > > > > child
> > > > > > > > > > to parent in CallGraph) so
that caller can use
> > > > > > > > > > different
> > > > > > > > > > registers
> > > > > > > > > > than callee and can save load
store cost at
> > > > > > > > > > procedure
> > > > > > > > > > call,
> > > > > > > > > > this
> > > > > > > > > > is
> > > > > > > > > > not trivial as it seems due to
recursive calls,
> > > > > > > > > > library
> > > > > > > > > > function
> > > > > > > > > > usage etc. Also for upper
region of the graph in
> > > > > > > > > > this
> > > > > > > > > > technique
> > > > > > > > > > available number of registers
might become zero in
> > > > > > > > > > that
> > > > > > > > > > case
> > > > > > > > > > it
> > > > > > > > > > should fall back to normal
load store at procedure
> > > > > > > > > > call.
> > > > > > > > > > Apart
> > > > > > > > > > from
> > > > > > > > > > these difficulties other
difficulties have been
> > > > > > > > > > identified
> > > > > > > > > > please
> > > > > > > > > > follow this mail-chain
> > > > > > > > > >
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > My mentor has already provided
me a patch that
> > > > > > > > > > alters
> > > > > > > > > > code
> > > > > > > > > > generation
> > > > > > > > > > order as per bottom up call
graph traversal, I am
> > > > > > > > > > working
> > > > > > > > > > from
> > > > > > > > > > that
> > > > > > > > > > point now. Any other
help/suggestion is always
> > > > > > > > > > welcomed.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > 2) Link time ----- Global
register allocation at
> > > > > > > > > > link
> > > > > > > > > > time
> > > > > > > > > > -
> > > > > > > > > >
http://dl.acm.org/citation.cfm?id=989415
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > >
====================================================================In
> > > > > > > > > > this particular approach (sort
of true IPRA)
> > > > > > > > > > registers
> > > > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > reallocated (this optimization
will be optional if
> > > > > > > > > > turned
> > > > > > > > > > off
> > > > > > > > > > still
> > > > > > > > > > code will be compiled as per
intra-procedural
> > > > > > > > > > allocation)
> > > > > > > > > > at
> > > > > > > > > > link
> > > > > > > > > > time. Here modules are first
complied as per normal
> > > > > > > > > > compilation
> > > > > > > > > > but
> > > > > > > > > > the object code is annotated
with details so that
> > > > > > > > > > linker
> > > > > > > > > > can
> > > > > > > > > > build
> > > > > > > > > > call graph and also calculate
usage information at
> > > > > > > > > > link
> > > > > > > > > > time.
> > > > > > > > > > Compiler also write hints in
object code that if
> > > > > > > > > > particular
> > > > > > > > > > variable
> > > > > > > > > > is allocated in some other
register ( due to new
> > > > > > > > > > allocation)
> > > > > > > > > > then
> > > > > > > > > > how the code should be
changed? Thus linker can use
> > > > > > > > > > these
> > > > > > > > > > information to decide which
variables (global) need
> > > > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > in
> > > > > > > > > > same
> > > > > > > > > > register through out the
program execution and also
> > > > > > > > > > according
> > > > > > > > > > to
> > > > > > > > > > register usage information in
call graph which
> > > > > > > > > > procedure
> > > > > > > > > > will
> > > > > > > > > > not
> > > > > > > > > > be
> > > > > > > > > > active simultaneously so that
locals for that
> > > > > > > > > > procedures
> > > > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > in
> > > > > > > > > > same registers with out load
store at procedure
> > > > > > > > > > calls.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > For these particular method
help me to analyze
> > > > > > > > > > feasibility:
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > 1) Can llvm collects following
information at
> > > > > > > > > > module
> > > > > > > > > > level
> > > > > > > > > > in
> > > > > > > > > > MachineIR? list of procedures
in module, list of
> > > > > > > > > > locals
> > > > > > > > > > in
> > > > > > > > > > procedures, list of procedures
that a particular
> > > > > > > > > > procedure
> > > > > > > > > > can
> > > > > > > > > > call,
> > > > > > > > > > and a list of the variables
this procedure
> > > > > > > > > > references.
> > > > > > > > > > Each
> > > > > > > > > > entry
> > > > > > > > > > in
> > > > > > > > > > the last two lists includes an
estimate of the
> > > > > > > > > > number
> > > > > > > > > > of
> > > > > > > > > > times
> > > > > > > > > > the
> > > > > > > > > > procedure is called or the
variable is referenced
> > > > > > > > > > in
> > > > > > > > > > each
> > > > > > > > > > execution
> > > > > > > > > > of this procedure
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > 2) Can llvm write informative
commands to object
> > > > > > > > > > files?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > 3) Can LTO is capable of
leveraging those commands?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > In terms of scoping the project for
the summer, I
> > > > > > > > > definitely
> > > > > > > > > recommend that you focus on (1)
first. If you finish
> > > > > > > > > that,
> > > > > > > > > we
> > > > > > > > > can
> > > > > > > > > certainly move on to other things.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > I'll add +1 here, but I already
wrote the same thing on
> > > > > > > > IRC
> > > > > > > > when
> > > > > > > > discussing with Vivek. True IPRA without
a proper
> > > > > > > > MachineModule
> > > > > > > > infrastructure won't be doable in my
opinion (even with
> > > > > > > > such
> > > > > > > > infrastructure, it may not be trivial in
LLVM in
> > > > > > > > general).
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Regarding link time, note that any
such a design
> > > > > > > > > would
> > > > > > > > > likely
> > > > > > > > > look
> > > > > > > > > much different than in David
Wall's paper however,
> > > > > > > > > because
> > > > > > > > > our
> > > > > > > > > LTO
> > > > > > > > > re-codegens everything anyway. The
paper says,
> > > > > > > > > "Finally,
> > > > > > > > > it
> > > > > > > > > keeps
> > > > > > > > > us
> > > > > > > > > honest as designers of the system;
once we postpone
> > > > > > > > > anything
> > > > > > > > > until
> > > > > > > > > link time, the temptation is great
to postpone
> > > > > > > > > everything,
> > > > > > > > > ..."
> > > > > > > > > -
> > > > > > > > > Well, we've long-since succumb
to that temptation
> > > > > > > > > when
> > > > > > > > > we
> > > > > > > > > LTO.
> > > > > > > > > C'est
> > > > > > > > > la vie.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > +1 as well, our LTO will benefit
naturally from the
> > > > > > > > leaf-to-root
> > > > > > > > information propagation. ThinLTO will be
more
> > > > > > > > challenging/interesting though!
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > For the first part a mechanism
similar to
> > > > > > > > > > MachineModulePass
> > > > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > desirable but that may not be
possible during this
> > > > > > > > > > project,
> > > > > > > > > > but
> > > > > > > > > > if
> > > > > > > > > > we can make some sort of
smaller version of that to
> > > > > > > > > > suit
> > > > > > > > > > our
> > > > > > > > > > purpose.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > I don't think we need to make
any kind of
> > > > > > > > > MachineModulePass
> > > > > > > > > to
> > > > > > > > > make
> > > > > > > > > this work. Once we alter the
visitation order based
> > > > > > > > > on
> > > > > > > > > the
> > > > > > > > > CGSCC
> > > > > > > > > iteration scheme, we can keep state
in-between
> > > > > > > > > functions
> > > > > > > > > in
> > > > > > > > > the
> > > > > > > > > pre-existing hacky way (using
static members of the
> > > > > > > > > relevant
> > > > > > > > > function passes).
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Sorry my mistake here by first part I mean 1)
requirement
> > > > > > > in
> > > > > > > the
> > > > > > > link
> > > > > > > time approach.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > I also don't see where/why we need a
> > > > > > > > MachineModule(Pass)
> > > > > > > > for
> > > > > > > > the
> > > > > > > > CGSCC scheme, that said I'd rather
avoid using a
> > > > > > > > function
> > > > > > > > pass
> > > > > > > > with
> > > > > > > > static members, if we can have a
ModuleAnalysis that is
> > > > > > > > bookkeeping
> > > > > > > > the results for functions in the module
and queries by
> > > > > > > > the
> > > > > > > > register
> > > > > > > > allocator somehow.
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Matthias/Quentin may have other inputs
on this aspect.
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > @Hal do you mean to add a simple MachineFunction
pass that
> > > > > > will
> > > > > > just
> > > > > > operate on register allocated function and prepare
a
> > > > > > BitVector
> > > > > > to
> > > > > > indicate which register is being used by
MachineFunction,
> > > > > > and
> > > > > > then
> > > > > > use this pass as analysis pass (i.e just simply
return
> > > > > > static
> > > > > > BitVector for clobbered register when register
allocation
> > > > > > for
> > > > > > next
> > > > > > function begins. This part is not much clear to
me) this
> > > > > > thing
> > > > > > can
> > > > > > be done by scheduling a pass post register
allocation in
> > > > > > lib/CodeGen/Passes.cpp
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > void TargetPassConfig::addMachinePasses() {
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > .
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > .
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > .
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > // Run pre-ra passes.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > addPreRegAlloc();
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > // Run register allocation and passes that are
tightly
> > > > > > coupled
> > > > > > with
> > > > > > it,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > // including phi elimination and scheduling.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > if (getOptimizeRegAlloc())
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > addOptimizedRegAlloc(createRegAllocPass(true));
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > else
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > addFastRegAlloc(createRegAllocPass(false));
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > // Run post-ra passes.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > addPostRegAlloc();
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > // Adding a new pass here which keeps register
mask
> > > > > > information
> > > > > > across function calls.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > .
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > .
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > .
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > }
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > But this also requires current register allocators
to use
> > > > > > this
> > > > > > information in someway because RegMaskBits in
> > > > > > LiveIntervalAnalysis.cpp is not static across
calls. I mean
> > > > > > I
> > > > > > am
> > > > > > not
> > > > > > clear for how to propagate static info to
Intra-procedural
> > > > > > Register
> > > > > > allocators (if possible without disturbing their
code )
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > First, my hope is that we won't need to change the
register
> > > > > allocators, as such, in order to make use of this
> > > > > information.
> > > > > Instead, we'll simply be able to alter the register
masks
> > > > > generated
> > > > > for the call instructions. These masks will indicate
fewer
> > > > > clobbers
> > > > > than might otherwise be present based on the ABI
because of
> > > > > information gathered during the codegen of the callee.
These
> > > > > masks
> > > > > are generally constructed by target based on the
calling
> > > > > convention.
> > > > > The PowerPC backend, for example, looks like this:
> > > > 
> > > 
> > 
> 
> > > > > // Add a register mask operand representing the
> > > > > call-preserved
> > > > > registers.
> > > > 
> > > 
> > 
> 
> > > > > const TargetRegisterInfo *TRI =
Subtarget.getRegisterInfo();
> > > > 
> > > 
> > 
> 
> > > > > const uint32_t *Mask > > > > 
> > > 
> > 
> 
> > > > > TRI->getCallPreservedMask(DAG.getMachineFunction(),
> > > > > CallConv);
> > > > 
> > > 
> > 
> 
> > > > > assert(Mask && "Missing call preserved
mask for calling
> > > > > convention");
> > > > 
> > > 
> > 
> 
> > > > > Ops.push_back(DAG.getRegisterMask(Mask));
> > > > 
> > > 
> > 
> 
> > > > > but it can be more complicated. If you look for uses of
> > > > > 'getRegisterMask' in
Target/*/*ISelLowering.cpp, you'll see
> > > > > what
> > > > > I
> > > > > mean. Regardless, the code ends up calling some method
is the
> > > > > targets TargetRegisterInfo subclass. These methods
generally
> > > > > look
> > > > > something like this:
> > > > 
> > > 
> > 
> 
> > > > > const uint32_t *
> > > > 
> > > 
> > 
> 
> > > > > PPCRegisterInfo::getCallPreservedMask(const
MachineFunction
> > > > > &MF,
> > > > 
> > > 
> > 
> 
> > > > > CallingConv::ID CC) const {
> > > > 
> > > 
> > 
> 
> > > > > const PPCSubtarget &Subtarget > > > >
> MF.getSubtarget<PPCSubtarget>();
> > > > 
> > > 
> > 
> 
> > > > > ...
> > > > 
> > > 
> > 
> 
> > > > > return TM.isPPC64() ? (Subtarget.hasAltivec() ?
> > > > > CSR_SVR464_Altivec_RegMask
> > > > 
> > > 
> > 
> 
> > > > > : CSR_SVR464_RegMask)
> > > > 
> > > 
> > 
> 
> > > > > : (Subtarget.hasAltivec() ? CSR_SVR432_Altivec_RegMask
> > > > 
> > > 
> > 
> 
> > > > > : CSR_SVR432_RegMask);
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 
> > > > > In any case, the fundamental idea here is that, when
someone
> > > > > calls
> > > > > getCallPreservedMask in order to set the regmask on a
call,
> > > > > we
> > > > > might
> > > > > not have to use the CC at all. Instead, if we've
already
> > > > > codegened
> > > > > the function, we might use a cache of 'exact'
register masks
> > > > > computed during codegen of the potential callees
instead.
> > > > 
> > > 
> > 
> 
> > > > > In order to do this, I think we'll need to provide
a function
> > > > > callable from the target's getCallPreservedMask
> > > > > implementation,
> > > > > which can return such an 'exact' regmask when
available. I
> > > > > think
> > > > > we
> > > > > need to do it this way for two reasons:
> > > > 
> > > 
> > 
> 
> > > > > 1. Not all of the target code calls
getCallPreservedMask, but
> > > > > sometimes calls other similar target-specific functions
(e.g.
> > > > > getTLSCallPreservedMask).
> > > > 
> > > 
> > 
> 
> > > > > 2. The targets need to opt-in to this behavior because
only
> > > > > the
> > > > > target can know that all register uses are really
tagged
> > > > > correctly
> > > > > post "pre-emit".
> > > > 
> > > 
> > 
> 
> > > > > Because the target is free to introduce uses of
registers at
> > > > > essentially any time, we need to do the scanning for
used
> > > > > registers
> > > > > after the "pre-emit" passes run. This can be
done by
> > > > > scheduling
> > > > > some
> > > > > simple register-use scanning pass after the call to
> > > > > addPreEmitPass
> > > > > in lib/CodeGen/Passes.cpp.
> > > > 
> > > 
> > 
> 
> > > > MachineRegister maintains linked lists with defs/uses for
each
> > > > register so you can determine whether a specific register is
> > > > used
> > > > or
> > > > not without scanning.
> > > 
> > 
> 
> > > Does this include regmask-clobbered registers?
> > 
> 
> > > -Hal
> > 
> 
> > > > - Matthias
> > > 
> > 
> 
> > > --
> > 
> 
> > > Hal Finkel
> > 
> 
> > > Assistant Computational Scientist
> > 
> 
> > > Leadership Computing Facility
> > 
> 
> > > Argonne National Laboratory
> > 
> 
> > > _______________________________________________
> > 
> 
> > > LLVM Developers mailing list
> > 
> 
> > > llvm-dev at lists.llvm.org
> > 
> 
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> -- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160511/73037f41/attachment.html>

vivek pandya via llvm-dev

2016-May-11 18:19 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

*Vivek Pandya*


On Wed, May 11, 2016 at 11:46 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>
> ------------------------------
>
> *From: *"vivek pandya" <vivekvpandya at gmail.com>
> *To: *"Matthias Braun" <matze at braunis.de>
> *Cc: *"Hal Finkel" <hfinkel at anl.gov>,
"llvm-dev" <llvm-dev at lists.llvm.org>
> *Sent: *Wednesday, May 11, 2016 1:14:07 PM
> *Subject: *Re: [llvm-dev] [GSoC 2016] Interprocedural Register Allocation
> - Introduction and Feedback
>
>
>
> *Vivek Pandya*
>
>
> On Wed, May 11, 2016 at 11:19 PM, Matthias Braun <matze at
braunis.de> wrote:
>
>> Yes there is also MachineRegisterInfo::UsedPhysRegMask which should be
>> the union of all regmasks in the function.
>>
>> Does this means there is no requirement of a separate pass for
> propagating register usage information ? And also only codegen order needs
> to be changed to DFS on call graph. And currently no intra-procedural
> register is using UsedPhysRegMask to avoid load/store. Does it mean we need
> to change that ? Or I am getting it wrong.
>
>
> No, it just means that the "scanning" procedure is simple, it
does not
> actually need to visit each instruction.
>ok so still we need a ModulePass that will be executed as pre-emit pass but
it will do propagation in O(1) time  right ?

- Vivek
>  -Hal
>
>
>
> - Vivek
>
>> On May 11, 2016, at 10:47 AM, Hal Finkel via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>
>>
>> ------------------------------
>>
>> *From: *"Matthias Braun" <matze at braunis.de>
>>
>> *To: *"Hal Finkel" <hfinkel at anl.gov>
>> *Cc: *"vivek pandya" <vivekvpandya at gmail.com>,
"llvm-dev" <
>> llvm-dev at lists.llvm.org>
>> *Sent: *Wednesday, May 11, 2016 12:46:25 PM
>> *Subject: *Re: [llvm-dev] [GSoC 2016] Interprocedural Register
>> Allocation - Introduction and Feedback
>>
>>
>> On May 11, 2016, at 3:31 AM, Hal Finkel via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>
>> ------------------------------
>>
>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>> *To: *"Mehdi Amini" <mehdi.amini at apple.com>
>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "Quentin
Colombet" <
>> qcolombet at apple.com>, "llvm-dev" <llvm-dev at
lists.llvm.org>, "Matthias
>> Braun" <matze at braunis.de>
>> *Sent: *Wednesday, May 11, 2016 3:15:03 AM
>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
>> Introduction and Feedback
>>
>>
>>
>> *Vivek Pandya*
>>
>>
>> On Wed, May 11, 2016 at 10:02 AM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> *Vivek Pandya*
>>>
>>>
>>> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at
apple.com>
>>> wrote:
>>>
>>>>
>>>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at
anl.gov> wrote:
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> *From: *"vivek pandya" <vivekvpandya at
gmail.com>
>>>> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>,
"Tim Amini Golling" <
>>>> mehdi.amini at apple.com>, "Hal Finkel"
<hfinkel at anl.gov>
>>>> *Cc: *"Quentin Colombet" <qcolombet at
apple.com>
>>>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
>>>> *Subject: *[GSoC 2016] Interprocedural Register Allocation -
>>>> Introduction and Feedback
>>>>
>>>> Hello LLVM Community,
>>>>
>>>> Sorry for delay as I was busy in final exams.
>>>>
>>>> I am Vivek from India. Thanks for choosing my proposal for
>>>> Interprocedural Register Allocation (IPRA) in LLVM. Mehdi Amini
and Hal
>>>> Finkel will be mentoring me for this project.
>>>>
>>>> IPRA can reduce code size and runtime of programs by allocating
>>>> register across the module and procedure boundaries.
>>>>
>>>> I have identified some old but effective research work on this
area.
>>>> I want community's feedback for feasibility of these
approach and I am
>>>> targeting to implement two of them during this project.
>>>>
>>>> Here is list of the papers, I have read first two papers and I
would
>>>> like to discuss those approach first, I will read other two
paper then
>>>> initiate discussion for them as well. All I want is to find out
a concrete
>>>> implementation plan before 23 May, 2016 and for that I need
community's
>>>> help.
>>>>
>>>> 1) Compile time ----- Minimizing register usage penalty at
procedure
>>>> calls - http://dl.acm.org/citation.cfm?id=53999
>>>>
====================================================================In
>>>> this approach intra-procedural register allocation is used as
base but
>>>> machine code generation order is bottom up traversal of call
graph and
>>>> inter-procedural effect is achieved by propagating register
usage
>>>> information of callee function to caller (i.e child to parent
in CallGraph)
>>>> so that caller can use different registers than callee and can
save load
>>>> store cost at procedure call, this is not trivial as it seems
due to
>>>> recursive calls, library function usage etc. Also for upper
region of the
>>>> graph in this technique available number of registers might
become zero in
>>>> that case it should fall back to normal load store at procedure
call. Apart
>>>> from these difficulties other difficulties have been identified
please
>>>> follow this mail-chain
>>>>
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>>>> My mentor has already provided me a patch that alters code
generation
>>>> order as per bottom up call graph traversal, I am working from
that point
>>>> now. Any other help/suggestion is always welcomed.
>>>>
>>>> 2) Link time ----- Global register allocation at link time -
>>>> http://dl.acm.org/citation.cfm?id=989415
>>>>
====================================================================In
>>>> this particular approach (sort of true IPRA) registers will be
reallocated
>>>> (this optimization will be optional if turned off still code
will be
>>>> compiled as per intra-procedural allocation) at link time. Here
modules are
>>>> first complied as per normal compilation but the object code is
annotated
>>>> with details so that linker can build call graph and also
calculate usage
>>>> information at link time. Compiler also write hints in object
code that if
>>>> particular variable is allocated in some other register ( due
to new
>>>> allocation) then how the code should be changed? Thus linker
can use these
>>>> information to decide which variables (global) need to be in
same register
>>>> through out the program execution and also according to
register usage
>>>> information in call graph which procedure will not be active
simultaneously
>>>> so that locals for that procedures can be in same registers
with out load
>>>> store at procedure calls.
>>>> For these particular method help me to analyze feasibility:
>>>> 1) Can llvm collects following information at module level in
>>>> MachineIR? list of procedures in module, list of locals in
procedures, list
>>>> of procedures that a particular procedure can call, and a list
of the
>>>> variables this procedure references. Each entry in the last two
lists
>>>> includes an estimate of the number of times the procedure is
called or the
>>>> variable is referenced in each execution of this procedure
>>>> 2) Can llvm write informative commands to object files?
>>>> 3) Can LTO is capable of leveraging those commands?
>>>>
>>>> In terms of scoping the project for the summer, I definitely
recommend
>>>> that you focus on (1) first. If you finish that, we can
certainly move on
>>>> to other things.
>>>>
>>>>
>>>> I'll add +1 here, but I already wrote the same thing on IRC
when
>>>> discussing with Vivek. True IPRA without a proper MachineModule
>>>> infrastructure won't be doable in my opinion (even with
such
>>>> infrastructure, it may not be trivial in LLVM in general).
>>>>
>>>> Regarding link time, note that any such a design would likely
look much
>>>> different than in David Wall's paper however, because our
LTO re-codegens
>>>> everything anyway. The paper says, "Finally, it keeps us
honest as
>>>> designers of the system; once we postpone anything until link
time, the
>>>> temptation is great to postpone everything, ..." - Well,
we've long-since
>>>> succumb to that temptation when we LTO. C'est la vie.
>>>>
>>>>
>>>> +1 as well, our LTO will benefit naturally from the
leaf-to-root
>>>> information propagation. ThinLTO will be more
challenging/interesting
>>>> though!
>>>>
>>>> For the first part a mechanism similar to MachineModulePass
would be
>>>> desirable but that may not be possible during this project, but
if we can
>>>> make some sort of smaller version of that to suit our purpose.
>>>>
>>>> I don't think we need to make any kind of MachineModulePass
to make
>>>> this work. Once we alter the visitation order based on the
CGSCC iteration
>>>> scheme, we can keep state in-between functions in the
pre-existing hacky
>>>> way (using static members of the relevant function passes).
>>>>
>>>>  Sorry my mistake here by first part I mean 1) requirement in
the link
>>> time approach.
>>>
>>>>
>>>
>>>> I also don't see where/why we need a MachineModule(Pass)
for the CGSCC
>>>> scheme, that said I'd rather avoid using a function pass
with static
>>>> members, if we can have a ModuleAnalysis that is bookkeeping
the results
>>>> for functions in the module and queries by the register
allocator somehow.
>>>> Matthias/Quentin may have other inputs on this aspect.
>>>>
>>>
>> @Hal do you mean to add a simple MachineFunction pass that will just
>> operate on register allocated function and prepare a BitVector to
indicate
>> which register is being used by MachineFunction, and then use this pass
as
>> analysis pass (i.e just simply return static BitVector for clobbered
>> register when register allocation for next function begins. This part
is
>> not much clear to me) this thing can be done by scheduling a pass post
>> register allocation in lib/CodeGen/Passes.cpp
>>
>> void TargetPassConfig::addMachinePasses() {
>> .
>> .
>> .
>>   // Run pre-ra passes.
>>   addPreRegAlloc();
>>
>>   // Run register allocation and passes that are tightly coupled with
it,
>>   // including phi elimination and scheduling.
>>   if (getOptimizeRegAlloc())
>>     addOptimizedRegAlloc(createRegAllocPass(true));
>>   else
>>     addFastRegAlloc(createRegAllocPass(false));
>>
>>   // Run post-ra passes.
>>   addPostRegAlloc();
>> // Adding a new pass here which keeps register mask information across
>> function calls.
>> .
>> .
>> .
>> }
>>
>> But this also requires current register allocators to use this
>> information in someway because RegMaskBits in LiveIntervalAnalysis.cpp
is
>> not static across calls. I mean I am not clear for how to propagate
static
>> info to Intra-procedural Register allocators (if possible without
>> disturbing their code )
>>
>> First, my hope is that we won't need to change the register
allocators,
>> as such, in order to make use of this information. Instead, we'll
simply be
>> able to alter the register masks generated for the call instructions.
These
>> masks will indicate fewer clobbers than might otherwise be present
based on
>> the ABI because of information gathered during the codegen of the
callee.
>> These masks are generally constructed by target based on the calling
>> convention. The PowerPC backend, for example, looks like this:
>>
>>   // Add a register mask operand representing the call-preserved
>> registers.
>>   const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo();
>>   const uint32_t *Mask >>      
TRI->getCallPreservedMask(DAG.getMachineFunction(), CallConv);
>>   assert(Mask && "Missing call preserved mask for calling
convention");
>>   Ops.push_back(DAG.getRegisterMask(Mask));
>>
>> but it can be more complicated. If you look for uses of
'getRegisterMask'
>> in Target/*/*ISelLowering.cpp, you'll see what I mean. Regardless,
the code
>> ends up calling some method is the targets TargetRegisterInfo subclass.
>> These methods generally look something like this:
>>
>> const uint32_t *
>> PPCRegisterInfo::getCallPreservedMask(const MachineFunction &MF,
>>                                       CallingConv::ID CC) const {
>>   const PPCSubtarget &Subtarget =
MF.getSubtarget<PPCSubtarget>();
>>   ...
>>   return TM.isPPC64() ? (Subtarget.hasAltivec() ?
>> CSR_SVR464_Altivec_RegMask
>>                                                 : CSR_SVR464_RegMask)
>>                       : (Subtarget.hasAltivec() ?
>> CSR_SVR432_Altivec_RegMask
>>                                                 : CSR_SVR432_RegMask);
>> }
>>
>> In any case, the fundamental idea here is that, when someone calls
>> getCallPreservedMask in order to set the regmask on a call, we might
not
>> have to use the CC at all. Instead, if we've already codegened the
>> function, we might use a cache of 'exact' register masks
computed during
>> codegen of the potential callees instead.
>>
>> In order to do this, I think we'll need to provide a function
callable
>> from the target's getCallPreservedMask implementation, which can
return
>> such an 'exact' regmask when available. I think we need to do
it this way
>> for two reasons:
>>
>>  1. Not all of the target code calls getCallPreservedMask, but
sometimes
>> calls other similar target-specific functions (e.g.
>> getTLSCallPreservedMask).
>>  2. The targets need to opt-in to this behavior because only the target
>> can know that all register uses are really tagged correctly post
"pre-emit".
>>
>> Because the target is free to introduce uses of registers at
essentially
>> any time, we need to do the scanning for used registers after the
>> "pre-emit" passes run. This can be done by scheduling some
simple
>> register-use scanning pass after the call to addPreEmitPass in
>> lib/CodeGen/Passes.cpp.
>>
>> MachineRegister maintains linked lists with defs/uses for each register
>> so you can determine whether a specific register is used or not without
>> scanning.
>>
>> Does this include regmask-clobbered registers?
>>
>>  -Hal
>>
>>
>> - Matthias
>>
>>
>>
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160511/56b19eb1/attachment-0001.html>

Hal Finkel via llvm-dev

2016-May-11 18:22 UTC

head link

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

----- Original Message -----
> From: "vivek pandya" <vivekvpandya at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias
Braun"
> <matze at braunis.de>
> Sent: Wednesday, May 11, 2016 1:19:18 PM
> Subject: Re: [llvm-dev] [GSoC 2016] Interprocedural Register
> Allocation - Introduction and Feedback
> Vivek Pandya
> On Wed, May 11, 2016 at 11:46 PM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> > > From: "vivek pandya" < vivekvpandya at gmail.com
>
> > 
> 
> > > To: "Matthias Braun" < matze at braunis.de >
> > 
> 
> > > Cc: "Hal Finkel" < hfinkel at anl.gov >,
"llvm-dev" <
> > > llvm-dev at lists.llvm.org >
> > 
> 
> > > Sent: Wednesday, May 11, 2016 1:14:07 PM
> > 
> 
> > > Subject: Re: [llvm-dev] [GSoC 2016] Interprocedural Register
> > > Allocation - Introduction and Feedback
> > 
> 
> > > Vivek Pandya
> > 
> 
> > > On Wed, May 11, 2016 at 11:19 PM, Matthias Braun <
> > > matze at braunis.de
> > > >
> > > wrote:
> > 
> 
> > > > Yes there is also MachineRegisterInfo::UsedPhysRegMask which
> > > > should
> > > > be the union of all regmasks in the function.
> > > 
> > 
> 
> > > Does this means there is no requirement of a separate pass for
> > > propagating register usage information ? And also only codegen
> > > order
> > > needs to be changed to DFS on call graph. And currently no
> > > intra-procedural register is using UsedPhysRegMask to avoid
> > > load/store. Does it mean we need to change that ? Or I am getting
> > > it
> > > wrong.
> > 
> 
> > No, it just means that the "scanning" procedure is simple,
it does
> > not actually need to visit each instruction.
> 
> ok so still we need a ModulePass that will be executed as pre-emit
> pass but it will do propagation in O(1) time right ?No, we need a function pass that runs after the pre-emit passes. This pass will
update some module-level analysis (which will need to be an immutable pass,
likely, to fit within the current infrastructure, even though it is not
technically immutable - like our AssumptionCache). But, yes, this post-pre-emit
function pass will be O(1).

-Hal 
> - Vivek
> > -Hal
> 
> > > - Vivek
> > 
> 
> > > > > On May 11, 2016, at 10:47 AM, Hal Finkel via llvm-dev
<
> > > > > llvm-dev at lists.llvm.org > wrote:
> > > > 
> > > 
> > 
> 
> > > > > > From: "Matthias Braun" < matze at
braunis.de >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > To: "Hal Finkel" < hfinkel at anl.gov
>
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Cc: "vivek pandya" < vivekvpandya at
gmail.com >, "llvm-dev" <
> > > > > > llvm-dev at lists.llvm.org >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Sent: Wednesday, May 11, 2016 12:46:25 PM
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Subject: Re: [llvm-dev] [GSoC 2016]
Interprocedural
> > > > > > Register
> > > > > > Allocation - Introduction and Feedback
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On May 11, 2016, at 3:31 AM, Hal Finkel via
llvm-dev <
> > > > > > > llvm-dev at lists.llvm.org > wrote:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > From: "vivek pandya" <
vivekvpandya at gmail.com >
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > To: "Mehdi Amini" <
mehdi.amini at apple.com >
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Cc: "Hal Finkel" < hfinkel
at anl.gov >, "Quentin
> > > > > > > > Colombet"
> > > > > > > > <
> > > > > > > > qcolombet at apple.com >,
"llvm-dev" <
> > > > > > > > llvm-dev at lists.llvm.org
> > > > > > > > >,
> > > > > > > > "Matthias Braun" < matze at
braunis.de >
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Sent: Wednesday, May 11, 2016 3:15:03 AM
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Subject: Re: [GSoC 2016] Interprocedural
Register
> > > > > > > > Allocation
> > > > > > > > -
> > > > > > > > Introduction and Feedback
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Vivek Pandya
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > On Wed, May 11, 2016 at 10:02 AM, vivek
pandya <
> > > > > > > > vivekvpandya at gmail.com > wrote:
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Vivek Pandya
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > On Wed, May 11, 2016 at 9:43 AM,
Mehdi Amini <
> > > > > > > > > mehdi.amini at apple.com
> > > > > > > > > > wrote:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > On May 10, 2016, at 6:06
PM, Hal Finkel <
> > > > > > > > > > > hfinkel at anl.gov
> > > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > From: "vivek
pandya" < vivekvpandya at gmail.com >
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > To:
"llvm-dev" < llvm-dev at lists.llvm.org >,
> > > > > > > > > > > > "Tim
> > > > > > > > > > > > Amini
> > > > > > > > > > > > Golling"
> > > > > > > > > > > > <
> > > > > > > > > > > > mehdi.amini at
apple.com >, "Hal Finkel" <
> > > > > > > > > > > > hfinkel at anl.gov
> > > > > > > > > > > > >
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > Cc: "Quentin
Colombet" < qcolombet at apple.com >
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > Sent: Tuesday, May
10, 2016 2:59:16 PM
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > Subject: [GSoC 2016]
Interprocedural Register
> > > > > > > > > > > > Allocation
> > > > > > > > > > > > -
> > > > > > > > > > > > Introduction and
Feedback
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > Hello LLVM
Community,
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > Sorry for delay as I
was busy in final exams.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > I am Vivek from
India. Thanks for choosing my
> > > > > > > > > > > > proposal
> > > > > > > > > > > > for
> > > > > > > > > > > > Interprocedural
Register Allocation (IPRA) in
> > > > > > > > > > > > LLVM.
> > > > > > > > > > > > Mehdi
> > > > > > > > > > > > Amini
> > > > > > > > > > > > and
> > > > > > > > > > > > Hal Finkel will be
mentoring me for this
> > > > > > > > > > > > project.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > IPRA can reduce code
size and runtime of
> > > > > > > > > > > > programs
> > > > > > > > > > > > by
> > > > > > > > > > > > allocating
> > > > > > > > > > > > register across the
module and procedure
> > > > > > > > > > > > boundaries.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > I have identified
some old but effective
> > > > > > > > > > > > research
> > > > > > > > > > > > work
> > > > > > > > > > > > on
> > > > > > > > > > > > this
> > > > > > > > > > > > area.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > I want
community's feedback for feasibility of
> > > > > > > > > > > > these
> > > > > > > > > > > > approach
> > > > > > > > > > > > and
> > > > > > > > > > > > I
> > > > > > > > > > > > am targeting to
implement two of them during
> > > > > > > > > > > > this
> > > > > > > > > > > > project.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > Here is list of the
papers, I have read first
> > > > > > > > > > > > two
> > > > > > > > > > > > papers
> > > > > > > > > > > > and
> > > > > > > > > > > > I
> > > > > > > > > > > > would
> > > > > > > > > > > > like to discuss
those approach first, I will
> > > > > > > > > > > > read
> > > > > > > > > > > > other
> > > > > > > > > > > > two
> > > > > > > > > > > > paper
> > > > > > > > > > > > then initiate
discussion for them as well. All
> > > > > > > > > > > > I
> > > > > > > > > > > > want
> > > > > > > > > > > > is
> > > > > > > > > > > > to
> > > > > > > > > > > > find
> > > > > > > > > > > > out
> > > > > > > > > > > > a concrete
implementation plan before 23 May,
> > > > > > > > > > > > 2016
> > > > > > > > > > > > and
> > > > > > > > > > > > for
> > > > > > > > > > > > that
> > > > > > > > > > > > I
> > > > > > > > > > > > need community's
help.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > 1) Compile time
----- Minimizing register usage
> > > > > > > > > > > > penalty
> > > > > > > > > > > > at
> > > > > > > > > > > > procedure
> > > > > > > > > > > > calls -
http://dl.acm.org/citation.cfm?id=53999
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > >
====================================================================In
> > > > > > > > > > > > this approach
intra-procedural register
> > > > > > > > > > > > allocation
> > > > > > > > > > > > is
> > > > > > > > > > > > used
> > > > > > > > > > > > as
> > > > > > > > > > > > base
> > > > > > > > > > > > but machine code
generation order is bottom up
> > > > > > > > > > > > traversal
> > > > > > > > > > > > of
> > > > > > > > > > > > call
> > > > > > > > > > > > graph and
inter-procedural effect is achieved
> > > > > > > > > > > > by
> > > > > > > > > > > > propagating
> > > > > > > > > > > > register usage
information of callee function
> > > > > > > > > > > > to
> > > > > > > > > > > > caller
> > > > > > > > > > > > (i.e
> > > > > > > > > > > > child
> > > > > > > > > > > > to parent in
CallGraph) so that caller can use
> > > > > > > > > > > > different
> > > > > > > > > > > > registers
> > > > > > > > > > > > than callee and can
save load store cost at
> > > > > > > > > > > > procedure
> > > > > > > > > > > > call,
> > > > > > > > > > > > this
> > > > > > > > > > > > is
> > > > > > > > > > > > not trivial as it
seems due to recursive calls,
> > > > > > > > > > > > library
> > > > > > > > > > > > function
> > > > > > > > > > > > usage etc. Also for
upper region of the graph
> > > > > > > > > > > > in
> > > > > > > > > > > > this
> > > > > > > > > > > > technique
> > > > > > > > > > > > available number of
registers might become zero
> > > > > > > > > > > > in
> > > > > > > > > > > > that
> > > > > > > > > > > > case
> > > > > > > > > > > > it
> > > > > > > > > > > > should fall back to
normal load store at
> > > > > > > > > > > > procedure
> > > > > > > > > > > > call.
> > > > > > > > > > > > Apart
> > > > > > > > > > > > from
> > > > > > > > > > > > these difficulties
other difficulties have been
> > > > > > > > > > > > identified
> > > > > > > > > > > > please
> > > > > > > > > > > > follow this
mail-chain
> > > > > > > > > > > >
https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > My mentor has
already provided me a patch that
> > > > > > > > > > > > alters
> > > > > > > > > > > > code
> > > > > > > > > > > > generation
> > > > > > > > > > > > order as per bottom
up call graph traversal, I
> > > > > > > > > > > > am
> > > > > > > > > > > > working
> > > > > > > > > > > > from
> > > > > > > > > > > > that
> > > > > > > > > > > > point now. Any other
help/suggestion is always
> > > > > > > > > > > > welcomed.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > 2) Link time -----
Global register allocation
> > > > > > > > > > > > at
> > > > > > > > > > > > link
> > > > > > > > > > > > time
> > > > > > > > > > > > -
> > > > > > > > > > > >
http://dl.acm.org/citation.cfm?id=989415
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > >
====================================================================In
> > > > > > > > > > > > this particular
approach (sort of true IPRA)
> > > > > > > > > > > > registers
> > > > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > reallocated (this
optimization will be optional
> > > > > > > > > > > > if
> > > > > > > > > > > > turned
> > > > > > > > > > > > off
> > > > > > > > > > > > still
> > > > > > > > > > > > code will be
compiled as per intra-procedural
> > > > > > > > > > > > allocation)
> > > > > > > > > > > > at
> > > > > > > > > > > > link
> > > > > > > > > > > > time. Here modules
are first complied as per
> > > > > > > > > > > > normal
> > > > > > > > > > > > compilation
> > > > > > > > > > > > but
> > > > > > > > > > > > the object code is
annotated with details so
> > > > > > > > > > > > that
> > > > > > > > > > > > linker
> > > > > > > > > > > > can
> > > > > > > > > > > > build
> > > > > > > > > > > > call graph and also
calculate usage information
> > > > > > > > > > > > at
> > > > > > > > > > > > link
> > > > > > > > > > > > time.
> > > > > > > > > > > > Compiler also write
hints in object code that
> > > > > > > > > > > > if
> > > > > > > > > > > > particular
> > > > > > > > > > > > variable
> > > > > > > > > > > > is allocated in some
other register ( due to
> > > > > > > > > > > > new
> > > > > > > > > > > > allocation)
> > > > > > > > > > > > then
> > > > > > > > > > > > how the code should
be changed? Thus linker can
> > > > > > > > > > > > use
> > > > > > > > > > > > these
> > > > > > > > > > > > information to
decide which variables (global)
> > > > > > > > > > > > need
> > > > > > > > > > > > to
> > > > > > > > > > > > be
> > > > > > > > > > > > in
> > > > > > > > > > > > same
> > > > > > > > > > > > register through out
the program execution and
> > > > > > > > > > > > also
> > > > > > > > > > > > according
> > > > > > > > > > > > to
> > > > > > > > > > > > register usage
information in call graph which
> > > > > > > > > > > > procedure
> > > > > > > > > > > > will
> > > > > > > > > > > > not
> > > > > > > > > > > > be
> > > > > > > > > > > > active
simultaneously so that locals for that
> > > > > > > > > > > > procedures
> > > > > > > > > > > > can
> > > > > > > > > > > > be
> > > > > > > > > > > > in
> > > > > > > > > > > > same registers with
out load store at procedure
> > > > > > > > > > > > calls.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > For these particular
method help me to analyze
> > > > > > > > > > > > feasibility:
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > 1) Can llvm collects
following information at
> > > > > > > > > > > > module
> > > > > > > > > > > > level
> > > > > > > > > > > > in
> > > > > > > > > > > > MachineIR? list of
procedures in module, list
> > > > > > > > > > > > of
> > > > > > > > > > > > locals
> > > > > > > > > > > > in
> > > > > > > > > > > > procedures, list of
procedures that a
> > > > > > > > > > > > particular
> > > > > > > > > > > > procedure
> > > > > > > > > > > > can
> > > > > > > > > > > > call,
> > > > > > > > > > > > and a list of the
variables this procedure
> > > > > > > > > > > > references.
> > > > > > > > > > > > Each
> > > > > > > > > > > > entry
> > > > > > > > > > > > in
> > > > > > > > > > > > the last two lists
includes an estimate of the
> > > > > > > > > > > > number
> > > > > > > > > > > > of
> > > > > > > > > > > > times
> > > > > > > > > > > > the
> > > > > > > > > > > > procedure is called
or the variable is
> > > > > > > > > > > > referenced
> > > > > > > > > > > > in
> > > > > > > > > > > > each
> > > > > > > > > > > > execution
> > > > > > > > > > > > of this procedure
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > 2) Can llvm write
informative commands to
> > > > > > > > > > > > object
> > > > > > > > > > > > files?
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > 3) Can LTO is
capable of leveraging those
> > > > > > > > > > > > commands?
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > In terms of scoping the
project for the summer, I
> > > > > > > > > > > definitely
> > > > > > > > > > > recommend that you focus
on (1) first. If you
> > > > > > > > > > > finish
> > > > > > > > > > > that,
> > > > > > > > > > > we
> > > > > > > > > > > can
> > > > > > > > > > > certainly move on to
other things.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > I'll add +1 here, but I
already wrote the same
> > > > > > > > > > thing
> > > > > > > > > > on
> > > > > > > > > > IRC
> > > > > > > > > > when
> > > > > > > > > > discussing with Vivek. True
IPRA without a proper
> > > > > > > > > > MachineModule
> > > > > > > > > > infrastructure won't be
doable in my opinion (even
> > > > > > > > > > with
> > > > > > > > > > such
> > > > > > > > > > infrastructure, it may not be
trivial in LLVM in
> > > > > > > > > > general).
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > Regarding link time, note
that any such a design
> > > > > > > > > > > would
> > > > > > > > > > > likely
> > > > > > > > > > > look
> > > > > > > > > > > much different than in
David Wall's paper
> > > > > > > > > > > however,
> > > > > > > > > > > because
> > > > > > > > > > > our
> > > > > > > > > > > LTO
> > > > > > > > > > > re-codegens everything
anyway. The paper says,
> > > > > > > > > > > "Finally,
> > > > > > > > > > > it
> > > > > > > > > > > keeps
> > > > > > > > > > > us
> > > > > > > > > > > honest as designers of
the system; once we
> > > > > > > > > > > postpone
> > > > > > > > > > > anything
> > > > > > > > > > > until
> > > > > > > > > > > link time, the temptation
is great to postpone
> > > > > > > > > > > everything,
> > > > > > > > > > > ..."
> > > > > > > > > > > -
> > > > > > > > > > > Well, we've
long-since succumb to that temptation
> > > > > > > > > > > when
> > > > > > > > > > > we
> > > > > > > > > > > LTO.
> > > > > > > > > > > C'est
> > > > > > > > > > > la vie.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > +1 as well, our LTO will
benefit naturally from the
> > > > > > > > > > leaf-to-root
> > > > > > > > > > information propagation.
ThinLTO will be more
> > > > > > > > > > challenging/interesting
though!
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > > For the first part a
mechanism similar to
> > > > > > > > > > > > MachineModulePass
> > > > > > > > > > > > would
> > > > > > > > > > > > be
> > > > > > > > > > > > desirable but that
may not be possible during
> > > > > > > > > > > > this
> > > > > > > > > > > > project,
> > > > > > > > > > > > but
> > > > > > > > > > > > if
> > > > > > > > > > > > we can make some
sort of smaller version of
> > > > > > > > > > > > that
> > > > > > > > > > > > to
> > > > > > > > > > > > suit
> > > > > > > > > > > > our
> > > > > > > > > > > > purpose.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > > I don't think we need
to make any kind of
> > > > > > > > > > > MachineModulePass
> > > > > > > > > > > to
> > > > > > > > > > > make
> > > > > > > > > > > this work. Once we alter
the visitation order
> > > > > > > > > > > based
> > > > > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > CGSCC
> > > > > > > > > > > iteration scheme, we can
keep state in-between
> > > > > > > > > > > functions
> > > > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > pre-existing hacky way
(using static members of
> > > > > > > > > > > the
> > > > > > > > > > > relevant
> > > > > > > > > > > function passes).
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Sorry my mistake here by first part
I mean 1)
> > > > > > > > > requirement
> > > > > > > > > in
> > > > > > > > > the
> > > > > > > > > link
> > > > > > > > > time approach.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > I also don't see where/why
we need a
> > > > > > > > > > MachineModule(Pass)
> > > > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > CGSCC scheme, that said
I'd rather avoid using a
> > > > > > > > > > function
> > > > > > > > > > pass
> > > > > > > > > > with
> > > > > > > > > > static members, if we can have
a ModuleAnalysis
> > > > > > > > > > that
> > > > > > > > > > is
> > > > > > > > > > bookkeeping
> > > > > > > > > > the results for functions in
the module and queries
> > > > > > > > > > by
> > > > > > > > > > the
> > > > > > > > > > register
> > > > > > > > > > allocator somehow.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Matthias/Quentin may have
other inputs on this
> > > > > > > > > > aspect.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > @Hal do you mean to add a simple
MachineFunction pass
> > > > > > > > that
> > > > > > > > will
> > > > > > > > just
> > > > > > > > operate on register allocated function
and prepare a
> > > > > > > > BitVector
> > > > > > > > to
> > > > > > > > indicate which register is being used by
> > > > > > > > MachineFunction,
> > > > > > > > and
> > > > > > > > then
> > > > > > > > use this pass as analysis pass (i.e just
simply return
> > > > > > > > static
> > > > > > > > BitVector for clobbered register when
register
> > > > > > > > allocation
> > > > > > > > for
> > > > > > > > next
> > > > > > > > function begins. This part is not much
clear to me)
> > > > > > > > this
> > > > > > > > thing
> > > > > > > > can
> > > > > > > > be done by scheduling a pass post
register allocation
> > > > > > > > in
> > > > > > > > lib/CodeGen/Passes.cpp
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > void
TargetPassConfig::addMachinePasses() {
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > .
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > .
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > .
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > // Run pre-ra passes.
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > addPreRegAlloc();
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > // Run register allocation and passes
that are tightly
> > > > > > > > coupled
> > > > > > > > with
> > > > > > > > it,
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > // including phi elimination and
scheduling.
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > if (getOptimizeRegAlloc())
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > >
addOptimizedRegAlloc(createRegAllocPass(true));
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > else
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > >
addFastRegAlloc(createRegAllocPass(false));
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > // Run post-ra passes.
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > addPostRegAlloc();
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > // Adding a new pass here which keeps
register mask
> > > > > > > > information
> > > > > > > > across function calls.
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > .
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > .
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > .
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > }
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > But this also requires current register
allocators to
> > > > > > > > use
> > > > > > > > this
> > > > > > > > information in someway because
RegMaskBits in
> > > > > > > > LiveIntervalAnalysis.cpp is not static
across calls. I
> > > > > > > > mean
> > > > > > > > I
> > > > > > > > am
> > > > > > > > not
> > > > > > > > clear for how to propagate static info
to
> > > > > > > > Intra-procedural
> > > > > > > > Register
> > > > > > > > allocators (if possible without
disturbing their code )
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > First, my hope is that we won't need to
change the
> > > > > > > register
> > > > > > > allocators, as such, in order to make use of
this
> > > > > > > information.
> > > > > > > Instead, we'll simply be able to alter
the register masks
> > > > > > > generated
> > > > > > > for the call instructions. These masks will
indicate
> > > > > > > fewer
> > > > > > > clobbers
> > > > > > > than might otherwise be present based on the
ABI because
> > > > > > > of
> > > > > > > information gathered during the codegen of
the callee.
> > > > > > > These
> > > > > > > masks
> > > > > > > are generally constructed by target based on
the calling
> > > > > > > convention.
> > > > > > > The PowerPC backend, for example, looks like
this:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > // Add a register mask operand representing
the
> > > > > > > call-preserved
> > > > > > > registers.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > const TargetRegisterInfo *TRI > > >
> > > > Subtarget.getRegisterInfo();
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > const uint32_t *Mask > > > > >
>
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
TRI->getCallPreservedMask(DAG.getMachineFunction(),
> > > > > > > CallConv);
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > assert(Mask && "Missing call
preserved mask for calling
> > > > > > > convention");
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Ops.push_back(DAG.getRegisterMask(Mask));
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > but it can be more complicated. If you look
for uses of
> > > > > > > 'getRegisterMask' in
Target/*/*ISelLowering.cpp, you'll
> > > > > > > see
> > > > > > > what
> > > > > > > I
> > > > > > > mean. Regardless, the code ends up calling
some method is
> > > > > > > the
> > > > > > > targets TargetRegisterInfo subclass. These
methods
> > > > > > > generally
> > > > > > > look
> > > > > > > something like this:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > const uint32_t *
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > PPCRegisterInfo::getCallPreservedMask(const
> > > > > > > MachineFunction
> > > > > > > &MF,
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > CallingConv::ID CC) const {
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > const PPCSubtarget &Subtarget > >
> > > > > MF.getSubtarget<PPCSubtarget>();
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > ...
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > return TM.isPPC64() ? (Subtarget.hasAltivec()
?
> > > > > > > CSR_SVR464_Altivec_RegMask
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > : CSR_SVR464_RegMask)
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > : (Subtarget.hasAltivec() ?
CSR_SVR432_Altivec_RegMask
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > : CSR_SVR432_RegMask);
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > }
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > In any case, the fundamental idea here is
that, when
> > > > > > > someone
> > > > > > > calls
> > > > > > > getCallPreservedMask in order to set the
regmask on a
> > > > > > > call,
> > > > > > > we
> > > > > > > might
> > > > > > > not have to use the CC at all. Instead, if
we've already
> > > > > > > codegened
> > > > > > > the function, we might use a cache of
'exact' register
> > > > > > > masks
> > > > > > > computed during codegen of the potential
callees instead.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > In order to do this, I think we'll need
to provide a
> > > > > > > function
> > > > > > > callable from the target's
getCallPreservedMask
> > > > > > > implementation,
> > > > > > > which can return such an 'exact'
regmask when available.
> > > > > > > I
> > > > > > > think
> > > > > > > we
> > > > > > > need to do it this way for two reasons:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > 1. Not all of the target code calls
getCallPreservedMask,
> > > > > > > but
> > > > > > > sometimes calls other similar target-specific
functions
> > > > > > > (e.g.
> > > > > > > getTLSCallPreservedMask).
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > 2. The targets need to opt-in to this
behavior because
> > > > > > > only
> > > > > > > the
> > > > > > > target can know that all register uses are
really tagged
> > > > > > > correctly
> > > > > > > post "pre-emit".
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Because the target is free to introduce uses
of registers
> > > > > > > at
> > > > > > > essentially any time, we need to do the
scanning for used
> > > > > > > registers
> > > > > > > after the "pre-emit" passes run.
This can be done by
> > > > > > > scheduling
> > > > > > > some
> > > > > > > simple register-use scanning pass after the
call to
> > > > > > > addPreEmitPass
> > > > > > > in lib/CodeGen/Passes.cpp.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > MachineRegister maintains linked lists with
defs/uses for
> > > > > > each
> > > > > > register so you can determine whether a specific
register
> > > > > > is
> > > > > > used
> > > > > > or
> > > > > > not without scanning.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > Does this include regmask-clobbered registers?
> > > > 
> > > 
> > 
> 
> > > > > -Hal
> > > > 
> > > 
> > 
> 
> > > > > > - Matthias
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > --
> > > > 
> > > 
> > 
> 
> > > > > Hal Finkel
> > > > 
> > > 
> > 
> 
> > > > > Assistant Computational Scientist
> > > > 
> > > 
> > 
> 
> > > > > Leadership Computing Facility
> > > > 
> > > 
> > 
> 
> > > > > Argonne National Laboratory
> > > > 
> > > 
> > 
> 
> > > > > _______________________________________________
> > > > 
> > > 
> > 
> 
> > > > > LLVM Developers mailing list
> > > > 
> > > 
> > 
> 
> > > > > llvm-dev at lists.llvm.org
> > > > 
> > > 
> > 
> 
> > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > 
> > > 
> > 
> 
> > --
> 
> > Hal Finkel
> 
> > Assistant Computational Scientist
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160511/aede503b/attachment.html>

llvm dev - May 2016 - [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback