Hi Jakob,
I have spent last 4 weeks trying to figure out how to implement
Interprocedural Register Allocation. I must admit that I was really
overwhelmed with LLVM's codebase while trying to figure this out :)
There is so much to know! I think I have reached a point where I
have some sort of basic understanding of what needs to be done,
but I need some help from here on. So here is the summary of
what I know and for what I need more help.
I see that lib/CodeGen/Passes.cpp adds PrologEpilogInserter (PEI
henceforth) pass after the RegAlloc pass. Do I understand correctly
that LLVM first performs a given pass on all the functions before
moving to the next pass? In order to proceed further with this analysis
and to reduce the number of email exchanges back and forth, I will
assume that my understanding is correct for now (branch prediction in
real life? :P). Please let me know if I am wrong, I will restart the
analysis from this point in the subsequent emails. Assuming this is the
case, as with many things compiler optimizations, this is somewhat
opposite to what we want :) So let me start from the beginning of what
we want.
Let us say we want to allocate registers for node X in the call graph.
We want our interprocedural register allocator to have run on all the
child nodes of X in the call graph before it starts the register
allocation on the given node X itself. We not only need this, but
we also want all Callee Saved Registers (CSR henceforth)
information to be available for all the child nodes of X.
This is because we want to compute the physical registers available
for node X as:
set(all the physical registers available) - Union over all the
children(X) (set(CSR for the child node)) - Union over all the
children(X) (set(used physical registers including calling
convention registers for the child node))
However in the current implementation CSR info is calculated in the PEI
pass which is done after register allocation. I am guessing that this
was because, CSR spill insertion is done in the Prolog and the Epilog
in the case when we do not shrink wrap and there was no need to
calculate CSR elsewhere and this was the reasonable location to place
that calculation. However, we have a need to have the calculation done
in the register allocator now. So IMHO, we should be moving the
calculation of CSR out of PEI pass but should save that info to be made
available to PEI pass for spill code insertion. If we can actually do
this, we should be able to combine this CSR calculation with register
allocation. In fact, unless it is absolutely necessary to keep them
separate for modularity or other such reasons, we can merge the PEI pass
and the RegAlloc pass together. What are the cases where one uses a
RegAlloc pass but not PEI pass?
Another concern is about the re-use of CallGraphSCC pass. I see that in
lib/Analysis/IPA/CallGraphSCCPass.cpp's CGPassManager::RunPassOnSCC()
method there is this line, Changed |= FPP->runOnFunction(*F); and FPP
is cast to (FPPassManager*). However, if we want to re-use
CallGraphSCCPass in CodeGen, shouldn't we be casting FPP to the
MachineFunction equivalent and be calling the runOnMachineFunction()
method instead? So should we inherit CallGraphSCCPass to create
MachineCallGraphSCCPass like what we do for FunctionPass, and replace
all the casts/references to non-machine passes to equivalent machine
passes in the overridden methods?
So given all these things, here is my plan on how to go about
implementing an Interprocedural Register Allocator (IPRA henceforth).
1. Decide if we have to create MachineCallGraphSCCPass.
2. If we have to create MachineCallGraphSCCPass, inherit from
CallGraphSCCPass and override the methods as required.
3. Create a new IPRA pass which inherits from MachineCallGraphSCCPass
or CallGraphPass depending on whether we create
MachineCallGraphSCCPass.
4. Implement a loop in runOnMachineSCC that iterates over the CallGraph
nodes in the bottom-up order:
A. For each node, run one of the allocators as chosen by the command
line flag or the optimization level required. This can be
implemented as an argument to createIPRAPass() method.
B. While running the register allocator for a node, make use of the
RegMask from the child nodes to compute the registers available
for this node.
C. Immediately after the register allocator is run for the node, run
calculateCalleeSavedRegisters() on that node and save that info.
I think the current method already saves this info in MFI, so we
do not have to do any extra work to save it for later use.
D. Update the RegMask for the node so that its parents know what
registers to use.
5. Since calculateCalleeSavedRegisters() is now part of register
allocator, remove it from PEI pass but use that info where required.
6. Add a command line flag to enable IPRA (turned off by default) and
add the pass to passes.cpp and the non-IPRAs work exactly as they
are now from the outside, but calculateCalleeSavedRegisters() is
moved to a different place.
Other things like, not using the RegMask of the child nodes which have
calls to external functions or have function pointers as function
parameters should be taken care of.
Please let me know if this makes any sense at all. Also if you would
like to have a quicker turn around time for a conversation, I can login
to LLVM's IRC channel when you have some time or please feel free
to ping me on IM (I use this same email ID for GTalk).
On Thu, Oct 4, 2012 at 2:38 PM, Madhusudan C.S <madhusudancs at
gmail.com>wrote:
> Hi Jakob,
>
> On Thu, Oct 4, 2012 at 2:31 PM, Jakob Stoklund Olesen <stoklund at
2pi.dk>wrote:
>
>>
>> On Oct 4, 2012, at 2:27 PM, "Madhusudan C.S" <madhusudancs
at gmail.com>
>> wrote:
>>
>> Basically, the PrologEpilogInsertion pass will add a bit mask to
>>> MachineModuleInfo describing which registers are clobbered by the
function
>>> being compiled. Later, when compiling the callers, that bit mask is
used to
>>> initialize the regmask operands on call instructions.
>>>
>>
>> So the idea is to sidestep from the calling convention a bit if we
>> already know that the called function will not be using all the
>> registers required by the convention and instead use those registers
>> in the caller?
>>
>>
>> That's right.
>>
>> If I am understanding this correctly, is this something desirable for
>> LLVM, even if it is not high priority?
>>
>>
>> Yes.
>>
>> Would you guys be Ok if I try
>> to implement this without disturbing the project priorities but with
>> a little help/guidance?
>>
>>
>> Absolutely.
>>
>
> Great! Thank you very much. Then, I will do some homework on how
> I plan to implement this, a _very_ rough sketch if not the actual
> design, and come back.
>
> Btw, I just found this http://optimisticcompilers.blogspot.com/ So
> I will just take a look at what happened to that project.
>
>>
>> /jakob
>>
>>
>
--
Thanks and regards,
Madhusudan.C.S
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121031/8f1bfda5/attachment.html>