thr3ads.net - llvm dev - [LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Ghassan Shobaki

2011-Aug-15 08:27 UTC

[LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling

Hi,

We are working on a research project whose objective is developing a
pre-allocation scheduling algorithm that achieves the optimal balance between
exploiting ILP (hiding latencies) and minimizing register pressure.  A prototype
of our algorithm has been implemented and integrated into an experimental
version of LLVM 2.9. Our algorithm is based on a combinatorial optimization
approach, which is naturally slower than heuristic approaches. However, our
benchmarking (using SPEC CPU2006) shows that for some (but not all) benchmarks
our algorithm can produce significantly faster code with a reasonable increase
in compile time if we apply it selectively to the hot spots. Work on generating
faster code with less increase in compile time is in progress.   


One of the issues that we currently trying to resolve is computing a precise
register pressure estimate that correlates very well with the amount of spill
code generated by the register allocator (we are using the default linear scan
one). Of course, we won't be able to achieve 100% correlation unless we do
allocation and scheduling simultaneously, which is not our current goal. Our
current goal is limited to enhancing the pre-allocation scheduling phase to
achieve the best possible reduction in register pressure without sacrificing
ILP. Note that pre-allocation scheduling is done within the basic block while
allocation is done globally for the whole function, which makes it even harder
to achieve a good correlation.


One factor that is causing our current register pressure estimate to be off is
not being able to properly account for live-in and live-out registers (both
virtual and physical). As far as we can tell, LLVM represents live-in regs with
CopyFromReg instrs and live-out regs with CopyToReg instrs. However, it looks
that in a given basic block, LLVM does not generate CopyToReg instrs for
registers that are not defined within that block and does not generate
CopyFromReg instsr for regs that are not used within the block. This is causing
a problem for us, because a precise register pressure estimate has to take into
account all live regs at a given point in the block whether they are defined or
used in the block itself or not. Our questions are:

(1) How can we get information about the registers that are live within a basic
block but are not defined within the block? And how can we get that information
for regs that are not used in the block?


(2) Can we safely assume that CopyFromReg instrs and CopyToReg instrs are used
for the sole purpose of representing live-in and live-our regs? In other words,
are there other uses for these?
If yes, how do we identify the ones that represent live-in and live-out regs.

(3) The current physical regsiter limits obtained using 

TargetRegisterInfo::getRegPressureLimit()
seem to be too low (for example, 3 integer regs on x86 32-bit mode). Are these
good limits to use in our case? If not, how can we get better limits?

Thank you in advance!
-Ghassan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110815/ed0d68d1/attachment.html>

Jakob Stoklund Olesen

2011-Aug-15 21:52 UTC

head link

[LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling

On Aug 15, 2011, at 1:27 AM, Ghassan Shobaki wrote:
> One factor that is causing our current register pressure estimate to be off
is not being able to properly account for live-in and live-out registers (both
virtual and physical). As far as we can tell, LLVM represents live-in regs with
CopyFromReg instrs and live-out regs with CopyToReg instrs. However, it looks
that in a given basic block, LLVM does not generate CopyToReg instrs for
registers that are not defined within that block and does not generate
CopyFromReg instsr for regs that are not used within the block. This is causing
a problem for us, because a precise register pressure estimate has to take into
account all live regs at a given point in the block whether they are defined or
used in the block itself or not. Our questions are:
> 
> (1) How can we get information about the registers that are live within a
basic block but are not defined within the block? And how can we get that
information for regs that are not used in the block?
This information is only computed immediately before register allocation. Passes
that run after scheduling can significantly change the register pressure. In
particular MachineCSE and MachineLICM do this.

We also run value-based copy coalescing before register allocation. That usually
reduces register pressure.
> (2) Can we safely assume that CopyFromReg instrs and CopyToReg instrs are
used for the sole purpose of representing live-in and live-our regs? In other
words, are there other uses for these?
> If yes, how do we identify the ones that represent live-in and live-out
regs.
These DAG nodes are also used to copy to/from physical registers before and
after calls.

Virtual registers defined by PHI instructions will also appear as CopyFromReg
operands.
> (3) The current physical regsiter limits obtained using 
> TargetRegisterInfo::getRegPressureLimit()
> seem to be too low (for example, 3 integer regs on x86 32-bit mode). Are
these good limits to use in our case? If not, how can we get better limits?
These limits are low exactly to make room for the global registers passing
through a block without being used. The x86-32 architecture has so few registers
that you pretty much always want to schedule for register pressure.

Note that LLVM 3.0 has a new register allocator with live range splitting. That
makes it less important to accurately model the register pressure from global
live ranges. These ranges are likely to be split and spilled anyway, so the
schedule should not necessarily make room for them.

/jakob

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110815/ed4d802f/attachment.html>

Ghassan Shobaki

2011-Aug-16 09:08 UTC

head link

[LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling

Thank you for the answers, Jakob! That's really informative for someone who
is still new to LLVM like me. Please see my responses below.

-Ghassan  



________________________________
From: Jakob Stoklund Olesen <stoklund at 2pi.dk>
To: Ghassan Shobaki <ghassan_shobaki at yahoo.com>
Cc: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu>
Sent: Tuesday, August 16, 2011 12:52 AM
Subject: Re: [LLVMdev] Register Pressure Computation during Pre-Allocation
Scheduling




On Aug 15, 2011, at 1:27 AM, Ghassan Shobaki wrote:
>This information is only computed immediately before register allocation.
Passes that run after scheduling can significantly change the register pressure.
In
> particular MachineCSE and MachineLICM do this.
Ghassan: I know that phase ordering is a non-trivial problem that does not have
a perfect solution (like most compiler optimization problems!), but I wonder why
LLVM runs such passes between scheduling and allocation. One would expect a
register pressure reduction pass to be placed as close as possible to the
register allocation pass. Ideally, you would like to have an integrated
algorithm that does allocation and scheduling simultaneously, but such an
integrated solution is usually not implemented due to its complexity. So, my
questions are:
(1) CSE naturally tends to increase reg pressure. Is there any particular reason
for doing CSE between scheduling and allocation instead of doing it, say before
scheduling?
(2) How easy will it be to change the phase ordering in LLVM without breaking
things? Where is the phase ordering done? How do we know if there are
dependencies among certain phases?
(3) Does LLVM have command-line options for turning off phases like CSE and
LICM?



>These DAG nodes are also used to copy to/from physical registers before and
after calls.
>Virtual registers defined by PHI instructions will also appear as
CopyFromReg operands.Ghassan: So, is there a way to distinguish the ones that represent live-in and
live-out regs?
Failing to recognize CopyFromReg and CopyToReg instrs that represent
live-in/live-out regs, our scheduler will treat them as regular instructions and
may move some of them to the middle of the block, thus resulting in an incorrect
under-estimate of reg pressure. We would like to avoid that if possible. 

/jakob
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110816/1e1c6b36/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Aug 2011 - [LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling

[LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling

[LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling

[LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling

Possibly Parallel Threads