Ghassan Shobaki
2011-Aug-15 08:27 UTC
[LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling
Hi, We are working on a research project whose objective is developing a pre-allocation scheduling algorithm that achieves the optimal balance between exploiting ILP (hiding latencies) and minimizing register pressure. A prototype of our algorithm has been implemented and integrated into an experimental version of LLVM 2.9. Our algorithm is based on a combinatorial optimization approach, which is naturally slower than heuristic approaches. However, our benchmarking (using SPEC CPU2006) shows that for some (but not all) benchmarks our algorithm can produce significantly faster code with a reasonable increase in compile time if we apply it selectively to the hot spots. Work on generating faster code with less increase in compile time is in progress. One of the issues that we currently trying to resolve is computing a precise register pressure estimate that correlates very well with the amount of spill code generated by the register allocator (we are using the default linear scan one). Of course, we won't be able to achieve 100% correlation unless we do allocation and scheduling simultaneously, which is not our current goal. Our current goal is limited to enhancing the pre-allocation scheduling phase to achieve the best possible reduction in register pressure without sacrificing ILP. Note that pre-allocation scheduling is done within the basic block while allocation is done globally for the whole function, which makes it even harder to achieve a good correlation. One factor that is causing our current register pressure estimate to be off is not being able to properly account for live-in and live-out registers (both virtual and physical). As far as we can tell, LLVM represents live-in regs with CopyFromReg instrs and live-out regs with CopyToReg instrs. However, it looks that in a given basic block, LLVM does not generate CopyToReg instrs for registers that are not defined within that block and does not generate CopyFromReg instsr for regs that are not used within the block. This is causing a problem for us, because a precise register pressure estimate has to take into account all live regs at a given point in the block whether they are defined or used in the block itself or not. Our questions are: (1) How can we get information about the registers that are live within a basic block but are not defined within the block? And how can we get that information for regs that are not used in the block? (2) Can we safely assume that CopyFromReg instrs and CopyToReg instrs are used for the sole purpose of representing live-in and live-our regs? In other words, are there other uses for these? If yes, how do we identify the ones that represent live-in and live-out regs. (3) The current physical regsiter limits obtained using TargetRegisterInfo::getRegPressureLimit() seem to be too low (for example, 3 integer regs on x86 32-bit mode). Are these good limits to use in our case? If not, how can we get better limits? Thank you in advance! -Ghassan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110815/ed0d68d1/attachment.html>
Jakob Stoklund Olesen
2011-Aug-15 21:52 UTC
[LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling
On Aug 15, 2011, at 1:27 AM, Ghassan Shobaki wrote:> One factor that is causing our current register pressure estimate to be off is not being able to properly account for live-in and live-out registers (both virtual and physical). As far as we can tell, LLVM represents live-in regs with CopyFromReg instrs and live-out regs with CopyToReg instrs. However, it looks that in a given basic block, LLVM does not generate CopyToReg instrs for registers that are not defined within that block and does not generate CopyFromReg instsr for regs that are not used within the block. This is causing a problem for us, because a precise register pressure estimate has to take into account all live regs at a given point in the block whether they are defined or used in the block itself or not. Our questions are: > > (1) How can we get information about the registers that are live within a basic block but are not defined within the block? And how can we get that information for regs that are not used in the block?This information is only computed immediately before register allocation. Passes that run after scheduling can significantly change the register pressure. In particular MachineCSE and MachineLICM do this. We also run value-based copy coalescing before register allocation. That usually reduces register pressure.> (2) Can we safely assume that CopyFromReg instrs and CopyToReg instrs are used for the sole purpose of representing live-in and live-our regs? In other words, are there other uses for these? > If yes, how do we identify the ones that represent live-in and live-out regs.These DAG nodes are also used to copy to/from physical registers before and after calls. Virtual registers defined by PHI instructions will also appear as CopyFromReg operands.> (3) The current physical regsiter limits obtained using > TargetRegisterInfo::getRegPressureLimit() > seem to be too low (for example, 3 integer regs on x86 32-bit mode). Are these good limits to use in our case? If not, how can we get better limits?These limits are low exactly to make room for the global registers passing through a block without being used. The x86-32 architecture has so few registers that you pretty much always want to schedule for register pressure. Note that LLVM 3.0 has a new register allocator with live range splitting. That makes it less important to accurately model the register pressure from global live ranges. These ranges are likely to be split and spilled anyway, so the schedule should not necessarily make room for them. /jakob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110815/ed4d802f/attachment.html>
Ghassan Shobaki
2011-Aug-16 09:08 UTC
[LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling
Thank you for the answers, Jakob! That's really informative for someone who is still new to LLVM like me. Please see my responses below. -Ghassan ________________________________ From: Jakob Stoklund Olesen <stoklund at 2pi.dk> To: Ghassan Shobaki <ghassan_shobaki at yahoo.com> Cc: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu> Sent: Tuesday, August 16, 2011 12:52 AM Subject: Re: [LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling On Aug 15, 2011, at 1:27 AM, Ghassan Shobaki wrote:>This information is only computed immediately before register allocation. Passes that run after scheduling can significantly change the register pressure. In > particular MachineCSE and MachineLICM do this.Ghassan: I know that phase ordering is a non-trivial problem that does not have a perfect solution (like most compiler optimization problems!), but I wonder why LLVM runs such passes between scheduling and allocation. One would expect a register pressure reduction pass to be placed as close as possible to the register allocation pass. Ideally, you would like to have an integrated algorithm that does allocation and scheduling simultaneously, but such an integrated solution is usually not implemented due to its complexity. So, my questions are: (1) CSE naturally tends to increase reg pressure. Is there any particular reason for doing CSE between scheduling and allocation instead of doing it, say before scheduling? (2) How easy will it be to change the phase ordering in LLVM without breaking things? Where is the phase ordering done? How do we know if there are dependencies among certain phases? (3) Does LLVM have command-line options for turning off phases like CSE and LICM?>These DAG nodes are also used to copy to/from physical registers before and after calls. >Virtual registers defined by PHI instructions will also appear as CopyFromReg operands.Ghassan: So, is there a way to distinguish the ones that represent live-in and live-out regs? Failing to recognize CopyFromReg and CopyToReg instrs that represent live-in/live-out regs, our scheduler will treat them as regular instructions and may move some of them to the middle of the block, thus resulting in an incorrect under-estimate of reg pressure. We would like to avoid that if possible. /jakob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110816/1e1c6b36/attachment.html>
Apparently Analagous Threads
- [LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling
- [LLVMdev] Register Pressure Computation during Pre-Allocation Scheduling
- [LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3
- [LLVMdev] MI Scheduler vs SD Scheduler?
- [LLVMdev] Pre-Allocation Schedulers in LLVM