James R. Byerly via llvm-dev
2016-Jan-06 19:41 UTC
[llvm-dev] Using constraint solving to support complex unspillable register classes?
In the backend that I'm making for my custom architecture, I have some registers that I want to be allocatable but not spillable. I know for a fact that the registers can always be allocated without spilling this class if the scheduling is done correctly - they're only used by code inserted by my backend, and each insertion defines and kills them. My architecture has several ALUs, which I've exposed using a few special instructions along with one super-register per ALU. For example, in my architecture, something like an 'add' is 4 completely separate instructions: set_alu_in1 aluA, R1 set_alu_in2, aluA, R2 add aluA get_alu_out R3, aluA Unfortunately, once scheduling is enabled, these instructions may be interleaved such that spilling parts of the ALU registers is required (which causes a failure when attempting to store it to the stack). Currently, I have a custom machine scheduler strategy that schedules bottom-up and picks the 'best' instruction that keeps the register pressure of the unspillable class from rising past the limit. This is simple enough right now, because I have each alu super-register as a completely distinct entity. I'd like to make my alu registers overlap, so that aluA.out is the same register as aluC.in1 and aluB.out is the same register as aluC.in2, for example. However, my current simple machine scheduler wouldn't handle this properly, because it only deals with register classes and doesn't generalize easily. The real goal of my scheduler is to pick the "best" instruction that still allows for register allocation to succeed, so an obvious direction to go is towards doing some of the work that the register allocator does. After some reading up on register allocation theory and practice, it seems like PBQP is the best approach for "irregular" architectures like I'm trying to make. In fact, switching to the PBQP allocator allowed me to remove a "register pre-allocation" pass that added allocation hints to all ALU instructions with my current non-overlapping ALU architecture. For the scheduling phase, I'm only interested in whether or not a solution (with non-infinite cost) exists, so it seems like I could drop a lot of data from the full PBQP solver - I only need binary weights (zero or infinite), and I can leave out all nodes that don't include any unspillable registers. Is using a PBQP model and solver the best way to solve this? It seems like it should be a simpler problem than normal register allocation if the choice of spilling is removed. After reading up on constraint solving a little, it seems to me that without a special spill class, the assignments could be expressed using a horn formula, and horn satisfiability can be determined in linear time (to the number of symbols, which would be equal to the sum of valid choices for each register). Would that be a better way? Any feedback on solving this constraint problem to allow scheduling with complicated unspillable register is appreciated. Thanks, James Byerly