Justin,
here's another way to look at it....
you are asking for a "half" [*] of a register allocator, in that a
traditional graph coloring allocator
is divided into a coloring algorithm followed by (sometimes
integrated with) a spilling
algorithm, but what you want is just the coloring part without the
spilling part.
the coloring algorithm is so trivial that you might want to simply
write one from scratch.
(or even copy it out of an existing allocator if you really don't
want to write any new code).
as opposed to your current choices which seem to be to take an
existing allocator and
find clever ways to trick it into not doing spilling through the
target machine description.
While that might work eventually, it requires way more intimate
knowledge of the
allocators and the machine descriptions, and could be flaky in that
future register-allocator
implementors might not be able to figure out that some "feature" of
the machine description
interface allowed for this "trick" and the functionality might
accidentally disappear.
just my $0.02, you milage my vary...
-Peter Lawrence.
[*] "half" isn't fair, the coloring algorithm is a very very
small
fraction of the work, spilling is
a hard problem, and that is where all the real work in a quality
allocator goes, as I am
sure the folks recently working on Linear and Greedy will tell you.!.
> ------------------------------
>
> Message: 7
> Date: Mon, 16 May 2011 09:52:25 -0400
> From: Justin Holewinski <justin.holewinski at gmail.com>
> Subject: [LLVMdev] TargetRegisterInfo and "infinite" register
files
> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
> Message-ID: <BANLkTinSWVMs_HB8-SPHjDmEZPoRMuwJYQ at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Currently, the TableGen register info files for all of the back-
> ends define
> concrete registers and divide them into logical register classes.
> I would
> like to get some input from the LLVM experts around here on how
> best to map
> this model to an architecture that does *not* have a concrete, pre-
> defined
> register file. The architecture is PTX, which is more of an
> intermediate
> form than a final assembly language. The format is essentially
> three-address code, with "virtual" registers instead of
"physical"
> registers. After PTX code generation, the PTX assembly is compiled
> to a
> device binary with a proprietary tool (ptxas) that does final register
> allocation (based on device and user constraints). However,
> exploiting
> register re-use at the LLVM/PTX level has shown performance
> improvement over
> blindly using a new "physical" register for each def and letting
ptxas
> figure out all of the register allocation details, so I would like
> to take
> advantage of the LLVM register allocation infrastructure if at all
> possible.
>
> Generally stated, I would like to solve the register allocation
> problem as
> "allocate the minimum number of registers from an arbitrary set
> without
> spill code" instead of the more traditional "allocate the minimum
> number of
> registers from a fixed set."
>
> The current implementation defines an arbitrary set of registers
> that the
> register allocator can use during code-gen. This works, but is not
> scalable. If the register allocator runs out of registers, spill
> code must
> be generated. However, the "optimal" solution in this case would
> be to
> extend the register file. A few alternatives I have come up with are:
>
> 1. Bypass register allocation completely and just emit virtual
> registers,
> 2. Remove register definitions from the TableGen files and
> create them at
> run-time using the virtual register counts as an upper bound on
> the number
> of registers needed, or
> 3. Keep a small set of pre-defined physical registers, and craft
> spill
> code that really just puts a new register definition in the
> final PTX and
> copies to/from this register when spilling/restoring is needed
>
> I hesitate to use (1) or (3) as they rely too heavily on the final
> ptxas
> tool to perform reasonable register allocation, which may not lead to
> optimal code. Option (2) seems promising, though I worry about the
> feasibility of the approach. Specifically, I am not yet sure if
> generating
> TargetRegisterInfo and TargetRegisterClass instances on-the-fly
> will fit
> into the existing architecture.
>
> Any thoughts from the experts out there? Specifically, I am
> interested in
> any non-trivial pros/cons for any of these approaches, or any new
> approaches
> I have not considered.
>
> Thanks!
>
>
>
> --
>
> Thanks,
>
> Justin Holewinski