On Sat, 17 Sep 2005, Tzu-Chien Chiu wrote:> I have a question about implementing subword register allocation
> problems (see the REFERENCES in the end of this message) on LLVM. I
> have algorithms, but don't know the best way to implement them in
> LLVM.
>
> I asked similar question before:
>
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2005-
> May/004001.html
>
> Because I still don't have a satisfying solution now, I try to
> elaborate it again. Pardon.
Ok.
> All registers are 128-bit. Each register can be divided into four
> 32-bit subwords. Each subword can be independently read and written. A
> symbolic name is given to each subword: x, y, z, w.
>
> MUL r0.xyz, r1.xyz, r2.xxx
> SUB r0.w, r3,y, r4.z
> ADD r5.xyzw, r0.xyzw, r2.xyzw
>
> MUL defines the three subwords of r0, and SUB defines the rest one.
> Note that ADD uses the four subwords defined by the previous two
> instructions.
Understood. This is exactly what the register aliases concept is for.
Basically you have "R0" and "R0.x", which alias. Likewise,
R0 aliases
"R0.y", but "R0.y" does not alias "R0.x".
> The register allocate must be aware of this, otherwise
> additional MOV instructions may have to be inserted:
>
> MUL r0.xyz, r1.xyz, r2.xxx
> SUB r9.x, r3,y, r4.z
> MOV r0.w, r9.x
> ADD r5.xyzw, r0.xyzw, r2.xyzw
The funny thing about this is that you actually have different opcodes.
Consider ADD for example. You actually have four different operations:
ADD r, r, r
ADD rr, rr, rr
ADD rrr, rrr, rrr
ADD rrrr, rrrr, rrrr
My guess is that you'll get the most mileage out of describing these to
the target as different operations. An alternative would be to make each
operand a reg #/permute mask pair.
Not having worked on an arch like this before, I can't say which way would
be the best approach.
> In the previous code snippet, when allocating the destination register
> for SUB, the register allocator doesn't choose r0.w, and later when
> it's found the four subwords are referenced in ADD, an additional MOV
> must be inserted to move the subword from r9.x to r9.w.
For this, if you're modeling the registers as wholes, which have pieces
modified out of them, you really want to model the instruction as a two
address instruction, which reads the register, modifies it, then writes it
out. See the 'isTwoAddress' opcodes in the X86 or PPC backend for
examples.
Note that you'd probably want a seperate opcode that destroys the whole
input (if wxyz are all set), so that the register allocator doesn't think
it need to preserve the input in this case.
> Even if the subwords in a 128-bit registers are never referenced
> together in an instruction, minimizing the number of registers is
> always preferred in many architectures to avoid spills.
>
> For example, the code
>
> ADD r0.xy, r1.xy, r2.xy
> MUL r4.xy, r1.xy, r2.xy
>
> cn be improved to save r4, since the rest of the two subwords, z and w
> of r0, are not available:
>
> ADD r0.xy, r1.xy, r2.xy
> MUL r0.zw, r1.xy, r2.xy
Sure, ok. If you modeled your register file as a whole bunch of partial
registers, this would come naturally.
> I know some register algorithms [1][2]. My question is how to
> implement them in LLVM. I try to avoid making too much changes to
> existing LLVM live interval analysis and register allocators. I wish
> them could be re-used without modification, and perhaps using some
> tricks in the TableGen .td file.
The real problem here is that you in some cases you want whole registers,
and in some cases you want partial registers. One bit of unimplemented
functionality is support for sub-registers. This is something we *will*
implement, but I don't know precisely when. The idea is that it will
allow specifying constraints on registers.
For example, on X86, we'd say that we have a register class with the
32-bit X registers in it [EAX, ECX, EBX, EDX], and a register class with
the low-8-bit counterparts [AL, CL, BL, DL]. We want to say that the
second is a designated subreg (say #0) of the first class. This would
allow us to use subregisters of virtual registers as operands to
instructions, and have the register allocator change them to the
appropriate physregs when appropriate.
This partially helps your problem, but not entirely: you still need to
decide how to model the instructions in the first place. Also, this
support isn't implemented yet. :(
> I don't know how to do it, but it may be like what is done in
> X86RegisterInfo.td. AL and AH are defined to be the alias of AX by
> RegisterGroup.
Yup, this is the simple alias part.
> But this method doesn't seem work. I'm not sure, and I
> need your comments before implementing this similar techniques because
> I have a tight schedule.
Unfortunately, I don't know a magic answer. My advice would be to start
with something simple (but suboptimal), and refine it over time.
-Chris
> REFERENCES
>
> [1] S. Tallam and R. Gupta, "Bitwidth aware global register
> allocation", Annual Symposium on Principles of Programming Languages,
> pp.85 - 96, 2003.
>
> [2] Bengu Li, Youtao Zhang, and Rajiv Gupta, "Speculative Subword
> Register Allocation in Embedded Processors", The 17th International
> Workshop on Languages and Compilers
> for Parallel Computing, 2004.
>
>
>
-Chris
--
http://nondot.org/sabre/
http://llvm.org/