Hi,
I have a question about the way sub-registers are spilled and restored that
is related to the changes I made in r192119.
Suppose I have the following piece of code with four instructions. %vreg0
and %vreg1 consist of two sub-registers indexed by sub_lo and sub_hi.
instr0 %vreg0<def>
instr1 %vreg1:sub_lo<def,read-undef>
instr2 %vreg0<use>
instr3 %vreg1:sub_hi<def>
If register allocator decides to insert spill and restore instructions for
%vreg0, will it spill the whole register that includes sub-registers lo and
hi?
instr0 %vreg0<def>
spill0 %vreg0
instr1 %vreg1:sub_lo<def,read-undef>
spill1 %vreg1:sub_lo
restore0 %vreg0
instr2 %vreg0<use>
restore1 %vreg1:sub_lo
instr3 %vreg1:sub_hi<def>
Or will it spill just the lo sub-register?
instr0 %vreg0<def>
spill0 %vreg0:sub_lo
instr1 %vreg1:sub_lo<def,read-undef>
spill1 %vreg1:sub_lo
restore0 %vreg0:sub_lo
instr2 %vreg0<use>
restore1 %vreg1:sub_lo
instr3 %vreg1:sub_hi<def>
If it spills the whole register (both sub-registers lo and hi), the changes
I made should be fine. Otherwise, I will have to find another way to
prevent the problems I mentioned in r192119's commit log.
On Mon, Oct 7, 2013 at 1:11 PM, Matthias Braun <matze at braunis.de>
wrote:
> I've been working on patches to improve subregister liveness tracking
on
> llvm and I wanted to inform the llvm community about the overal
> design/motivation for them. I will send the patches to llvm-commits later
> today.
>
> Greetings
> Matthias Braun
>
>
> Subregisters in llvm
> ===================>
> Some targets can access registers in different ways resulting in wider or
> narrower accesses. For example on ARM NEON one of the single precision
> floating point registers is called 'S0'. You may also access
'D0' on arm
> which
> is the combination of 'S0' and 'S1' and can store a double
prevision
> number or
> 2 single precision floats. 'Q0' is the combination of 'S0',
'S1', 'S2' and
> 'S3' (or 'D0' and 'D1') and so on.
>
> Before register allocation llvm machine code accesses values through
> virtual
> registers, these get assigned to physical registers later. Each virtual
> register has an assigned register class which is a set of physical
> registers.
> So for example on ARM you have a register class containing all the
'SXX'
> registers and another one containing all the 'DXX' registers, ...
>
> But sometimes you want to mix narrow and wide accesses to values. Like
> loading
> the 'D0' register but later reading the 'S0' and
'S1' components
> separately.
> This is modeled with subregister operands which specify that only parts of
> a
> wider value are accessed. For example the register class of the
'DXX'
> registers supports subregisters calls 'ssub_0' and 'ssub_1'
which would
> result in 'S4' and 'S5' getting used if 'D2' is
assigned to the virtual
> register later.
>
> Typical operations are decomposing wider values or composing wide values
> with
> multiple smaller defs:
>
> Decomposing:
> %vreg1<def> = produce a 'D' value
> = use 'S' value %vreg1:ssub_0
> = use 'S' value %vreg1:ssub_1
>
> Composing:
> %vreg1:ssub_0<def,read-undef> = produce an 'S' value
> %vreg1:ssub_1<def> = produce an 'S' value
> = use a 'D' value %vreg1
>
> Problems / Motivation
> ====================>
> Currently the llvm register allocator tracks liveness for whole virtual
> registers. This can lead to suboptimal code:
>
> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
> %vreg0:ssub_1<def> = produce an 'S' value
> = use a 'D' value %vreg0
> %vreg1 = produce an 'S' value
> = use an 'S' value %vreg1
> = use an 'S' value %vreg0:ssub_0
>
> The current code will realize that vreg0 and vreg1 interfere and assign
> them
> to different registers like D0+S2 aka S0+S1+S2; while in reality after the
> full use of %vreg0 only %vreg0::ssub_0 must remain in a register while the
> subregister used for %vreg0:ssub_1 can be reassigned to %vreg1. An ideal
> assignment would be D0+S1 aka S0+S1.
>
> A even more pressing problem are artificial dependencies in the schedule
> graph. This is a side effect of llvms live range information being
> represented
> in a static single assignment like fashion: Every definition of a vreg
> starts
> a new interval with a new value number. This means that partial register
> writes must be modeled as an implicit use of the unwritten parts of a
> register
> and force the creating of a new value number. This in turn leads to
> artificial
> dependencies in the schedule graph for code like the following where all
> defs
> should be independent:
>
> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
> %vreg0:ssub_1<def> = produce an 'S' value
> %vreg0:ssub_2<def> = produce an 'S' value
> %vreg0:ssub_3<def> = produce an 'S' value
>
>
> Subegister liveness tracking
> ===========================>
> I developed a set of patches which enable liveness tracking on the
> subregister
> level, to overcome the problems mentioned above. After these changes you
> can
> have separate live ranges for subregisters of a virtual register. With
> these
> patches the following code:
>
> 16B %vreg0:ssub_0<def,read-undef> = ...
> 32B %vreg0:ssub_1<def> = ...
> 48B = %vreg0
> 64B = %vreg0:ssub_0
> 80B %vreg0 = ...
> 96B = %vreg0:ssub_1
>
> will be represented as the following live range(s):
>
> Common LiveRange: [16r,32r)[32r,64r),[80r,96r)
> SubRange with Mask 0x0004 (=ssub_0): [16r,64r)[80r,80d)
> SubRange with Mask 0x0008 (=ssub_1): [32r,48r)[80r,96r)
>
> Patches/Changes:
> * Moves live range management code in the LiveInterval class to a new
> class LiveRange, move the previous LiveRange class (which was just a
> single
> interval inside a live range) to LiveRange::Segment.
> LiveInterval is made a subclass of LiveRange, other code paths like
> register units liveness use LiveRange instead of LiveInterval now.
> * Introduce a linked list of SubRange objects to the LiveInterval class.
> A SubRange is a subclass of LiveRange and contains a LaneMask indicating
> which subregisters are represented.
> * Various algorithms have been adapted to calculate/preserve subregister
> liveness.
> * The register allocator has been adapted to track interference at the
> subregister level (LaneMasks are mapped to register units)
>
> Note that SubRegister liveness tracking has to be explicitely enabled by
> the
> target architecture, as it does not provide enough benefits for the costs
> on
> some targets (e.g. having subregister liveness for the lower/upper 8bit
> regs
> on x86 provided nearly no benefits in the llvm-testsuite, so you can't
> justify
> more computations/memory usage for that.
> ______________________________**_________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>
http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131008/c4985525/attachment.html>