thr3ads.net - llvm dev - [llvm-dev] Reg units for unaddressable register parts? [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Krzysztof Parzyszek via llvm-dev

2016-Sep-29 14:45 UTC

[llvm-dev] Reg units for unaddressable register parts?

On 9/28/2016 7:30 PM, Quentin Colombet wrote:> Out of curiosity, could describe why this is useful to have such precision
in the liveness tracking?
RDF is meant to allow optimizations across the whole function. As a
result, registers may change between basic blocks, and there is code to
recalculate it. Accuracy is required to avoid unnecessary block live-ins.
For example, calculate live-ins to BB1:
BB#1:
R0 = ... // Does not affect R1
... = D0 // D0 is a pair R1:R0
Here we want R1 to be the live-in, but not the whole D0 or R0.
At the same time, on x86-64,
BB#1:
EAX = ...
... = RAX
RAX would not be a live-in (since EAX=... overwrites all bits in RAX).

One potential target optimization (for Hexagon) would do with register
renaming. To rename registers we would have to isolate their live ranges
very accurately.

> Indeed, we do not necessarily describe the exact semantic of an
instruction. For instance, on x86 it is probably right to assume most
instruction do not touch the high bits, but on AArch64 this is the opposite.
That's not necessary. In the x86-64 case, if EAX had an extra reg unit
that it would share with RAX (for the unaddressable part extending from
bit 16 upwards), then none of AL=, AH=, or AX= would invalidate the rest
of EAX and RAX, while EAX= would, since it would store into the
"hidden"
reg unit.

The fact that RAX ends up with 0s in the high part would not be
exploited by any target-independent code.

The problem is that at the moment, the last instruction in
EAX = ...
AX = ...
... = EAX
would seem to only use the value from the second one, since AX= defines
all lanes/units that EAX has. This kind of inaccuracy is not just
suboptimal, it would lead to an incorrect conclusion. Currently, only
x86-specific knowledge would tell us that the first instruction is still
live, and I'd like to be able to tell by examining lane masks/reg units.

> What I am saying is that even if we had the infrastructure for the
unaddressable reg units, we would probably need a lot of work to be able to use
it.
Maybe I have overstated the degree of complexity of what I'm looking
for. The information I'm interested in is: "what part of the
super-register survives a definition of a subregister". And the "what
part" does not have to be precise in terms of exact bits, but just some
identification like a bit in a lane mask.

> Side question, have you check how the scheduler check dependencies for in
post RA mode? I wonder if it is already possible to build the information you
want form the existing APIs.
It checks register aliasing. If two registers are aliased, there will
be a dependency between them.

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Bruce Hoult via llvm-dev

2016-Sep-29 22:42 UTC

head link

[llvm-dev] Reg units for unaddressable register parts?

On Fri, Sep 30, 2016 at 3:45 AM, Krzysztof Parzyszek via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> The problem is that at the moment, the last instruction in
>   EAX = ...
>   AX = ...
>   ... = EAX
> would seem to only use the value from the second one, since AX= defines
> all lanes/units that EAX has. This kind of inaccuracy is not just
> suboptimal, it would lead to an incorrect conclusion. Currently, only
> x86-specific knowledge would tell us that the first instruction is still
> live, and I'd like to be able to tell by examining lane masks/reg
units.

Code like this does works ok to merge the top half of EAX with the new
value inserted in AX (or AL, AH), but on many CPUs it is very slow --
slower than using proper machine-independent masking operations.

This is because the CPUs *themselves* track EAX and AX separately in the
register renaming machinery, and have to wait until the write to AX has
actually retired before EAX can be read again.

On Pentium Pro, P2, P3 this caused about a half dozen cycle stall. On Core2
it was reduced to 2 or 3 cycles. I'm not sure about P4. I think not good
:-) Sometime around Nehalem or Sandy Bridge it was finally eliminated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160930/8fd6c961/attachment.html>

Krzysztof Parzyszek via llvm-dev

2016-Sep-30 00:23 UTC

head link

[llvm-dev] Reg units for unaddressable register parts?

On 9/29/2016 9:45 AM, Krzysztof Parzyszek via llvm-dev
wrote:> In the x86-64 case, if EAX had an extra reg unit that it would share
> with RAX (for the unaddressable part extending from bit 16 upwards),
> then none of AL=, AH=, or AX= would invalidate the rest of EAX and RAX,
> while EAX= would, since it would store into the "hidden" reg
unit.
Quentin,
If such units were something that targets could explicitly request via 
some construct in a .td file, would you find that acceptable?

-Krzysztof

Quentin Colombet via llvm-dev

2016-Sep-30 00:48 UTC

head link

[llvm-dev] Reg units for unaddressable register parts?

> On Sep 29, 2016, at 5:23 PM, Krzysztof Parzyszek <kparzysz at
codeaurora.org> wrote:
> 
> On 9/29/2016 9:45 AM, Krzysztof Parzyszek via llvm-dev wrote:
>> In the x86-64 case, if EAX had an extra reg unit that it would share
>> with RAX (for the unaddressable part extending from bit 16 upwards),
>> then none of AL=, AH=, or AX= would invalidate the rest of EAX and RAX,
>> while EAX= would, since it would store into the "hidden" reg
unit.
> 
> Quentin,
> If such units were something that targets could explicitly request via some
construct in a .td file, would you find that acceptable?
Quick thought (I haven’t had time to look closely to your other emails).

If we add some explicit construct in the .td files, how could we use them?

Basically, I am wondering if say in RDF we would you have some mode where we
check for that property to be on and support two different modes, or we would
have everything working on RegUnit and if we don’t use the unaddressable mode in
the td files, we get conservative answers (due to the nature of RegUnit)?

The reason I am asking is because I believe it may already be possible to add
unaddressable register units by hand.
One would need to create additional subregs in their td file to fill the holes,
then mark all the registers mapping to those subregs as unallocatable.

E.g.,
let SubRegIndices = [sub_16bit] in {
def EAX : X86Reg<"eax", 0, [AX]>, DwarfRegNum<[-2, 0, 0]>;

—>
let SubRegIndices = [dummysubIdx_16bit, sub_16bit] in {
def EAX : X86Reg<"eax", 0, [ADummyXH, AX]>, DwarfRegNum<[-2,
0, 0]>;
[…]
def DummyRegClass : RegisterClass[…]/*list all dummy regs*/ {
isAllocatable = 0;
}

In other words, you may be able to explore if that would solve your problem or
if we have to come up with something smarter.

Cheers,
-Quentin> 
> -Krzysztof
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160929/41ee9e6c/attachment.html>

llvm dev - Sep 2016 - Reg units for unaddressable register parts?

[llvm-dev] Reg units for unaddressable register parts?

[llvm-dev] Reg units for unaddressable register parts?

[llvm-dev] Reg units for unaddressable register parts?

[llvm-dev] Reg units for unaddressable register parts?