thr3ads.net - llvm dev - [LLVMdev] Register class proliferation [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2011-Jun-22 05:39 UTC

[LLVMdev] Register class proliferation

On Jun 21, 2011, at 9:15 PM, Andrew Trick wrote:> We should make it clear that the performance problem is with register
aliases, not classes.
Yes, those are orthogonal issues.
> Register aliasing is a serious problem with the current implementation. The
postRA scheduler is *cubic* in the size of the alias list. You dramatically
improved compile time with this simple fix:
> 
> ------------------------------------------------------------------------
> r130801 | stoklund | 2011-05-03 15:31:24 -0700 (Tue, 03 May 2011) | 10
lines
> Mark ultra-super-registers QQQQ as call-clobbered instead of the D
sub-registers.
Completely unintended, I might add.

Our representation of call clobbered registers is bad enough that it deserves
special attention. Every call instruction uses about 1500 bytes for the clobber
list, and there are performance issues as well.

We also need to address the problem with register aliases. Currently, we
don't model register sequence constraints properly for ARM NEON registers. I
want to expand the current model of using QQ and QQQQ registers, but it would
create more register aliases by adding pseudo super-registers.

The RegisterTuples I added to TableGen recently are meant for this, but I
don't want to add them to the ARM target until we have compile time under
control.
> And the new regalloc is not yet designed to scale with the number of
register aliases. (But that could be fixed under some limited definition of
aliasing).
I think I can make it work while supporting the current aliasing model which
consists of sub/super registers and arbitrary pair aliasing.

The idea is to divide registers into atoms such that two registers alias iff
they share an atom. In a normal architecture with sub-register structure, the
atoms are simply the smallest sub-registers. With arbitrary aliasing, atoms may
not correspond to real registers, but they can still be computed.

On x86, for example, the %rax, %eax, and %ax registers all consist of the atoms
(%ah, %al).

The register allocators will check interference using atoms instead of aliases.
That will be faster since every register has fewer atoms than aliases. It also
scales well when adding support for register sequence constraints since new
super-registers don't add any atoms.

In the greedy and basic allocators, it means that we will have one
LiveIntervalUnion per atom instead of one per physical register as we do it
today.
> Whereas, I don't see a reason that proliferating register classes would
be a problem now or in the future.
> 
> Is it easy to distinguish a a couple classes that make up a disjoint
coverage set (e.g. int regs, float regs, ...)? Some algorithms partition the
problem this way.
Initially, I just want the set of register classes to be closed under
intersections and sub-register operations. That's what the coalescer needs.
It is about having getCommonSubClass and getMatchingSuperRegClass return maximal
results. Today, they sometimes return NULL or a class that is smaller than
necessary.

That doesn't give you a disjoint covering, you would need to close under set
differences as well to get that. We could do that, but I think such a covering
would be too fine-grained to be useful. The classes would be much smaller than
'int regs' and  'float regs'.

The x86 GPRs would be divided into nearly single-register classes. More regular
architectures would do better.

/jakob

Andrew Trick

2011-Jun-23 00:01 UTC

head link

[LLVMdev] Register class proliferation

On Jun 21, 2011, at 10:39 PM, Jakob Stoklund Olesen
wrote:> The register allocators will check interference using atoms instead of
aliases. That will be faster since every register has fewer atoms than aliases.
It also scales well when adding support for register sequence constraints since
new super-registers don't add any atoms.
> 
> In the greedy and basic allocators, it means that we will have one
LiveIntervalUnion per atom instead of one per physical register as we do it
today.
That's one way to make an algorithm scale well--just make the best case
expensive enough. Actually, I like this approach very much, I just wonder if we
need to 8x liveintervals for pure 64-bit code.

 -Andy

Jakob Stoklund Olesen

2011-Jun-23 00:54 UTC

head link

[LLVMdev] Register class proliferation

On Jun 22, 2011, at 5:01 PM, Andrew Trick wrote:
> On Jun 21, 2011, at 10:39 PM, Jakob Stoklund Olesen wrote:
>> The register allocators will check interference using atoms instead of
aliases. That will be faster since every register has fewer atoms than aliases.
It also scales well when adding support for register sequence constraints since
new super-registers don't add any atoms.
>> 
>> In the greedy and basic allocators, it means that we will have one
LiveIntervalUnion per atom instead of one per physical register as we do it
today.
> 
> That's one way to make an algorithm scale well--just make the best case
expensive enough. Actually, I like this approach very much, I just wonder if we
need to 8x liveintervals for pure 64-bit code.
8x is not necessary, atoms are not bytes or bits.

The x86 integer registers would each map to two atoms, except for the 8-bit
registers which map to a single atom.

Today, we have live interval unions for %rax, %eax, %ax, %ah, and %al. Using
atoms, we would only have live interval unions for %ah and %al. This probably
makes interference checking cheaper than it is today.

When a virtual register is assigned to a physical register, we only need to
insert it into one live interval union. With atoms, the virtual register live
range would be inserted into multiple live interval unions - one per atom. On
x86, that means assignment will be twice as expensive as it is today.

We perform many more interference checks than we do register assignments, so I
think it will be a good tradeoff.

The x86 reg --> atoms map would look like this:

%rax --> {1, 2}
%eax --> {1, 2}
%ax --> {1, 2}
%ah --> {1}
%al --> {2}

%rbx --> {3, 4}
...

Registers overlap iff they share an atom, that's all you need. X86 would get
60% less live interval unions.

/jakob

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jun 2011 - [LLVMdev] Register class proliferation

[LLVMdev] Register class proliferation

[LLVMdev] Register class proliferation

[LLVMdev] Register class proliferation

Possibly Parallel Threads