On Jun 21, 2011, at 9:15 PM, Andrew Trick wrote:> We should make it clear that the performance problem is with register
aliases, not classes.
Yes, those are orthogonal issues.
> Register aliasing is a serious problem with the current implementation. The
postRA scheduler is *cubic* in the size of the alias list. You dramatically
improved compile time with this simple fix:
>
> ------------------------------------------------------------------------
> r130801 | stoklund | 2011-05-03 15:31:24 -0700 (Tue, 03 May 2011) | 10
lines
> Mark ultra-super-registers QQQQ as call-clobbered instead of the D
sub-registers.
Completely unintended, I might add.
Our representation of call clobbered registers is bad enough that it deserves
special attention. Every call instruction uses about 1500 bytes for the clobber
list, and there are performance issues as well.
We also need to address the problem with register aliases. Currently, we
don't model register sequence constraints properly for ARM NEON registers. I
want to expand the current model of using QQ and QQQQ registers, but it would
create more register aliases by adding pseudo super-registers.
The RegisterTuples I added to TableGen recently are meant for this, but I
don't want to add them to the ARM target until we have compile time under
control.
> And the new regalloc is not yet designed to scale with the number of
register aliases. (But that could be fixed under some limited definition of
aliasing).
I think I can make it work while supporting the current aliasing model which
consists of sub/super registers and arbitrary pair aliasing.
The idea is to divide registers into atoms such that two registers alias iff
they share an atom. In a normal architecture with sub-register structure, the
atoms are simply the smallest sub-registers. With arbitrary aliasing, atoms may
not correspond to real registers, but they can still be computed.
On x86, for example, the %rax, %eax, and %ax registers all consist of the atoms
(%ah, %al).
The register allocators will check interference using atoms instead of aliases.
That will be faster since every register has fewer atoms than aliases. It also
scales well when adding support for register sequence constraints since new
super-registers don't add any atoms.
In the greedy and basic allocators, it means that we will have one
LiveIntervalUnion per atom instead of one per physical register as we do it
today.
> Whereas, I don't see a reason that proliferating register classes would
be a problem now or in the future.
>
> Is it easy to distinguish a a couple classes that make up a disjoint
coverage set (e.g. int regs, float regs, ...)? Some algorithms partition the
problem this way.
Initially, I just want the set of register classes to be closed under
intersections and sub-register operations. That's what the coalescer needs.
It is about having getCommonSubClass and getMatchingSuperRegClass return maximal
results. Today, they sometimes return NULL or a class that is smaller than
necessary.
That doesn't give you a disjoint covering, you would need to close under set
differences as well to get that. We could do that, but I think such a covering
would be too fine-grained to be useful. The classes would be much smaller than
'int regs' and 'float regs'.
The x86 GPRs would be divided into nearly single-register classes. More regular
architectures would do better.
/jakob