N. E. C. via llvm-dev
2016-Feb-13 22:54 UTC
[llvm-dev] Register spilling fix for experimental 6502 backend
So I've been designing an experimental 6502 backend for LLVM. Link: <https://github.com/beholdnec/llvm-m6502> The 6502 only has a few registers, one accumulator and two indexes, none of which are large enough to hold an absolute pointer. Because of this, the backend really tests the power of LLVM's register allocator (RA). I've made a change to the RA that might be of interest to devs of other backends, read on. Other devs suggested reserving the zero page of memory (addresses $00-$FF) and treating it as a giant set of registers. And I don't know, because doesn't that sort of defeat the purpose of register allocation? It might be inevitable that we have to lie to LLVM about the 6502's registers, but, for now, I want to see how far I can get using the allocator as intended. The registers are A, X, and Y. They belong to a hierarchy of Register Classes: Acc class: A Index class: X, Y General class: Acc, Index There's a fictional instruction "SUBidx Acc, Index". It subtracts an Index from an Acc. Bear with me. What if we want to subtract A from X? We can't say "SUBidx X, A". The operands must be Acc, Index. The DAG looks something like this: %vreg2 = COPY %X ; %vreg2 is Index %vreg0 = COPY %A ; %vreg0 is Acc SUBidx %vreg2, %vreg0 To make this work, we first have to swap A and X. For technical reasons, we can't use Y as temporary storage. Instead, we must spill to the stack. When spilling a reg to the stack, the RA needs two virtual regs, one to store and one to load. Currently, the loaded reg always has the same class as the stored reg. Because of this limitation, I ran into a problem: the RA would try to assign two Accs to physical regs, but only one is available, and the RA would fail. Line 1290 of InlineSpiller.cpp, inside the function spillAroundUses, says: "// FIXME: Infer regclass from instruction alone" (As opposed to the current behavior of always using the same class as the reg being spilled.) The line refers to this very limitation. It was written in 2010, so we've known about the limitation since the beginning. My RA patch can be found in this commit: <https://github.com/beholdnec/llvm-m6502/commit/a4b492f63> This patch allows spills to use a more general reg class, so a spilled Acc can be reloaded as a General. Previously, the RA would fail because it would spill X, but it couldn't reload X into an Acc. Now, with the patch, the DAG compiles into this: # store X to stack STX_stack [0] # transfer A to X TAX # load A from stack LDA_stack [0] # now that A and X are swapped, we can subtract! SUBidx A, X It works, but I'm not sure if I used the best approach. If you're familiar with the RA, feel free to look at the patch and leave feedback. - Nolan
David Chisnall via llvm-dev
2016-Feb-14 10:07 UTC
[llvm-dev] Register spilling fix for experimental 6502 backend
On 13 Feb 2016, at 22:54, N. E. C. via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Other devs suggested reserving the zero page of memory (addresses $00-$FF) and > treating it as a giant set of registers. And I don't know, because doesn't that > sort of defeat the purpose of register allocation? It might be inevitable that > we have to lie to LLVM about the 6502's registers, but, for now, I want to see > how far I can get using the allocator as intended.I don’t think that it contradicts the purpose of register allocation. Register allocation is really shorthand for operand location selection (though registers do have the nice property that, on the compiler’s side of the instruction decoder, they never alias). On a RISC-like register-register architecture, these are precisely the same. On modern x86 processors (at least when running in 32-bit mode), if you don’t move the stack pointer then the top few stack slots are stored in rename registers so register-memory instructions that operate on them can use the same rename logic as register-register instructions, so it would be entirely valid to treat a few esp-relative addresses that the compiler guarantees are not address taken as registers. For the 6502, you might want to look at the papers describing compilers for the Xerox Alto as a good reference. As was common in that era, they defined a virtual instruction set implemented in microcode as the compiler target. The Alto’s instruction set was specifically designed to be used in this way, with compilers generating bytecode and shipping a bytecode interpreter. If you think that this is slow, remember that Smalltalk on the Alto ran an entire multitasking GUI on a processor not much faster than a 6502, though with a bit more RAM. Without long pipelines, a jump-threaded bytecode interpreter is very fast. See also P-code from the UCSD Pascal compiler. SWEET16 is an example of the same concept for the 6502, providing effectively a 16-bit virtual ISA for compilers and interpreters to target. When targeting a processor like the 6502 from a vaguely high-level language (which, given the amount of structure that it implies, includes LLVM IR in this context), you are going to find so many repeated instruction sequences that it will be very difficult to fit within the memory constraints if you expand them at every use. Even Clang and GCC targeting modern architectures end up doing this to some degree, for example providing software floating point routines in compiler-rt / libgcc. If you tried to inline them at every use, then you’d end up with code that would suffer from instruction cache footprint on larger processors and completely fail to fit on M-profile processors. If I were writing a 6502 back end, then I would define an SWEET16-derived virtual ISA, include the 6502 assembly for that code in compiler-rt, and target that ISA from the LLVM back end. I’d end up with much denser code at the end and not have to fight the LLVM infrastructure so much, David
Steve Montgomery via llvm-dev
2016-Feb-14 11:49 UTC
[llvm-dev] Register spilling fix for experimental 6502 backend
Have you tried implementing getLargestLegalSuperClass(RC) in M6502RegisterInfo? The default implementation returns RC but if you make it return the General class then live range splitting will use General registers. That might avoid the need for a patch to the register allocator. Steve> On 13 Feb 2016, at 22:54, N. E. C. via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > So I've been designing an experimental 6502 backend for LLVM. Link: > <https://github.com/beholdnec/llvm-m6502> > > The 6502 only has a few registers, one accumulator and two indexes, none of > which are large enough to hold an absolute pointer. Because of this, the > backend really tests the power of LLVM's register allocator (RA). > > I've made a change to the RA that might be of interest to devs of other > backends, read on. > > Other devs suggested reserving the zero page of memory (addresses $00-$FF) and > treating it as a giant set of registers. And I don't know, because doesn't that > sort of defeat the purpose of register allocation? It might be inevitable that > we have to lie to LLVM about the 6502's registers, but, for now, I want to see > how far I can get using the allocator as intended. > > The registers are A, X, and Y. They belong to a hierarchy of Register Classes: > Acc class: A > Index class: X, Y > General class: Acc, Index > > There's a fictional instruction "SUBidx Acc, Index". It subtracts an Index from > an Acc. Bear with me. > > What if we want to subtract A from X? > We can't say "SUBidx X, A". The operands must be Acc, Index. > > The DAG looks something like this: > > %vreg2 = COPY %X ; %vreg2 is Index > %vreg0 = COPY %A ; %vreg0 is Acc > SUBidx %vreg2, %vreg0 > > To make this work, we first have to swap A and X. > For technical reasons, we can't use Y as temporary storage. Instead, we must > spill to the stack. > > When spilling a reg to the stack, the RA needs two virtual regs, one to store > and one to load. Currently, the loaded reg always has the same class as the > stored reg. Because of this limitation, I ran into a problem: the RA would try > to assign two Accs to physical regs, but only one is available, and the RA > would fail. > > Line 1290 of InlineSpiller.cpp, inside the function spillAroundUses, says: > "// FIXME: Infer regclass from instruction alone" > (As opposed to the current behavior of always using the same class as the reg > being spilled.) > The line refers to this very limitation. It was written in 2010, so we've known > about the limitation since the beginning. > > My RA patch can be found in this commit: > <https://github.com/beholdnec/llvm-m6502/commit/a4b492f63> > > This patch allows spills to use a more general reg class, so a spilled Acc can > be reloaded as a General. > > Previously, the RA would fail because it would spill X, but it couldn't reload > X into an Acc. > > Now, with the patch, the DAG compiles into this: > > # store X to stack > STX_stack [0] > # transfer A to X > TAX > # load A from stack > LDA_stack [0] > # now that A and X are swapped, we can subtract! > SUBidx A, X > > It works, but I'm not sure if I used the best approach. If you're familiar with > the RA, feel free to look at the patch and leave feedback. > > - Nolan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Quentin Colombet via llvm-dev
2016-Feb-15 18:36 UTC
[llvm-dev] Register spilling fix for experimental 6502 backend
Hi Nolan, One thing you could try is to define a register class that is the union of X and Acc. That register class would be used for spill and reload instructions. In other words, both store and load would use the “same" register class. By doing so, you shouldn’t need to patch any of the RA pipeline. Cheers, -Quentin> On Feb 13, 2016, at 2:54 PM, N. E. C. via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > So I've been designing an experimental 6502 backend for LLVM. Link: > <https://github.com/beholdnec/llvm-m6502> > > The 6502 only has a few registers, one accumulator and two indexes, none of > which are large enough to hold an absolute pointer. Because of this, the > backend really tests the power of LLVM's register allocator (RA). > > I've made a change to the RA that might be of interest to devs of other > backends, read on. > > Other devs suggested reserving the zero page of memory (addresses $00-$FF) and > treating it as a giant set of registers. And I don't know, because doesn't that > sort of defeat the purpose of register allocation? It might be inevitable that > we have to lie to LLVM about the 6502's registers, but, for now, I want to see > how far I can get using the allocator as intended. > > The registers are A, X, and Y. They belong to a hierarchy of Register Classes: > Acc class: A > Index class: X, Y > General class: Acc, Index > > There's a fictional instruction "SUBidx Acc, Index". It subtracts an Index from > an Acc. Bear with me. > > What if we want to subtract A from X? > We can't say "SUBidx X, A". The operands must be Acc, Index. > > The DAG looks something like this: > > %vreg2 = COPY %X ; %vreg2 is Index > %vreg0 = COPY %A ; %vreg0 is Acc > SUBidx %vreg2, %vreg0 > > To make this work, we first have to swap A and X. > For technical reasons, we can't use Y as temporary storage. Instead, we must > spill to the stack. > > When spilling a reg to the stack, the RA needs two virtual regs, one to store > and one to load. Currently, the loaded reg always has the same class as the > stored reg. Because of this limitation, I ran into a problem: the RA would try > to assign two Accs to physical regs, but only one is available, and the RA > would fail. > > Line 1290 of InlineSpiller.cpp, inside the function spillAroundUses, says: > "// FIXME: Infer regclass from instruction alone" > (As opposed to the current behavior of always using the same class as the reg > being spilled.) > The line refers to this very limitation. It was written in 2010, so we've known > about the limitation since the beginning. > > My RA patch can be found in this commit: > <https://github.com/beholdnec/llvm-m6502/commit/a4b492f63> > > This patch allows spills to use a more general reg class, so a spilled Acc can > be reloaded as a General. > > Previously, the RA would fail because it would spill X, but it couldn't reload > X into an Acc. > > Now, with the patch, the DAG compiles into this: > > # store X to stack > STX_stack [0] > # transfer A to X > TAX > # load A from stack > LDA_stack [0] > # now that A and X are swapped, we can subtract! > SUBidx A, X > > It works, but I'm not sure if I used the best approach. If you're familiar with > the RA, feel free to look at the patch and leave feedback. > > - Nolan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
N. E. C. via llvm-dev
2016-Feb-16 07:09 UTC
[llvm-dev] Register spilling fix for experimental 6502 backend
Thank you for pointing me to getLargestLegalSuperClass, implementing this is adequate to fix the reg allocation problem without patching LLVM's register allocator. It is understandable that I should avoid touching the Reg Allocator, as doing so might potentially break other backends. Nolan Check On Mon, Feb 15, 2016 at 10:36 AM, Quentin Colombet <qcolombet at apple.com> wrote:> Hi Nolan, > > One thing you could try is to define a register class that is the union of X and Acc. > That register class would be used for spill and reload instructions. > In other words, both store and load would use the “same" register class. > > By doing so, you shouldn’t need to patch any of the RA pipeline. > > Cheers, > -Quentin >> On Feb 13, 2016, at 2:54 PM, N. E. C. via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> So I've been designing an experimental 6502 backend for LLVM. Link: >> <https://github.com/beholdnec/llvm-m6502> >> >> The 6502 only has a few registers, one accumulator and two indexes, none of >> which are large enough to hold an absolute pointer. Because of this, the >> backend really tests the power of LLVM's register allocator (RA). >> >> I've made a change to the RA that might be of interest to devs of other >> backends, read on. >> >> Other devs suggested reserving the zero page of memory (addresses $00-$FF) and >> treating it as a giant set of registers. And I don't know, because doesn't that >> sort of defeat the purpose of register allocation? It might be inevitable that >> we have to lie to LLVM about the 6502's registers, but, for now, I want to see >> how far I can get using the allocator as intended. >> >> The registers are A, X, and Y. They belong to a hierarchy of Register Classes: >> Acc class: A >> Index class: X, Y >> General class: Acc, Index >> >> There's a fictional instruction "SUBidx Acc, Index". It subtracts an Index from >> an Acc. Bear with me. >> >> What if we want to subtract A from X? >> We can't say "SUBidx X, A". The operands must be Acc, Index. >> >> The DAG looks something like this: >> >> %vreg2 = COPY %X ; %vreg2 is Index >> %vreg0 = COPY %A ; %vreg0 is Acc >> SUBidx %vreg2, %vreg0 >> >> To make this work, we first have to swap A and X. >> For technical reasons, we can't use Y as temporary storage. Instead, we must >> spill to the stack. >> >> When spilling a reg to the stack, the RA needs two virtual regs, one to store >> and one to load. Currently, the loaded reg always has the same class as the >> stored reg. Because of this limitation, I ran into a problem: the RA would try >> to assign two Accs to physical regs, but only one is available, and the RA >> would fail. >> >> Line 1290 of InlineSpiller.cpp, inside the function spillAroundUses, says: >> "// FIXME: Infer regclass from instruction alone" >> (As opposed to the current behavior of always using the same class as the reg >> being spilled.) >> The line refers to this very limitation. It was written in 2010, so we've known >> about the limitation since the beginning. >> >> My RA patch can be found in this commit: >> <https://github.com/beholdnec/llvm-m6502/commit/a4b492f63> >> >> This patch allows spills to use a more general reg class, so a spilled Acc can >> be reloaded as a General. >> >> Previously, the RA would fail because it would spill X, but it couldn't reload >> X into an Acc. >> >> Now, with the patch, the DAG compiles into this: >> >> # store X to stack >> STX_stack [0] >> # transfer A to X >> TAX >> # load A from stack >> LDA_stack [0] >> # now that A and X are swapped, we can subtract! >> SUBidx A, X >> >> It works, but I'm not sure if I used the best approach. If you're familiar with >> the RA, feel free to look at the patch and leave feedback. >> >> - Nolan >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >