thr3ads.net - llvm dev - [llvm-dev] Question about target instruction optimization [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Michael Stellmann via llvm-dev

2018-Jul-25 17:42 UTC

[llvm-dev] Question about target instruction optimization

This is a question about optimizing the code generation in a (new) Z80 
backend:

The CPU has a couple of 8 bit physical registers, e.g. H, L, D and E, 
which are overlaid in 16 bit register pairs named HL and DE.

It has also a native instruction to load a 16 bit immediate value into a 
16 bit register pair (HL or DE), e.g.:

     LD HL,<imm16>

Now when having a sequence of loading two 16 bit register pairs with the 
*same* immediate value, the simple approach is:

     LD HL,<imm16>
     LD DE,<imm16>

However, the second line can be shortened (in opcode bytes and cycles) 
to load the overlaid 8 bit registers of HL (H and L) into the overlaid 8 
bit registers of DE (D and E), so the desired result is:

     ; optimized version: saves 1 byte and 2 cycles
     LD D,H    (sets the high 8 bits of DE from the high 8 bits of HL)
     LD E,L    (same for lower 8 bits)


Another example: If reg pair DE needs to be loaded with imm16 = 0, and 
another physical(!) register is known to be 0 (from a previous immediate 
load, directly or indirectly) - assuming that L = 0 (H might be 
something else) - the following code:

     LD DE,0x0000

should become:

     LD D,L
     LD E,L

I would expect that this needs to be done in a peephole optimizer pass, 
as during the lowering process, the physical registers are not yet assigned.

Now my question:
1. Is that correct (peephole instead of lowering)? Should the lowering 
always emit the generic, not always optimal "LD DE,<imm16>". Or
should
the lowering process always split the 16 bit immediate load in two 8 bit 
immediate loads (via two new virtual 8 bit registers), which would be 
eliminated later automatically?
2. And if peephole is the better choice, which of these is recommended: 
the SSA-based Machine Code Optimizations, or the Late Machine Code 
Optimizations? Both places in the LLVM code generator docs say "To be 
written", so I don't really know which one to choose... or even writing
a custom pass?

...and more importantly, how would I check if any physical register 
contains a specific fixed value at a certain point (in which case the 
optimization can be done) - or not.

Michael

Bruce Hoult via llvm-dev

2018-Jul-25 21:33 UTC

head link

[llvm-dev] Question about target instruction optimization

This is so far down the list of problems you'll have (and the difference so
trivial to program size and speed) that I think you should ignore it until
you have a working compiler.

As far as two registers getting the same value, that should be picked up by
common subexpression elimination in the optimiser anyway.

You might want to consider having a pseudo-instruction for LD
{BC,DE,HL,IX,IY},{BC,DE,HL,IX,IY} (all combinations are valid except those
containing two of HL,IX,IY). You could expand this very late in the
assembler, or during legalisation.


On Wed, Jul 25, 2018 at 10:42 AM, Michael Stellmann via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> This is a question about optimizing the code generation in a (new) Z80
> backend:
>
> The CPU has a couple of 8 bit physical registers, e.g. H, L, D and E,
> which are overlaid in 16 bit register pairs named HL and DE.
>
> It has also a native instruction to load a 16 bit immediate value into a
> 16 bit register pair (HL or DE), e.g.:
>
>     LD HL,<imm16>
>
> Now when having a sequence of loading two 16 bit register pairs with the
> *same* immediate value, the simple approach is:
>
>     LD HL,<imm16>
>     LD DE,<imm16>
>
> However, the second line can be shortened (in opcode bytes and cycles) to
> load the overlaid 8 bit registers of HL (H and L) into the overlaid 8 bit
> registers of DE (D and E), so the desired result is:
>
>     ; optimized version: saves 1 byte and 2 cycles
>     LD D,H    (sets the high 8 bits of DE from the high 8 bits of HL)
>     LD E,L    (same for lower 8 bits)
>
>
> Another example: If reg pair DE needs to be loaded with imm16 = 0, and
> another physical(!) register is known to be 0 (from a previous immediate
> load, directly or indirectly) - assuming that L = 0 (H might be something
> else) - the following code:
>
>     LD DE,0x0000
>
> should become:
>
>     LD D,L
>     LD E,L
>
> I would expect that this needs to be done in a peephole optimizer pass, as
> during the lowering process, the physical registers are not yet assigned.
>
> Now my question:
> 1. Is that correct (peephole instead of lowering)? Should the lowering
> always emit the generic, not always optimal "LD
DE,<imm16>". Or should the
> lowering process always split the 16 bit immediate load in two 8 bit
> immediate loads (via two new virtual 8 bit registers), which would be
> eliminated later automatically?
> 2. And if peephole is the better choice, which of these is recommended:
> the SSA-based Machine Code Optimizations, or the Late Machine Code
> Optimizations? Both places in the LLVM code generator docs say "To be
> written", so I don't really know which one to choose... or even
writing a
> custom pass?
>
> ...and more importantly, how would I check if any physical register
> contains a specific fixed value at a certain point (in which case the
> optimization can be done) - or not.
>
> Michael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180725/97864604/attachment.html>

Michael Stellmann via llvm-dev

2018-Jul-25 22:21 UTC

head link

[llvm-dev] Question about target instruction optimization

Yes, such optimizations are something for the "last 20%" of the
project,
nice to have's.

As of now, I have yet to get a feeling of what LLVM can do on its own, 
depending on what it's from the instruction tables and where it needs 
help, and how much in other processing stages.
As this affects the way how the instruction info table will be set-up, I 
appreciate your suggestions very much!

Now that you mentioned using a the pseudo instruction for the possible 
16 bit LD command combinations:

Regarding the heavily overlapped register structure and the asymmetric 
instruction set of the Z80, would you recommend to try mapping more 
instructions in the form of generic pseudos that expand to multiple 
instructions during legalisation (leading to more custom lowering code), 
or try to map as many instructions and variations according to the 
allowed / limited operators as possible in a 1:1 way, leading to simpler 
lowering code (not sure if I am using the right words here)?

Thanks,
Michael


------------------------------------------------------------------------
*From:* Bruce Hoult
*Sent:* Wednesday, Jul 25, 2018 11:33 PM WEST
*To:* Michael Stellmann
*Cc:* LLVM Developers Mailing List
*Subject:* [llvm-dev] Question about target instruction optimization
> This is so far down the list of problems you'll have (and the 
> difference so trivial to program size and speed) that I think you 
> should ignore it until you have a working compiler.
>
> As far as two registers getting the same value, that should be picked 
> up by common subexpression elimination in the optimiser anyway.
>
> You might want to consider having a pseudo-instruction for LD 
> {BC,DE,HL,IX,IY},{BC,DE,HL,IX,IY} (all combinations are valid except 
> those containing two of HL,IX,IY). You could expand this very late in 
> the assembler, or during legalisation.
>
>
> On Wed, Jul 25, 2018 at 10:42 AM, Michael Stellmann via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     This is a question about optimizing the code generation in a (new)
>     Z80 backend:
>
>     The CPU has a couple of 8 bit physical registers, e.g. H, L, D and
>     E, which are overlaid in 16 bit register pairs named HL and DE.
>
>     It has also a native instruction to load a 16 bit immediate value
>     into a 16 bit register pair (HL or DE), e.g.:
>
>         LD HL,<imm16>
>
>     Now when having a sequence of loading two 16 bit register pairs
>     with the *same* immediate value, the simple approach is:
>
>         LD HL,<imm16>
>         LD DE,<imm16>
>
>     However, the second line can be shortened (in opcode bytes and
>     cycles) to load the overlaid 8 bit registers of HL (H and L) into
>     the overlaid 8 bit registers of DE (D and E), so the desired
>     result is:
>
>         ; optimized version: saves 1 byte and 2 cycles
>         LD D,H    (sets the high 8 bits of DE from the high 8 bits of HL)
>         LD E,L    (same for lower 8 bits)
>
>
>     Another example: If reg pair DE needs to be loaded with imm16 = 0,
>     and another physical(!) register is known to be 0 (from a previous
>     immediate load, directly or indirectly) - assuming that L = 0 (H
>     might be something else) - the following code:
>
>         LD DE,0x0000
>
>     should become:
>
>         LD D,L
>         LD E,L
>
>     I would expect that this needs to be done in a peephole optimizer
>     pass, as during the lowering process, the physical registers are
>     not yet assigned.
>
>     Now my question:
>     1. Is that correct (peephole instead of lowering)? Should the
>     lowering always emit the generic, not always optimal "LD
>     DE,<imm16>". Or should the lowering process always split the
16
>     bit immediate load in two 8 bit immediate loads (via two new
>     virtual 8 bit registers), which would be eliminated later
>     automatically?
>     2. And if peephole is the better choice, which of these is
>     recommended: the SSA-based Machine Code Optimizations, or the Late
>     Machine Code Optimizations? Both places in the LLVM code generator
>     docs say "To be written", so I don't really know which
one to
>     choose... or even writing a custom pass?
>
>     ...and more importantly, how would I check if any physical
>     register contains a specific fixed value at a certain point (in
>     which case the optimization can be done) - or not.
>
>     Michael
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180726/6aa9c7fe/attachment.html>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Jul 2018 - Question about target instruction optimization

[llvm-dev] Question about target instruction optimization

[llvm-dev] Question about target instruction optimization

[llvm-dev] Question about target instruction optimization

Seemingly Similar Threads