thr3ads.net - llvm dev - [LLVMdev] Pseudo load and store instructions for AArch64 [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Sergey Dmitrouk

2014-Aug-22 12:44 UTC

[LLVMdev] Pseudo load and store instructions for AArch64

Hi Renato,
> > I'm trying to add pseudo 64-bit load and store instructions for
AArch64, which
> > should have latencies set to "1" while being otherwise
exactly the same as
> > normal load and store instructions.
> 
> Can I ask why would you need that?
This is the only way I found to stop Machine Instruction Scheduler from
reordering load and store instructions.  I asked on this specific topic
several times before, but no one answered.  The following approaches
didn't work in this case:

 - different kind of chaining;
 - gluing;
 - single pseudo instruction for load and store as it needs temporary
   register, but such pseudos are expanded after RA.

It's needed to make code of inlined memcpy() more efficient.
> Looks like there's specific knowledge about the types and instructions
> codes in switches midway through that is not recognizing your new
> pseudos.
>
> One way to find them out is to grep for the instruction codes yours is
> similar to, and then see if you need to add your pseudos
Thanks, I'll try that.

Regards,
Sergey

Renato Golin

2014-Aug-22 13:01 UTC

head link

[LLVMdev] Pseudo load and store instructions for AArch64

On 22 August 2014 13:44, Sergey Dmitrouk <sdmitrouk at accesssoftek.com>
wrote:> This is the only way I found to stop Machine Instruction Scheduler from
> reordering load and store instructions.
I see. Saleem (cc'd) worked on a similar thing for ARM's movh/movt for
Windows, which also didn't like the reordering. Maybe he can help you.

cheers,
--renato

Renato Golin

2014-Aug-26 22:45 UTC

head link

[LLVMdev] Pseudo load and store instructions for AArch64

On 22 August 2014 13:44, Sergey Dmitrouk <sdmitrouk at accesssoftek.com>
wrote:> It's needed to make code of inlined memcpy() more efficient.
Hi Sergey,

I was thinking about this and I remember seeing a similar problem to
yours in ARM. Something like:

  ldr r1, [sp, #20]
  ldr r2, [sp, #24]
  ldr r3, [sp, #28]

being reordered to:

  ldr r2, [sp, #24]
  ldr r1, [sp, #20]
  ldr r3, [sp, #28]

and having a big hit on performance.

The ARM back-end has the ARMLoadStoreOptimizer class, which deals with
similar problems and it's generally passed at the right time for
fixing loads and stores, maybe you could add a similar thing to
AArch64?

That'd have the benefit of not polluting the table-gen files, and
could be turned on via a flag, on demand, that only after heavily
tested, could be turned on by default.

James (cc'd) implemented the optimizer, maybe he could hint on some of
the issues for your particular case.

cheers,
--renato

Saleem Abdulrasool

2014-Aug-27 02:29 UTC

head link

[LLVMdev] Pseudo load and store instructions for AArch64

On Fri, Aug 22, 2014 at 6:01 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 22 August 2014 13:44, Sergey Dmitrouk <sdmitrouk at
accesssoftek.com>
> wrote:
> > This is the only way I found to stop Machine Instruction Scheduler
from
> > reordering load and store instructions.
>
> I see. Saleem (cc'd) worked on a similar thing for ARM's movh/movt
for
> Windows, which also didn't like the reordering. Maybe he can help you.
>
Sorry, Ive been a bit busy at work :-(.

For Windows on ARM, the movw/movt relocations need to be contiguous.  In
order to accommodate that, we generate a bundle (similar to the VLIW
concept) to treat the pair as a single scheduling entity.

Although, that approach could work, it feels like updating the
LoadStoreOptimizer to deal with the particular case may be a cleaner
approach.

> cheers,
> --renato

-- 
Saleem Abdulrasool
compnerd (at) compnerd (dot) org
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140826/3ba3fba9/attachment.html>

James Molloy

2014-Aug-27 08:00 UTC

head link

[LLVMdev] Pseudo load and store instructions for AArch64

On 26/08/2014 23:45, "Renato Golin" <renato.golin at linaro.org>
wrote:
>James (cc'd) implemented the optimizer, maybe he could hint on some of
>the issues for your particular case.
>
I think you give me too much credit - I didn’t write that pass!

Cheers,

James


-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered
in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No:  2548782

Sergey Dmitrouk

2014-Aug-27 10:10 UTC

head link

[LLVMdev] Pseudo load and store instructions for AArch64

Hi Renato,
> I was thinking about this and I remember seeing a similar problem to
> yours in ARM. Something like:
> 
>   ldr r1, [sp, #20]
>   ldr r2, [sp, #24]
>   ldr r3, [sp, #28]
> 
> being reordered to:
> 
>   ldr r2, [sp, #24]
>   ldr r1, [sp, #20]
>   ldr r3, [sp, #28]
Well, it's a bit different.  What I'm trying to do is to turn

    ldp  x10, x11, [x9]     // load
    ldp x12, x9, [x9, #16]  // load
    stp  x10, x11, [x8]     // store
    mov  w0, wzr
    stp x12, x9, [x8, #16]  // store

into

    ldp  x10, x11, [x9]     // load
    stp  x10, x11, [x8]     // store
    ldp x12, x9, [x9, #16]  // load
    stp x12, x9, [x8, #16]  // store
    mov  w0, wzr

So "load" + "load" and "store" + "store"
are already fine, I need paired
operations to be properly interleaved and adjacent.  It should result
in better performance even though machine instruction scheduler thinks
differently.
> fixing loads and stores, maybe you could add a similar thing to
> AArch64?
AArch64LoadStoreOptimizer already exists, but I'll try to add
instruction reordering to it, I saw some code for moving instructions in
ARMLoadStoreOptimizer.  Saleem suggested something similar.
> That'd have the benefit of not polluting the table-gen files, and
> could be turned on via a flag, on demand, that only after heavily
> tested, could be turned on by default.
I'd prefer that as well.  Pseudo instructions was the last resort as I
was out of options.

Thanks for your help.

Regards,
Sergey

llvm dev - Aug 2014 - [LLVMdev] Pseudo load and store instructions for AArch64

[LLVMdev] Pseudo load and store instructions for AArch64

[LLVMdev] Pseudo load and store instructions for AArch64

[LLVMdev] Pseudo load and store instructions for AArch64

[LLVMdev] Pseudo load and store instructions for AArch64

[LLVMdev] Pseudo load and store instructions for AArch64

[LLVMdev] Pseudo load and store instructions for AArch64