Sergey Dmitrouk
2014-Aug-22 12:44 UTC
[LLVMdev] Pseudo load and store instructions for AArch64
Hi Renato,> > I'm trying to add pseudo 64-bit load and store instructions for AArch64, which > > should have latencies set to "1" while being otherwise exactly the same as > > normal load and store instructions. > > Can I ask why would you need that?This is the only way I found to stop Machine Instruction Scheduler from reordering load and store instructions. I asked on this specific topic several times before, but no one answered. The following approaches didn't work in this case: - different kind of chaining; - gluing; - single pseudo instruction for load and store as it needs temporary register, but such pseudos are expanded after RA. It's needed to make code of inlined memcpy() more efficient.> Looks like there's specific knowledge about the types and instructions > codes in switches midway through that is not recognizing your new > pseudos. > > One way to find them out is to grep for the instruction codes yours is > similar to, and then see if you need to add your pseudosThanks, I'll try that. Regards, Sergey
Renato Golin
2014-Aug-22 13:01 UTC
[LLVMdev] Pseudo load and store instructions for AArch64
On 22 August 2014 13:44, Sergey Dmitrouk <sdmitrouk at accesssoftek.com> wrote:> This is the only way I found to stop Machine Instruction Scheduler from > reordering load and store instructions.I see. Saleem (cc'd) worked on a similar thing for ARM's movh/movt for Windows, which also didn't like the reordering. Maybe he can help you. cheers, --renato
Renato Golin
2014-Aug-26 22:45 UTC
[LLVMdev] Pseudo load and store instructions for AArch64
On 22 August 2014 13:44, Sergey Dmitrouk <sdmitrouk at accesssoftek.com> wrote:> It's needed to make code of inlined memcpy() more efficient.Hi Sergey, I was thinking about this and I remember seeing a similar problem to yours in ARM. Something like: ldr r1, [sp, #20] ldr r2, [sp, #24] ldr r3, [sp, #28] being reordered to: ldr r2, [sp, #24] ldr r1, [sp, #20] ldr r3, [sp, #28] and having a big hit on performance. The ARM back-end has the ARMLoadStoreOptimizer class, which deals with similar problems and it's generally passed at the right time for fixing loads and stores, maybe you could add a similar thing to AArch64? That'd have the benefit of not polluting the table-gen files, and could be turned on via a flag, on demand, that only after heavily tested, could be turned on by default. James (cc'd) implemented the optimizer, maybe he could hint on some of the issues for your particular case. cheers, --renato
Saleem Abdulrasool
2014-Aug-27 02:29 UTC
[LLVMdev] Pseudo load and store instructions for AArch64
On Fri, Aug 22, 2014 at 6:01 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 22 August 2014 13:44, Sergey Dmitrouk <sdmitrouk at accesssoftek.com> > wrote: > > This is the only way I found to stop Machine Instruction Scheduler from > > reordering load and store instructions. > > I see. Saleem (cc'd) worked on a similar thing for ARM's movh/movt for > Windows, which also didn't like the reordering. Maybe he can help you. >Sorry, Ive been a bit busy at work :-(. For Windows on ARM, the movw/movt relocations need to be contiguous. In order to accommodate that, we generate a bundle (similar to the VLIW concept) to treat the pair as a single scheduling entity. Although, that approach could work, it feels like updating the LoadStoreOptimizer to deal with the particular case may be a cleaner approach.> cheers, > --renato-- Saleem Abdulrasool compnerd (at) compnerd (dot) org -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140826/3ba3fba9/attachment.html>
James Molloy
2014-Aug-27 08:00 UTC
[LLVMdev] Pseudo load and store instructions for AArch64
On 26/08/2014 23:45, "Renato Golin" <renato.golin at linaro.org> wrote:>James (cc'd) implemented the optimizer, maybe he could hint on some of >the issues for your particular case. >I think you give me too much credit - I didn’t write that pass! Cheers, James -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
Sergey Dmitrouk
2014-Aug-27 10:10 UTC
[LLVMdev] Pseudo load and store instructions for AArch64
Hi Renato,> I was thinking about this and I remember seeing a similar problem to > yours in ARM. Something like: > > ldr r1, [sp, #20] > ldr r2, [sp, #24] > ldr r3, [sp, #28] > > being reordered to: > > ldr r2, [sp, #24] > ldr r1, [sp, #20] > ldr r3, [sp, #28]Well, it's a bit different. What I'm trying to do is to turn ldp x10, x11, [x9] // load ldp x12, x9, [x9, #16] // load stp x10, x11, [x8] // store mov w0, wzr stp x12, x9, [x8, #16] // store into ldp x10, x11, [x9] // load stp x10, x11, [x8] // store ldp x12, x9, [x9, #16] // load stp x12, x9, [x8, #16] // store mov w0, wzr So "load" + "load" and "store" + "store" are already fine, I need paired operations to be properly interleaved and adjacent. It should result in better performance even though machine instruction scheduler thinks differently.> fixing loads and stores, maybe you could add a similar thing to > AArch64?AArch64LoadStoreOptimizer already exists, but I'll try to add instruction reordering to it, I saw some code for moving instructions in ARMLoadStoreOptimizer. Saleem suggested something similar.> That'd have the benefit of not polluting the table-gen files, and > could be turned on via a flag, on demand, that only after heavily > tested, could be turned on by default.I'd prefer that as well. Pseudo instructions was the last resort as I was out of options. Thanks for your help. Regards, Sergey