Kosov Pavel via llvm-dev
2019-Nov-22 11:08 UTC
[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )
Ok, thank you, I will implement it then. As far as I see this optimization should be done in AArch64LoadStoreOptimizer, is it right? From: Eli Friedman [mailto:efriedma at quicinc.com] Sent: Thursday, November 21, 2019 11:55 PM To: Kosov Pavel <kosov.pavel at huawei.com>; LLVM Dev <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) That transform is legal; it's a missed optimization. -Eli From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Kosov Pavel via llvm-dev Sent: Thursday, November 21, 2019 2:00 AM To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: [EXT] [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) Hello! I noticed that in some cases clang generates sequence of AND+TST instructions: For example: AND x3, x2, x1 TST x2, x1 I think these instructions should be merged to one: ANDS x3, x2, x1 ( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>, <Xm> - https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf ) Is it missing optimization or there could be some negative effect from such merge? Best regards Pavel PS: Code sample (though it may be significantly reduced): (clang -target aarch64 sample.c -S -O2 -o sample.S ) ======================================================================== #define NULL ((void*)0) typedef struct { unsigned long * res_in; unsigned long * proc; } fd_set_bits; fd_set_bits *gv_fds; int g_max_i; int LOOP_ITERS_COUNT; unsigned DEF_MASK; __attribute__((noinline)) int do_test(const int max_iters_count, const unsigned long in, const unsigned long out, const unsigned long ex, const unsigned long bit_init_val, const unsigned long mask) { int retval = 0; for(int k =0 ; k < max_iters_count; k++) { fd_set_bits *fds = gv_fds; for(int j = 0; j < LOOP_ITERS_COUNT; ++j) { if (in) { retval++; fds->proc = NULL; } if (mask & DEF_MASK) { fds->proc = NULL; } } } return retval; } ======================================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191122/d289feb9/attachment-0001.html>
Eli Friedman via llvm-dev
2019-Nov-22 18:53 UTC
[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )
You probably want to do this some time before register allocation, so you don't have to worry about physical register definitions. Maybe take a look at what ARM does in ARMBaseInstrInfo::optimizeCompareInstr ? -Eli From: Kosov Pavel <kosov.pavel at huawei.com> Sent: Friday, November 22, 2019 3:09 AM To: Eli Friedman <efriedma at quicinc.com>; LLVM Dev <llvm-dev at lists.llvm.org> Subject: [EXT] RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) Ok, thank you, I will implement it then. As far as I see this optimization should be done in AArch64LoadStoreOptimizer, is it right? From: Eli Friedman [mailto:efriedma at quicinc.com] Sent: Thursday, November 21, 2019 11:55 PM To: Kosov Pavel <kosov.pavel at huawei.com<mailto:kosov.pavel at huawei.com>>; LLVM Dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) That transform is legal; it's a missed optimization. -Eli From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Kosov Pavel via llvm-dev Sent: Thursday, November 21, 2019 2:00 AM To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: [EXT] [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) Hello! I noticed that in some cases clang generates sequence of AND+TST instructions: For example: AND x3, x2, x1 TST x2, x1 I think these instructions should be merged to one: ANDS x3, x2, x1 ( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>, <Xm> - https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf ) Is it missing optimization or there could be some negative effect from such merge? Best regards Pavel PS: Code sample (though it may be significantly reduced): (clang -target aarch64 sample.c -S -O2 -o sample.S ) ======================================================================== #define NULL ((void*)0) typedef struct { unsigned long * res_in; unsigned long * proc; } fd_set_bits; fd_set_bits *gv_fds; int g_max_i; int LOOP_ITERS_COUNT; unsigned DEF_MASK; __attribute__((noinline)) int do_test(const int max_iters_count, const unsigned long in, const unsigned long out, const unsigned long ex, const unsigned long bit_init_val, const unsigned long mask) { int retval = 0; for(int k =0 ; k < max_iters_count; k++) { fd_set_bits *fds = gv_fds; for(int j = 0; j < LOOP_ITERS_COUNT; ++j) { if (in) { retval++; fds->proc = NULL; } if (mask & DEF_MASK) { fds->proc = NULL; } } } return retval; } ======================================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191122/27be6781/attachment-0001.html>
Kosov Pavel via llvm-dev
2019-Nov-26 06:51 UTC
[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )
Thank you! I took a look at this method (ARMBaseInstrInfo::optimizeCompareInstr) and how it is used. So,if I understood correctly, I need to add new method to TargetInstrInfo (similar to optimizeCompareInstr - e.g. optimizeAddInstr) and implement it in AArch64InstrInfo. This method should be able to transform code like this: %47:gpr64 = ANDXrr %46:gpr64, %32:gpr64 %48:gpr64common = ORRXrr killed %47:gpr64, %28:gpr64common %49:gpr64 = ANDSXrr %46:gpr64, %32:gpr64, implicit-def $nzcv to this form: %47:gpr64 = ANDSXrr %46:gpr64, %32:gpr64, implicit-def $nzcv %48:gpr64common = ORRXrr killed %47:gpr64, %28:gpr64common Is everything correct? From: Eli Friedman [mailto:efriedma at quicinc.com] Sent: Friday, November 22, 2019 9:53 PM To: Kosov Pavel <kosov.pavel at huawei.com>; LLVM Dev <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) You probably want to do this some time before register allocation, so you don't have to worry about physical register definitions. Maybe take a look at what ARM does in ARMBaseInstrInfo::optimizeCompareInstr ? -Eli From: Kosov Pavel <kosov.pavel at huawei.com<mailto:kosov.pavel at huawei.com>> Sent: Friday, November 22, 2019 3:09 AM To: Eli Friedman <efriedma at quicinc.com<mailto:efriedma at quicinc.com>>; LLVM Dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: [EXT] RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) Ok, thank you, I will implement it then. As far as I see this optimization should be done in AArch64LoadStoreOptimizer, is it right? From: Eli Friedman [mailto:efriedma at quicinc.com] Sent: Thursday, November 21, 2019 11:55 PM To: Kosov Pavel <kosov.pavel at huawei.com<mailto:kosov.pavel at huawei.com>>; LLVM Dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) That transform is legal; it's a missed optimization. -Eli From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Kosov Pavel via llvm-dev Sent: Thursday, November 21, 2019 2:00 AM To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: [EXT] [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) Hello! I noticed that in some cases clang generates sequence of AND+TST instructions: For example: AND x3, x2, x1 TST x2, x1 I think these instructions should be merged to one: ANDS x3, x2, x1 ( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>, <Xm> - https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf ) Is it missing optimization or there could be some negative effect from such merge? Best regards Pavel PS: Code sample (though it may be significantly reduced): (clang -target aarch64 sample.c -S -O2 -o sample.S ) ======================================================================== #define NULL ((void*)0) typedef struct { unsigned long * res_in; unsigned long * proc; } fd_set_bits; fd_set_bits *gv_fds; int g_max_i; int LOOP_ITERS_COUNT; unsigned DEF_MASK; __attribute__((noinline)) int do_test(const int max_iters_count, const unsigned long in, const unsigned long out, const unsigned long ex, const unsigned long bit_init_val, const unsigned long mask) { int retval = 0; for(int k =0 ; k < max_iters_count; k++) { fd_set_bits *fds = gv_fds; for(int j = 0; j < LOOP_ITERS_COUNT; ++j) { if (in) { retval++; fds->proc = NULL; } if (mask & DEF_MASK) { fds->proc = NULL; } } } return retval; } ======================================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191126/5bc949a9/attachment.html>