thr3ads.net - llvm dev - [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Kosov Pavel via llvm-dev

2019-Nov-21 10:00 UTC

[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )

Hello!

I noticed that in some cases clang generates sequence of AND+TST instructions:



For example:

       AND          x3, x2, x1

         TST            x2, x1



I think these instructions should be merged to one:

         ANDS       x3, x2, x1



( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>,
<Xm> -
https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf
)



Is it missing optimization or there could be some negative effect from such
merge?





Best regards

Pavel



PS: Code sample (though it may be significantly reduced):

(clang -target aarch64 sample.c -S -O2 -o sample.S )



========================================================================
#define NULL ((void*)0)



typedef struct {

    unsigned long * res_in;

    unsigned long * proc;



    } fd_set_bits;



fd_set_bits *gv_fds;

int g_max_i;

int LOOP_ITERS_COUNT;

unsigned DEF_MASK;



__attribute__((noinline)) int do_test(const int max_iters_count,

                                        const unsigned long in,

                                        const unsigned long out,

                                        const unsigned long ex,

                                        const unsigned long bit_init_val,

                                        const unsigned long mask) {

    int retval = 0;

    for(int k =0 ; k < max_iters_count; k++)

    {

        fd_set_bits *fds = gv_fds;



        for(int j = 0; j < LOOP_ITERS_COUNT; ++j)

        {

          if (in) {

            retval++;

            fds->proc = NULL;

          }



          if (mask & DEF_MASK) {

            fds->proc = NULL;

          }

        }

    }

         return retval;

}

========================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191121/46f52d5a/attachment.html>

Eli Friedman via llvm-dev

2019-Nov-21 20:55 UTC

head link

[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )

That transform is legal; it's a missed optimization.

-Eli

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Kosov
Pavel via llvm-dev
Sent: Thursday, November 21, 2019 2:00 AM
To: llvm-dev at lists.llvm.org
Subject: [EXT] [llvm-dev] [ARM] Peephole optimization ( instructions tst + add )


Hello!

I noticed that in some cases clang generates sequence of AND+TST instructions:



For example:

       AND          x3, x2, x1

         TST            x2, x1



I think these instructions should be merged to one:

         ANDS       x3, x2, x1



( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>,
<Xm> -
https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf
)



Is it missing optimization or there could be some negative effect from such
merge?





Best regards

Pavel



PS: Code sample (though it may be significantly reduced):

(clang -target aarch64 sample.c -S -O2 -o sample.S )



========================================================================
#define NULL ((void*)0)



typedef struct {

    unsigned long * res_in;

    unsigned long * proc;



    } fd_set_bits;



fd_set_bits *gv_fds;

int g_max_i;

int LOOP_ITERS_COUNT;

unsigned DEF_MASK;



__attribute__((noinline)) int do_test(const int max_iters_count,

                                        const unsigned long in,

                                        const unsigned long out,

                                        const unsigned long ex,

                                        const unsigned long bit_init_val,

                                        const unsigned long mask) {

    int retval = 0;

    for(int k =0 ; k < max_iters_count; k++)

    {

        fd_set_bits *fds = gv_fds;



        for(int j = 0; j < LOOP_ITERS_COUNT; ++j)

        {

          if (in) {

            retval++;

            fds->proc = NULL;

          }



          if (mask & DEF_MASK) {

            fds->proc = NULL;

          }

        }

    }

         return retval;

}

========================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191121/6599e953/attachment.html>

Kosov Pavel via llvm-dev

2019-Nov-22 11:08 UTC

head link

[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )

Ok, thank you, I will implement it then.
As far as I see this optimization should be done in AArch64LoadStoreOptimizer,
is it right?


From: Eli Friedman [mailto:efriedma at quicinc.com]
Sent: Thursday, November 21, 2019 11:55 PM
To: Kosov Pavel <kosov.pavel at huawei.com>; LLVM Dev <llvm-dev at
lists.llvm.org>
Subject: RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add )

That transform is legal; it's a missed optimization.

-Eli

From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> On Behalf Of Kosov Pavel via llvm-dev
Sent: Thursday, November 21, 2019 2:00 AM
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: [EXT] [llvm-dev] [ARM] Peephole optimization ( instructions tst + add )


Hello!

I noticed that in some cases clang generates sequence of AND+TST instructions:



For example:

       AND          x3, x2, x1

         TST            x2, x1



I think these instructions should be merged to one:

         ANDS       x3, x2, x1



( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>,
<Xm> -
https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf
)



Is it missing optimization or there could be some negative effect from such
merge?





Best regards

Pavel



PS: Code sample (though it may be significantly reduced):

(clang -target aarch64 sample.c -S -O2 -o sample.S )



========================================================================
#define NULL ((void*)0)



typedef struct {

    unsigned long * res_in;

    unsigned long * proc;



    } fd_set_bits;



fd_set_bits *gv_fds;

int g_max_i;

int LOOP_ITERS_COUNT;

unsigned DEF_MASK;



__attribute__((noinline)) int do_test(const int max_iters_count,

                                        const unsigned long in,

                                        const unsigned long out,

                                        const unsigned long ex,

                                        const unsigned long bit_init_val,

                                        const unsigned long mask) {

    int retval = 0;

    for(int k =0 ; k < max_iters_count; k++)

    {

        fd_set_bits *fds = gv_fds;



        for(int j = 0; j < LOOP_ITERS_COUNT; ++j)

        {

          if (in) {

            retval++;

            fds->proc = NULL;

          }



          if (mask & DEF_MASK) {

            fds->proc = NULL;

          }

        }

    }

         return retval;

}

========================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191122/d289feb9/attachment-0001.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Nov 2019 - [ARM] Peephole optimization ( instructions tst + add )

[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )

[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )

[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )

Maybe Matching Threads