thr3ads.net - llvm dev - [llvm-dev] assembly code for array iteration generated by llvm is much slower than gcc [Apr 2020]

If this information is useful, please help other people find it:
Share via:

Lori Yao Yu via llvm-dev

2020-Apr-26 03:37 UTC

[llvm-dev] assembly code for array iteration generated by llvm is much slower than gcc

Hi all developers,

    I'm changing compiler from gcc to llvm on a RISCV target now. but I
found in some case the assembly code generated by llvm is much more than gcc. It
cause my program's performance about 40% decrease.

   The flowing is a simple test code. It shows the problem. We can see than gcc
prefer to use pointer to iterate the array,  but llvm perfere to use index to
iterate the array. So llvm generate more codes to calculate the memory address
of an array element from the index.

    My question is that if there's an option of llvm, to let it iterate
array by pointer. Or it’s a bug of llvm not resolved now ?



        Test C code:

Int func(int w1, int w2, int *b, int *c) {

   Int wstart = 0;

    Int i = 0;

   Int j = 0;

   Int sum = 0;

   In wend = 0;

   Int dst_idx = 0;

   Int dst_idx2 = 0;

   for (I = 0; I < w2; i++) {

        wstart = i * w1;

        wend = i / w1;

        sum = c[wstart];

        for (j = wstart + 1; j < wend; j++) {

                         sum += c[j * w2];

           sum += c[j * w1];

        }

       dst_idx = w1 * i + w2;

       dst_idx2 = w2 * i + w1;

       b[dst_idx] = sum;

       b[dst_idx2] = sum/2;

    }

}



Compile command:

riscv32-unkown-elf-g++ -nostartfiles -nostdlib -O2 -march=rv32imf -mabi=ilp32f
-fno-builtin -S perf.c -o perf.g++

clang++ -O2 –target=riscv32 -march=rv32img -mabi=ilp32f -nostdlib -fno-builtin
-S perf.c -o perf.lang



the gcc version is 7.2.0

the llvm version is 10.0.0



Assembly code of the loop generated by gcc and llvm:



[cid:image002.jpg at 01D61BBF.0A4250B0]

   Thank you all for your time and any help you can provide.

Lori


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200426/e8d2c45f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 117391 bytes
Desc: image002.jpg
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200426/e8d2c45f/attachment-0001.jpg>

Sam Elliott via llvm-dev

2020-Apr-27 13:36 UTC

head link

[llvm-dev] assembly code for array iteration generated by llvm is much slower than gcc

Hi,

Am I right in thinking that this is RISC-V assembly?

Please can you provide a testcase (a C file, or LLVM IR) that we can use to
diagnose this issue further? It would also be useful to know what architecture
(including extensions) and other compiler flags you are using.

We know that the assembly that LLVM generates for RISC-V is not always the most
efficient, and we're working on this issue at the moment. We would welcome
more testcases.

Sam
> On 26 Apr 2020, at 4:37 am, Lori Yao Yu via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> <image002.jpg>
--
Sam Elliott
Software Team Lead
Senior Software Developer - LLVM and OpenTitan
lowRISC CIC

Lori Yao Yu via llvm-dev

2020-Apr-28 03:00 UTC

head link

[llvm-dev] 回复: assembly code for array iteration generated by llvm is much slower than gcc

Hi Sam,

Yes, it is riscv assembly code.  The test code is show bellow. You can copy the
code to a c file named perf.c, then you can compile perf.c using the compile
command bellow.
We can see than gcc prefer to use pointer to iterate the array,  but llvm perfer
to use index to iterate the array.  So llvm generate more codes to calculate the
memory address of an array element from the index.

Test C code:

//perf.c

int func(int w1, int w2, int *b, int *c) {
   int wstart = 0;
   int i = 0;
   int j = 0;
   int sum = 0;
   int wend = 0;
   int dst_idx = 0;
   int dst_idx2 = 0;
   for (i = 0; i < w2; i++) {
        wstart = i * w1;
        wend = i / w1;
        sum = c[wstart];
        for (j = wstart + 1; j < wend; j++) {
           sum += c[j * w2];
           sum += c[j * w1];
        }
       dst_idx = w1 * i + w2;
       dst_idx2 = w2 * i + w1;
       b[dst_idx] = sum;
       b[dst_idx2] = sum/2;
    }
}

Compile command:
riscv32-unkown-elf-g++ -nostartfiles -nostdlib -O2 -march=rv32imf -mabi=ilp32f
-fno-builtin -S perf.c -o perf.g++
clang++ -O2 �Ctarget=riscv32 -march=rv32img -mabi=ilp32f -nostdlib -fno-builtin
-S perf.c -o perf.lang

the gcc version is 7.2.0
the llvm version is 10.0.0

thanks!~
Lori

-----�ʼ�ԭ��-----
������: Sam Elliott <selliott at lowrisc.org> 
����ʱ��: 2020��4��27�� 21:36
�ռ���: Lori Yao Yu <loriyu at panyi.ai>
����: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
����: Re: [llvm-dev] assembly code for array iteration generated by llvm is much
slower than gcc

Hi,

Am I right in thinking that this is RISC-V assembly?

Please can you provide a testcase (a C file, or LLVM IR) that we can use to
diagnose this issue further? It would also be useful to know what architecture
(including extensions) and other compiler flags you are using.

We know that the assembly that LLVM generates for RISC-V is not always the most
efficient, and we're working on this issue at the moment. We would welcome
more testcases.

Sam
> On 26 Apr 2020, at 4:37 am, Lori Yao Yu via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> <image002.jpg>
--
Sam Elliott
Software Team Lead
Senior Software Developer - LLVM and OpenTitan lowRISC CIC

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Apr 2020 - assembly code for array iteration generated by llvm is much slower than gcc

[llvm-dev] assembly code for array iteration generated by llvm is much slower than gcc

[llvm-dev] assembly code for array iteration generated by llvm is much slower than gcc

[llvm-dev] 回复: assembly code for array iteration generated by llvm is much slower than gcc

Possibly Parallel Threads