thr3ads.net - llvm dev - [llvm-dev] AVX 512 Assembly Code Generation [Jun 2017]

If this information is useful, please help other people find it:
Share via:

hameeza ahmed via llvm-dev

2017-Jun-21 00:21 UTC

[llvm-dev] AVX 512 Assembly Code Generation

Hello,

I am using llvm  on my core i7 laptop which has no avx support.

my goal is to generate avx512 code (loop vectorization) for  Knight
landing/skylake .



my .c code is;

int a[256], b[256], c[256];
foo () {
int i;
for (i=0; i<256; i++) {
a[i] = b[i] + c[i];
}
}

i first generated its .ll file via clang

clang -S  -emit-llvm test.c -o test.ll

then i optimized it;

opt -S -O3 test.ll -o test_o3.ll

then i used llc for code generation

llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s

llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s


here is my generated code;



.text
.file "filer_o3.ll"
.globl foo
.p2align 4, 0x90
.type foo, at function
foo:                                    # @foo
.cfi_startproc
# BB#0:                                 # %min.iters.checked
pushq %rbp
.Ltmp0:
.cfi_def_cfa_offset 16
.Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp2:
.cfi_def_cfa_register %rbp
movq $-1024, %rax            # imm = 0xFC00
.p2align 4, 0x90
.*LBB0_1:                                # %vector.body*
*                                        # =>This Inner Loop Header:
Depth=1*
* vmovdqa32 c+1024(%rax), %xmm0*
* vmovdqa32 c+1040(%rax), %xmm1*
* vpaddd b+1024(%rax), %xmm0, %xmm0*
* vpaddd b+1040(%rax), %xmm1, %xmm1*
* vmovdqa32 %xmm0, a+1024(%rax)*
* vmovdqa32 %xmm1, a+1040(%rax)*
* vmovdqa32 c+1056(%rax), %xmm0*
* vmovdqa32 c+1072(%rax), %xmm1*
* vpaddd b+1056(%rax), %xmm0, %xmm0*
* vpaddd b+1072(%rax), %xmm1, %xmm1*
* vmovdqa32 %xmm0, a+1056(%rax)*
* vmovdqa32 %xmm1, a+1072(%rax)*
* addq $64, %rax*
* jne .LBB0_1*
# BB#2:                                 # %middle.block
popq %rbp
retq
.Lfunc_end0:
.size foo, .Lfunc_end0-foo
.cfi_endproc

.type b, at object               # @b
.comm b,1024,16
.type c, at object               # @c
.comm c,1024,16
.type a, at object               # @a
.comm a,1024,16

.ident "clang version 3.9.0 (tags/RELEASE_390/final)"
.section ".note.GNU-stack","", at progbits

in the generated code although there is use of vmov... instructions but no
zmm register? only xmm registers.


Can you please specify where i am wrong. i have tried it several times by
different parameters but always get xmm registers.


Thank You
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170621/c02540dd/attachment.html>

Hal Finkel via llvm-dev

2017-Jun-21 00:39 UTC

head link

[llvm-dev] AVX 512 Assembly Code Generation

On 06/20/2017 07:21 PM, hameeza ahmed via llvm-dev
wrote:> Hello,
>
> I am using llvm  on my core i7 laptop which has no avx support.
>
> my goal is to generate avx512 code (loop vectorization) for  Knight 
> landing/skylake .
>
>
>
> my .c code is;
>
> int a[256], b[256], c[256];
> foo () {
void foo() {
> int i;
> for (i=0; i<256; i++) {
> a[i] = b[i] + c[i];
> }
> }
>
> i first generated its .ll file via clang
>
> clang -S  -emit-llvm test.c -o test.ll
Your problem is that vectorization happens in opt, not in llc. Telling 
llc that you wish to enable AVX-512 is not sufficient. In fact, if you run:

clang -S -o - test.c -march=knl -O3

you'll see AVX-512 vectorized code. If you want to run opt separately to 
generate the vectorized code, you need to tell it that it is targeting 
the KNL. Clang can add the necessary function attributes to do this. 
You'll also want to run clang with optimizations enabled so that it will 
generate IR that is intended to be optimized, even if you then disable 
the actual optimizaitons to get the pre-opt IR.

clang  -S -emit-llvm test.c -march=knl -O3 -mllvm -disable-llvm-optzns

then running opt as you have it below should produce the desired result.

Finally, I recommend upgrading to Clang/LLVM 4.0. It produces better 
AVX-512 code than 3.9 did.

  -Hal
>
> then i optimized it;
>
> opt -S -O3 test.ll -o test_o3.ll
>
> then i used llc for code generation
>
> llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s
>
> llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s
>
>
> here is my generated code;
>
>
>
> .text
> .file"filer_o3.ll"
> .globlfoo
> .p2align4, 0x90
> .typefoo, at function
> foo:                                    # @foo
> .cfi_startproc
> # BB#0:                                 # %min.iters.checked
> pushq%rbp
> .Ltmp0:
> .cfi_def_cfa_offset 16
> .Ltmp1:
> .cfi_offset %rbp, -16
> movq%rsp, %rbp
> .Ltmp2:
> .cfi_def_cfa_register %rbp
> movq$-1024, %rax            # imm = 0xFC00
> .p2align4, 0x90
> .*LBB0_1:        # %vector.body*
> *        # =>This Inner Loop Header: Depth=1*
> *vmovdqa32c+1024(%rax), %xmm0*
> *vmovdqa32c+1040(%rax), %xmm1*
> *vpadddb+1024(%rax), %xmm0, %xmm0*
> *vpadddb+1040(%rax), %xmm1, %xmm1*
> *vmovdqa32%xmm0, a+1024(%rax)*
> *vmovdqa32%xmm1, a+1040(%rax)*
> *vmovdqa32c+1056(%rax), %xmm0*
> *vmovdqa32c+1072(%rax), %xmm1*
> *vpadddb+1056(%rax), %xmm0, %xmm0*
> *vpadddb+1072(%rax), %xmm1, %xmm1*
> *vmovdqa32%xmm0, a+1056(%rax)*
> *vmovdqa32%xmm1, a+1072(%rax)*
> *addq$64, %rax*
> *jne.LBB0_1*
> # BB#2:                                 # %middle.block
> popq%rbp
> retq
> .Lfunc_end0:
> .sizefoo, .Lfunc_end0-foo
> .cfi_endproc
>
> .typeb, at object               # @b
> .commb,1024,16
> .typec, at object               # @c
> .commc,1024,16
> .typea, at object               # @a
> .comma,1024,16
>
> .ident"clang version 3.9.0 (tags/RELEASE_390/final)"
> .section".note.GNU-stack","", at progbits
>
> in the generated code although there is use of vmov... instructions 
> but no zmm register? only xmm registers.
>
>
> Can you please specify where i am wrong. i have tried it several times 
> by different parameters but always get xmm registers.
>
>
> Thank You
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170620/d312ed7e/attachment.html>

llvm dev - Jun 2017 - AVX 512 Assembly Code Generation

[llvm-dev] AVX 512 Assembly Code Generation

[llvm-dev] AVX 512 Assembly Code Generation