hameeza ahmed via llvm-dev
2017-Jun-21 00:21 UTC
[llvm-dev] AVX 512 Assembly Code Generation
Hello,
I am using llvm on my core i7 laptop which has no avx support.
my goal is to generate avx512 code (loop vectorization) for Knight
landing/skylake .
my .c code is;
int a[256], b[256], c[256];
foo () {
int i;
for (i=0; i<256; i++) {
a[i] = b[i] + c[i];
}
}
i first generated its .ll file via clang
clang -S -emit-llvm test.c -o test.ll
then i optimized it;
opt -S -O3 test.ll -o test_o3.ll
then i used llc for code generation
llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s
llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s
here is my generated code;
.text
.file "filer_o3.ll"
.globl foo
.p2align 4, 0x90
.type foo, at function
foo: # @foo
.cfi_startproc
# BB#0: # %min.iters.checked
pushq %rbp
.Ltmp0:
.cfi_def_cfa_offset 16
.Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp2:
.cfi_def_cfa_register %rbp
movq $-1024, %rax # imm = 0xFC00
.p2align 4, 0x90
.*LBB0_1: # %vector.body*
* # =>This Inner Loop Header:
Depth=1*
* vmovdqa32 c+1024(%rax), %xmm0*
* vmovdqa32 c+1040(%rax), %xmm1*
* vpaddd b+1024(%rax), %xmm0, %xmm0*
* vpaddd b+1040(%rax), %xmm1, %xmm1*
* vmovdqa32 %xmm0, a+1024(%rax)*
* vmovdqa32 %xmm1, a+1040(%rax)*
* vmovdqa32 c+1056(%rax), %xmm0*
* vmovdqa32 c+1072(%rax), %xmm1*
* vpaddd b+1056(%rax), %xmm0, %xmm0*
* vpaddd b+1072(%rax), %xmm1, %xmm1*
* vmovdqa32 %xmm0, a+1056(%rax)*
* vmovdqa32 %xmm1, a+1072(%rax)*
* addq $64, %rax*
* jne .LBB0_1*
# BB#2: # %middle.block
popq %rbp
retq
.Lfunc_end0:
.size foo, .Lfunc_end0-foo
.cfi_endproc
.type b, at object # @b
.comm b,1024,16
.type c, at object # @c
.comm c,1024,16
.type a, at object # @a
.comm a,1024,16
.ident "clang version 3.9.0 (tags/RELEASE_390/final)"
.section ".note.GNU-stack","", at progbits
in the generated code although there is use of vmov... instructions but no
zmm register? only xmm registers.
Can you please specify where i am wrong. i have tried it several times by
different parameters but always get xmm registers.
Thank You
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170621/c02540dd/attachment.html>
On 06/20/2017 07:21 PM, hameeza ahmed via llvm-dev wrote:> Hello, > > I am using llvm on my core i7 laptop which has no avx support. > > my goal is to generate avx512 code (loop vectorization) for Knight > landing/skylake . > > > > my .c code is; > > int a[256], b[256], c[256]; > foo () {void foo() {> int i; > for (i=0; i<256; i++) { > a[i] = b[i] + c[i]; > } > } > > i first generated its .ll file via clang > > clang -S -emit-llvm test.c -o test.llYour problem is that vectorization happens in opt, not in llc. Telling llc that you wish to enable AVX-512 is not sufficient. In fact, if you run: clang -S -o - test.c -march=knl -O3 you'll see AVX-512 vectorized code. If you want to run opt separately to generate the vectorized code, you need to tell it that it is targeting the KNL. Clang can add the necessary function attributes to do this. You'll also want to run clang with optimizations enabled so that it will generate IR that is intended to be optimized, even if you then disable the actual optimizaitons to get the pre-opt IR. clang -S -emit-llvm test.c -march=knl -O3 -mllvm -disable-llvm-optzns then running opt as you have it below should produce the desired result. Finally, I recommend upgrading to Clang/LLVM 4.0. It produces better AVX-512 code than 3.9 did. -Hal> > then i optimized it; > > opt -S -O3 test.ll -o test_o3.ll > > then i used llc for code generation > > llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s > > llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s > > > here is my generated code; > > > > .text > .file"filer_o3.ll" > .globlfoo > .p2align4, 0x90 > .typefoo, at function > foo: # @foo > .cfi_startproc > # BB#0: # %min.iters.checked > pushq%rbp > .Ltmp0: > .cfi_def_cfa_offset 16 > .Ltmp1: > .cfi_offset %rbp, -16 > movq%rsp, %rbp > .Ltmp2: > .cfi_def_cfa_register %rbp > movq$-1024, %rax # imm = 0xFC00 > .p2align4, 0x90 > .*LBB0_1: # %vector.body* > * # =>This Inner Loop Header: Depth=1* > *vmovdqa32c+1024(%rax), %xmm0* > *vmovdqa32c+1040(%rax), %xmm1* > *vpadddb+1024(%rax), %xmm0, %xmm0* > *vpadddb+1040(%rax), %xmm1, %xmm1* > *vmovdqa32%xmm0, a+1024(%rax)* > *vmovdqa32%xmm1, a+1040(%rax)* > *vmovdqa32c+1056(%rax), %xmm0* > *vmovdqa32c+1072(%rax), %xmm1* > *vpadddb+1056(%rax), %xmm0, %xmm0* > *vpadddb+1072(%rax), %xmm1, %xmm1* > *vmovdqa32%xmm0, a+1056(%rax)* > *vmovdqa32%xmm1, a+1072(%rax)* > *addq$64, %rax* > *jne.LBB0_1* > # BB#2: # %middle.block > popq%rbp > retq > .Lfunc_end0: > .sizefoo, .Lfunc_end0-foo > .cfi_endproc > > .typeb, at object # @b > .commb,1024,16 > .typec, at object # @c > .commc,1024,16 > .typea, at object # @a > .comma,1024,16 > > .ident"clang version 3.9.0 (tags/RELEASE_390/final)" > .section".note.GNU-stack","", at progbits > > in the generated code although there is use of vmov... instructions > but no zmm register? only xmm registers. > > > Can you please specify where i am wrong. i have tried it several times > by different parameters but always get xmm registers. > > > Thank You > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170620/d312ed7e/attachment.html>