hameeza ahmed via llvm-dev
2017-Jun-21 00:21 UTC
[llvm-dev] AVX 512 Assembly Code Generation
Hello, I am using llvm on my core i7 laptop which has no avx support. my goal is to generate avx512 code (loop vectorization) for Knight landing/skylake . my .c code is; int a[256], b[256], c[256]; foo () { int i; for (i=0; i<256; i++) { a[i] = b[i] + c[i]; } } i first generated its .ll file via clang clang -S -emit-llvm test.c -o test.ll then i optimized it; opt -S -O3 test.ll -o test_o3.ll then i used llc for code generation llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s here is my generated code; .text .file "filer_o3.ll" .globl foo .p2align 4, 0x90 .type foo, at function foo: # @foo .cfi_startproc # BB#0: # %min.iters.checked pushq %rbp .Ltmp0: .cfi_def_cfa_offset 16 .Ltmp1: .cfi_offset %rbp, -16 movq %rsp, %rbp .Ltmp2: .cfi_def_cfa_register %rbp movq $-1024, %rax # imm = 0xFC00 .p2align 4, 0x90 .*LBB0_1: # %vector.body* * # =>This Inner Loop Header: Depth=1* * vmovdqa32 c+1024(%rax), %xmm0* * vmovdqa32 c+1040(%rax), %xmm1* * vpaddd b+1024(%rax), %xmm0, %xmm0* * vpaddd b+1040(%rax), %xmm1, %xmm1* * vmovdqa32 %xmm0, a+1024(%rax)* * vmovdqa32 %xmm1, a+1040(%rax)* * vmovdqa32 c+1056(%rax), %xmm0* * vmovdqa32 c+1072(%rax), %xmm1* * vpaddd b+1056(%rax), %xmm0, %xmm0* * vpaddd b+1072(%rax), %xmm1, %xmm1* * vmovdqa32 %xmm0, a+1056(%rax)* * vmovdqa32 %xmm1, a+1072(%rax)* * addq $64, %rax* * jne .LBB0_1* # BB#2: # %middle.block popq %rbp retq .Lfunc_end0: .size foo, .Lfunc_end0-foo .cfi_endproc .type b, at object # @b .comm b,1024,16 .type c, at object # @c .comm c,1024,16 .type a, at object # @a .comm a,1024,16 .ident "clang version 3.9.0 (tags/RELEASE_390/final)" .section ".note.GNU-stack","", at progbits in the generated code although there is use of vmov... instructions but no zmm register? only xmm registers. Can you please specify where i am wrong. i have tried it several times by different parameters but always get xmm registers. Thank You -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170621/c02540dd/attachment.html>
On 06/20/2017 07:21 PM, hameeza ahmed via llvm-dev wrote:> Hello, > > I am using llvm on my core i7 laptop which has no avx support. > > my goal is to generate avx512 code (loop vectorization) for Knight > landing/skylake . > > > > my .c code is; > > int a[256], b[256], c[256]; > foo () {void foo() {> int i; > for (i=0; i<256; i++) { > a[i] = b[i] + c[i]; > } > } > > i first generated its .ll file via clang > > clang -S -emit-llvm test.c -o test.llYour problem is that vectorization happens in opt, not in llc. Telling llc that you wish to enable AVX-512 is not sufficient. In fact, if you run: clang -S -o - test.c -march=knl -O3 you'll see AVX-512 vectorized code. If you want to run opt separately to generate the vectorized code, you need to tell it that it is targeting the KNL. Clang can add the necessary function attributes to do this. You'll also want to run clang with optimizations enabled so that it will generate IR that is intended to be optimized, even if you then disable the actual optimizaitons to get the pre-opt IR. clang -S -emit-llvm test.c -march=knl -O3 -mllvm -disable-llvm-optzns then running opt as you have it below should produce the desired result. Finally, I recommend upgrading to Clang/LLVM 4.0. It produces better AVX-512 code than 3.9 did. -Hal> > then i optimized it; > > opt -S -O3 test.ll -o test_o3.ll > > then i used llc for code generation > > llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s > > llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s > > > here is my generated code; > > > > .text > .file"filer_o3.ll" > .globlfoo > .p2align4, 0x90 > .typefoo, at function > foo: # @foo > .cfi_startproc > # BB#0: # %min.iters.checked > pushq%rbp > .Ltmp0: > .cfi_def_cfa_offset 16 > .Ltmp1: > .cfi_offset %rbp, -16 > movq%rsp, %rbp > .Ltmp2: > .cfi_def_cfa_register %rbp > movq$-1024, %rax # imm = 0xFC00 > .p2align4, 0x90 > .*LBB0_1: # %vector.body* > * # =>This Inner Loop Header: Depth=1* > *vmovdqa32c+1024(%rax), %xmm0* > *vmovdqa32c+1040(%rax), %xmm1* > *vpadddb+1024(%rax), %xmm0, %xmm0* > *vpadddb+1040(%rax), %xmm1, %xmm1* > *vmovdqa32%xmm0, a+1024(%rax)* > *vmovdqa32%xmm1, a+1040(%rax)* > *vmovdqa32c+1056(%rax), %xmm0* > *vmovdqa32c+1072(%rax), %xmm1* > *vpadddb+1056(%rax), %xmm0, %xmm0* > *vpadddb+1072(%rax), %xmm1, %xmm1* > *vmovdqa32%xmm0, a+1056(%rax)* > *vmovdqa32%xmm1, a+1072(%rax)* > *addq$64, %rax* > *jne.LBB0_1* > # BB#2: # %middle.block > popq%rbp > retq > .Lfunc_end0: > .sizefoo, .Lfunc_end0-foo > .cfi_endproc > > .typeb, at object # @b > .commb,1024,16 > .typec, at object # @c > .commc,1024,16 > .typea, at object # @a > .comma,1024,16 > > .ident"clang version 3.9.0 (tags/RELEASE_390/final)" > .section".note.GNU-stack","", at progbits > > in the generated code although there is use of vmov... instructions > but no zmm register? only xmm registers. > > > Can you please specify where i am wrong. i have tried it several times > by different parameters but always get xmm registers. > > > Thank You > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170620/d312ed7e/attachment.html>