Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am new to clang / llvm so I may not be invoking the tools correctly but given that –fvectorize and –fslp-vectorize are on by default at 3.4 I would have thought that if the code is AVX-able by icc that clang / llvm would be able to do the same… The code is basic matrix multiplication written a number of ways (with and without transposition and such) as a performance measurement exercise. The environments I’ve tried are: Intel Ivy Bridge-EX (pre-release hardware) running Red Hat Linux 6.5 Generic desktop with Haswell processor running Fedora 18 If you have a moment to point me to the appropriate docs I’m happy to go learn on my own – but I’ve now googled for the better part of 3 days trying to find what invocation parameters I should use to get the desired use of packed AVX instructions and the YMM registers and I just can’t seem to get it right. I’m also grateful if you just send the correct invocation. I’ve actually started digging through the code as well - but since I am starting from zero it could take me a while to find an answer this way - just didn’t want you to think I’m not willing to try to find the answer on my own :-) Thank you, Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131211/eeb7c61c/attachment.html>
It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f .cfi_startproc ## BB#0: ## %entry pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp xorl %eax, %eax .align 4, 0x90 LBB0_1: ## %vector.body ## =>This Inner Loop Header: Depth=1 vmovups (%rdx,%rax,4), %ymm0 vmulps (%rsi,%rax,4), %ymm0, %ymm0 vaddps (%rdi,%rax,4), %ymm0, %ymm0 vmovups %ymm0, (%rdi,%rax,4) addq $8, %rax cmpq $256, %rax ## imm = 0x100 jne LBB0_1 ## BB#2: ## %for.end popq %rbp vzeroupper ret .cfi_endproc $ cat test.c void f(float * restrict A, float * restrict B, float * restrict C) { for (int i = 0; i < 256; ++i) A[i] += C[i] *B[i]; } $ clang -v clang version 3.5 (trunk 195376) (llvm/trunk 195372) Best, Arnold On Dec 11, 2013, at 2:59 PM, Ken Gahagan <ken.gahagan at gmail.com> wrote:> Hello - > > I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am new to clang / llvm so I may not be invoking the tools correctly but given that –fvectorize and –fslp-vectorize are on by default at 3.4 I would have thought that if the code is AVX-able by icc that clang / llvm would be able to do the same… The code is basic matrix multiplication written a number of ways (with and without transposition and such) as a performance measurement exercise. > > The environments I’ve tried are: > Intel Ivy Bridge-EX (pre-release hardware) running Red Hat Linux 6.5 > Generic desktop with Haswell processor running Fedora 18 > > If you have a moment to point me to the appropriate docs I’m happy to go learn on my own – but I’ve now googled for the better part of 3 days trying to find what invocation parameters I should use to get the desired use of packed AVX instructions and the YMM registers and I just can’t seem to get it right. I’m also grateful if you just send the correct invocation. > > I’ve actually started digging through the code as well - but since I am starting from zero it could take me a while to find an answer this way - just didn’t want you to think I’m not willing to try to find the answer on my own :-) > > Thank you, > Ken > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
For the list of cpus and features for the current target, you can try: $ llvm-as < /dev/null | llc -mcpu=help On Thu, Dec 12, 2013 at 8:57 AM, Arnold Schwaighofer < aschwaighofer at apple.com> wrote:> It probably does not pick the right processor architecture. > > You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge > and “clang -march=core-avx2” or “clang -mavx2" for haswell. > > $ clang -march=core-avx2 -O3 -S -o - test.c > > .section __TEXT,__text,regular,pure_instructions > .globl _f > .align 4, 0x90 > _f: ## @f > .cfi_startproc > ## BB#0: ## %entry > pushq %rbp > Ltmp2: > .cfi_def_cfa_offset 16 > Ltmp3: > .cfi_offset %rbp, -16 > movq %rsp, %rbp > Ltmp4: > .cfi_def_cfa_register %rbp > xorl %eax, %eax > .align 4, 0x90 > LBB0_1: ## %vector.body > ## =>This Inner Loop Header: > Depth=1 > vmovups (%rdx,%rax,4), %ymm0 > vmulps (%rsi,%rax,4), %ymm0, %ymm0 > vaddps (%rdi,%rax,4), %ymm0, %ymm0 > vmovups %ymm0, (%rdi,%rax,4) > addq $8, %rax > cmpq $256, %rax ## imm = 0x100 > jne LBB0_1 > ## BB#2: ## %for.end > popq %rbp > vzeroupper > ret > .cfi_endproc > $ cat test.c > void f(float * restrict A, float * restrict B, float * restrict C) { > for (int i = 0; i < 256; ++i) > A[i] += C[i] *B[i]; > } > > $ clang -v > clang version 3.5 (trunk 195376) (llvm/trunk 195372) > > > Best, > Arnold > > On Dec 11, 2013, at 2:59 PM, Ken Gahagan <ken.gahagan at gmail.com> wrote: > > > Hello - > > > > I found this post on the llvm blog: > http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me > think that clang / llvm are capable of generating AVX with packed > instructions as well as utilizing the full width of the YMM registers… I > have an environment where icc generates these instructions (vmulps %ymm1, > %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such > instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am > new to clang / llvm so I may not be invoking the tools correctly but given > that –fvectorize and –fslp-vectorize are on by default at 3.4 I would have > thought that if the code is AVX-able by icc that clang / llvm would be able > to do the same… The code is basic matrix multiplication written a number of > ways (with and without transposition and such) as a performance measurement > exercise. > > > > The environments I’ve tried are: > > Intel Ivy Bridge-EX (pre-release hardware) running Red Hat Linux 6.5 > > Generic desktop with Haswell processor running Fedora 18 > > > > If you have a moment to point me to the appropriate docs I’m happy to go > learn on my own – but I’ve now googled for the better part of 3 days trying > to find what invocation parameters I should use to get the desired use of > packed AVX instructions and the YMM registers and I just can’t seem to get > it right. I’m also grateful if you just send the correct invocation. > > > > I’ve actually started digging through the code as well - but since I am > starting from zero it could take me a while to find an answer this way - > just didn’t want you to think I’m not willing to try to find the answer on > my own :-) > > > > Thank you, > > Ken > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131212/e153ccd6/attachment.html>