similar to: [LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions"

2013 Dec 19
0
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On 19 December 2013 14:31, Gao, Yunzhong <yunzhong_gao at playstation.sony.com> wrote: > Hi all, > > > > I would like to find out whether anyone will find it useful to add an x86- > > specific calling convention for reducing emission of vzeroupper > instructions. > > > > Current implementation: > > vzeroupper is inserted to any functions that use AVX
2013 Dec 24
2
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> In general, I'm not too keen on adding more calling conventions unless > there's a really powerful need for one from an ABI perspective. This > sounds more like an optimization than an ABI need. I think that is the case. > What's more, I > worry (a little bit) about confusion that could be caused with the > __vectorcall calling convention (which we do not
2013 Dec 19
2
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On Thu, Dec 19, 2013 at 12:14 PM, Rafael Espíndola < rafael.espindola at gmail.com> wrote: > On 19 December 2013 14:31, Gao, Yunzhong > <yunzhong_gao at playstation.sony.com> wrote: > > Hi all, > > > > > > > > I would like to find out whether anyone will find it useful to add an > x86- > > > > specific calling convention for reducing
2013 Sep 19
5
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Hi all, I would like to make a proposal about changing the optimization strategy regarding when to insert a vzeroupper instruction in the x86 backend. Current implementation: vzeroupper is inserted to any functions that use AVX instructions. The insertion points are: 1) before a call instruction; 2) before a return instruction; Rationale: vzeroupper is an AVX instruction; it is inserted to
2013 Sep 19
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Great idea. I reported on this problem before and glad to see someone trying to tackle this. cheers. ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on behalf of Gao, Yunzhong [yunzhong_gao at playstation.sony.com] Sent: Thursday, September 19, 2013 11:53 AM To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Proposal to improve
2013 Sep 20
3
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Hi Eli, Thanks for the feedback. Please see below. - Gao. From: Eli Friedman [mailto:eli.friedman at gmail.com] Sent: Thursday, September 19, 2013 12:31 PM To: Gao, Yunzhong Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Proposal to improve vzeroupper optimization strategy > This is essentially equivalent to "don't insert vzeroupper anywhere", as > far as I can tell. (The
2013 Sep 20
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
On Fri, Sep 20, 2013 at 2:58 PM, Gao, Yunzhong < yunzhong_gao at playstation.sony.com> wrote: > Hi Eli,**** > > Thanks for the feedback. Please see below. > - Gao.**** > > ** ** > > From: Eli Friedman [mailto:eli.friedman at gmail.com] **** > > Sent: Thursday, September 19, 2013 12:31 PM**** > > To: Gao, Yunzhong**** > > Cc: llvmdev at
2013 Sep 21
1
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Is it realistic to worry about performance of vectorized code that does PIC calls into a non-vectorized sin() in libc? Maybe there's an example other than sin() that is more realistic? -- Sean Silva On Fri, Sep 20, 2013 at 7:11 PM, Eli Friedman <eli.friedman at gmail.com>wrote: > On Fri, Sep 20, 2013 at 2:58 PM, Gao, Yunzhong < > yunzhong_gao at playstation.sony.com>
2013 Sep 19
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
On Thu, Sep 19, 2013 at 11:53 AM, Gao, Yunzhong < yunzhong_gao at playstation.sony.com> wrote: > Hi all, > > I would like to make a proposal about changing the optimization strategy > regarding when to insert a vzeroupper instruction in the x86 backend. > > Current implementation: > vzeroupper is inserted to any functions that use AVX instructions. The > insertion
2013 Dec 19
0
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> Maybe a target-specific attribute instead? It would still apply to all CCs, > but would never be dropped. That would work too, yes. I proposed metadata because it looks like it can be dropped, but that is not a big issue. I would be OK with an attribute too if that is more convenient or we want to make sure it is kept. Cheers, Rafael
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2014 Sep 17
2
[LLVMdev] VEX prefixes for JIT in llvm 3.5
Hi guys, I just upgraded our JIT system to use llvm 3.5 and noticed one big change in our generated code: we don't see any non-destructive VEX prefix instructions being emitted any more (vmulsd xmm0, xmm1, blah) etc. It's long been on my list of things to investigate anyway as I noticed llvm didn't emit VZEROUPPER calls either, so I supposed it might not be a bad thing to disable
2012 May 24
4
[LLVMdev] use AVX automatically if present
I wonder why AVX is not used automatically if available at the host machine. In contrast to that, SSE41 instructions (like pmulld) are automatically used if the host machine supports SSE41. E.g. $ cat avx.ll define void @_fun1(<8 x float>*, <8 x float>*) { _L1: %x = load <8 x float>* %0 %y = load <8 x float>* %1 %z = fadd <8 x float> %x, %y store
2012 May 24
0
[LLVMdev] use AVX automatically if present
On Thu, 24 May 2012, Pan, Wei wrote: > Very likely AVX is not enabled in your llc. This feature was enabled > just recently (late of April). I forgot to mention that I am using recent LLVM-3.1 and in principle my llc knows about avx as I have shown in the second example. But avx does not seem to be used by default. On Thu, 24 May 2012, Henning Thielemann wrote: > $ llc -o - -mattr
2014 Sep 17
3
[LLVMdev] VEX prefixes for JIT in llvm 3.5
Hi Jim, Thanks for a very quick reply! That indeed does the trick! Presumably the default has changed in 3.5 to be a "generic" CPU instead of the native one? If that's the case I wonder why: especially when JITting it really only makes sense to target the actual CPU - unless I'm missing something? :) Thanks again, Matt On Wed, Sep 17, 2014 at 2:16 PM, Jim Grosbach
2012 Nov 07
1
[LLVMdev] AVX support
We have been using LLVM 3.1 to support JITing of AVX. From dumping the MC generating by the MCJIT I noticed it always emits 'VZEROUPPER' to clear the high 128 bit before calling another function. In some cases I know the function called either only use AVX or does not use SSE. I will like to inform the backend it is safe not to emit that instruction. Have not been able to figure out how
2013 Sep 05
1
[LLVMdev] AVX calling convention?
I am tracking down an x86-64 code generation problem that has to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually. I tracked this down to the following. The call site looks like vmovdqa 24064(%rsp), %ymm0 vmovdqa
2014 Sep 17
2
[LLVMdev] VEX prefixes for JIT in llvm 3.5
Great stuff; thanks both! I'm also looking to turn my MCJIT conversion spike into our main use case. The only thing I'm missing is the ability to get a post-linked copy of the generated assembly. In JIT I used JITEventListener's NotifyFunctionEmitted and used a MCDisassembler to disassemble the stream (with my own custom annotators), and redirected the output to the relevant place
2011 Aug 25
0
[LLVMdev] Trouble using the MCJIT: "Target does not support MC emission" error
Hi Matt, I am unsure about MCJIT, but I guess the problem is the same. Just like when invoking llc, you need to pass the information to use AVX (llc -mattr=+avx). I guess the corresponding code should look like this: llvm::EngineBuilder engineBuilder(module); engineBuilder.setErrorStr(&eeError); engineBuilder.setEngineKind(llvm::EngineKind::JIT);