thr3ads.net - similar to: "[LLVMdev] AVX support"

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] AVX support"

[LLVMdev] Proposal to improve vzeroupper optimization strategy

2013 Sep 19

[LLVMdev] Proposal to improve vzeroupper optimization strategy

Great idea. I reported on this problem before and glad to see someone trying to tackle this. cheers. ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on behalf of Gao, Yunzhong [yunzhong_gao at playstation.sony.com] Sent: Thursday, September 19, 2013 11:53 AM To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Proposal to improve

[LLVMdev] Proposal to improve vzeroupper optimization strategy

2013 Sep 19

[LLVMdev] Proposal to improve vzeroupper optimization strategy

Hi all, I would like to make a proposal about changing the optimization strategy regarding when to insert a vzeroupper instruction in the x86 backend. Current implementation: vzeroupper is inserted to any functions that use AVX instructions. The insertion points are: 1) before a call instruction; 2) before a return instruction; Rationale: vzeroupper is an AVX instruction; it is inserted to

[LLVMdev] AVX calling convention?

2013 Sep 05

[LLVMdev] AVX calling convention?

I am tracking down an x86-64 code generation problem that has to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually. I tracked this down to the following. The call site looks like vmovdqa 24064(%rsp), %ymm0 vmovdqa

[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions

2013 Dec 19

[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions

Hi all, I would like to find out whether anyone will find it useful to add an x86- specific calling convention for reducing emission of vzeroupper instructions. Current implementation: vzeroupper is inserted to any functions that use AVX instructions. The insertion points are: 1) before a call instruction; 2) before a return instruction; Background: vzeroupper is an AVX instruction; it is

[LLVMdev] use AVX automatically if present

2012 May 24

[LLVMdev] use AVX automatically if present

I wonder why AVX is not used automatically if available at the host machine. In contrast to that, SSE41 instructions (like pmulld) are automatically used if the host machine supports SSE41. E.g. $ cat avx.ll define void @_fun1(<8 x float>*, <8 x float>*) { _L1: %x = load <8 x float>* %0 %y = load <8 x float>* %1 %z = fadd <8 x float> %x, %y store

[LLVMdev] use AVX automatically if present

2012 May 24

[LLVMdev] use AVX automatically if present

On Thu, 24 May 2012, Pan, Wei wrote: > Very likely AVX is not enabled in your llc. This feature was enabled > just recently (late of April). I forgot to mention that I am using recent LLVM-3.1 and in principle my llc knows about avx as I have shown in the second example. But avx does not seem to be used by default. On Thu, 24 May 2012, Henning Thielemann wrote: > $ llc -o - -mattr

[LLVMdev] Proposal to improve vzeroupper optimization strategy

2013 Sep 20

[LLVMdev] Proposal to improve vzeroupper optimization strategy

Hi Eli, Thanks for the feedback. Please see below. - Gao. From: Eli Friedman [mailto:eli.friedman at gmail.com] Sent: Thursday, September 19, 2013 12:31 PM To: Gao, Yunzhong Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Proposal to improve vzeroupper optimization strategy > This is essentially equivalent to "don't insert vzeroupper anywhere", as > far as I can tell. (The

[LLVMdev] Proposal to improve vzeroupper optimization strategy

2013 Sep 20

[LLVMdev] Proposal to improve vzeroupper optimization strategy

On Fri, Sep 20, 2013 at 2:58 PM, Gao, Yunzhong < yunzhong_gao at playstation.sony.com> wrote: > Hi Eli,**** > > Thanks for the feedback. Please see below. > - Gao.**** > > ** ** > > From: Eli Friedman [mailto:eli.friedman at gmail.com] **** > > Sent: Thursday, September 19, 2013 12:31 PM**** > > To: Gao, Yunzhong**** > > Cc: llvmdev at

[LLVMdev] Proposal to improve vzeroupper optimization strategy

2013 Sep 19

[LLVMdev] Proposal to improve vzeroupper optimization strategy

On Thu, Sep 19, 2013 at 11:53 AM, Gao, Yunzhong < yunzhong_gao at playstation.sony.com> wrote: > Hi all, > > I would like to make a proposal about changing the optimization strategy > regarding when to insert a vzeroupper instruction in the x86 backend. > > Current implementation: > vzeroupper is inserted to any functions that use AVX instructions. The > insertion

[LLVMdev] Proposal to improve vzeroupper optimization strategy

2013 Sep 21

[LLVMdev] Proposal to improve vzeroupper optimization strategy

Is it realistic to worry about performance of vectorized code that does PIC calls into a non-vectorized sin() in libc? Maybe there's an example other than sin() that is more realistic? -- Sean Silva On Fri, Sep 20, 2013 at 7:11 PM, Eli Friedman <eli.friedman at gmail.com>wrote: > On Fri, Sep 20, 2013 at 2:58 PM, Gao, Yunzhong < > yunzhong_gao at playstation.sony.com>

[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions

2013 Dec 19

[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions

On 19 December 2013 14:31, Gao, Yunzhong <yunzhong_gao at playstation.sony.com> wrote: > Hi all, > > > > I would like to find out whether anyone will find it useful to add an x86- > > specific calling convention for reducing emission of vzeroupper > instructions. > > > > Current implementation: > > vzeroupper is inserted to any functions that use AVX

[LLVMdev] VEX prefixes for JIT in llvm 3.5

2014 Sep 17

[LLVMdev] VEX prefixes for JIT in llvm 3.5

Hi Jim, Thanks for a very quick reply! That indeed does the trick! Presumably the default has changed in 3.5 to be a "generic" CPU instead of the native one? If that's the case I wonder why: especially when JITting it really only makes sense to target the actual CPU - unless I'm missing something? :) Thanks again, Matt On Wed, Sep 17, 2014 at 2:16 PM, Jim Grosbach

[LLVMdev] VEX prefixes for JIT in llvm 3.5

2014 Sep 17

[LLVMdev] VEX prefixes for JIT in llvm 3.5

Hi guys, I just upgraded our JIT system to use llvm 3.5 and noticed one big change in our generated code: we don't see any non-destructive VEX prefix instructions being emitted any more (vmulsd xmm0, xmm1, blah) etc. It's long been on my list of things to investigate anyway as I noticed llvm didn't emit VZEROUPPER calls either, so I supposed it might not be a bad thing to disable

[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions

2013 Dec 24

[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions

> In general, I'm not too keen on adding more calling conventions unless > there's a really powerful need for one from an ABI perspective. This > sounds more like an optimization than an ABI need. I think that is the case. > What's more, I > worry (a little bit) about confusion that could be caused with the > __vectorcall calling convention (which we do not

[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions

2013 Dec 19

[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions

On Thu, Dec 19, 2013 at 12:14 PM, Rafael Espíndola < rafael.espindola at gmail.com> wrote: > On 19 December 2013 14:31, Gao, Yunzhong > <yunzhong_gao at playstation.sony.com> wrote: > > Hi all, > > > > > > > > I would like to find out whether anyone will find it useful to add an > x86- > > > > specific calling convention for reducing

[LLVMdev] AVX code gen

2013 Dec 12

[LLVMdev] AVX code gen

It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f

[LLVMdev] VEX prefixes for JIT in llvm 3.5

2014 Sep 17

[LLVMdev] VEX prefixes for JIT in llvm 3.5

Great stuff; thanks both! I'm also looking to turn my MCJIT conversion spike into our main use case. The only thing I'm missing is the ability to get a post-linked copy of the generated assembly. In JIT I used JITEventListener's NotifyFunctionEmitted and used a MCDisassembler to disassemble the stream (with my own custom annotators), and redirected the output to the relevant place

[LLVMdev] use AVX automatically if present

2012 May 24

[LLVMdev] use AVX automatically if present

Henning, I believe the code that is supposed to do this is in: lib/Target/X86/X86Subtarget.cpp in X86Subtarget::AutoDetectSubtargetFeatures() Is there a bug in that function? -Hal On Thu, 24 May 2012 23:56:48 +0200 (CEST) Henning Thielemann <llvm at henning-thielemann.de> wrote: > > On Thu, 24 May 2012, Pan, Wei wrote: > > > Very likely AVX is not enabled in your llc.

[LLVMdev] AVX code gen

2013 Dec 11

[LLVMdev] AVX code gen

Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such

[LLVMdev] VEX prefixes for JIT in llvm 3.5

2014 Sep 18

[LLVMdev] VEX prefixes for JIT in llvm 3.5

Hi Matt, Philip, You could get the data you want by recording the addresses returned by the allocateCodeSection and allocateDataSection methods on your RTDyldMemoryManager, then disassembling those sections after you've called resolveRelocations. That's a little unsatisfying though. For one thing, unless you very carefully maintain the association with the original object via

similar to: [LLVMdev] AVX support