similar to: [LLVMdev] Proposal to improve vzeroupper optimization strategy

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Proposal to improve vzeroupper optimization strategy"

2013 Sep 19
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Great idea. I reported on this problem before and glad to see someone trying to tackle this. cheers. ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on behalf of Gao, Yunzhong [yunzhong_gao at playstation.sony.com] Sent: Thursday, September 19, 2013 11:53 AM To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Proposal to improve
2013 Sep 20
3
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Hi Eli, Thanks for the feedback. Please see below. - Gao. From: Eli Friedman [mailto:eli.friedman at gmail.com] Sent: Thursday, September 19, 2013 12:31 PM To: Gao, Yunzhong Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Proposal to improve vzeroupper optimization strategy > This is essentially equivalent to "don't insert vzeroupper anywhere", as > far as I can tell. (The
2013 Dec 19
4
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
Hi all, I would like to find out whether anyone will find it useful to add an x86- specific calling convention for reducing emission of vzeroupper instructions. Current implementation: vzeroupper is inserted to any functions that use AVX instructions. The insertion points are: 1) before a call instruction; 2) before a return instruction; Background: vzeroupper is an AVX instruction; it is
2013 Sep 20
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
On Fri, Sep 20, 2013 at 2:58 PM, Gao, Yunzhong < yunzhong_gao at playstation.sony.com> wrote: > Hi Eli,**** > > Thanks for the feedback. Please see below. > - Gao.**** > > ** ** > > From: Eli Friedman [mailto:eli.friedman at gmail.com] **** > > Sent: Thursday, September 19, 2013 12:31 PM**** > > To: Gao, Yunzhong**** > > Cc: llvmdev at
2013 Sep 21
1
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Is it realistic to worry about performance of vectorized code that does PIC calls into a non-vectorized sin() in libc? Maybe there's an example other than sin() that is more realistic? -- Sean Silva On Fri, Sep 20, 2013 at 7:11 PM, Eli Friedman <eli.friedman at gmail.com>wrote: > On Fri, Sep 20, 2013 at 2:58 PM, Gao, Yunzhong < > yunzhong_gao at playstation.sony.com>
2013 Dec 19
0
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On 19 December 2013 14:31, Gao, Yunzhong <yunzhong_gao at playstation.sony.com> wrote: > Hi all, > > > > I would like to find out whether anyone will find it useful to add an x86- > > specific calling convention for reducing emission of vzeroupper > instructions. > > > > Current implementation: > > vzeroupper is inserted to any functions that use AVX
2013 Sep 19
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
On Thu, Sep 19, 2013 at 11:53 AM, Gao, Yunzhong < yunzhong_gao at playstation.sony.com> wrote: > Hi all, > > I would like to make a proposal about changing the optimization strategy > regarding when to insert a vzeroupper instruction in the x86 backend. > > Current implementation: > vzeroupper is inserted to any functions that use AVX instructions. The > insertion
2013 Dec 19
2
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On Thu, Dec 19, 2013 at 12:14 PM, Rafael Espíndola < rafael.espindola at gmail.com> wrote: > On 19 December 2013 14:31, Gao, Yunzhong > <yunzhong_gao at playstation.sony.com> wrote: > > Hi all, > > > > > > > > I would like to find out whether anyone will find it useful to add an > x86- > > > > specific calling convention for reducing
2013 Dec 24
2
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> In general, I'm not too keen on adding more calling conventions unless > there's a really powerful need for one from an ABI perspective. This > sounds more like an optimization than an ABI need. I think that is the case. > What's more, I > worry (a little bit) about confusion that could be caused with the > __vectorcall calling convention (which we do not
2012 Nov 07
1
[LLVMdev] AVX support
We have been using LLVM 3.1 to support JITing of AVX. From dumping the MC generating by the MCJIT I noticed it always emits 'VZEROUPPER' to clear the high 128 bit before calling another function. In some cases I know the function called either only use AVX or does not use SSE. I will like to inform the backend it is safe not to emit that instruction. Have not been able to figure out how
2013 Dec 19
0
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> Maybe a target-specific attribute instead? It would still apply to all CCs, > but would never be dropped. That would work too, yes. I proposed metadata because it looks like it can be dropped, but that is not a big issue. I would be OK with an attribute too if that is more convenient or we want to make sure it is kept. Cheers, Rafael
2012 Jul 27
0
[LLVMdev] X86 FMA4
Hey Michael, Thanks for the legwork! It appears that the stats you listed are for movaps [SSE], not vmovaps [AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256), since they are both AVX instructions. Although, yes, I agree that this is not clear from Agner's report. Please correct me if I am misunderstanding. As I am sure you are aware, we cannot use SSE (movaps)
2015 Dec 01
2
Endianness for multi-word types
On Mon, Nov 30, 2015 at 7:24 PM Gao, Yunzhong < yunzhong_gao at playstation.sony.com> wrote: > According to > http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html, > "The high-order double-precision value (the one that comes first in > storage) must have the larger magnitude." > > So the order of the two doubles in your fp128 is not affected by the
2012 Jul 27
2
[LLVMdev] X86 FMA4
Just looked up the numbers from Agner Fog for Sandy Bridge for vmovaps/etc for loading/storing from memory. vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5. vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1. He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f
2015 Dec 01
3
Endianness for multi-word types
Hi, I'm recently trying to investigate ppc_fp128 related problem. Here is a minimal C++ test case that seems wrongly compiled: long double id(long double a) { return a; } bool f(long double x) { return id(__builtin_fabsl(x)) >= 0; } int main() { if (f(-123.l)) { return 0; } return 1; } The program compiled with command: clang++ -static -target powerpc64le-linux-gnu bad.cc
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2014 Aug 22
3
[LLVMdev] [RFC] Raising minimum required Visual Studio version to 2013 for trunk
On Fri, Aug 22, 2014 at 8:58 AM, Renato Golin <renato.golin at linaro.org> wrote: > On 22 August 2014 13:43, Aaron Ballman <aaron at aaronballman.com> wrote: >> My opposition to this switch was the timing. When we researched "what >> minimum can we live with for C++11" nine months ago, we determined >> what versions would make sense, which included MSVC
2015 Dec 01
3
Endianness for multi-word types
On 1 December 2015 at 13:41, Tim Shen via llvm-dev <llvm-dev at lists.llvm.org> wrote: > As a simple solution, when see a LLVM IR bitcast, instead of generating > (ISD::BITCAST x), can we generate (exchange_hi_lo (ISD::BITCAST x)) instead? An LLVM bitcast is defined to be equivalent to a store/load pair. Changing that for ISD::BITCAST would be very surprising, and I wouldn't
2014 Aug 25
2
[LLVMdev] [RFC] Raising minimum required Visual Studio version to 2013 for trunk
On Mon, Aug 25, 2014 at 12:04 PM, Gao, Yunzhong <yunzhong_gao at playstation.sony.com> wrote: > Hi, > Sorry for the delay in responding, we have been discussing this internally > and have not had time to do a proper investigation. > >> We absolutely have to ship a set of DLLs that run hosted in VS2012. Is >> there any sort of runtime incompatibility that would happen