Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Proposal to improve vzeroupper optimization strategy"
2013 Sep 19
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Great idea. I reported on this problem before and glad to see someone trying to tackle this.
cheers.
________________________________________
From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on behalf of Gao, Yunzhong [yunzhong_gao at playstation.sony.com]
Sent: Thursday, September 19, 2013 11:53 AM
To: llvmdev at cs.uiuc.edu
Subject: [LLVMdev] Proposal to improve
2013 Sep 20
3
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Hi Eli,
Thanks for the feedback. Please see below.
- Gao.
From: Eli Friedman [mailto:eli.friedman at gmail.com]
Sent: Thursday, September 19, 2013 12:31 PM
To: Gao, Yunzhong
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Proposal to improve vzeroupper optimization strategy
> This is essentially equivalent to "don't insert vzeroupper anywhere", as
> far as I can tell. (The
2013 Dec 19
4
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
Hi all,
I would like to find out whether anyone will find it useful to add an x86-
specific calling convention for reducing emission of vzeroupper instructions.
Current implementation:
vzeroupper is inserted to any functions that use AVX instructions. The
insertion points are:
1) before a call instruction;
2) before a return instruction;
Background:
vzeroupper is an AVX instruction; it is
2013 Sep 20
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
On Fri, Sep 20, 2013 at 2:58 PM, Gao, Yunzhong <
yunzhong_gao at playstation.sony.com> wrote:
> Hi Eli,****
>
> Thanks for the feedback. Please see below.
> - Gao.****
>
> ** **
>
> From: Eli Friedman [mailto:eli.friedman at gmail.com] ****
>
> Sent: Thursday, September 19, 2013 12:31 PM****
>
> To: Gao, Yunzhong****
>
> Cc: llvmdev at
2013 Sep 21
1
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Is it realistic to worry about performance of vectorized code that does PIC
calls into a non-vectorized sin() in libc? Maybe there's an example other
than sin() that is more realistic?
-- Sean Silva
On Fri, Sep 20, 2013 at 7:11 PM, Eli Friedman <eli.friedman at gmail.com>wrote:
> On Fri, Sep 20, 2013 at 2:58 PM, Gao, Yunzhong <
> yunzhong_gao at playstation.sony.com>
2013 Dec 19
0
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On 19 December 2013 14:31, Gao, Yunzhong
<yunzhong_gao at playstation.sony.com> wrote:
> Hi all,
>
>
>
> I would like to find out whether anyone will find it useful to add an x86-
>
> specific calling convention for reducing emission of vzeroupper
> instructions.
>
>
>
> Current implementation:
>
> vzeroupper is inserted to any functions that use AVX
2013 Sep 19
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
On Thu, Sep 19, 2013 at 11:53 AM, Gao, Yunzhong <
yunzhong_gao at playstation.sony.com> wrote:
> Hi all,
>
> I would like to make a proposal about changing the optimization strategy
> regarding when to insert a vzeroupper instruction in the x86 backend.
>
> Current implementation:
> vzeroupper is inserted to any functions that use AVX instructions. The
> insertion
2013 Dec 19
2
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On Thu, Dec 19, 2013 at 12:14 PM, Rafael Espíndola <
rafael.espindola at gmail.com> wrote:
> On 19 December 2013 14:31, Gao, Yunzhong
> <yunzhong_gao at playstation.sony.com> wrote:
> > Hi all,
> >
> >
> >
> > I would like to find out whether anyone will find it useful to add an
> x86-
> >
> > specific calling convention for reducing
2013 Dec 24
2
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> In general, I'm not too keen on adding more calling conventions unless
> there's a really powerful need for one from an ABI perspective. This
> sounds more like an optimization than an ABI need.
I think that is the case.
> What's more, I
> worry (a little bit) about confusion that could be caused with the
> __vectorcall calling convention (which we do not
2012 Nov 07
1
[LLVMdev] AVX support
We have been using LLVM 3.1 to support JITing of AVX. From dumping the MC generating by the MCJIT I noticed it always emits 'VZEROUPPER' to clear the high 128 bit before calling another function. In some cases I know the function called either only use AVX or does not use SSE. I will like to inform the backend it is safe not to emit that instruction.
Have not been able to figure out how
2013 Dec 19
0
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> Maybe a target-specific attribute instead? It would still apply to all CCs,
> but would never be dropped.
That would work too, yes. I proposed metadata because it looks like it
can be dropped, but that is not a big issue. I would be OK with an
attribute too if that is more convenient or we want to make sure it is
kept.
Cheers,
Rafael
2012 Jul 27
0
[LLVMdev] X86 FMA4
Hey Michael,
Thanks for the legwork!
It appears that the stats you listed are for movaps [SSE], not vmovaps
[AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256),
since they are both AVX instructions. Although, yes, I agree that this is
not clear from Agner's report. Please correct me if I am misunderstanding.
As I am sure you are aware, we cannot use SSE (movaps)
2015 Dec 01
2
Endianness for multi-word types
On Mon, Nov 30, 2015 at 7:24 PM Gao, Yunzhong <
yunzhong_gao at playstation.sony.com> wrote:
> According to
> http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html,
> "The high-order double-precision value (the one that comes first in
> storage) must have the larger magnitude."
>
> So the order of the two doubles in your fp128 is not affected by the
2012 Jul 27
2
[LLVMdev] X86 FMA4
Just looked up the numbers from Agner Fog for Sandy Bridge for vmovaps/etc for loading/storing from memory.
vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5.
vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1.
He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture.
You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell.
$ clang -march=core-avx2 -O3 -S -o - test.c
.section __TEXT,__text,regular,pure_instructions
.globl _f
.align 4, 0x90
_f: ## @f
2015 Dec 01
3
Endianness for multi-word types
Hi,
I'm recently trying to investigate ppc_fp128 related problem. Here is a
minimal C++ test case that seems wrongly compiled:
long double id(long double a) {
return a;
}
bool f(long double x) {
return id(__builtin_fabsl(x)) >= 0;
}
int main() {
if (f(-123.l)) {
return 0;
}
return 1;
}
The program compiled with command:
clang++ -static -target powerpc64le-linux-gnu bad.cc
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello -
I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2014 Aug 22
3
[LLVMdev] [RFC] Raising minimum required Visual Studio version to 2013 for trunk
On Fri, Aug 22, 2014 at 8:58 AM, Renato Golin <renato.golin at linaro.org> wrote:
> On 22 August 2014 13:43, Aaron Ballman <aaron at aaronballman.com> wrote:
>> My opposition to this switch was the timing. When we researched "what
>> minimum can we live with for C++11" nine months ago, we determined
>> what versions would make sense, which included MSVC
2015 Dec 01
3
Endianness for multi-word types
On 1 December 2015 at 13:41, Tim Shen via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> As a simple solution, when see a LLVM IR bitcast, instead of generating
> (ISD::BITCAST x), can we generate (exchange_hi_lo (ISD::BITCAST x)) instead?
An LLVM bitcast is defined to be equivalent to a store/load pair.
Changing that for ISD::BITCAST would be very surprising, and I
wouldn't
2014 Aug 25
2
[LLVMdev] [RFC] Raising minimum required Visual Studio version to 2013 for trunk
On Mon, Aug 25, 2014 at 12:04 PM, Gao, Yunzhong
<yunzhong_gao at playstation.sony.com> wrote:
> Hi,
> Sorry for the delay in responding, we have been discussing this internally
> and have not had time to do a proper investigation.
>
>> We absolutely have to ship a set of DLLs that run hosted in VS2012. Is
>> there any sort of runtime incompatibility that would happen