search for: vzeroupp

Displaying 20 results from an estimated 43 matches for "vzeroupp".

Did you mean: vzeroupper
2013 Sep 19
5
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Hi all, I would like to make a proposal about changing the optimization strategy regarding when to insert a vzeroupper instruction in the x86 backend. Current implementation: vzeroupper is inserted to any functions that use AVX instructions. The insertion points are: 1) before a call instruction; 2) before a return instruction; Rationale: vzeroupper is an AVX instruction; it is inserted to avoid performance pen...
2013 Sep 20
3
[LLVMdev] Proposal to improve vzeroupper optimization strategy
Hi Eli, Thanks for the feedback. Please see below. - Gao. From: Eli Friedman [mailto:eli.friedman at gmail.com] Sent: Thursday, September 19, 2013 12:31 PM To: Gao, Yunzhong Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Proposal to improve vzeroupper optimization strategy > This is essentially equivalent to "don't insert vzeroupper anywhere", as > far as I can tell. (The case of SSE instructions without a v- prefixed > equivalent is rare enough we can separate it from this discussion.) So will you be interested in a p...
2013 Sep 21
1
[LLVMdev] Proposal to improve vzeroupper optimization strategy
...** ** >> >> From: Eli Friedman [mailto:eli.friedman at gmail.com] **** >> >> Sent: Thursday, September 19, 2013 12:31 PM**** >> >> To: Gao, Yunzhong**** >> >> Cc: llvmdev at cs.uiuc.edu**** >> >> Subject: Re: [LLVMdev] Proposal to improve vzeroupper optimization >> strategy**** >> >> ** ** >> >> > This is essentially equivalent to "don't insert vzeroupper anywhere", as >> **** >> >> > far as I can tell. (The case of SSE instructions without a v- prefixed* >> *** >&g...
2013 Sep 20
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
...ase see below. > - Gao.**** > > ** ** > > From: Eli Friedman [mailto:eli.friedman at gmail.com] **** > > Sent: Thursday, September 19, 2013 12:31 PM**** > > To: Gao, Yunzhong**** > > Cc: llvmdev at cs.uiuc.edu**** > > Subject: Re: [LLVMdev] Proposal to improve vzeroupper optimization strategy > **** > > ** ** > > > This is essentially equivalent to "don't insert vzeroupper anywhere", as > **** > > > far as I can tell. (The case of SSE instructions without a v- prefixed** > ** > > > equivalent is rare enough...
2013 Sep 19
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
On Thu, Sep 19, 2013 at 11:53 AM, Gao, Yunzhong < yunzhong_gao at playstation.sony.com> wrote: > Hi all, > > I would like to make a proposal about changing the optimization strategy > regarding when to insert a vzeroupper instruction in the x86 backend. > > Current implementation: > vzeroupper is inserted to any functions that use AVX instructions. The > insertion points are: > 1) before a call instruction; > 2) before a return instruction; > > Rationale: > vzeroupper is an AVX instructi...
2013 Sep 19
0
[LLVMdev] Proposal to improve vzeroupper optimization strategy
...s. cheers. ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on behalf of Gao, Yunzhong [yunzhong_gao at playstation.sony.com] Sent: Thursday, September 19, 2013 11:53 AM To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Proposal to improve vzeroupper optimization strategy Hi all, I would like to make a proposal about changing the optimization strategy regarding when to insert a vzeroupper instruction in the x86 backend. Current implementation: vzeroupper is inserted to any functions that use AVX instructions. The insertion points are: 1) b...
2013 Dec 19
4
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
Hi all, I would like to find out whether anyone will find it useful to add an x86- specific calling convention for reducing emission of vzeroupper instructions. Current implementation: vzeroupper is inserted to any functions that use AVX instructions. The insertion points are: 1) before a call instruction; 2) before a return instruction; Background: vzeroupper is an AVX instruction; it is inserted to avoid performance penalty when transit...
2013 Dec 19
0
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On 19 December 2013 14:31, Gao, Yunzhong <yunzhong_gao at playstation.sony.com> wrote: > Hi all, > > > > I would like to find out whether anyone will find it useful to add an x86- > > specific calling convention for reducing emission of vzeroupper > instructions. > > > > Current implementation: > > vzeroupper is inserted to any functions that use AVX instructions. The > > insertion points are: > > 1) before a call instruction; > > 2) before a return instruction; > > > > Background: > &g...
2013 Dec 19
2
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
...ember 2013 14:31, Gao, Yunzhong > <yunzhong_gao at playstation.sony.com> wrote: > > Hi all, > > > > > > > > I would like to find out whether anyone will find it useful to add an > x86- > > > > specific calling convention for reducing emission of vzeroupper > > instructions. > > > > > > > > Current implementation: > > > > vzeroupper is inserted to any functions that use AVX instructions. The > > > > insertion points are: > > > > 1) before a call instruction; > > > > 2) bef...
2013 Dec 24
2
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
...> What should happen with this code? > > int foo() __attribute__((avx)); > > void bar(int (*fp)()) { > int i = fp(); > } > > void baz(void) { > bar(foo); > } > > Based on your description, this code is valid, but not as performant > as it could be. The vzeroupper would be inserted before fp() is > called, but there's no incompatibility happening. So I guess this > feels more like a regular function attribute than a calling > convention. It is not a calling convention. The issue is more if it is a type or a decl attribute. Given that putting...
2013 Dec 19
0
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> Maybe a target-specific attribute instead? It would still apply to all CCs, > but would never be dropped. That would work too, yes. I proposed metadata because it looks like it can be dropped, but that is not a big issue. I would be OK with an attribute too if that is more convenient or we want to make sure it is kept. Cheers, Rafael
2014 Sep 17
2
[LLVMdev] VEX prefixes for JIT in llvm 3.5
...graded our JIT system to use llvm 3.5 and noticed one big change in our generated code: we don't see any non-destructive VEX prefix instructions being emitted any more (vmulsd xmm0, xmm1, blah) etc. It's long been on my list of things to investigate anyway as I noticed llvm didn't emit VZEROUPPER calls either, so I supposed it might not be a bad thing to disable vex. That being said, try as I might I can't force avx on (builder.setMCPU("core-avx-i") and/or builder.setMAttrs(vector<string>{"+avx"});). We're still using the old JIT but I just spiked out a...
2014 Sep 17
3
[LLVMdev] VEX prefixes for JIT in llvm 3.5
...big >> change in our generated code: we don't see any non-destructive VEX >> prefix instructions being emitted any more (vmulsd xmm0, xmm1, blah) >> etc. >> >> It's long been on my list of things to investigate anyway as I noticed >> llvm didn't emit VZEROUPPER calls either, so I supposed it might not >> be a bad thing to disable vex. >> >> That being said, try as I might I can't force avx on >> (builder.setMCPU("core-avx-i") and/or >> builder.setMAttrs(vector<string>{"+avx"});). We're stil...
2013 Sep 05
1
[LLVMdev] AVX calling convention?
...tion is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually. I tracked this down to the following. The call site looks like vmovdqa 24064(%rsp), %ymm0 vmovdqa %ymm0, (%rsp) vzeroupper callq __Z14convert_char16Dv16_s which passes the argument on the stack. The callee, however, begins with __Z14convert_char16Dv16_s: ## @_Z14convert_char16Dv16_s .cfi_startproc ## BB#0: ## %entry pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cf...
2012 May 24
4
[LLVMdev] use AVX automatically if present
...pushq %rbp .Ltmp2: .cfi_def_cfa_offset 16 .Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp .Ltmp4: .cfi_def_cfa_register %rbp vmovaps (%rdi), %ymm0 vaddps (%rsi), %ymm0, %ymm0 vmovaps %ymm0, (%rdi) popq %rbp vzeroupper ret .Ltmp5: .size _fun1, .Ltmp5-_fun1 .cfi_endproc .section ".note.GNU-stack","", at progbits I guess your answer is that I did not specify a target triple. However why is SSE41 automatically detected and AVX is not?
2014 Sep 17
2
[LLVMdev] VEX prefixes for JIT in llvm 3.5
...e don't see any non-destructive VEX >> >> prefix instructions being emitted any more (vmulsd xmm0, xmm1, blah) >> >> etc. >> >> >> >> It's long been on my list of things to investigate anyway as I noticed >> >> llvm didn't emit VZEROUPPER calls either, so I supposed it might not >> >> be a bad thing to disable vex. >> >> >> >> That being said, try as I might I can't force avx on >> >> (builder.setMCPU("core-avx-i") and/or >> >> builder.setMAttrs(vector<st...
2012 May 24
0
[LLVMdev] use AVX automatically if present
....cfi_def_cfa_offset 16 > .Ltmp3: > .cfi_offset %rbp, -16 > movq %rsp, %rbp > .Ltmp4: > .cfi_def_cfa_register %rbp > vmovaps (%rdi), %ymm0 > vaddps (%rsi), %ymm0, %ymm0 > vmovaps %ymm0, (%rdi) > popq %rbp > vzeroupper > ret > .Ltmp5: > .size _fun1, .Ltmp5-_fun1 > .cfi_endproc > > > .section ".note.GNU-stack","", at progbits > > > > > I guess your answer is that I did not specify a target triple. However why is > SS...
2016 Mar 24
3
Open Project : Inter-procedural Register Allocation [GSoC 2016]
...the function output different. If we care about the order, which we may do, then we’d need to cache the data in the AsmPrinter and reorder it there somehow. Some bonus features that come from codegen on the calligraphy, and specifically having accurate regmasks and similar information: - The X86 VZeroUpper pass should insert fewer VZeroUpper instructions before calls, and could possibly even learn that after the call the state of vzeroupper is known. - Values in registers can be used by the callee instead of loading them. The second one here is fun. Imagine this pseudo code: foo: r0 = 1000 … ret...
2014 Sep 18
5
[LLVMdev] VEX prefixes for JIT in llvm 3.5
...VEX >>>>>> prefix instructions being emitted any more (vmulsd xmm0, xmm1, blah) >>>>>> etc. >>>>>> >>>>>> It's long been on my list of things to investigate anyway as I noticed >>>>>> llvm didn't emit VZEROUPPER calls either, so I supposed it might not >>>>>> be a bad thing to disable vex. >>>>>> >>>>>> That being said, try as I might I can't force avx on >>>>>> (builder.setMCPU("core-avx-i") and/or >>>>>...
2016 Mar 24
0
Open Project : Inter-procedural Register Allocation [GSoC 2016]
...utput different. If we care about the order, which we may do, then we’d need to cache the data in the AsmPrinter and reorder it there somehow. > > Some bonus features that come from codegen on the calligraphy, and specifically having accurate regmasks and similar information: > - The X86 VZeroUpper pass should insert fewer VZeroUpper instructions before calls, and could possibly even learn that after the call the state of vzeroupper is known. > - Values in registers can be used by the callee instead of loading them. > > The second one here is fun. Imagine this pseudo code: >...