thr3ads.net - search: "vbroadcastss"

Displaying 20 results from an estimated 25 matches for "vbroadcastss".

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 30

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...hat. I'll fix those. On Fri, Sep 26, 2014 at 3:39 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com > wrote: > Hi Chandler, > > Here is another test. > > When looking at the AVX codegen, I noticed that, when using the new > shuffle lowering, we no longer emit a single vbroadcastss in the case > where the shuffle performs a splat of a scalar float loaded from > memory. > > For example: > (with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering) > vmovss (%rdi), %xmm0 > vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0] > > Instead of: &...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 23

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > If you don’t want to spend time on this, I’d be happy to create a > candidate patch for review? I’ve been unclear if you were taking patches > for your shuffle work prior to it becoming the default. While I'm happy to work on it, I'm even more happy to have patches. =D -------------- next

Should llvm optimize 1.0 / x ?

2020 Aug 31

Should llvm optimize 1.0 / x ?

...__attribute__((__vector_size__(16))); v4f32 fct1(v4f32 x) { return 1.0 / x; } v4f32 fct2(v4f32 x) { return __builtin_ia32_rcpps(x); } Which is compiled to: vec.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_Z4fct1Dv4_f>: 0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9 <_Z4fct1Dv4_f+0x9> 7: 00 00 9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0 d: c3 retq e: 66 90 xchg %ax,%ax 0000000000000010 <_Z4fct2Dv4_f>: 10: c5 f8 53 c0 vrcpps %xmm0,%xmm0 14: c3...

Should llvm optimize 1.0 / x ?

2020 Sep 01

Should llvm optimize 1.0 / x ?

...e \ -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize \ -ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast \ -c -o vec.o vec.cc 0000000000000140 <_Z4fct4Dv4_f>: 140: c5 f8 53 c8 vrcpps %xmm0,%xmm1 144: c4 e2 79 18 15 00 00 vbroadcastss 0x0(%rip),%xmm2 # 14d <_Z4fct4Dv4_f+0xd> 14b: 00 00 14d: c4 e2 71 ac c2 vfnmadd213ps %xmm2,%xmm1,%xmm0 152: c4 e2 71 98 c1 vfmadd132ps %xmm1,%xmm1,%xmm0 157: c3 retq 158: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 15f: 00 0000000000000160 <_...

Vector evolution?

2020 Sep 01

Vector evolution?

...pass-analysis=loop-vectorize,slp-vectorize \ -ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast -mrecip=all:0 \ -c -o vec.o vec.cc I get the following codegen: 0000000000000160 <_Z4fct6PDv4_f>: 160: 31 c0 xor %eax,%eax 162: c4 e2 79 18 05 00 00 vbroadcastss 0x0(%rip),%xmm0 # 16b <_Z4fct6PDv4_f+0xb> 169: 00 00 16b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 170: c5 f8 59 0c 07 vmulps (%rdi,%rax,1),%xmm0,%xmm1 175: c5 f8 29 0c 07 vmovaps %xmm1,(%rdi,%rax,1) 17a: c5 f8 59 4c 07 10 vmulps 0x10(%rdi,%rax,1),%xmm0,%xmm1...

AVX2 codegen - question reg. FMA generation

2019 Sep 02

AVX2 codegen - question reg. FMA generation

...on't see the x86-64 code generator (with cpu set to haswell or later types) turning it into an AVX2 FMA instructions. Here's the snippet in the output it generates: $ llc -O3 -mcpu=skylake --------------------- .LBB0_2: # =>This Inner Loop Header: Depth=1 vbroadcastss (%rsi,%rdx,4), %ymm0 vmulps (%rdi,%rcx), %ymm0, %ymm0 vaddps (%rax,%rcx), %ymm0, %ymm0 vmovups %ymm0, (%rax,%rcx) incq %rdx addq $32, %rcx cmpq $15, %rdx jle .LBB0_2 ----------------------- $ llc --version LLVM (http://llvm.org/): LLVM version 8.0.0 Optimized build. Default target: x86_64-un...

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 11

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

...sm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S index f7c495e2863c..46feaea52632 100644 --- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S @@ -52,10 +52,10 @@ /* \ * S-function with AES subbytes \ */ \ - vmovdqa .Linv_shift_row, t4; \ - vbroadcastss .L0f0f0f0f, t7; \ - vmovdqa .Lpre_tf_lo_s1, t0; \ - vmovdqa .Lpre_tf_hi_s1, t1; \ + vmovdqa .Linv_shift_row(%rip), t4; \ + vbroadcastss .L0f0f0f0f(%rip), t7; \ + vmovdqa .Lpre_tf_lo_s1(%rip), t0; \ + vmovdqa .Lpre_tf_hi_s1(%rip), t1; \ \ /* AES inverse shift rows */ \ vpshufb t4, x0, x0; \ @@...

[LLVMdev] Long-Term ISel Design

2011 Mar 17

[LLVMdev] Long-Term ISel Design

...y this. I mean that in legalize/lowering we're massaging the DAG to get it into a state where tabel-driven isel can match it. There is a lot of code like this: if (shuffle_is_MOVL) do_nothing_and_return It's duplicating exactly the checks that the table-driven isel does later. In the VBROADCASTSS/D case, it's doing an entire DAG match to check whether it's implementable with VBROADCASTSS/D. Why not just let table-driven isel run first and take care of these checks just once? If something doesn't match, we then know it needs manual lowering and selection. >>...

[LLVMdev] Long-Term ISel Design

2011 Mar 16

[LLVMdev] Long-Term ISel Design

...the final stage we already know the shuffle mask isn't implementable manually so there's no need to check for legality. We simply need to implement whatever X86ISelLowering would have done in those case previously. This also helps example 2. In the memory-operand case we will match to a VBROADCASTSS/D. If we don't match we'll fall through to manual lowering and we'll implement the reg-reg broadcast via some other combination of shuffles. So we more gracefully handle situations where sometimes things are legal and sometimes they aren't depending on the context. Perhaps I'...

[LLVMdev] Long-Term ISel Design

2011 Mar 17

[LLVMdev] Long-Term ISel Design

On Mar 16, 2011, at 1:44 PM, David Greene wrote: > All, > > As I've done more integrating of AVX work upstream and more tuning here, > I've run across several things which are clunky in the current isel > design. A couple examples I can remember offhand: > > 1. We have special target-specific operators for certain shuffles in X86, > such as X86unpckl. I

VBROADCAST Implementation Issues

2017 Aug 06

VBROADCAST Implementation Issues

...t;>>>>> > wrote: >>>>>>>> >>>>>>>>> in x86 it is; >>>>>>>>> >>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>> >>>>>>>>> mine is >>>>>>>>> >>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>> (BROADCAST_256B addr:$src)>; >>...

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

...gt;>>>>>>>> >>>>>>>>>>> in x86 it is; >>>>>>>>>>> >>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>> >>>>>>>>>>> mine is >>>>>>>>>>> >>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>...

Lets do a 1.3.2 release

2016 Jan 18

Lets do a 1.3.2 release

Dave Yeo wrote: > Seems that the default binutils on OS/2 is too old to support AVX2, > attached patch works around this. Not the best solution as best would be > configure tests, but simple. Are you sure that these binutils support AVX and FMA? (Currently libFLAC doesn't contain AVX and FMA instructions). If they aren't supported then it's better to include them too into

Lets do a 1.3.2 release

2016 Jan 18

Lets do a 1.3.2 release

...dif. The nature of the error implies AVX2 support that is missing but I'm not much up on assembly, make[4]: Entering directory `K:/usr/local/src/flac/src/libFLAC' CC lpc_intrin_avx2.lo R:/tmp/ccwvrScM.s: Assembler messages: R:/tmp/ccwvrScM.s:495: Error: operand type mismatch for `vbroadcastss' ... R:/tmp/ccwvrScM.s:8773: Error: operand type mismatch for `vpsrlq' R:/tmp/ccwvrScM.s:8778: Error: no such instruction: `vpermd %ymm1,%ymm5,%ymm0' R:/tmp/ccwvrScM.s:8859: Error: operand type mismatch for `vpmovzxdq' ... Best to be safe so updated patch attached. I've also o...

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

...>>>>>>>> in x86 it is; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> mine is >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroa...

[LLVMdev] Long-Term ISel Design

2011 Mar 18

[LLVMdev] Long-Term ISel Design

...ng we're massaging the DAG to get it into > a state where tabel-driven isel can match it. There is a lot of code > like this: > > if (shuffle_is_MOVL) > do_nothing_and_return > > It's duplicating exactly the checks that the table-driven isel does > later. In the VBROADCASTSS/D case, it's doing an entire DAG match > to check whether it's implementable with VBROADCASTSS/D. > > Why not just let table-driven isel run first and take care of these > checks just once? If something doesn't match, we then know it needs > manual lowering and selectio...

AVX2 codegen - question reg. FMA generation

2019 Sep 02

AVX2 codegen - question reg. FMA generation

...ater types) turning it into an > > AVX2 FMA instructions. Here's the snippet in the output it generates: > > > > $ llc -O3 -mcpu=skylake > > > > --------------------- > > .LBB0_2: # =>This Inner Loop Header: Depth=1 > > vbroadcastss (%rsi,%rdx,4), %ymm0 > > vmulps (%rdi,%rcx), %ymm0, %ymm0 > > vaddps (%rax,%rcx), %ymm0, %ymm0 > > vmovups %ymm0, (%rax,%rcx) > > incq %rdx > > addq $32, %rcx > > cmpq $15, %rdx > > jle .LBB0_2 > > ----------------------- > > > > $ llc --v...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

search for: vbroadcastss