thr3ads.net - search: "vf16"

Displaying 5 results from an estimated 5 matches for "vf16".

Did you mean: f16

2017 Jan 23

Changes to TableGen in v4.0?

I am trying to upgrade to the LLVM v4.0 branch, but I am seeing failures in my TableGen descriptions for conversion from FP32 to FP16 (scalar and vector). The patterns I have are along the lines of: [(set (f16 RF16:$dst), (fround (f32 RF32:$src)))] or: [(set (v2f16 VF16:$dst), (fround (v2f32 VF32:$src)))] and these now produce the errors: error: In CONV_f32_f16: Type inference contradiction found, merging 'f32' into 'f16' or: error: In CONV_v2f32_v2f16: Type inference contradiction found, merging 'v2f32' into 'v2f16' F...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

On 07/23/2018 06:23 PM, Hal Finkel via llvm-dev wrote: > > On 07/23/2018 05:22 PM, Craig Topper wrote: >> Hello all, >> >> This code https://godbolt.org/g/tTyxpf is a dot product reduction >> loop multipying sign extended 16-bit values to produce a 32-bit >> accumulated result. The x86 backend is currently not able to optimize >> it as well as gcc and icc.

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

[LoopVectorizer] Improving the performance of dot product reduction loop

...rvation where instead of forcing vector factor to > 16 (by using pragma) tried option “vectorizer-maximize-bandwidth”. > > “vectorizer-maximize-bandwidth” considers the smallest data type size > in the loop body and allows the possible VF up to 16, but LV selects > the VF as 8 though VF16 has the same cost. > > LV: Vector loop of width 8 costs: 1. > > LV: Vector loop of width 16 costs: 1. > > > > It’s because of below check in LV: > > LoopVectorizationCostModel::selectVectorizationFactor() { > > … > > if (VectorCost < Cost) { > &...

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 12

X86 TRUNCATE cost for AVX & AVX2 mode

<Copied Cong> Thanks Elena. Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41. Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive vs SSE2. I feel this number should be same/close to the cost mentioned for same operation in SSE2ConversionTbl. Below patch from Cong Hou reduce cost for same operation in SSE2

[Proposal][RFC] Epilog loop vectorization

2017 Feb 27

[Proposal][RFC] Epilog loop vectorization

Thanks for looking into this. 1) Issues with re running vectorizer: Vectorizer might generate redundant alias checks while vectorizing epilog loop. Redundant alias checks are expensive, we like to reuse the results of already computed alias checks. With metadata we can limit the width of epilog loop, but not sure about reusing alias check result. Any thoughts on rerunning vectorizer with reusing

search for: vf16