thr3ads.net - search: "vlog10f"

Displaying 2 results from an estimated 2 matches for "vlog10f".

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 17

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

...> > For example, the Accelerate library has a vector version of log10. Running > `opt -vector-library=Accelerate -inject-tli-mappings` on the IR below will > add the following attribute to the llvm.log10 call-site, indicating that > there’s a <4 x float> version of log10 called vlog10f. > > { "vector-function-abi-variant"="_ZGV_LLVM_N4v_llvm.log10.f32(vlog10f)" } > > > To double-check, if running -inject-tli-mappings on your example does not > add the vector-function-abi-variant attribute for `pow`, the vectorisers > won’t know about them....

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

So for us we use SLEEF to actually implement the libcalls (LLVM intrinsics) that LLVM by default would generate - and since SLEEF has highly optimal 8-wide pow, optimized for AVX and AVX2, we really want to use that. So we would not see 4/8 libcalls and instead see 1 call to something that lights up the ymm registers. I guess the problem then is that the default expectation is that pow would be

search for: vlog10f