Neil Henning via llvm-dev
2020-Jul-17 11:51 UTC
[llvm-dev] LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
Oh interesting - I hadn't even considered registering vector descriptors for the LLVM intrinsics, but right enough when I just registered that pow has a vector variant (itself of a bigger size) I got the correct 8-wide variants like I was expecting - nice! Thanks for the help! Cheers, -Neil. On Fri, Jul 17, 2020 at 12:09 PM Florian Hahn <florian_hahn at apple.com> wrote:> > > On 16 Jul 2020, at 19:54, Neil Henning via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > So for us we use SLEEF to actually implement the libcalls (LLVM > intrinsics) that LLVM by default would generate - and since SLEEF has > highly optimal 8-wide pow, optimized for AVX and AVX2, we really want to > use that. > > > Right, the way vector versions of library functions are accessed by the > vectoriser has changed since the last release. I think the initial patch > was https://reviews.llvm.org/D70107. > > Vector functions now must be annotated with a vector-function-abi-variant > function attribute. There’s the -inject-tli-mappings pass, that is supposed > to add the attributes for vector functions from TLI. It seems like this is > currently not happening for your custom TLI mappings for some reason. > > For example, the Accelerate library has a vector version of log10. Running > `opt -vector-library=Accelerate -inject-tli-mappings` on the IR below will > add the following attribute to the llvm.log10 call-site, indicating that > there’s a <4 x float> version of log10 called vlog10f. > > { "vector-function-abi-variant"="_ZGV_LLVM_N4v_llvm.log10.f32(vlog10f)" } > > > To double-check, if running -inject-tli-mappings on your example does not > add the vector-function-abi-variant attribute for `pow`, the vectorisers > won’t know about them. If the vector-function-abi-variant attribute is > actually created, but the vector version is not used nonetheless, it would > be great if you could share the IR with the attributes, as they depend on > the downstream TLI. > > I am also CC’ing Francesco, who might be able to help you pinning down > where exactly things go wrong with the mapping. > > Cheers, > Florian > > —— > > define float @call_llvm.log10.f32(float %in) { > %call = tail call float @llvm.log10.f32(float %in) > ret float %call > } > > declare float @llvm.log10.f32(float) >-- Neil Henning Senior Software Engineer Compiler unity.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200717/e903ada7/attachment.html>
Florian Hahn via llvm-dev
2020-Jul-17 14:19 UTC
[llvm-dev] LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
> On Jul 17, 2020, at 12:51, Neil Henning <neil.henning at unity3d.com> wrote: > > Oh interesting - I hadn't even considered registering vector descriptors for the LLVM intrinsics, but right enough when I just registered that pow has a vector variant (itself of a bigger size) I got the correct 8-wide variants like I was expecting - nice! > > Thanks for the help!No worries. It might be worth calling this new behavior out in the release notes for 11.0 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200717/57ae8fd7/attachment.html>
Francesco Petrogalli via llvm-dev
2020-Jul-17 14:23 UTC
[llvm-dev] LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
Hi all, Thank you Florian for pointing Neil in the right direction! I am glad this has been working for you Neil. Feel free to contact me directly if you need anything related to function vectorization again.> It might be worth calling this new behavior out in the release notes for 11.0How can we make sure this happens? Is there a process where we could write into release notes? Kind regards, Francesco From: Florian Hahn <florian_hahn at apple.com> Date: Friday, July 17, 2020 at 9:19 AM To: Neil Henning <neil.henning at unity3d.com> Cc: Sanjay Patel <spatel at rotateright.com>, llvm-dev <llvm-dev at lists.llvm.org>, Francesco Petrogalli <Francesco.Petrogalli at arm.com> Subject: Re: [llvm-dev] LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target On Jul 17, 2020, at 12:51, Neil Henning <mailto:neil.henning at unity3d.com> wrote: Oh interesting - I hadn't even considered registering vector descriptors for the LLVM intrinsics, but right enough when I just registered that pow has a vector variant (itself of a bigger size) I got the correct 8-wide variants like I was expecting - nice! Thanks for the help! No worries. It might be worth calling this new behavior out in the release notes for 11.0