Renato Golin via llvm-dev
2016-Aug-05 21:03 UTC
[llvm-dev] enabling interleaved access loop vectorization
On 5 August 2016 at 21:00, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:> As far as I remember, may be I’m wrong, vectorizer does not generate > shuffles for interleave access. It generates a bunch of extracts and inserts > that ought to be coupled into shuffles after wise.That's my understanding as well. Whatever strategy we take, it will be a mix of telling the cost model to avoid some pathological cases as well as improving the detection of the patterns in the x86 back-end. The work to benchmark this properly looks harder than enabling the right flags and patterns. :) cheers, --renato
Michael Kuperstein via llvm-dev
2016-Aug-05 23:18 UTC
[llvm-dev] enabling interleaved access loop vectorization
As Ashutosh wrote, the BasicTTI cost model evaluates this as the cost of using extracts and inserts. So even if we end up generating inserts and extracts (and I believe we actually manage to get the right shuffles, more or less, courtesy of InstCombine and the shuffle lowering code), we should be seeing improvements with the current cost model. I agree that we can get *more* improvement with better cost modeling, but I'd expect to be able to get *some* improvement the way things are right now. That's why I'm curious about where we saw regressions - I'm wondering whether there's really a significant cost modeling issue I'm missing, or it's something that's easy to fix so that we can make forward progress, while Ashutosh is working on the longer-term solution. On Fri, Aug 5, 2016 at 2:03 PM, Renato Golin <renato.golin at linaro.org> wrote:> On 5 August 2016 at 21:00, Demikhovsky, Elena > <elena.demikhovsky at intel.com> wrote: > > As far as I remember, may be I’m wrong, vectorizer does not generate > > shuffles for interleave access. It generates a bunch of extracts and > inserts > > that ought to be coupled into shuffles after wise. > > That's my understanding as well. > > Whatever strategy we take, it will be a mix of telling the cost model > to avoid some pathological cases as well as improving the detection of > the patterns in the x86 back-end. > > The work to benchmark this properly looks harder than enabling the > right flags and patterns. :) > > cheers, > --renato >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160805/9d71e253/attachment.html>
Renato Golin via llvm-dev
2016-Aug-05 23:37 UTC
[llvm-dev] enabling interleaved access loop vectorization
On 6 August 2016 at 00:18, Michael Kuperstein <mkuper at google.com> wrote:> I agree that we can get *more* improvement with better cost modeling, but > I'd expect to be able to get *some* improvement the way things are right > now.Elena said she saw "some" improvements. :)> That's why I'm curious about where we saw regressions - I'm wondering > whether there's really a significant cost modeling issue I'm missing, or > it's something that's easy to fix so that we can make forward progress, > while Ashutosh is working on the longer-term solution.Sounds like a task to try a few patterns and fiddle with the cost model. Arnold did a lot of those during the first months of the vectorizer, so it might be just a matter of finding the right heuristics, at least for the low hanging fruits. Of course, that'd also involve benchmarking everything else, to make sure the new heuristics doesn't introduce regressions on non-interleaved vectorisation. cheers, --renato