thr3ads.net - llvm dev - [llvm-dev] enabling interleaved access loop vectorization [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Renato Golin via llvm-dev

2016-Aug-05 21:03 UTC

[llvm-dev] enabling interleaved access loop vectorization

On 5 August 2016 at 21:00, Demikhovsky, Elena
<elena.demikhovsky at intel.com> wrote:> As far as I remember, may be I’m wrong, vectorizer does not generate
> shuffles for interleave access. It generates a bunch of extracts and
inserts
> that ought to be coupled into shuffles after wise.
That's my understanding as well.

Whatever strategy we take, it will be a mix of telling the cost model
to avoid some pathological cases as well as improving the detection of
the patterns in the x86 back-end.

The work to benchmark this properly looks harder than enabling the
right flags and patterns. :)

cheers,
--renato

Michael Kuperstein via llvm-dev

2016-Aug-05 23:18 UTC

head link

[llvm-dev] enabling interleaved access loop vectorization

As Ashutosh wrote, the BasicTTI cost model evaluates this as the cost of
using extracts and inserts.
So even if we end up generating inserts and extracts (and I believe we
actually manage to get the right shuffles, more or less, courtesy of
InstCombine and the shuffle lowering code), we should be seeing
improvements with the current cost model.
I agree that we can get *more* improvement with better cost modeling, but
I'd expect to be able to get *some* improvement the way things are right
now.

That's why I'm curious about where we saw regressions - I'm
wondering
whether there's really a significant cost modeling issue I'm missing, or
it's something that's easy to fix so that we can make forward progress,
while Ashutosh is working on the longer-term solution.

On Fri, Aug 5, 2016 at 2:03 PM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 5 August 2016 at 21:00, Demikhovsky, Elena
> <elena.demikhovsky at intel.com> wrote:
> > As far as I remember, may be I’m wrong, vectorizer does not generate
> > shuffles for interleave access. It generates a bunch of extracts and
> inserts
> > that ought to be coupled into shuffles after wise.
>
> That's my understanding as well.
>
> Whatever strategy we take, it will be a mix of telling the cost model
> to avoid some pathological cases as well as improving the detection of
> the patterns in the x86 back-end.
>
> The work to benchmark this properly looks harder than enabling the
> right flags and patterns. :)
>
> cheers,
> --renato
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160805/9d71e253/attachment.html>

Renato Golin via llvm-dev

2016-Aug-05 23:37 UTC

head link

[llvm-dev] enabling interleaved access loop vectorization

On 6 August 2016 at 00:18, Michael Kuperstein <mkuper at google.com>
wrote:> I agree that we can get *more* improvement with better cost modeling, but
> I'd expect to be able to get *some* improvement the way things are
right
> now.
Elena said she saw "some" improvements. :)

> That's why I'm curious about where we saw regressions - I'm
wondering
> whether there's really a significant cost modeling issue I'm
missing, or
> it's something that's easy to fix so that we can make forward
progress,
> while Ashutosh is working on the longer-term solution.
Sounds like a task to try a few patterns and fiddle with the cost model.

Arnold did a lot of those during the first months of the vectorizer,
so it might be just a matter of finding the right heuristics, at least
for the low hanging fruits.

Of course, that'd also involve benchmarking everything else, to make
sure the new heuristics doesn't introduce regressions on
non-interleaved vectorisation.

cheers,
--renato

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Aug 2016 - enabling interleaved access loop vectorization

[llvm-dev] enabling interleaved access loop vectorization

[llvm-dev] enabling interleaved access loop vectorization

[llvm-dev] enabling interleaved access loop vectorization

Reasonably Related Threads