thr3ads.net - llvm dev - [llvm-dev] LoopVectorizer: shufflevectors [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Saito, Hideki via llvm-dev

2018-Sep-05 01:58 UTC

[llvm-dev] LoopVectorizer: shufflevectors

>> To me, this looks like something the LoopVectorizer is neglecting and
>> should be combining.
>
>It's not up to the vectoriser to combine code.
>
>But it could be up to the vectoriser to generate less bloated code,
>given it's a small change.
>
>That's my point above.
We should note that
1) Loop Vectorizer is not the only place that generates vectorized IR. For
example, programmer's intrinsic vector code, after inlining etc. might show
the same problem. Any optimizations added within LV won't be applied when
other parts of the compiler is generating vectorized IR.
2) Vectorizer's main job is generating widened vector code that is easier to
optimize later on, not necessarily generating highly optimized vector code on
its own.
3) Cost modeling correctly (and as a result choosing good VF) is a more
important problem, than performing the optimization within the vectorizer
itself.
4) If cost modeling is taking optimization into account, LV has a chance of
generating optimized code. That doesn't necessarily mean LV should ---- back
to 1).

The last thing we want would be making LV a gigantic monolithic optimizer that
is so hard to maintain.

I think we should talk about how much complexity we would be adding for general
"vectorized load/store optimization", and whether we should have a
separate post-vectorizer optimizer doing it (while LV still needs to understand
the cost modeling aspect of that optimization, in order to choose the right VF).
This should include a discussion about moving interleave memory access
optimization from LV to there. Adding a small new optimization here and there to
LV can have a snowball effect.

Thanks,
Hideki

=============================Date: Tue, 4 Sep 2018 18:57:17 +0100
From: Renato Golin via llvm-dev <llvm-dev at lists.llvm.org>
To: 
Cc: LLVM Dev <llvm-dev at lists.llvm.org>, Ulrich Weigand
	<ulrich.weigand at de.ibm.com>
Subject: Re: [llvm-dev] LoopVectorizer: shufflevectors
Message-ID:
	<CAMSE1kcHuN4a-a1VTUdsyyVD_9aThZ6p_N8ZbPhW1H8KoxAJtg at mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

On Tue, 4 Sep 2018 at 17:35, Jonas Paulsson <paulsson at
linux.vnet.ibm.com> wrote:> > It's probably a lot simpler to improve the SystemZ model to
"know"
> > have the same arch flags / cost model completeness as the other
> > targets.
> I thought they were - anything particular in mind?
I have no idea about SystemZ, sorry. :)
>From your post and response, it seems that both improving the targetinfo and cost model are opening new ways to vectorise on SystemZ.

That's what I was referring to.

> This then made many more cases of interleaving happen (~450 cases on
> spec IIRC). Only problem was... the SystemZ backend could not handle
> those shuffles as well in all the cases. To me that looked like
> something to be fixed on the I/R level, and after discussions with
> Sanjay I got the impression that this was the case...
Right. Being fixed at IR level and that being done in the vectoriser
are two different things.

Our current implementation is too monolithic to be trying out
branching off the beaten path, and we're in the process of moving out
(which can still take years), so I don't recommend big refactorings on
the code.

You could probably find a number of simplifications, taking target
info in consideration, that can later be ported to VPlan, but that
will require testing the vectorisation on the supported targets.

We don't need to re-benchmark everything again, just make sure the
code doesn't change for them, of if it does, to know why.

> To me, this looks like something the LoopVectorizer is neglecting and
> should be combining.
It's not up to the vectoriser to combine code.

But it could be up to the vectoriser to generate less bloated code,
given it's a small change.

That's my point above.

> I suppose with my patch for the Load -> Store
> groups, I could add also the handling of recomputed indices so that the
> load group produces a vector that fits the store group directly. But if
> I understand you correctly, even this is not so wise?
It will depend on how much that changes other targets, because what
looks less bloated can also mean patterns are not recognised any more
by other back-ends.

> And if so, then indeed improving the SystemZ DAGCombiner is the only
alternative left, I guess...
You'll probably have to do that anyway, but I wouldn't try it unless I
had no other choice. :)

> But having the cost functions available is not enough to drive a later
> I/R pass to optimize the generated vector code? I mean if the target
> indicated which shuffles were expensive, that could then easily be avoided.
Sure, but "expensive" is a relative term and it's intimately
linked to
what the back-end can combine.

If you're lucky enough that a mid-end change just happens to unbloat
shuffles and be correctly lowered, without breaking other targets,
then that's a big win.

-- 
cheers,
--renato

Renato Golin via llvm-dev

2018-Sep-05 10:36 UTC

head link

[llvm-dev] LoopVectorizer: shufflevectors

On Wed, 5 Sep 2018 at 02:58, Saito, Hideki <hideki.saito at intel.com>
wrote:> I think we should talk about how much complexity we would be adding for
general "vectorized load/store optimization", and whether we should
have a separate post-vectorizer optimizer doing it (while LV still needs to
understand the cost modeling aspect of that optimization, in order to choose the
right VF).
I imagine it would be a lot easier to plug loop-vectorisation-specific
clean up passes in a VPlan model than today. But as you said, this is
only part of the vectorised code the middle end generates.

While LV could (potentially) generate less bloated code, which would
also help clean up passes to do their jobs better, it will have to be
very conservative and extensively tested.

> This should include a discussion about moving interleave memory access
optimization from LV to there. Adding a small new optimization here and there to
LV can have a snowball effect.
I agree that interleave access is not exclusive to loop vectorisation
and that it should be moved to a higher position (some of your patches
earlier this year come to mind).

But, as I said back then, before we do so, we need to understand
exactly where to put it. That will depend on what other passes will
actually use it and if we want it to be a utility class or an analysis
pass, or both.

Have you compiled a list of passes that could benefit from such a move?

cheers,
--renato

llvm dev - Sep 2018 - LoopVectorizer: shufflevectors

[llvm-dev] LoopVectorizer: shufflevectors

[llvm-dev] LoopVectorizer: shufflevectors