----- Original Message ----->
> On Oct 31, 2013, at 6:06 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> > Hello,
> >
> > We currently generate suboptimal code for this loop:
> >
> > for (int i = 1; i < LEN; i++) {
> > a[i] = a[0] + b[i];
> > }
> >
> > The largest problem is that we won't vectorize the loop because
the
> > loop vectorizer does not grok the independence of a[i] and a[0]
> > (note that i starts at 1 in this loop). While we can probably fix
> > the vectorizer to handle this, I don't think that is the right
> > place for the fix, because this is really a LICM problem. LICM
> > could move a[0] out of the loop, but it currently doesn't (which
> > is pretty suboptimial in itself). The reason is that LICM does not
> > use SE (either directly or indirectly), and BasicAA does not
> > recognize that a[0] and a[i] will never overlap in this loop.
> >
> > I see several possible solutions:
> >
> > 1. Add yet-another hack to BasicAA to catch this case.
> >
> > 2. Let LICM use SE directly. Instead of bailing out on hoisting a
> > loop-invariant value in an alias set with non-invariant values,
> > use SE to check independence with the other values in the alias
> > set.
> >
> > 3. Use ScalarEvolutionAliasAnalysis; scheduling SCEV AA prior to
> > running LICM fixes this problem.
> >
> > Using #3 seems like the simplest, and most natural, way to fix this
> > issue. I recall someone tell me (maybe at the developers' meeting)
> > that SCEV AA was 'fundamentally flawed', and now I'm
wondering
> > what was meant by that. The pass manager does not know how to
> > reschedule it unless it is specifically requested, but if that's
> > the entirety of the problem, then I think that we could manually
> > schedule it in a few key places and get a lot of benefit (there is
> > a particular use case for prior to LICM, at least).
> >
> > FWIW, I don't think that adding an SCEV dependency before LICM
> > would be particularly problematic: Currently, LICM lives in the
> > same loop pass manager instance as LoopSimplify, LoopRotate and
> > LoopUnswitch, all of which preserve SE.
> >
> > Suggestions?
>
> I would be scared to enable SCEV AA because it's not well tested
This seems like a solvable problem. SCEV AA itself is something like 30 lines of
non-boilerplate code, and essentially does one thing (calls getMinusSCEV in both
directions and uses getUnsignedRange on the result). We should look it over.
> and
> updating SCEV across transforms can be problematic. LICM would
> require another run of SCEV (bad for compile time).
Why? 'Running' SE does not really do anything because we compute
everything lazily. SE depends on DT, which I agree we don't want to
constantly recompute, but all of those transformations preserve DT and SE (and
already use DT), so I don't see why there would be any problem here.
Using SE itself on every AA query might certainly add extra expense, but that
just seems like an argument for judicious use.
> As for other
> reasons it's fundamentally flawed, Dan G. may remember.
>
> I would be tempted to go with #1 just to fix LICM, since LICM has no
> need for generic dependence analysis.
I was afraid that you'd say that ;)
>
> Of course, the vectorizer already uses SCEV to determine array
> element dependence. It should have a strong method for dependence
> analysis. Why does it miss this case?
It has been improving, but I think it is still fairly ad hoc. I think that we
should switch to using a DependenceAnalysis-based solution (although maybe not
until after 3.4 branches).
Nevertheless, it would still be better to have the scalar load outside the loop,
instead of loaded and splatted inside the loop.
-Hal
>
> -Andy
>
> >
> > Thanks again,
> > Hal
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
>
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory