> > Again, optimizations could break it, violating a possible contract
with the
> > user.
>
> AFAIK, all these annotations are more hints than contracts. Adding
> #pragma omp simd on a loop that has forward dependencies that cannot
> be solved should not make the loop vectorize (incorrectly). Same goes
> for standard OMP, threads and other types of parallel computation.
I am actually not completely familiar with the semantics of all OpenMP
directives,
but in the case of Cilk for instance a spawn means execute in parallel if the
runtime
deems it beneficial. In my understanding this does not cover sequential
execution
because of other (static) optimizations breaking parallelism.
Other parallel source languages might be even more strict in this.
> > The headaches that this approach causes me are that basic analyses
like
> > dominance, reachability and the like are broken
>
> Yes, intrinsics are heavy handed and will stop the compiler from being
> smart in many ways.
Just to be clear here. Code like this:
parallel.task.start(1)
some_fun(n/2)
parallel.task.end(1)
...
parallel.task.start(2)
other_fun(n/2)
parallel.task.end(2)
Would be transformed to
parallel.task.start(1)
tmp = n/2
some_fun(tmp)
parallel.task.end(1)
...
parallel.task.start(2)
other_fun(tmp)
parallel.task.end(2)
Which is fine in the eyes of the respective optimizations currently, even if the
parallel.task intrinsics might throw. The problem stems from dominance thinking
that
the new location of the division dominates all uses, which is wrong.
This is not just not smart, it is wrong and the result of trying to
insufficiently
integrate parallelism into a sequential IR.
One possibility would be to add a "barrier" property to the intrinsics
with a
semantics
of not allowing to move anything around it. But his again would prohibit other
useful
optimizations.
Basically the parallelism and the different dominance resulting from it might
affect
most flow analyses.
Of course we could make them all aware of the intrinsics, but how far are we
then away
from integrating it into the one, or a different, IR.
> > One possibility to do this gradually might also be to have a seperate,
> > parallel, IR, say PIR, that will be
> > lowered to regular IR at some point (however this point is chosen).
>
> So, we have Polly that does that, and it's not a trivial issue. Adding
> yet another representation is worrying. The IR Language Reference is
> comprehensive and authoritative, and is a great resource for building
> IR from ASTs or adding optimisations. For every new representation, we
> have to add a similar document, plus another to tell how that is
> represented in other IRs, so that transformations are well defined. I
> don't think we have that for Polly, and adding PIR would double that
> problem.
You are completely right on that and I fully agree. That's why, from the
perspective
of LLVM, I would definitely not perform such a big, risky and error-prone step.
I think defining a parallel IR (with first class parallelism) is a research
topic,
just as Polly has been in the beginning (and still is, for that matter).
> I'd be much more open to slow moving changes to the current IR
> (especially if they benefit Polly, too). This path worked very well
> for exception handling (which is an equally complex case), and should
> work for parallelism.
>
> But that's just my opinion. :)
As said, I completely agree with you on that. It's just that, to play
parallelism's
advocate here, parallelism is invasive and consequently has, or should have,
significant
impact on most analyses and transformations. I am not sure how much effort and
head-aches
will be necessary in the long run to gradually move over to a parallelism-aware
toolchain.
Cheers,
--
Kevin Streit
Neugäßchen 2
66111 Saarbrücken
Tel. +49 (0)151 23003245
streit at mailbox.org · http://www.kevinstreit.de