On Wed, Jul 8, 2015 at 1:19 AM, Pekka Jääskeläinen
<pekka.jaaskelainen at tut.fi> wrote:> On 07/07/2015 01:32 PM, Renato Golin wrote:
>>
>> Wouldn't OpenMP account for some of that? At least on a single
>> machine, could you have both parallel and simd optimisations done on
>> the same loop?
>
>
> The point in SPMD program description (e.g. CUDA or OpenCL C)
> autovectorization is to produce something like OpenMP parallel
> loops or SIMD pragmas automatically from the single thread/WI
> description, adhering to its barrier synchronization semantics
> etc.
>
> That is, the output of this pass could be also converted to
> OpenMP SIMD constructs, if wanted. In pocl's case the output
> is simply a new kernel function (we call "work group function")
> that executes all WIs using parallel loops (which can be
> autovectorized more easily, or even multithreaded if seeing fit,
> or both).
If you're going to "autopar" (turn a loop into a threads which run
on
many cores or something) then please don't add a dependency on OMP.
While it may seem enticing that will just add a layer of overhead that
in-the-end you won't need (and probably won't want). Just lower to
pthreads on linux and whatever on Windows.