thr3ads.net - llvm dev - [LLVMdev] Optimization on Atomics (and the OpenMP memory model) [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2015-Apr-10 17:12 UTC

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

Hi everyone,

The OpenMP standards committee has begun work to formalize their memory model,
and define its relationship to the C/C++ memory models. A questionnaire has been
put together (pasted below), and I'd like everyone's help in composing
detailed answers to inform their decision-making process. While our OpenMP
support is still in active development, many of these questions apply equally to
C/C++ atomics, and a lot of work has certainly been done here on that front.

* Which processor architectures does your compiler target (e.g. x86, Power, ARM,
ARM v8, Xeon Phi, Nvidia GPUs, etc.)?
    [I'll just answer "yes" for that one ;)]
* What is a flush lowered to in assembly for each of the supported
architectures? For instance, a flush might be implemented as an MFENCE on the
x86 architecture in some compilers.
* What are non-seq_cst atomic read, write, update and capture lowered to for
each of your targets?
* What are seq_cst atomic read, write, update and capture lowered to for each of
your targets?
* What is the taskwait construct lowered to for each of your targets?
* What are omp_set_lock and omp_unset_lock lowered to for each of your targets?
* What is a barrier lowered to for each of your targets?
* Are any optimisations allowed to reorder, change or remove code that uses any
of the synchronisation constructs above, or any of the other synchronisation
constructs in section 2.12 of the OpenMP 4.0 specification?

I'll be happy to collate answers to send back to the committee; please
provide as much feedback as you can.

Thanks in advance,
Hal

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

JF Bastien

2015-Apr-10 17:22 UTC

head link

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

Architecture-specific NaCl basically uses whatever the underlying ISA
expects (so answers will be "same as the above"), save for any OS
interactions which go through NaCl's own syscalls (which then sometimes
punt to the host OS). The most interesting here is futex, which is a
restricted version of Linux' futex and gets emulated on non-Linux guests.

For PNaCl none of the C++ constructs get lowered past their C++ semantics
(they look like functions calls). futex is still a syscall for PNaCl. C++
constructs only get lowered once the actual ISA is known (translation time)
at which point we're usually generating NaCl code and the same as the above
applies.

The same as PNaCl would apply to JavaScript once it supports
SharedArrayBuffer if the current proposal goes forward. One small change
would be to rely on synchronic instead of futex.

I'm not familiar enough with taskwait, omp_set_lock / omp_unset_lock to
provide a useful answer. I assume that we can implement them with atomics
and futex if QOI isn't an issue?

On reordering: yes, as much as C++ allows. Good timing for this:

https://github.com/jfbastien/papers/blob/master/source/N4455.rst

On Fri, Apr 10, 2015 at 10:12 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> Hi everyone,
>
> The OpenMP standards committee has begun work to formalize their memory
> model, and define its relationship to the C/C++ memory models. A
> questionnaire has been put together (pasted below), and I'd like
everyone's
> help in composing detailed answers to inform their decision-making process.
> While our OpenMP support is still in active development, many of these
> questions apply equally to C/C++ atomics, and a lot of work has certainly
> been done here on that front.
>
> * Which processor architectures does your compiler target (e.g. x86,
> Power, ARM, ARM v8, Xeon Phi, Nvidia GPUs, etc.)?
>     [I'll just answer "yes" for that one ;)]
> * What is a flush lowered to in assembly for each of the supported
> architectures? For instance, a flush might be implemented as an MFENCE on
> the x86 architecture in some compilers.
> * What are non-seq_cst atomic read, write, update and capture lowered to
> for each of your targets?
> * What are seq_cst atomic read, write, update and capture lowered to for
> each of your targets?
> * What is the taskwait construct lowered to for each of your targets?
> * What are omp_set_lock and omp_unset_lock lowered to for each of your
> targets?
> * What is a barrier lowered to for each of your targets?
> * Are any optimisations allowed to reorder, change or remove code that
> uses any of the synchronisation constructs above, or any of the other
> synchronisation constructs in section 2.12 of the OpenMP 4.0 specification?
>
> I'll be happy to collate answers to send back to the committee; please
> provide as much feedback as you can.
>
> Thanks in advance,
> Hal
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150410/f0b9839b/attachment.html>

Robin Morisset

2015-Apr-10 17:56 UTC

head link

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

Hello,

For how the C++ atomics are lowered, I suggest looking at:
https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
Please note that the ARMv7 Cmpxchg SeqCst is uncorrect on that page, a 
dmb is required instead of an isb after it (it will be updated soon, and 
is already correct in LLVM).

I don't know enough about OpenMP to answer about OpenMP-specific constructs.

Best regards,
Robin Morisset

Le 10/04/2015 19:12, Hal Finkel a écrit :> Hi everyone,
>
> The OpenMP standards committee has begun work to formalize their memory
model, and define its relationship to the C/C++ memory models. A questionnaire
has been put together (pasted below), and I'd like everyone's help in
composing detailed answers to inform their decision-making process. While our
OpenMP support is still in active development, many of these questions apply
equally to C/C++ atomics, and a lot of work has certainly been done here on that
front.
>
> * Which processor architectures does your compiler target (e.g. x86, Power,
ARM, ARM v8, Xeon Phi, Nvidia GPUs, etc.)?
>      [I'll just answer "yes" for that one ;)]
> * What is a flush lowered to in assembly for each of the supported
architectures? For instance, a flush might be implemented as an MFENCE on the
x86 architecture in some compilers.
> * What are non-seq_cst atomic read, write, update and capture lowered to
for each of your targets?
> * What are seq_cst atomic read, write, update and capture lowered to for
each of your targets?
> * What is the taskwait construct lowered to for each of your targets?
> * What are omp_set_lock and omp_unset_lock lowered to for each of your
targets?
> * What is a barrier lowered to for each of your targets?
> * Are any optimisations allowed to reorder, change or remove code that uses
any of the synchronisation constructs above, or any of the other synchronisation
constructs in section 2.12 of the OpenMP 4.0 specification?
>
> I'll be happy to collate answers to send back to the committee; please
provide as much feedback as you can.
>
> Thanks in advance,
> Hal
>

Michael Wong

2015-Apr-10 18:20 UTC

head link

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

Thanks, I have also forwarded your N document and associated discussion  to
the OpenMP MM group including Mark Batty


_________________________________________________________
Regards, Michael
ISOCPP.org Director, VP http://isocpp.org/wiki/faq/wg21#michael-wong
OpenMP CEO: http://openmp.org/wp/about-openmp/
My Blogs: http://ibm.co/pCvPHR
C++11 status: http://tinyurl.com/43y8xgf
Boost test results
http://www.ibm.com/support/docview.wss?rs=2239&context=SSJT9L&uid=swg27006911

C/C++ Compilers Feature Request Page
http://www.ibm.com/developerworks/rfe/?PROD_ID=700
Chair of WG21 SG5 Transactional Memory:
https://groups.google.com/a/isocpp.org/forum/?hl=en&fromgroups#!forum/tm

IBM Corporation
XL C++ Compiler kernel Development
IBM z Systems Software,IBM Systems Unit
IBM Canada Ltd., C2/KD2/8200/MKM
8200 Warden Avenue
Markham, Ontario L6G 1C7
W:905-413-3283 F:905-413-4839
OpenMPCon 2015





From:	JF Bastien <jfb at chromium.org>
To:	Hal Finkel <hfinkel at anl.gov>
Cc:	LLVM Dev <llvmdev at cs.uiuc.edu>, Chandler Carruth
            <chandlerc at gmail.com>, Daniel Berlin <dberlin at
dberlin.org>,
            Robin Morisset <robin.morisset at normalesup.org>, t p
northover
            <t.p.northover at gmail.com>, James Molloy
            <james at jamesmolloy.co.uk>, Tom Stellard <tom at
stellard.net>,
            renato golin <renato.golin at linaro.org>, Michael
            Wong/Toronto/IBM at IBMCA, Alexey Bataev <a.bataev at
gmx.com>, Bill
            Schmidt <wschmidt at linux.vnet.ibm.com>
Date:	04/10/2015 01:22 PM
Subject:	Re: Optimization on Atomics (and the OpenMP memory model)
Sent by:	jfb at google.com



Architecture-specific NaCl basically uses whatever the underlying ISA
expects (so answers will be "same as the above"), save for any OS
interactions which go through NaCl's own syscalls (which then sometimes
punt to the host OS). The most interesting here is futex, which is a
restricted version of Linux' futex and gets emulated on non-Linux guests.

For PNaCl none of the C++ constructs get lowered past their C++ semantics
(they look like functions calls). futex is still a syscall for PNaCl. C++
constructs only get lowered once the actual ISA is known (translation time)
at which point we're usually generating NaCl code and the same as the above
applies.

The same as PNaCl would apply to JavaScript once it supports
SharedArrayBuffer if the current proposal goes forward. One small change
would be to rely on synchronic instead of futex.

I'm not familiar enough with taskwait, omp_set_lock / omp_unset_lock to
provide a useful answer. I assume that we can implement them with atomics
and futex if QOI isn't an issue?

On reordering: yes, as much as C++ allows. Good timing for this:
    https://github.com/jfbastien/papers/blob/master/source/N4455.rst

On Fri, Apr 10, 2015 at 10:12 AM, Hal Finkel <hfinkel at anl.gov> wrote:
  Hi everyone,

  The OpenMP standards committee has begun work to formalize their memory
  model, and define its relationship to the C/C++ memory models. A
  questionnaire has been put together (pasted below), and I'd like
  everyone's help in composing detailed answers to inform their
  decision-making process. While our OpenMP support is still in active
  development, many of these questions apply equally to C/C++ atomics, and
  a lot of work has certainly been done here on that front.

  * Which processor architectures does your compiler target (e.g. x86,
  Power, ARM, ARM v8, Xeon Phi, Nvidia GPUs, etc.)?
      [I'll just answer "yes" for that one ;)]
  * What is a flush lowered to in assembly for each of the supported
  architectures? For instance, a flush might be implemented as an MFENCE on
  the x86 architecture in some compilers.
  * What are non-seq_cst atomic read, write, update and capture lowered to
  for each of your targets?
  * What are seq_cst atomic read, write, update and capture lowered to for
  each of your targets?
  * What is the taskwait construct lowered to for each of your targets?
  * What are omp_set_lock and omp_unset_lock lowered to for each of your
  targets?
  * What is a barrier lowered to for each of your targets?
  * Are any optimisations allowed to reorder, change or remove code that
  uses any of the synchronisation constructs above, or any of the other
  synchronisation constructs in section 2.12 of the OpenMP 4.0
  specification?

  I'll be happy to collate answers to send back to the committee; please
  provide as much feedback as you can.

  Thanks in advance,
  Hal

  --
  Hal Finkel
  Assistant Computational Scientist
  Leadership Computing Facility
  Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150410/411556aa/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2E142455.jpg
Type: image/jpeg
Size: 14029 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150410/411556aa/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150410/411556aa/attachment.gif>

Bill Schmidt

2015-Apr-10 19:15 UTC

head link

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

Hi Hal,

Paul McKenney and Raul Silvera wrote a detailed document describing
POWER implementation of the C/C++ Memory Model.  It can be downloaded at
http://www.rdrop.com/~paulmck/scalability/paper/N2745r.2011.03.04a.html.

Hope this helps,
Bill

On Fri, 2015-04-10 at 12:12 -0500, Hal Finkel wrote:> Hi everyone,
> 
> The OpenMP standards committee has begun work to formalize their memory
model, and define its relationship to the C/C++ memory models. A questionnaire
has been put together (pasted below), and I'd like everyone's help in
composing detailed answers to inform their decision-making process. While our
OpenMP support is still in active development, many of these questions apply
equally to C/C++ atomics, and a lot of work has certainly been done here on that
front.
> 
> * Which processor architectures does your compiler target (e.g. x86, Power,
ARM, ARM v8, Xeon Phi, Nvidia GPUs, etc.)?
>     [I'll just answer "yes" for that one ;)]
> * What is a flush lowered to in assembly for each of the supported
architectures? For instance, a flush might be implemented as an MFENCE on the
x86 architecture in some compilers.
> * What are non-seq_cst atomic read, write, update and capture lowered to
for each of your targets?
> * What are seq_cst atomic read, write, update and capture lowered to for
each of your targets?
> * What is the taskwait construct lowered to for each of your targets?
> * What are omp_set_lock and omp_unset_lock lowered to for each of your
targets?
> * What is a barrier lowered to for each of your targets?
> * Are any optimisations allowed to reorder, change or remove code that uses
any of the synchronisation constructs above, or any of the other synchronisation
constructs in section 2.12 of the OpenMP 4.0 specification?
> 
> I'll be happy to collate answers to send back to the committee; please
provide as much feedback as you can.
> 
> Thanks in advance,
> Hal
>

llvm dev - Apr 2015 - [LLVMdev] Optimization on Atomics (and the OpenMP memory model)

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)