thr3ads.net - llvm dev - [llvm-dev] Nontemporal memory accesses and fences [Apr 2020]

If this information is useful, please help other people find it:
Share via:

Finkel, Hal J. via llvm-dev

2020-Apr-29 21:13 UTC

[llvm-dev] Nontemporal memory accesses and fences

________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of JF
Bastien via llvm-dev <llvm-dev at lists.llvm.org>
Sent: Tuesday, April 28, 2020 4:54 PM
To: Cranmer, Joshua <joshua.cranmer at intel.com>
Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Nontemporal memory accesses and fences

I see nontemporals as a nice builtin so you don’t have to hand-write the
assembly. I don’t think existing hardware is consistent enough to allow us to
expose nice semantics at a language or IR level, in the sense that we probably
don’t want to define what happens if you step out of the comfortable uses for
nontemporals (i.e. what happens when you start having temporality). A builtin
pre / post fence seems fine to me, but that still leaves a gaping semantic hole…
which I’m also fine with.

FWIW you have the same problem, somewhat worse, with some vector load / store
instructions on some ISAs.

I was under the impression that we had decided to move forward with the fence,
and it continues to make sense to me.

 -Hal

On Apr 28, 2020, at 2:42 PM, Cranmer, Joshua <joshua.cranmer at
intel.com<mailto:joshua.cranmer at intel.com>> wrote:

The current specification of the behavior of the !nontemporal attribute in LLVM,
and the __builtin_nontemporal_* functions in Clang, is rather spartan and
underspecified. In effect, it says the following things:

  *   Atomic !nontemporal has no defined semantics
  *   !nontemporal may use special instructions to save cache bandwidth, such as
“MOVNT” on x86.

What is crucially lacking from this specification is its effects in relation to
other memory ordering constructs, namely fences. In lieu of text to the
contrary, one could reasonably come to the conclusion that this code should be
sufficient to generate correctly working code:

store i64 %x, i64* %addr, !nontemporal !100
fence release

But on x86 today, it does not: the first store is lowered to a MOVNT, while the
fence release is lowered to a compiler-only memory barrier instead of an SFENCE
(as regular load/store/atomicrmw operations already have these operations built
into their execution semantics). So the fence instructions appear to be lowered
with the expectation that nontemporal instructions do not exist, which is not
implied by the language reference.

I’ve been pointed to an earlier thread
(https://lists.llvm.org/pipermail/llvm-dev/2016-January/093912.html) that
suggests an introduction of nontemporal-specific fences, but there appears to
have been no effort to actually push this into LLVM. What’s the current status
of this work, and what is the right way to move forward and clarifying the
semantics of !nontemporal with respect to threads.

--
Joshua Cranmer

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200429/ef122073/attachment.html>

JF Bastien via llvm-dev

2020-Apr-29 21:24 UTC

head link

[llvm-dev] Nontemporal memory accesses and fences

> On Apr 29, 2020, at 2:13 PM, Finkel, Hal J. <hfinkel at anl.gov>
wrote:
> 
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of JF
Bastien via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: Tuesday, April 28, 2020 4:54 PM
> To: Cranmer, Joshua <joshua.cranmer at intel.com>
> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Nontemporal memory accesses and fences
>  
> I see nontemporals as a nice builtin so you don’t have to hand-write the
assembly. I don’t think existing hardware is consistent enough to allow us to
expose nice semantics at a language or IR level, in the sense that we probably
don’t want to define what happens if you step out of the comfortable uses for
nontemporals (i.e. what happens when you start having temporality). A builtin
pre / post fence seems fine to me, but that still leaves a gaping semantic hole…
which I’m also fine with.
> 
> FWIW you have the same problem, somewhat worse, with some vector load /
store instructions on some ISAs.
> 
> I was under the impression that we had decided to move forward with the
fence, and it continues to make sense to me.
I probably missed a prior discussion, so a link would be great! That being said,
fences sound fine to me as long as they’re “nice builtins instead of inline asm”
and don’t try to offer more than “whatever the hardware instruction, if any,
means”.

>  -Hal
> 
> 
> 
> On Apr 28, 2020, at 2:42 PM, Cranmer, Joshua <joshua.cranmer at
intel.com <mailto:joshua.cranmer at intel.com>> wrote:
> 
> The current specification of the behavior of the !nontemporal attribute in
LLVM, and the __builtin_nontemporal_* functions in Clang, is rather spartan and
underspecified. In effect, it says the following things:
> Atomic !nontemporal has no defined semantics
> !nontemporal may use special instructions to save cache bandwidth, such as
“MOVNT” on x86.
>  
> What is crucially lacking from this specification is its effects in
relation to other memory ordering constructs, namely fences. In lieu of text to
the contrary, one could reasonably come to the conclusion that this code should
be sufficient to generate correctly working code:
>  
> store i64 %x, i64* %addr, !nontemporal !100
> fence release
>  
> But on x86 today, it does not: the first store is lowered to a MOVNT, while
the fence release is lowered to a compiler-only memory barrier instead of an
SFENCE (as regular load/store/atomicrmw operations already have these operations
built into their execution semantics). So the fence instructions appear to be
lowered with the expectation that nontemporal instructions do not exist, which
is not implied by the language reference.
>  
> I’ve been pointed to an earlier thread
(https://lists.llvm.org/pipermail/llvm-dev/2016-January/093912.html
<https://lists.llvm.org/pipermail/llvm-dev/2016-January/093912.html>) that
suggests an introduction of nontemporal-specific fences, but there appears to
have been no effort to actually push this into LLVM. What’s the current status
of this work, and what is the right way to move forward and clarifying the
semantics of !nontemporal with respect to threads.
>  
> -- 
> Joshua Cranmer
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200429/6b64368c/attachment.html>

Finkel, Hal J. via llvm-dev

2020-Apr-29 21:40 UTC

head link

[llvm-dev] Nontemporal memory accesses and fences

________________________________
From: JF Bastien <jfbastien at apple.com>
Sent: Wednesday, April 29, 2020 4:24 PM
To: Finkel, Hal J. <hfinkel at anl.gov>
Cc: Cranmer, Joshua <joshua.cranmer at intel.com>; llvm-dev at
lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Nontemporal memory accesses and fences

On Apr 29, 2020, at 2:13 PM, Finkel, Hal J. <hfinkel at
anl.gov<mailto:hfinkel at anl.gov>> wrote:

________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> on behalf of JF Bastien via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: Tuesday, April 28, 2020 4:54 PM
To: Cranmer, Joshua <joshua.cranmer at intel.com<mailto:joshua.cranmer at
intel.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
<llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] Nontemporal memory accesses and fences

I see nontemporals as a nice builtin so you don’t have to hand-write the
assembly. I don’t think existing hardware is consistent enough to allow us to
expose nice semantics at a language or IR level, in the sense that we probably
don’t want to define what happens if you step out of the comfortable uses for
nontemporals (i.e. what happens when you start having temporality). A builtin
pre / post fence seems fine to me, but that still leaves a gaping semantic hole…
which I’m also fine with.

FWIW you have the same problem, somewhat worse, with some vector load / store
instructions on some ISAs.

I was under the impression that we had decided to move forward with the fence,
and it continues to make sense to me.

I probably missed a prior discussion, so a link would be great! That being said,
fences sound fine to me as long as they’re “nice builtins instead of inline asm”
and don’t try to offer more than “whatever the hardware instruction, if any,
means”.

I was referring to your RFC linked below (from 2016). We might also have talked
about this offline. The fences seemed like a sensible solution. Reviewing the
RFC thread, it seemed like: 1) We needed separate load and store fences and 2)
There may have been some uncertainty how to map these on ARM.

 -Hal

 -Hal

On Apr 28, 2020, at 2:42 PM, Cranmer, Joshua <joshua.cranmer at
intel.com<mailto:joshua.cranmer at intel.com>> wrote:

The current specification of the behavior of the !nontemporal attribute in LLVM,
and the __builtin_nontemporal_* functions in Clang, is rather spartan and
underspecified. In effect, it says the following things:

  *   Atomic !nontemporal has no defined semantics
  *   !nontemporal may use special instructions to save cache bandwidth, such as
“MOVNT” on x86.

What is crucially lacking from this specification is its effects in relation to
other memory ordering constructs, namely fences. In lieu of text to the
contrary, one could reasonably come to the conclusion that this code should be
sufficient to generate correctly working code:

store i64 %x, i64* %addr, !nontemporal !100
fence release

But on x86 today, it does not: the first store is lowered to a MOVNT, while the
fence release is lowered to a compiler-only memory barrier instead of an SFENCE
(as regular load/store/atomicrmw operations already have these operations built
into their execution semantics). So the fence instructions appear to be lowered
with the expectation that nontemporal instructions do not exist, which is not
implied by the language reference.

I’ve been pointed to an earlier thread
(https://lists.llvm.org/pipermail/llvm-dev/2016-January/093912.html) that
suggests an introduction of nontemporal-specific fences, but there appears to
have been no effort to actually push this into LLVM. What’s the current status
of this work, and what is the right way to move forward and clarifying the
semantics of !nontemporal with respect to threads.

--
Joshua Cranmer

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200429/80e0a443/attachment-0001.html>

llvm dev - Apr 2020 - Nontemporal memory accesses and fences

[llvm-dev] Nontemporal memory accesses and fences

[llvm-dev] Nontemporal memory accesses and fences

[llvm-dev] Nontemporal memory accesses and fences