Cranmer, Joshua via llvm-dev
2020-Apr-28 21:42 UTC
[llvm-dev] Nontemporal memory accesses and fences
The current specification of the behavior of the !nontemporal attribute in LLVM, and the __builtin_nontemporal_* functions in Clang, is rather spartan and underspecified. In effect, it says the following things: * Atomic !nontemporal has no defined semantics * !nontemporal may use special instructions to save cache bandwidth, such as "MOVNT" on x86. What is crucially lacking from this specification is its effects in relation to other memory ordering constructs, namely fences. In lieu of text to the contrary, one could reasonably come to the conclusion that this code should be sufficient to generate correctly working code: store i64 %x, i64* %addr, !nontemporal !100 fence release But on x86 today, it does not: the first store is lowered to a MOVNT, while the fence release is lowered to a compiler-only memory barrier instead of an SFENCE (as regular load/store/atomicrmw operations already have these operations built into their execution semantics). So the fence instructions appear to be lowered with the expectation that nontemporal instructions do not exist, which is not implied by the language reference. I've been pointed to an earlier thread (https://lists.llvm.org/pipermail/llvm-dev/2016-January/093912.html) that suggests an introduction of nontemporal-specific fences, but there appears to have been no effort to actually push this into LLVM. What's the current status of this work, and what is the right way to move forward and clarifying the semantics of !nontemporal with respect to threads. -- Joshua Cranmer -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200428/52e0399e/attachment.html>
JF Bastien via llvm-dev
2020-Apr-28 21:54 UTC
[llvm-dev] Nontemporal memory accesses and fences
I see nontemporals as a nice builtin so you don’t have to hand-write the assembly. I don’t think existing hardware is consistent enough to allow us to expose nice semantics at a language or IR level, in the sense that we probably don’t want to define what happens if you step out of the comfortable uses for nontemporals (i.e. what happens when you start having temporality). A builtin pre / post fence seems fine to me, but that still leaves a gaping semantic hole… which I’m also fine with. FWIW you have the same problem, somewhat worse, with some vector load / store instructions on some ISAs.> On Apr 28, 2020, at 2:42 PM, Cranmer, Joshua <joshua.cranmer at intel.com> wrote: > > The current specification of the behavior of the !nontemporal attribute in LLVM, and the __builtin_nontemporal_* functions in Clang, is rather spartan and underspecified. In effect, it says the following things: > Atomic !nontemporal has no defined semantics > !nontemporal may use special instructions to save cache bandwidth, such as “MOVNT” on x86. > > What is crucially lacking from this specification is its effects in relation to other memory ordering constructs, namely fences. In lieu of text to the contrary, one could reasonably come to the conclusion that this code should be sufficient to generate correctly working code: > > store i64 %x, i64* %addr, !nontemporal !100 > fence release > > But on x86 today, it does not: the first store is lowered to a MOVNT, while the fence release is lowered to a compiler-only memory barrier instead of an SFENCE (as regular load/store/atomicrmw operations already have these operations built into their execution semantics). So the fence instructions appear to be lowered with the expectation that nontemporal instructions do not exist, which is not implied by the language reference. > > I’ve been pointed to an earlier thread (https://lists.llvm.org/pipermail/llvm-dev/2016-January/093912.html <https://lists.llvm.org/pipermail/llvm-dev/2016-January/093912.html>) that suggests an introduction of nontemporal-specific fences, but there appears to have been no effort to actually push this into LLVM. What’s the current status of this work, and what is the right way to move forward and clarifying the semantics of !nontemporal with respect to threads. > > -- > Joshua Cranmer-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200428/3cf1a864/attachment.html>
Finkel, Hal J. via llvm-dev
2020-Apr-29 21:13 UTC
[llvm-dev] Nontemporal memory accesses and fences
________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of JF Bastien via llvm-dev <llvm-dev at lists.llvm.org> Sent: Tuesday, April 28, 2020 4:54 PM To: Cranmer, Joshua <joshua.cranmer at intel.com> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Nontemporal memory accesses and fences I see nontemporals as a nice builtin so you don’t have to hand-write the assembly. I don’t think existing hardware is consistent enough to allow us to expose nice semantics at a language or IR level, in the sense that we probably don’t want to define what happens if you step out of the comfortable uses for nontemporals (i.e. what happens when you start having temporality). A builtin pre / post fence seems fine to me, but that still leaves a gaping semantic hole… which I’m also fine with. FWIW you have the same problem, somewhat worse, with some vector load / store instructions on some ISAs. I was under the impression that we had decided to move forward with the fence, and it continues to make sense to me. -Hal On Apr 28, 2020, at 2:42 PM, Cranmer, Joshua <joshua.cranmer at intel.com<mailto:joshua.cranmer at intel.com>> wrote: The current specification of the behavior of the !nontemporal attribute in LLVM, and the __builtin_nontemporal_* functions in Clang, is rather spartan and underspecified. In effect, it says the following things: * Atomic !nontemporal has no defined semantics * !nontemporal may use special instructions to save cache bandwidth, such as “MOVNT” on x86. What is crucially lacking from this specification is its effects in relation to other memory ordering constructs, namely fences. In lieu of text to the contrary, one could reasonably come to the conclusion that this code should be sufficient to generate correctly working code: store i64 %x, i64* %addr, !nontemporal !100 fence release But on x86 today, it does not: the first store is lowered to a MOVNT, while the fence release is lowered to a compiler-only memory barrier instead of an SFENCE (as regular load/store/atomicrmw operations already have these operations built into their execution semantics). So the fence instructions appear to be lowered with the expectation that nontemporal instructions do not exist, which is not implied by the language reference. I’ve been pointed to an earlier thread (https://lists.llvm.org/pipermail/llvm-dev/2016-January/093912.html) that suggests an introduction of nontemporal-specific fences, but there appears to have been no effort to actually push this into LLVM. What’s the current status of this work, and what is the right way to move forward and clarifying the semantics of !nontemporal with respect to threads. -- Joshua Cranmer -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200429/ef122073/attachment.html>