On Fri, Aug 14, 2020 at 10:42:02AM -0700, JF Bastien via llvm-dev wrote:> We (C, C++, and LLVM) are generally moving towards supporting FP as a > first-class thing with all atomic operations †, including cmpxchg. It’s > indeed *usually* specified as a bitwise comparison, not a floating-point > one, although IIRC AMD has an FP cmpxchg. Similarly, some of the > operations are allowed to have separate FP state (say, atomic add won’t > necessarily affect the scalar FP execution’s exception state, might > have a different rounding mode, etc).We don't really FP cmpxchg in hardware to implement it, do we? It can be lowered as load, FP compare, if not equal cmpxchg load? Joerg
On Tue, Aug 18, 2020 at 1:27 AM Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org> wrote:> On Fri, Aug 14, 2020 at 10:42:02AM -0700, JF Bastien via llvm-dev wrote: > > We (C, C++, and LLVM) are generally moving towards supporting FP as a > > first-class thing with all atomic operations †, including cmpxchg. It’s > > indeed *usually* specified as a bitwise comparison, not a floating-point > > one, although IIRC AMD has an FP cmpxchg. Similarly, some of the > > operations are allowed to have separate FP state (say, atomic add won’t > > necessarily affect the scalar FP execution’s exception state, might > > have a different rounding mode, etc). > > We don't really FP cmpxchg in hardware to implement it, do we? It can be > lowered as load, FP compare, if not equal cmpxchg load?Two points here: 1. Hardware with native fcmpxchg already exists. 2. It's incorrect even if I replace your "if not equal" by "if equal" (which I assume is what you meant). On the latter, assume your float in memory is initially -0.0, thread 1 does cmpxchg(-0.0, +0.0) and thread 2 does fcmpxchg(+0.0, 1.0). The memory location is guaranteed to be 1.0 after both threads have run, but this is no longer true with your replacement, because the following ordering of operations is possible: - Thread 2 loads -0.0, compares to +0.0 => comparison is equal - Thread 1 does cmpxchg, memory value is now changed to +0.0 - Thread 2 does cmpxchg(-0.0, 1.0) now, testing whether the memory location is unchanged --> this fails, so the memory location stays +0.0 Cheers, Nicolai> > Joerg > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte.
> On Aug 21, 2020, at 2:51 PM, Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Tue, Aug 18, 2020 at 1:27 AM Joerg Sonnenberger via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> On Fri, Aug 14, 2020 at 10:42:02AM -0700, JF Bastien via llvm-dev wrote: >>> We (C, C++, and LLVM) are generally moving towards supporting FP as a >>> first-class thing with all atomic operations †, including cmpxchg. It’s >>> indeed *usually* specified as a bitwise comparison, not a floating-point >>> one, although IIRC AMD has an FP cmpxchg. Similarly, some of the >>> operations are allowed to have separate FP state (say, atomic add won’t >>> necessarily affect the scalar FP execution’s exception state, might >>> have a different rounding mode, etc). >> >> We don't really FP cmpxchg in hardware to implement it, do we? It can be >> lowered as load, FP compare, if not equal cmpxchg load? > > Two points here: > > 1. Hardware with native fcmpxchg already exists. > 2. It's incorrect even if I replace your "if not equal" by "if equal" > (which I assume is what you meant). > > On the latter, assume your float in memory is initially -0.0, thread 1 > does cmpxchg(-0.0, +0.0) and thread 2 does fcmpxchg(+0.0, 1.0). The > memory location is guaranteed to be 1.0 after both threads have run, > but this is no longer true with your replacement, because the > following ordering of operations is possible: > > - Thread 2 loads -0.0, compares to +0.0 => comparison is equal > - Thread 1 does cmpxchg, memory value is now changed to +0.0 > - Thread 2 does cmpxchg(-0.0, 1.0) now, testing whether the memory > location is unchanged --> this fails, so the memory location stays > +0.0Right, I agree. I think this argues for this being a separate ‘fcmpxchg’ instruction, because the condition code is different. -Chris
On Fri, Aug 21, 2020 at 11:51:18PM +0200, Nicolai Hähnle wrote:> On Tue, Aug 18, 2020 at 1:27 AM Joerg Sonnenberger via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > On Fri, Aug 14, 2020 at 10:42:02AM -0700, JF Bastien via llvm-dev wrote: > > > We (C, C++, and LLVM) are generally moving towards supporting FP as a > > > first-class thing with all atomic operations †, including cmpxchg. It’s > > > indeed *usually* specified as a bitwise comparison, not a floating-point > > > one, although IIRC AMD has an FP cmpxchg. Similarly, some of the > > > operations are allowed to have separate FP state (say, atomic add won’t > > > necessarily affect the scalar FP execution’s exception state, might > > > have a different rounding mode, etc). > > > > We don't really FP cmpxchg in hardware to implement it, do we? It can be > > lowered as load, FP compare, if not equal cmpxchg load? > > Two points here: > > 1. Hardware with native fcmpxchg already exists. > 2. It's incorrect even if I replace your "if not equal" by "if equal" > (which I assume is what you meant). > > On the latter, assume your float in memory is initially -0.0, thread 1 > does cmpxchg(-0.0, +0.0) and thread 2 does fcmpxchg(+0.0, 1.0). The > memory location is guaranteed to be 1.0 after both threads have run, > but this is no longer true with your replacement, because the > following ordering of operations is possible: > > - Thread 2 loads -0.0, compares to +0.0 => comparison is equal > - Thread 1 does cmpxchg, memory value is now changed to +0.0 > - Thread 2 does cmpxchg(-0.0, 1.0) now, testing whether the memory > location is unchanged --> this fails, so the memory location stays > +0.0Thread 2 does the cmpxchg with the loaded value, not the value it is tested for. So thread 2 would be using +0.0 as well. Joerg
> On Aug 17, 2020, at 4:27 PM, Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Fri, Aug 14, 2020 at 10:42:02AM -0700, JF Bastien via llvm-dev wrote: >> We (C, C++, and LLVM) are generally moving towards supporting FP as a >> first-class thing with all atomic operations †, including cmpxchg. It’s >> indeed *usually* specified as a bitwise comparison, not a floating-point >> one, although IIRC AMD has an FP cmpxchg. Similarly, some of the >> operations are allowed to have separate FP state (say, atomic add won’t >> necessarily affect the scalar FP execution’s exception state, might >> have a different rounding mode, etc). > > We don't really FP cmpxchg in hardware to implement it, do we? It can be > lowered as load, FP compare, if not equal cmpxchg load?That’s correct, but I’m mainly interested in bitwise comparison. That’s what C, C++, and (IMO) LLVM IR mean when an FP value is passed to cmpxchg. Separately, there are operations such as atomic fadd and atomic fsub which make senses, can be supported directly by HW, and can have separate FP state. I bring this up because I believe the discussion which started this thread can benefit from a bit wider perspective than “the LangRef says exactly *this*”.