On Fri, Aug 21, 2020 at 11:51:18PM +0200, Nicolai Hähnle wrote:> On Tue, Aug 18, 2020 at 1:27 AM Joerg Sonnenberger via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > On Fri, Aug 14, 2020 at 10:42:02AM -0700, JF Bastien via llvm-dev wrote: > > > We (C, C++, and LLVM) are generally moving towards supporting FP as a > > > first-class thing with all atomic operations †, including cmpxchg. It’s > > > indeed *usually* specified as a bitwise comparison, not a floating-point > > > one, although IIRC AMD has an FP cmpxchg. Similarly, some of the > > > operations are allowed to have separate FP state (say, atomic add won’t > > > necessarily affect the scalar FP execution’s exception state, might > > > have a different rounding mode, etc). > > > > We don't really FP cmpxchg in hardware to implement it, do we? It can be > > lowered as load, FP compare, if not equal cmpxchg load? > > Two points here: > > 1. Hardware with native fcmpxchg already exists. > 2. It's incorrect even if I replace your "if not equal" by "if equal" > (which I assume is what you meant). > > On the latter, assume your float in memory is initially -0.0, thread 1 > does cmpxchg(-0.0, +0.0) and thread 2 does fcmpxchg(+0.0, 1.0). The > memory location is guaranteed to be 1.0 after both threads have run, > but this is no longer true with your replacement, because the > following ordering of operations is possible: > > - Thread 2 loads -0.0, compares to +0.0 => comparison is equal > - Thread 1 does cmpxchg, memory value is now changed to +0.0 > - Thread 2 does cmpxchg(-0.0, 1.0) now, testing whether the memory > location is unchanged --> this fails, so the memory location stays > +0.0Thread 2 does the cmpxchg with the loaded value, not the value it is tested for. So thread 2 would be using +0.0 as well. Joerg
On Sat, Aug 22, 2020 at 2:52 AM Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > On Fri, Aug 21, 2020 at 11:51:18PM +0200, Nicolai Hähnle wrote: > > On Tue, Aug 18, 2020 at 1:27 AM Joerg Sonnenberger via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > On Fri, Aug 14, 2020 at 10:42:02AM -0700, JF Bastien via llvm-dev wrote: > > > > We (C, C++, and LLVM) are generally moving towards supporting FP as a > > > > first-class thing with all atomic operations †, including cmpxchg. It’s > > > > indeed *usually* specified as a bitwise comparison, not a floating-point > > > > one, although IIRC AMD has an FP cmpxchg. Similarly, some of the > > > > operations are allowed to have separate FP state (say, atomic add won’t > > > > necessarily affect the scalar FP execution’s exception state, might > > > > have a different rounding mode, etc). > > > > > > We don't really FP cmpxchg in hardware to implement it, do we? It can be > > > lowered as load, FP compare, if not equal cmpxchg load? > > > > Two points here: > > > > 1. Hardware with native fcmpxchg already exists. > > 2. It's incorrect even if I replace your "if not equal" by "if equal" > > (which I assume is what you meant). > > > > On the latter, assume your float in memory is initially -0.0, thread 1 > > does cmpxchg(-0.0, +0.0) and thread 2 does fcmpxchg(+0.0, 1.0). The > > memory location is guaranteed to be 1.0 after both threads have run, > > but this is no longer true with your replacement, because the > > following ordering of operations is possible: > > > > - Thread 2 loads -0.0, compares to +0.0 => comparison is equal > > - Thread 1 does cmpxchg, memory value is now changed to +0.0 > > - Thread 2 does cmpxchg(-0.0, 1.0) now, testing whether the memory > > location is unchanged --> this fails, so the memory location stays > > +0.0 > > Thread 2 does the cmpxchg with the loaded value, not the value it is > tested for. So thread 2 would be using +0.0 as well.Please re-read the sequence of events carefully. Thread 2 did read -0.0, so that's the value it's using. Cheers, Nicolai> > Joerg > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte.
On Sat, Aug 22, 2020 at 10:59:51AM +0200, Nicolai Hähnle wrote:> On Sat, Aug 22, 2020 at 2:52 AM Joerg Sonnenberger via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > On Fri, Aug 21, 2020 at 11:51:18PM +0200, Nicolai Hähnle wrote: > > > On Tue, Aug 18, 2020 at 1:27 AM Joerg Sonnenberger via llvm-dev > > > <llvm-dev at lists.llvm.org> wrote: > > > > On Fri, Aug 14, 2020 at 10:42:02AM -0700, JF Bastien via llvm-dev wrote: > > > > > We (C, C++, and LLVM) are generally moving towards supporting FP as a > > > > > first-class thing with all atomic operations †, including cmpxchg. It’s > > > > > indeed *usually* specified as a bitwise comparison, not a floating-point > > > > > one, although IIRC AMD has an FP cmpxchg. Similarly, some of the > > > > > operations are allowed to have separate FP state (say, atomic add won’t > > > > > necessarily affect the scalar FP execution’s exception state, might > > > > > have a different rounding mode, etc). > > > > > > > > We don't really FP cmpxchg in hardware to implement it, do we? It can be > > > > lowered as load, FP compare, if not equal cmpxchg load? > > > > > > Two points here: > > > > > > 1. Hardware with native fcmpxchg already exists. > > > 2. It's incorrect even if I replace your "if not equal" by "if equal" > > > (which I assume is what you meant). > > > > > > On the latter, assume your float in memory is initially -0.0, thread 1 > > > does cmpxchg(-0.0, +0.0) and thread 2 does fcmpxchg(+0.0, 1.0). The > > > memory location is guaranteed to be 1.0 after both threads have run, > > > but this is no longer true with your replacement, because the > > > following ordering of operations is possible: > > > > > > - Thread 2 loads -0.0, compares to +0.0 => comparison is equal > > > - Thread 1 does cmpxchg, memory value is now changed to +0.0 > > > - Thread 2 does cmpxchg(-0.0, 1.0) now, testing whether the memory > > > location is unchanged --> this fails, so the memory location stays > > > +0.0 > > > > Thread 2 does the cmpxchg with the loaded value, not the value it is > > tested for. So thread 2 would be using +0.0 as well. > > Please re-read the sequence of events carefully. Thread 2 did read > -0.0, so that's the value it's using.Right and so it can fail as a weak cmpxchg does. It will pick up the correct value in the next iteration. This is not really that different from spurious failures of LL/SC etc. Joerg