Cranmer, Joshua via llvm-dev
2021-May-06 15:35 UTC
[llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear?
The semantics of `volatile` in the C11/C++11 memory model are emphatically orthogonal to requirements for atomic (non-tearing) loads/stores, so you cannot and should not rely on any assumption that volatile will guarantee non-tearing if it can be done. The reason why the `volatile` causes load tearing is that the x86 backend does not accept i128 as a legal type. Consequently, loads and stores for i128 are always broken up into two i64 loads/stores instead. However, there is a DAG combine that will merge two adjacent i64 loads/stores into an i128 load/store, which doesn't kick in for volatile loads/stores because that means optimizing a volatile load/store. Note that if you change the i128 type to one that is legal--say <2 x double>, you indeed do get both the volatile and non-volatile version implemented as an xmm mov instruction.> -----Original Message----- > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Itay > Bookstein via llvm-dev > Sent: Thursday, May 6, 2021 4:46 > To: llvm-dev <llvm-dev at lists.llvm.org> > Subject: [llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear? > > Hey all, > > I've encountered a codegen peculiarity on both X86-64 and PPC64LE on top > of trunk: > > void foo(__uint128_t *p, __uint128_t *q) { *p = *q; } void bar(volatile > __uint128_t *p, __uint128_t *q) { *p = *q; } > > On gcc trunk x86-64 -O3, both of these compile to movdqa, movaps, ret > (https://clang.godbolt.org/z/xvs8x646T). > On clang trunk x86-64 -O3, the first compiles to movaps, movaps, ret, and the > second tears into 4 mov-s (https://clang.godbolt.org/z/zfM9MMrbM). > On clang trunk power64le, the first compiles to lxvd2x, stxvd2x, blr, and the > second tears into 2x ld, 2x std, blr (https://clang.godbolt.org/z/7E7zG4Yfz). > > I'm a bit surprised by this, since I'd expect volatile to at least "nudge the > compiler along" in the direction of not tearing, rather than the other way > around (e.g. how Linux uses volatile to implement > READ_ONCE/WRITE_ONCE). > > I realize that the semantics of volatile might be a bit fuzzier when applied to > non-standard types such as __uint128_t (at the level of clang), but as far as I > can tell at the IR level these two just compile to load/store (volatile) i128. > Would this be considered a CodeGen issue? > > Thanks, > ~Itay > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Steve (Numerics) Canon via llvm-dev
2021-May-06 16:24 UTC
[llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear?
Everything Joshua said, but also please note that there’s no “direction of not tearing” w.r.t. int128 on x86_64. The architecture guarantees that 1, 2, 4, and 8 byte accesses to normal memory that do not cross a cache line are [single-copy] atomic, but makes no mention of 16 byte or wider accesses (section 8.1.1 in volume 3A of the SDM). The only architecturally guaranteed atomic 16B access in x86_64 is CMPXCHG16B. – Steve> On May 6, 2021, at 11:35 AM, Cranmer, Joshua via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > The semantics of `volatile` in the C11/C++11 memory model are emphatically orthogonal to requirements for atomic (non-tearing) loads/stores, so you cannot and should not rely on any assumption that volatile will guarantee non-tearing if it can be done. > > The reason why the `volatile` causes load tearing is that the x86 backend does not accept i128 as a legal type. Consequently, loads and stores for i128 are always broken up into two i64 loads/stores instead. However, there is a DAG combine that will merge two adjacent i64 loads/stores into an i128 load/store, which doesn't kick in for volatile loads/stores because that means optimizing a volatile load/store. > > Note that if you change the i128 type to one that is legal--say <2 x double>, you indeed do get both the volatile and non-volatile version implemented as an xmm mov instruction. > >> -----Original Message----- >> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Itay >> Bookstein via llvm-dev >> Sent: Thursday, May 6, 2021 4:46 >> To: llvm-dev <llvm-dev at lists.llvm.org> >> Subject: [llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear? >> >> Hey all, >> >> I've encountered a codegen peculiarity on both X86-64 and PPC64LE on top >> of trunk: >> >> void foo(__uint128_t *p, __uint128_t *q) { *p = *q; } void bar(volatile >> __uint128_t *p, __uint128_t *q) { *p = *q; } >> >> On gcc trunk x86-64 -O3, both of these compile to movdqa, movaps, ret >> (https://clang.godbolt.org/z/xvs8x646T). >> On clang trunk x86-64 -O3, the first compiles to movaps, movaps, ret, and the >> second tears into 4 mov-s (https://clang.godbolt.org/z/zfM9MMrbM). >> On clang trunk power64le, the first compiles to lxvd2x, stxvd2x, blr, and the >> second tears into 2x ld, 2x std, blr (https://clang.godbolt.org/z/7E7zG4Yfz). >> >> I'm a bit surprised by this, since I'd expect volatile to at least "nudge the >> compiler along" in the direction of not tearing, rather than the other way >> around (e.g. how Linux uses volatile to implement >> READ_ONCE/WRITE_ONCE). >> >> I realize that the semantics of volatile might be a bit fuzzier when applied to >> non-standard types such as __uint128_t (at the level of clang), but as far as I >> can tell at the IR level these two just compile to load/store (volatile) i128. >> Would this be considered a CodeGen issue? >> >> Thanks, >> ~Itay >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev