Itay Bookstein via llvm-dev
2021-May-06 08:46 UTC
[llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear?
Hey all, I've encountered a codegen peculiarity on both X86-64 and PPC64LE on top of trunk: void foo(__uint128_t *p, __uint128_t *q) { *p = *q; } void bar(volatile __uint128_t *p, __uint128_t *q) { *p = *q; } On gcc trunk x86-64 -O3, both of these compile to movdqa, movaps, ret (https://clang.godbolt.org/z/xvs8x646T). On clang trunk x86-64 -O3, the first compiles to movaps, movaps, ret, and the second tears into 4 mov-s (https://clang.godbolt.org/z/zfM9MMrbM). On clang trunk power64le, the first compiles to lxvd2x, stxvd2x, blr, and the second tears into 2x ld, 2x std, blr (https://clang.godbolt.org/z/7E7zG4Yfz). I'm a bit surprised by this, since I'd expect volatile to at least "nudge the compiler along" in the direction of not tearing, rather than the other way around (e.g. how Linux uses volatile to implement READ_ONCE/WRITE_ONCE). I realize that the semantics of volatile might be a bit fuzzier when applied to non-standard types such as __uint128_t (at the level of clang), but as far as I can tell at the IR level these two just compile to load/store (volatile) i128. Would this be considered a CodeGen issue? Thanks, ~Itay
Cranmer, Joshua via llvm-dev
2021-May-06 15:35 UTC
[llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear?
The semantics of `volatile` in the C11/C++11 memory model are emphatically orthogonal to requirements for atomic (non-tearing) loads/stores, so you cannot and should not rely on any assumption that volatile will guarantee non-tearing if it can be done. The reason why the `volatile` causes load tearing is that the x86 backend does not accept i128 as a legal type. Consequently, loads and stores for i128 are always broken up into two i64 loads/stores instead. However, there is a DAG combine that will merge two adjacent i64 loads/stores into an i128 load/store, which doesn't kick in for volatile loads/stores because that means optimizing a volatile load/store. Note that if you change the i128 type to one that is legal--say <2 x double>, you indeed do get both the volatile and non-volatile version implemented as an xmm mov instruction.> -----Original Message----- > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Itay > Bookstein via llvm-dev > Sent: Thursday, May 6, 2021 4:46 > To: llvm-dev <llvm-dev at lists.llvm.org> > Subject: [llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear? > > Hey all, > > I've encountered a codegen peculiarity on both X86-64 and PPC64LE on top > of trunk: > > void foo(__uint128_t *p, __uint128_t *q) { *p = *q; } void bar(volatile > __uint128_t *p, __uint128_t *q) { *p = *q; } > > On gcc trunk x86-64 -O3, both of these compile to movdqa, movaps, ret > (https://clang.godbolt.org/z/xvs8x646T). > On clang trunk x86-64 -O3, the first compiles to movaps, movaps, ret, and the > second tears into 4 mov-s (https://clang.godbolt.org/z/zfM9MMrbM). > On clang trunk power64le, the first compiles to lxvd2x, stxvd2x, blr, and the > second tears into 2x ld, 2x std, blr (https://clang.godbolt.org/z/7E7zG4Yfz). > > I'm a bit surprised by this, since I'd expect volatile to at least "nudge the > compiler along" in the direction of not tearing, rather than the other way > around (e.g. how Linux uses volatile to implement > READ_ONCE/WRITE_ONCE). > > I realize that the semantics of volatile might be a bit fuzzier when applied to > non-standard types such as __uint128_t (at the level of clang), but as far as I > can tell at the IR level these two just compile to load/store (volatile) i128. > Would this be considered a CodeGen issue? > > Thanks, > ~Itay > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev