On 01/12/2016 09:20 AM, Linus Torvalds wrote:> On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: >> #ifdef xchgrz >> /* same as xchg but poking at gcc red zone */ >> #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0) >> #endif > > That's not safe in general. gcc might be using its redzone, so doing > xchg into it is unsafe. > > But.. > >> Is this a good way to test it? > > .. it's fine for some basic testing. It doesn't show any subtle > interactions (ie some operations may have different dynamic behavior > when the write buffers are busy etc), but as a baseline for "how fast > can things go" the stupid raw loop is fine. And while the xchg into > the redzoen wouldn't be acceptable as a real implementation, for > timing testing it's likely fine (ie you aren't hitting the problem it > can cause).I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64) was better because it avoided stomping on very-likely-to-be-hot write buffers. --Andy
On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at kernel.org> wrote:> > I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64) > was better because it avoided stomping on very-likely-to-be-hot write > buffers.I suspect it could go either way. You want a small constant (for the isntruction size), but any small constant is likely to be within the current stack frame anyway. I don't think 0(%rsp) is particularly likely to have a spill on it right then and there, but who knows.. And 64(%rsp) is possibly going to be cold in the L1 cache, especially if it's just after a deep function call. Which it might be. So it might work the other way. So my guess would be that you wouldn't be able to measure the difference. It might be there, but probably too small to really see in any noise. But numbers talk, bullshit walks. It would be interesting to be proven wrong. Linus
On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds <torvalds at linux-foundation.org> wrote:> On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at kernel.org> wrote: >> >> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64) >> was better because it avoided stomping on very-likely-to-be-hot write >> buffers. > > I suspect it could go either way. You want a small constant (for the > isntruction size), but any small constant is likely to be within the > current stack frame anyway. I don't think 0(%rsp) is particularly > likely to have a spill on it right then and there, but who knows.. > > And 64(%rsp) is possibly going to be cold in the L1 cache, especially > if it's just after a deep function call. Which it might be. So it > might work the other way. > > So my guess would be that you wouldn't be able to measure the > difference. It might be there, but probably too small to really see in > any noise. > > But numbers talk, bullshit walks. It would be interesting to be proven wrong.Here's an article with numbers: http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ I think they're suggesting using a negative offset, which is safe as long as it doesn't page fault, even though we have the redzone disabled. --Andy