thr3ads.net - Linux Virtualization - [PATCH 3/4] x86,asm: Re-work smp_store

If this information is useful, please help other people find it:
Share via:

Andy Lutomirski

2016-Jan-12 20:30 UTC

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On 01/12/2016 09:20 AM, Linus Torvalds wrote:> On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at
redhat.com> wrote:
>> #ifdef xchgrz
>> /* same as xchg but poking at gcc red zone */
>> #define barrier() do { int ret; asm volatile ("xchgl %0,
-4(%%" SP ");": "=r"(ret) :: "memory",
"cc"); } while (0)
>> #endif
>
> That's not safe in general. gcc might be using its redzone, so doing
> xchg into it is unsafe.
>
> But..
>
>> Is this a good way to test it?
>
> .. it's fine for some basic testing. It doesn't show any subtle
> interactions (ie some operations may have different dynamic behavior
> when the write buffers are busy etc), but as a baseline for "how fast
> can things go" the stupid raw loop is fine. And while the xchg into
> the redzoen wouldn't be acceptable as a real implementation, for
> timing testing it's likely fine (ie you aren't hitting the problem
it
> can cause).
I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 
64) was better because it avoided stomping on very-likely-to-be-hot 
write buffers.

--Andy

Linus Torvalds

2016-Jan-12 20:54 UTC

head link

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at kernel.org>
wrote:>
> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even
64)
> was better because it avoided stomping on very-likely-to-be-hot write
> buffers.
I suspect it could go either way. You want a small constant (for the
isntruction size), but any small constant is likely to be within the
current stack frame anyway. I don't think 0(%rsp) is particularly
likely to have a spill on it right then and there, but who knows..

And 64(%rsp) is  possibly going to be cold in the L1 cache, especially
if it's just after a deep function call. Which it might be. So it
might work the other way.

So my guess would be that you wouldn't be able to measure the
difference. It might be there, but probably too small to really see in
any noise.

But numbers talk, bullshit walks. It would be interesting to be proven wrong.

                 Linus

Andy Lutomirski

2016-Jan-12 20:59 UTC

head link

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds
<torvalds at linux-foundation.org> wrote:> On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at
kernel.org> wrote:
>>
>> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe
even 64)
>> was better because it avoided stomping on very-likely-to-be-hot write
>> buffers.
>
> I suspect it could go either way. You want a small constant (for the
> isntruction size), but any small constant is likely to be within the
> current stack frame anyway. I don't think 0(%rsp) is particularly
> likely to have a spill on it right then and there, but who knows..
>
> And 64(%rsp) is  possibly going to be cold in the L1 cache, especially
> if it's just after a deep function call. Which it might be. So it
> might work the other way.
>
> So my guess would be that you wouldn't be able to measure the
> difference. It might be there, but probably too small to really see in
> any noise.
>
> But numbers talk, bullshit walks. It would be interesting to be proven
wrong.
Here's an article with numbers:

http://shipilev.net/blog/2014/on-the-fence-with-dependencies/

I think they're suggesting using a negative offset, which is safe as
long as it doesn't page fault, even though we have the redzone
disabled.

--Andy

Possibly Parallel Threads

Search for more possibly parallel threads

Linux Virtualization - Jan 2016 - [PATCH 3/4] x86,asm: Re-work smp_store_mb()

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

Possibly Parallel Threads