thr3ads.net - search: "shipilev"

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

5

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

...gt; > So my guess would be that you wouldn't be able to measure the > difference. It might be there, but probably too small to really see in > any noise. > > But numbers talk, bullshit walks. It would be interesting to be proven wrong. Here's an article with numbers: http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ I think they're suggesting using a negative offset, which is safe as long as it doesn't page fault, even though we have the redzone disabled. --Andy

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

5

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

...gt; > So my guess would be that you wouldn't be able to measure the > difference. It might be there, but probably too small to really see in > any noise. > > But numbers talk, bullshit walks. It would be interesting to be proven wrong. Here's an article with numbers: http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ I think they're suggesting using a negative offset, which is safe as long as it doesn't page fault, even though we have the redzone disabled. --Andy

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

0

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 12:59 PM, Andy Lutomirski <luto at amacapital.net> wrote: > > Here's an article with numbers: > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ Well, that's with the busy loop and one set of code generation. It doesn't show the "oops, deeper stack isn't even in the cache any more due to call chains" issue. But yes: > I think they're suggesting using a negative of...

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

0

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

...ouldn't be able to measure the > > difference. It might be there, but probably too small to really see in > > any noise. > > > > But numbers talk, bullshit walks. It would be interesting to be proven wrong. > > Here's an article with numbers: > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ > > I think they're suggesting using a negative offset, which is safe as > long as it doesn't page fault, even though we have the redzone > disabled. > > --Andy OK so I'll have to tweak the test to put something on stack...

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

2016 Jan 27

0

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

...ppears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Just poking at SP would be the most natural, but if we then read the value from SP, we get a false dependency which will slow us down. This was noted in this article: http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ And is easy to reproduce by sticking a barrier in a small non-inline function. So let's use a negative offset - which avoids this problem since we build with the red zone disabled. Unfortunately there's some code that wants to order clflush i...

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

2

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On 01/12/2016 09:20 AM, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: >> #ifdef xchgrz >> /* same as xchg but poking at gcc red zone */ >> #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0) >>

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

2

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On 01/12/2016 09:20 AM, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: >> #ifdef xchgrz >> /* same as xchg but poking at gcc red zone */ >> #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0) >>

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

...( +- 1.15% ) After: 0.578667024 seconds time elapsed ( +- 1.21% ) Just poking at SP would be the most natural, but if we then read the value from SP, we get a false dependency which will slow us down. This was noted in this article: http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ And is easy to reproduce by sticking a barrier in a small non-inline function. So let's use a negative offset - which avoids this problem since we build with the red zone disabled. For userspace, use an address just below the redzone. The one di...

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

...( +- 1.15% ) After: 0.578667024 seconds time elapsed ( +- 1.21% ) Just poking at SP would be the most natural, but if we then read the value from SP, we get a false dependency which will slow us down. This was noted in this article: http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ And is easy to reproduce by sticking a barrier in a small non-inline function. So let's use a negative offset - which avoids this problem since we build with the red zone disabled. For userspace, use an address just below the redzone. The one di...

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 28

10

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 28

10

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

search for: shipilev