Displaying 20 results from an estimated 7000 matches similar to: "[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks"
2016 Jan 28
10
[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 28
10
[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2017 Oct 27
1
[PATCH v6] x86: use lock+addl for smp_mb()
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Results:
perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0
Before:
0.922565990 seconds time elapsed ( +- 1.15% )
After:
0.578667024 seconds time elapsed
2017 Oct 27
1
[PATCH v6] x86: use lock+addl for smp_mb()
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Results:
perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0
Before:
0.922565990 seconds time elapsed ( +- 1.15% )
After:
0.578667024 seconds time elapsed
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So let's use the locked variant everywhere.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
The documentation fixes are included first - I verified that
they do not change the generated code at all.
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So let's use the locked variant everywhere.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
The documentation fixes are included first - I verified that
they do not change the generated code at all.
2016 Jan 12
7
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
So let's use the locked variant everywhere - helps keep the code simple as
well.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
I hope I'm not splitting this up too much - the reason
2016 Jan 12
7
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
So let's use the locked variant everywhere - helps keep the code simple as
well.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
I hope I'm not splitting this up too much - the reason
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote:
> >
> > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> > constantly cheaper (by at least half the latency) than MFENCE. While there
> > was a decent amount of variation, this difference
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote:
> >
> > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> > constantly cheaper (by at least half the latency) than MFENCE. While there
> > was a decent amount of variation, this difference
2016 Jan 27
0
[PATCH v4 5/5] x86: drop mfence in favor of lock+addl
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Just poking at SP would be the most natural, but if we
then read the value from SP, we get a false dependency
which will slow us down.
This was noted in this article:
http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
And is easy to reproduce by sticking
2016 Jan 12
1
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote:
> > #ifdef xchgrz
> > /* same as xchg but poking at gcc red zone */
> > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }
2016 Jan 12
1
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote:
> > #ifdef xchgrz
> > /* same as xchg but poking at gcc red zone */
> > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }
2016 Jan 26
2
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> >
> > So let's use the locked variant everywhere - helps keep the code simple as
>
2016 Jan 26
2
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> >
> > So let's use the locked variant everywhere - helps keep the code simple as
>
2016 Jan 12
5
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds
<torvalds at linux-foundation.org> wrote:
> On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at kernel.org> wrote:
>>
>> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64)
>> was better because it avoided stomping on very-likely-to-be-hot write
>> buffers.
>
> I suspect it
2016 Jan 12
5
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds
<torvalds at linux-foundation.org> wrote:
> On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at kernel.org> wrote:
>>
>> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64)
>> was better because it avoided stomping on very-likely-to-be-hot write
>> buffers.
>
> I suspect it
2016 Jan 28
0
[PATCH v5 1/5] x86: add cc clobber for addl
addl clobbers flags (such as CF) but barrier.h didn't tell this to gcc.
Historically, gcc doesn't need one on x86, and always considers flags
clobbered. We are probably missing the cc clobber in a *lot* of places
for this reason.
But even if not necessary, it's probably a good thing to add for
documentation, and in case gcc semantcs ever change.
Reported-by: Borislav Petkov <bp at
2008 Jun 10
1
[PATCH] xen: Use wmb instead of rmb in xen_evtchn_do_upcall().
This patch is ported one from 534:77db69c38249 of linux-2.6.18-xen.hg.
Use wmb instead of rmb to enforce ordering between
evtchn_upcall_pending and evtchn_pending_sel stores
in xen_evtchn_do_upcall().
Cc: Samuel Thibault <samuel.thibault at eu.citrix.com>
Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp>
---
drivers/xen/events.c | 2 +-
1 files changed, 1 insertions(+), 1
2007 Apr 18
1
[RFC] [PATCH] Split host arch headers for UML's benefit
The patch below fixes the recent UML compilation failure in -rc5-mm1
without making the UML build reach further into the i386 headers. It
splits the i386 ptrace.h and system.h into UML-usable and UML-unusable
pieces.
The string "abi" is in there because I did ptrace.h first, and that
involves separating the ptrace ABI stuff from everything else (if
pt_regs is not considered part of