thr3ads.net - Linux Virtualization - [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2016-Jan-26 08:20 UTC

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin
wrote:> On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > mb() typically uses mfence on modern x86, but a micro-benchmark shows
that it's
> > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older
CPUs.
> > 
> > So let's use the locked variant everywhere - helps keep the code
simple as
> > well.
> > 
> > While I was at it, I found some inconsistencies in comments in
> > arch/x86/include/asm/barrier.h
> > 
> > I hope I'm not splitting this up too much - the reason is I wanted
to isolate
> > the code changes (that people might want to test for performance) from
comment
> > changes approved by Linus, from (so far unreviewed) comment change I
came up
> > with myself.
> > 
> > Lightly tested on my system.
> > 
> > Michael S. Tsirkin (3):
> >   x86: drop mfence in favor of lock+addl
> >   x86: drop a comment left over from X86_OOSTORE
> >   x86: tweak the comment about use of wmb for IO
> > 
> 
> I would like to get feedback from the hardware team about the
> implications of this change, first.
> 
> 	-hpa
> 
Hi hpa,
Any luck getting some feedback on this one?

Thanks,

-- 
MST

H. Peter Anvin

2016-Jan-26 21:37 UTC

head link

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

On 01/26/16 00:20, Michael S. Tsirkin wrote:> On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> 
> Hi hpa,
> Any luck getting some feedback on this one?
> 
Yes.  What we know so far is that *most* cases it will work, but there
are apparently a few corner cases where MFENCE or a full-blown
serializing instruction is necessary.  We are trying to characterize
those corner cases and see if any of them affect the kernel.

Even if they are, we can probably make those barriers explicitly
different, but we don't want to go ahead with the change until we know
where we need to care.

	-hpa

Michael S. Tsirkin

2016-Jan-27 14:07 UTC

head link

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

On Tue, Jan 26, 2016 at 01:37:38PM -0800, H. Peter Anvin
wrote:> On 01/26/16 00:20, Michael S. Tsirkin wrote:
> > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > 
> > Hi hpa,
> > Any luck getting some feedback on this one?
> > 
> 
> Yes.  What we know so far is that *most* cases it will work, but there
> are apparently a few corner cases where MFENCE or a full-blown
> serializing instruction is necessary.  We are trying to characterize
> those corner cases and see if any of them affect the kernel.
It would be very interesting to know your findings.

Going over the manual I found one such case, and then going over the
kernel code I found some questionable uses of barriers - it would be
interesting to find out what some other cases are.

So I think it's probably useful to find out the full answer, anyway.

Awaiting the answers with interest.
> Even if they are, we can probably make those barriers explicitly
> different, but we don't want to go ahead with the change until we know
> where we need to care.
> 
> 	-hpa
Thanks!

Now that you definitely said there are corner cases, I poked some more
at the manual and found one:
	CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
	to be ordered by any other fencing or serializing instructions or by
	another CLFLUSH instruction. For example, software can use an MFENCE
	instruction to ensure that previous stores are included in the
	write-back.

There are instances of this in mwait_play_dead,
clflush_cache_range, mwait_idle_with_hints, mwait_idle ..

A comment near pcommit_sfence includes an example
flush_and_commit_buffer code which is interesting -
it assumes sfence flushes clflush.

So it appears that pcommit_sfence in that file is wrong then?
At least on processors where it falls back on clflush.

mwait_idle is the only one that calls smp_mb and not mb()
I couldn't figure out why - original patches did mb()
there.

Outside core kernel - drm_cache_flush_clflush, drm_clflush_sg,
drm_clflush_virt_range.

Then there's gru_start_instruction in drivers/misc/sgi-gru/.

But otherwise drivers/misc/sgi-gru/ calls clflush in gru_flush_cache
without calling mb() - this could be a bug.

Looking at all users, it seems that only mwait_idle calls  smp_mb,
around clflush, others call mb().

So at least as a first step, maybe it makes sense to scope this down
somewhat by changing mwait_idle to call mb() and then optimizing
__smp_mb instead of mb?

I'll post v3 that does this.

-- 
MST

Seemingly Similar Threads

Search for more apparently analagous threads

Linux Virtualization - Jan 2016 - [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

Seemingly Similar Threads