Michael S. Tsirkin
2016-Jan-26 08:20 UTC
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:> On 01/12/16 14:10, Michael S. Tsirkin wrote: > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. > > > > So let's use the locked variant everywhere - helps keep the code simple as > > well. > > > > While I was at it, I found some inconsistencies in comments in > > arch/x86/include/asm/barrier.h > > > > I hope I'm not splitting this up too much - the reason is I wanted to isolate > > the code changes (that people might want to test for performance) from comment > > changes approved by Linus, from (so far unreviewed) comment change I came up > > with myself. > > > > Lightly tested on my system. > > > > Michael S. Tsirkin (3): > > x86: drop mfence in favor of lock+addl > > x86: drop a comment left over from X86_OOSTORE > > x86: tweak the comment about use of wmb for IO > > > > I would like to get feedback from the hardware team about the > implications of this change, first. > > -hpa >Hi hpa, Any luck getting some feedback on this one? Thanks, -- MST
H. Peter Anvin
2016-Jan-26 21:37 UTC
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
On 01/26/16 00:20, Michael S. Tsirkin wrote:> On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote: > > Hi hpa, > Any luck getting some feedback on this one? >Yes. What we know so far is that *most* cases it will work, but there are apparently a few corner cases where MFENCE or a full-blown serializing instruction is necessary. We are trying to characterize those corner cases and see if any of them affect the kernel. Even if they are, we can probably make those barriers explicitly different, but we don't want to go ahead with the change until we know where we need to care. -hpa
Michael S. Tsirkin
2016-Jan-27 14:07 UTC
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
On Tue, Jan 26, 2016 at 01:37:38PM -0800, H. Peter Anvin wrote:> On 01/26/16 00:20, Michael S. Tsirkin wrote: > > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote: > > > > Hi hpa, > > Any luck getting some feedback on this one? > > > > Yes. What we know so far is that *most* cases it will work, but there > are apparently a few corner cases where MFENCE or a full-blown > serializing instruction is necessary. We are trying to characterize > those corner cases and see if any of them affect the kernel.It would be very interesting to know your findings. Going over the manual I found one such case, and then going over the kernel code I found some questionable uses of barriers - it would be interesting to find out what some other cases are. So I think it's probably useful to find out the full answer, anyway. Awaiting the answers with interest.> Even if they are, we can probably make those barriers explicitly > different, but we don't want to go ahead with the change until we know > where we need to care. > > -hpaThanks! Now that you definitely said there are corner cases, I poked some more at the manual and found one: CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to be ordered by any other fencing or serializing instructions or by another CLFLUSH instruction. For example, software can use an MFENCE instruction to ensure that previous stores are included in the write-back. There are instances of this in mwait_play_dead, clflush_cache_range, mwait_idle_with_hints, mwait_idle .. A comment near pcommit_sfence includes an example flush_and_commit_buffer code which is interesting - it assumes sfence flushes clflush. So it appears that pcommit_sfence in that file is wrong then? At least on processors where it falls back on clflush. mwait_idle is the only one that calls smp_mb and not mb() I couldn't figure out why - original patches did mb() there. Outside core kernel - drm_cache_flush_clflush, drm_clflush_sg, drm_clflush_virt_range. Then there's gru_start_instruction in drivers/misc/sgi-gru/. But otherwise drivers/misc/sgi-gru/ calls clflush in gru_flush_cache without calling mb() - this could be a bug. Looking at all users, it seems that only mwait_idle calls smp_mb, around clflush, others call mb(). So at least as a first step, maybe it makes sense to scope this down somewhat by changing mwait_idle to call mb() and then optimizing __smp_mb instead of mb? I'll post v3 that does this. -- MST
Reasonably Related Threads
- [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
- [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
- [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
- [PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
- [PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks