On 01/12/16 14:21, Michael S. Tsirkin wrote:> > OK so I'll have to tweak the test to put something > on stack to measure the difference: my test tweaks a > global variable instead. > I'll try that by tomorrow. > > I couldn't measure any difference between mfence and lock+addl > except in a micro-benchmark, but hey since we are tweaking this, > let's do the optimal thing. >Be careful with this: if it only shows up in a microbenchmark, we may introduce a hard-to-debug regression for no real benefit. -hpa
On Tue, Jan 12, 2016 at 2:55 PM, H. Peter Anvin <hpa at zytor.com> wrote:> > Be careful with this: if it only shows up in a microbenchmark, we may > introduce a hard-to-debug regression for no real benefit.So I can pretty much guarantee that it shouldn't regress from a correctness angle, since we rely *heavily* on locked instructions being barriers, in locking and in various other situations. Indeed, much more so than we ever rely on "smp_mb()". The places that rely on smp_mb() are pretty few in the end. So I think the only issue is whether sometimes "mfence" might be faster. So far, I've never actually heard of that being the case. The fence instructions have always sucked when I've seen them. But talking to the hw people about this is certainly a good idea regardless. Linus
On Tue, Jan 12, 2016 at 03:24:05PM -0800, Linus Torvalds wrote:> But talking to the hw people about this is certainly a good idea regardless.I'm not seeing it in this thread but I might've missed it too. Anyway, I'm being reminded that the ADD will change rFLAGS while MFENCE doesn't touch them. Do we care? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.