Eric Dumazet
2017-Mar-22 14:54 UTC
[Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:> > But I would feel a whole lot better about the entire thing if we could > measure their impact. It would also give us good precedent to whack > other potential users of _nocheck over the head with -- show numbers.I wont be able to measure the impact on real workloads, our productions kernels are based on 4.3 at this moment. I guess someone could code a lib/test_refcount.c launching X threads using either atomic_inc or refcount_inc() in a loop. That would give a rough estimate of the refcount_t overhead among various platforms.
Peter Zijlstra
2017-Mar-22 15:08 UTC
[Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:> On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote: > > > > > But I would feel a whole lot better about the entire thing if we could > > measure their impact. It would also give us good precedent to whack > > other potential users of _nocheck over the head with -- show numbers. > > I wont be able to measure the impact on real workloads, our productions > kernels are based on 4.3 at this moment.Is there really no micro bench that exercises the relevant network paths? Do you really fully rely on Google production workloads?> I guess someone could code a lib/test_refcount.c launching X threads > using either atomic_inc or refcount_inc() in a loop. > > That would give a rough estimate of the refcount_t overhead among > various platforms.Its also a fairly meaningless number. It doesn't include any of the other work the network path does.
Peter Zijlstra
2017-Mar-22 16:51 UTC
[Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:> > I guess someone could code a lib/test_refcount.c launching X threads > using either atomic_inc or refcount_inc() in a loop. > > That would give a rough estimate of the refcount_t overhead among > various platforms.Cycles spend on uncontended ops: SKL SNB IVB-EP atomic: lock incl ~15 ~13 ~10 atomic-ref: call refcount_inc ~31 ~37 ~31 atomic-ref2: $inlined ~23 ~22 ~21 Contended numbers (E3-1245 v5): root at skl:~/spinlocks# LOCK=./atomic ./test1.sh 1: 14.797240 2: 87.451230 4: 100.747790 8: 118.234010 root at skl:~/spinlocks# LOCK=./atomic-ref ./test1.sh 1: 30.627320 2: 91.866730 4: 111.029560 8: 141.922420 root at skl:~/spinlocks# LOCK=./atomic-ref2 ./test1.sh 1: 23.243930 2: 98.620250 4: 119.604240 8: 124.864380 The code includes the patches found here: https://lkml.kernel.org/r/20170317211918.393791494 at infradead.org and effectively does: #define REFCOUNT_WARN(cond, str) WARN_ON_ONCE(cond) s/WARN_ONCE/REFCOUNT_WARN/ on lib/refcount.c Find the tarball of the userspace code used attached (its a bit of a mess; its grown over time and needs a cleanup). I used: gcc (Debian 6.3.0-6) 6.3.0 20170205 So while its about ~20 cycles worse, reducing contention is far more effective than removing straight line instruction count (which too is entirely possible, because GCC generates absolute shite in places). -------------- next part -------------- A non-text attachment was scrubbed... Name: spinlocks.tar.bz2 Type: application/octet-stream Size: 20469 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/bridge/attachments/20170322/b8c11794/attachment-0001.obj>