thr3ads.net - Linux Ethernet Bridging - [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount

If this information is useful, please help other people find it:
Share via:

Herbert Xu

2017-Mar-20 13:27 UTC

[Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra
wrote:>
> So what bench/setup do you want ran?
You can start by counting how many cycles an atomic op takes
vs. how many cycles this new code takes.

Cheers,
-- 
Email: Herbert Xu <herbert at gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Peter Zijlstra

2017-Mar-20 13:40 UTC

head link

[Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu
wrote:> On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
> >
> > So what bench/setup do you want ran?
> 
> You can start by counting how many cycles an atomic op takes
> vs. how many cycles this new code takes.
On what uarch?

I think I tested hand coded asm version and it ended up about double the
cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
the memory bus saturated, at which point they took the same). Newer
parts will of course have different numbers,

Can't we run some iperf on a 40gbe fiber loop or something? It would be
very useful to have an actual workload we can run.

Eric Dumazet

2017-Mar-20 14:51 UTC

head link

[Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

On Mon, 2017-03-20 at 14:40 +0100, Peter Zijlstra wrote:> On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
> > On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
> > >
> > > So what bench/setup do you want ran?
> > 
> > You can start by counting how many cycles an atomic op takes
> > vs. how many cycles this new code takes.
> 
> On what uarch?
> 
> I think I tested hand coded asm version and it ended up about double the
> cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
> the memory bus saturated, at which point they took the same). Newer
> parts will of course have different numbers,
> 
> Can't we run some iperf on a 40gbe fiber loop or something? It would be
> very useful to have an actual workload we can run.
If atomic ops are converted one by one, it is likely that results will
be noise.

We can not start a global conversion without having a way to have
selective debugging ?

Then, adopting this fine infra would really not be a problem.

Some arches have efficient atomic_inc() ( no full barriers ) while load
+ test + atomic_cmpxchg() + test + loop" is more expensive.

PowerPC has no efficient atomic_inc() and this definitely shows on
network intensive workloads involving concurrent cores/threads.

atomic_cmpxchg() on PowerPC is horribly more expensive because of the
added two SYNC instructions.

networking performance is quite poor on PowerPC as of today.

Linux Ethernet Bridging - Mar 2017 - [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

[Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

[Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

[Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t