Ian Campbell
2012-Apr-10 14:26 UTC
[PATCH 05/10] net: move destructor_arg to the front of sk_buff.
As of the previous patch we align the end (rather than the start) of the struct to a cache line and so, with 32 and 64 byte cache lines and the shinfo size increase from the next patch, the first 8 bytes of the struct end up on a different cache line to the rest of it so make sure it is something relatively unimportant to avoid hitting an extra cache line on hot operations such as kfree_skb. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> --- include/linux/skbuff.h | 15 ++++++++++----- net/core/skbuff.c | 5 ++++- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0ad6a46..f0ae39c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -265,6 +265,15 @@ struct ubuf_info { * the end of the header data, ie. at skb->end. */ struct skb_shared_info { + /* Intermediate layers must ensure that destructor_arg + * remains valid until skb destructor */ + void *destructor_arg; + + /* + * Warning: all fields from here until dataref are cleared in + * __alloc_skb() + * + */ unsigned char nr_frags; __u8 tx_flags; unsigned short gso_size; @@ -276,14 +285,10 @@ struct skb_shared_info { __be32 ip6_frag_id; /* - * Warning : all fields before dataref are cleared in __alloc_skb() + * Warning: all fields before dataref are cleared in __alloc_skb() */ atomic_t dataref; - /* Intermediate layers must ensure that destructor_arg - * remains valid until skb destructor */ - void * destructor_arg; - /* must be last field, see pskb_expand_head() */ skb_frag_t frags[MAX_SKB_FRAGS]; }; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index d4e139e..b8a41d6 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -214,7 +214,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, /* make sure we initialize shinfo sequentially */ shinfo = skb_shinfo(skb); - memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); + + memset(&shinfo->nr_frags, 0, + offsetof(struct skb_shared_info, dataref) + - offsetof(struct skb_shared_info, nr_frags)); atomic_set(&shinfo->dataref, 1); kmemcheck_annotate_variable(shinfo->destructor_arg); -- 1.7.2.5
Eric Dumazet
2012-Apr-10 15:05 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On Tue, 2012-04-10 at 15:26 +0100, Ian Campbell wrote:> As of the previous patch we align the end (rather than the start) of the struct > to a cache line and so, with 32 and 64 byte cache lines and the shinfo size > increase from the next patch, the first 8 bytes of the struct end up on a > different cache line to the rest of it so make sure it is something relatively > unimportant to avoid hitting an extra cache line on hot operations such as > kfree_skb. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > --- > include/linux/skbuff.h | 15 ++++++++++----- > net/core/skbuff.c | 5 ++++- > 2 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 0ad6a46..f0ae39c 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -265,6 +265,15 @@ struct ubuf_info { > * the end of the header data, ie. at skb->end. > */ > struct skb_shared_info { > + /* Intermediate layers must ensure that destructor_arg > + * remains valid until skb destructor */ > + void *destructor_arg; > + > + /* > + * Warning: all fields from here until dataref are cleared in > + * __alloc_skb() > + * > + */ > unsigned char nr_frags; > __u8 tx_flags; > unsigned short gso_size; > @@ -276,14 +285,10 @@ struct skb_shared_info { > __be32 ip6_frag_id; > > /* > - * Warning : all fields before dataref are cleared in __alloc_skb() > + * Warning: all fields before dataref are cleared in __alloc_skb() > */ > atomic_t dataref; > > - /* Intermediate layers must ensure that destructor_arg > - * remains valid until skb destructor */ > - void * destructor_arg; > - > /* must be last field, see pskb_expand_head() */ > skb_frag_t frags[MAX_SKB_FRAGS]; > }; > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index d4e139e..b8a41d6 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -214,7 +214,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > /* make sure we initialize shinfo sequentially */ > shinfo = skb_shinfo(skb); > - memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); > + > + memset(&shinfo->nr_frags, 0, > + offsetof(struct skb_shared_info, dataref) > + - offsetof(struct skb_shared_info, nr_frags)); > atomic_set(&shinfo->dataref, 1); > kmemcheck_annotate_variable(shinfo->destructor_arg); >Not sure if we can do the same in build_skb()
Ian Campbell
2012-Apr-10 15:19 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On Tue, 2012-04-10 at 16:05 +0100, Eric Dumazet wrote:> On Tue, 2012-04-10 at 15:26 +0100, Ian Campbell wrote: > > As of the previous patch we align the end (rather than the start) of the struct > > to a cache line and so, with 32 and 64 byte cache lines and the shinfo size > > increase from the next patch, the first 8 bytes of the struct end up on a > > different cache line to the rest of it so make sure it is something relatively > > unimportant to avoid hitting an extra cache line on hot operations such as > > kfree_skb. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Cc: "David S. Miller" <davem@davemloft.net> > > Cc: Eric Dumazet <eric.dumazet@gmail.com> > > --- > > include/linux/skbuff.h | 15 ++++++++++----- > > net/core/skbuff.c | 5 ++++- > > 2 files changed, 14 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index 0ad6a46..f0ae39c 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -265,6 +265,15 @@ struct ubuf_info { > > * the end of the header data, ie. at skb->end. > > */ > > struct skb_shared_info { > > + /* Intermediate layers must ensure that destructor_arg > > + * remains valid until skb destructor */ > > + void *destructor_arg; > > + > > + /* > > + * Warning: all fields from here until dataref are cleared in > > + * __alloc_skb() > > + * > > + */ > > unsigned char nr_frags; > > __u8 tx_flags; > > unsigned short gso_size; > > @@ -276,14 +285,10 @@ struct skb_shared_info { > > __be32 ip6_frag_id; > > > > /* > > - * Warning : all fields before dataref are cleared in __alloc_skb() > > + * Warning: all fields before dataref are cleared in __alloc_skb() > > */ > > atomic_t dataref; > > > > - /* Intermediate layers must ensure that destructor_arg > > - * remains valid until skb destructor */ > > - void * destructor_arg; > > - > > /* must be last field, see pskb_expand_head() */ > > skb_frag_t frags[MAX_SKB_FRAGS]; > > }; > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index d4e139e..b8a41d6 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -214,7 +214,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > > > /* make sure we initialize shinfo sequentially */ > > shinfo = skb_shinfo(skb); > > - memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); > > + > > + memset(&shinfo->nr_frags, 0, > > + offsetof(struct skb_shared_info, dataref) > > + - offsetof(struct skb_shared_info, nr_frags)); > > atomic_set(&shinfo->dataref, 1); > > kmemcheck_annotate_variable(shinfo->destructor_arg); > > > > Not sure if we can do the same in build_skb()I don''t think there''s any chance of there being a destructor_arg to preserve in that case? Regardless of that though I think for consistency it would be worth pulling the common shinfo init out into a helper and using it in both places. I''ll make that change. Ian.> >
Alexander Duyck
2012-Apr-10 18:33 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On 04/10/2012 07:26 AM, Ian Campbell wrote:> As of the previous patch we align the end (rather than the start) of the struct > to a cache line and so, with 32 and 64 byte cache lines and the shinfo size > increase from the next patch, the first 8 bytes of the struct end up on a > different cache line to the rest of it so make sure it is something relatively > unimportant to avoid hitting an extra cache line on hot operations such as > kfree_skb. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > --- > include/linux/skbuff.h | 15 ++++++++++----- > net/core/skbuff.c | 5 ++++- > 2 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 0ad6a46..f0ae39c 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -265,6 +265,15 @@ struct ubuf_info { > * the end of the header data, ie. at skb->end. > */ > struct skb_shared_info { > + /* Intermediate layers must ensure that destructor_arg > + * remains valid until skb destructor */ > + void *destructor_arg; > + > + /* > + * Warning: all fields from here until dataref are cleared in > + * __alloc_skb() > + * > + */ > unsigned char nr_frags; > __u8 tx_flags; > unsigned short gso_size; > @@ -276,14 +285,10 @@ struct skb_shared_info { > __be32 ip6_frag_id; > > /* > - * Warning : all fields before dataref are cleared in __alloc_skb() > + * Warning: all fields before dataref are cleared in __alloc_skb() > */ > atomic_t dataref; > > - /* Intermediate layers must ensure that destructor_arg > - * remains valid until skb destructor */ > - void * destructor_arg; > - > /* must be last field, see pskb_expand_head() */ > skb_frag_t frags[MAX_SKB_FRAGS]; > }; > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index d4e139e..b8a41d6 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -214,7 +214,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > /* make sure we initialize shinfo sequentially */ > shinfo = skb_shinfo(skb); > - memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); > + > + memset(&shinfo->nr_frags, 0, > + offsetof(struct skb_shared_info, dataref) > + - offsetof(struct skb_shared_info, nr_frags)); > atomic_set(&shinfo->dataref, 1); > kmemcheck_annotate_variable(shinfo->destructor_arg); >Have you checked this for 32 bit as well as 64? Based on my math your next patch will still mess up the memset on 32 bit with the structure being split somewhere just in front of hwtstamps. Why not just take frags and move it to the start of the structure? It is already an unknown value because it can be either 16 or 17 depending on the value of PAGE_SIZE, and since you are making changes to frags the changes wouldn''t impact the alignment of the other values later on since you are aligning the end of the structure. That way you would be guaranteed that all of the fields that will be memset would be in the last 64 bytes. Thanks, Alex
Eric Dumazet
2012-Apr-10 18:41 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote:> Have you checked this for 32 bit as well as 64? Based on my math your > next patch will still mess up the memset on 32 bit with the structure > being split somewhere just in front of hwtstamps. > > Why not just take frags and move it to the start of the structure? It > is already an unknown value because it can be either 16 or 17 depending > on the value of PAGE_SIZE, and since you are making changes to frags the > changes wouldn''t impact the alignment of the other values later on since > you are aligning the end of the structure. That way you would be > guaranteed that all of the fields that will be memset would be in the > last 64 bytes. >Now when a fragmented packet is copied in pskb_expand_head(), you access two separate zones of memory to copy the shinfo. But its supposed to be slow path. Problem with this is that the offsets of often used fields will be big (instead of being < 127) and code will be bigger on x86.
Alexander Duyck
2012-Apr-10 19:15 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On 04/10/2012 11:41 AM, Eric Dumazet wrote:> On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote: > >> Have you checked this for 32 bit as well as 64? Based on my math your >> next patch will still mess up the memset on 32 bit with the structure >> being split somewhere just in front of hwtstamps. >> >> Why not just take frags and move it to the start of the structure? It >> is already an unknown value because it can be either 16 or 17 depending >> on the value of PAGE_SIZE, and since you are making changes to frags the >> changes wouldn''t impact the alignment of the other values later on since >> you are aligning the end of the structure. That way you would be >> guaranteed that all of the fields that will be memset would be in the >> last 64 bytes. >> > Now when a fragmented packet is copied in pskb_expand_head(), you access > two separate zones of memory to copy the shinfo. But its supposed to be > slow path. > > Problem with this is that the offsets of often used fields will be big > (instead of being < 127) and code will be bigger on x86.Actually now that I think about it my concerns go much further than the memset. I''m convinced that this is going to cause a pretty significant performance regression on multiple drivers, especially on non x86_64 architecture. What we have right now on most platforms is a skb_shared_info structure in which everything up to and including frag 0 is all in one cache line. This gives us pretty good performance for igb and ixgbe since that is our common case when jumbo frames are not enabled is to split the head and place the data in a page. However the change being recommend here only resolves the issue for one specific architecture, and that is what I don''t agree with. What we need is a solution that also works for 64K pages or 32 bit pointers and I am fairly certain this current solution does not. Thanks, Alex
Ian Campbell
2012-Apr-11 07:56 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On Tue, 2012-04-10 at 19:33 +0100, Alexander Duyck wrote:> On 04/10/2012 07:26 AM, Ian Campbell wrote: > > As of the previous patch we align the end (rather than the start) of the struct > > to a cache line and so, with 32 and 64 byte cache lines and the shinfo size > > increase from the next patch, the first 8 bytes of the struct end up on a > > different cache line to the rest of it so make sure it is something relatively > > unimportant to avoid hitting an extra cache line on hot operations such as > > kfree_skb. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Cc: "David S. Miller" <davem@davemloft.net> > > Cc: Eric Dumazet <eric.dumazet@gmail.com> > > --- > > include/linux/skbuff.h | 15 ++++++++++----- > > net/core/skbuff.c | 5 ++++- > > 2 files changed, 14 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index 0ad6a46..f0ae39c 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -265,6 +265,15 @@ struct ubuf_info { > > * the end of the header data, ie. at skb->end. > > */ > > struct skb_shared_info { > > + /* Intermediate layers must ensure that destructor_arg > > + * remains valid until skb destructor */ > > + void *destructor_arg; > > + > > + /* > > + * Warning: all fields from here until dataref are cleared in > > + * __alloc_skb() > > + * > > + */ > > unsigned char nr_frags; > > __u8 tx_flags; > > unsigned short gso_size; > > @@ -276,14 +285,10 @@ struct skb_shared_info { > > __be32 ip6_frag_id; > > > > /* > > - * Warning : all fields before dataref are cleared in __alloc_skb() > > + * Warning: all fields before dataref are cleared in __alloc_skb() > > */ > > atomic_t dataref; > > > > - /* Intermediate layers must ensure that destructor_arg > > - * remains valid until skb destructor */ > > - void * destructor_arg; > > - > > /* must be last field, see pskb_expand_head() */ > > skb_frag_t frags[MAX_SKB_FRAGS]; > > }; > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index d4e139e..b8a41d6 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -214,7 +214,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > > > /* make sure we initialize shinfo sequentially */ > > shinfo = skb_shinfo(skb); > > - memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); > > + > > + memset(&shinfo->nr_frags, 0, > > + offsetof(struct skb_shared_info, dataref) > > + - offsetof(struct skb_shared_info, nr_frags)); > > atomic_set(&shinfo->dataref, 1); > > kmemcheck_annotate_variable(shinfo->destructor_arg); > > > > Have you checked this for 32 bit as well as 64? Based on my math your > next patch will still mess up the memset on 32 bit with the structure > being split somewhere just in front of hwtstamps.You mean 32 byte cache lines? If so then yes there is a split half way through the structure in that case but there''s no way all this data could ever fit in a single 32 byte cache line. Not including the frags or destructor_arg the region nr_frags up to and including dataref is 36 bytes on 32 bit and 40 bytes on 64 bit. I''ve not changed anything in this respect. If you meant 64 byte cache lines with 32 bit structure sizes then by my calculations everything from destructor_arg (in fact a bit earlier, from 12 bytes before then) up to and including frag[0] is in the same 64 byte cache line. I find the easiest way to check is to use gdb and open code an offsetof macro. (gdb) print/d sizeof(struct skb_shared_info) - (unsigned long)&(((struct skb_shared_info *)0)->nr_frags) $3 = 240 (gdb) print/d sizeof(struct skb_shared_info) - (unsigned long)&(((struct skb_shared_info *)0)->frags[1]) $4 = 192 So given 64 byte cache lines the interesting area starts at 240/64=3.75 cache lines from the (aligned) end and it finishes just before 192/64=3 cache lines from the end, so nr_frags through to frags[0] are therefore on the same cache line. Ian.
Ian Campbell
2012-Apr-11 08:00 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On Tue, 2012-04-10 at 20:15 +0100, Alexander Duyck wrote:> On 04/10/2012 11:41 AM, Eric Dumazet wrote: > > On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote: > > > >> Have you checked this for 32 bit as well as 64? Based on my math your > >> next patch will still mess up the memset on 32 bit with the structure > >> being split somewhere just in front of hwtstamps. > >> > >> Why not just take frags and move it to the start of the structure? It > >> is already an unknown value because it can be either 16 or 17 depending > >> on the value of PAGE_SIZE, and since you are making changes to frags the > >> changes wouldn''t impact the alignment of the other values later on since > >> you are aligning the end of the structure. That way you would be > >> guaranteed that all of the fields that will be memset would be in the > >> last 64 bytes. > >> > > Now when a fragmented packet is copied in pskb_expand_head(), you access > > two separate zones of memory to copy the shinfo. But its supposed to be > > slow path. > > > > Problem with this is that the offsets of often used fields will be big > > (instead of being < 127) and code will be bigger on x86. > > Actually now that I think about it my concerns go much further than the > memset. I''m convinced that this is going to cause a pretty significant > performance regression on multiple drivers, especially on non x86_64 > architecture. What we have right now on most platforms is a > skb_shared_info structure in which everything up to and including frag 0 > is all in one cache line. This gives us pretty good performance for igb > and ixgbe since that is our common case when jumbo frames are not > enabled is to split the head and place the data in a page.With all the changes in this series it is still possible to fit a maximum standard MTU frame and the shinfo on the same 4K page while also have the skb_shared_info up to and including frag [0] aligned to the same 64 byte cache line. The only exception is destructor_arg on 64 bit which is on the preceding cache line but that is not a field used in any hot path.> However the change being recommend here only resolves the issue for one > specific architecture, and that is what I don''t agree with. What we > need is a solution that also works for 64K pages or 32 bit pointers and > I am fairly certain this current solution does not.I think it does work for 32 bit pointers. What issue to do you see with 64K pages? Ian.
Eric Dumazet
2012-Apr-11 08:20 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On Tue, 2012-04-10 at 12:15 -0700, Alexander Duyck wrote:> > Actually now that I think about it my concerns go much further than the > memset. I''m convinced that this is going to cause a pretty significant > performance regression on multiple drivers, especially on non x86_64 > architecture. What we have right now on most platforms is a > skb_shared_info structure in which everything up to and including frag 0 > is all in one cache line. This gives us pretty good performance for igb > and ixgbe since that is our common case when jumbo frames are not > enabled is to split the head and place the data in a page.I dont understand this split thing for MTU=1500 frames. Even using half a page per fragment, each skb : needs 2 allocations for sk_buff and skb->head, plus one page alloc / reference. skb->truesize = ksize(skb->head) + sizeof(*skb) + PAGE_SIZE/2 = 512 + 256 + 2048 = 2816 bytes With non split you have : 2 allocations for sk_buff and skb->head. skb->truesize = ksize(skb->head) + sizeof(*skb) = 2048 + 256 = 2304 bytes less overhead and less calls to page allocator... This only can benefit if GRO is on, since aggregation can use fragments and a single sk_buff, instead of a frag_list
Alexander Duyck
2012-Apr-11 16:05 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On 04/11/2012 01:20 AM, Eric Dumazet wrote:> On Tue, 2012-04-10 at 12:15 -0700, Alexander Duyck wrote: > >> Actually now that I think about it my concerns go much further than the >> memset. I''m convinced that this is going to cause a pretty significant >> performance regression on multiple drivers, especially on non x86_64 >> architecture. What we have right now on most platforms is a >> skb_shared_info structure in which everything up to and including frag 0 >> is all in one cache line. This gives us pretty good performance for igb >> and ixgbe since that is our common case when jumbo frames are not >> enabled is to split the head and place the data in a page. > I dont understand this split thing for MTU=1500 frames. > > Even using half a page per fragment, each skb : > > needs 2 allocations for sk_buff and skb->head, plus one page alloc / > reference. > > skb->truesize = ksize(skb->head) + sizeof(*skb) + PAGE_SIZE/2 = 512 + > 256 + 2048 = 2816 bytesThe number you provide for head is currently only available for 128 byte skb allocations. Anything larger than that will generate a 1K allocation. Also after all of these patches the smallest size you can allocate will be 1K for anything under 504 bytes. The size advantage is actually more for smaller frames. In the case of igb the behaviour is to place anything less than 512 bytes into just the header and to skip using the page. As such we get a much more ideal allocation for small packets. since the truesize is only 1280 in that case. In the case of ixgbe the advantage is more of a cache miss advantage. Ixgbe only receives the data into pages now. I can prefetch the first two cache lines of the page into memory while allocating the skb to receive it. As such we essentially cut the number of cache misses in half versus the old approach which had us generating cache misses on the sk_buff during allocation, and then generating more cache misses again once we received the buffer and can fill out the sk_buff fields. A similar size advantage exists as well, but only for frames 256 bytes or smaller.> With non split you have : > > 2 allocations for sk_buff and skb->head. > > skb->truesize = ksize(skb->head) + sizeof(*skb) = 2048 + 256 = 2304 > bytes > > less overhead and less calls to page allocator... > > This only can benefit if GRO is on, since aggregation can use fragments > and a single sk_buff, instead of a frag_listThere is much more than true size involved here. My main argument is that if we are going to align this modified skb_shared_info so that it is aligned on nr_frags we should do it on all architectures, not just x86_64. Thanks, Alex
Alexander Duyck
2012-Apr-11 16:31 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On 04/11/2012 01:00 AM, Ian Campbell wrote:> On Tue, 2012-04-10 at 20:15 +0100, Alexander Duyck wrote: >> On 04/10/2012 11:41 AM, Eric Dumazet wrote: >>> On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote: >>> >>>> Have you checked this for 32 bit as well as 64? Based on my math your >>>> next patch will still mess up the memset on 32 bit with the structure >>>> being split somewhere just in front of hwtstamps. >>>> >>>> Why not just take frags and move it to the start of the structure? It >>>> is already an unknown value because it can be either 16 or 17 depending >>>> on the value of PAGE_SIZE, and since you are making changes to frags the >>>> changes wouldn''t impact the alignment of the other values later on since >>>> you are aligning the end of the structure. That way you would be >>>> guaranteed that all of the fields that will be memset would be in the >>>> last 64 bytes. >>>> >>> Now when a fragmented packet is copied in pskb_expand_head(), you access >>> two separate zones of memory to copy the shinfo. But its supposed to be >>> slow path. >>> >>> Problem with this is that the offsets of often used fields will be big >>> (instead of being < 127) and code will be bigger on x86. >> Actually now that I think about it my concerns go much further than the >> memset. I''m convinced that this is going to cause a pretty significant >> performance regression on multiple drivers, especially on non x86_64 >> architecture. What we have right now on most platforms is a >> skb_shared_info structure in which everything up to and including frag 0 >> is all in one cache line. This gives us pretty good performance for igb >> and ixgbe since that is our common case when jumbo frames are not >> enabled is to split the head and place the data in a page. > With all the changes in this series it is still possible to fit a > maximum standard MTU frame and the shinfo on the same 4K page while also > have the skb_shared_info up to and including frag [0] aligned to the > same 64 byte cache line. > > The only exception is destructor_arg on 64 bit which is on the preceding > cache line but that is not a field used in any hot path.The problem I have is that this is only true on x86_64. Proper work hasn''t been done to guarantee this on any other architectures. I think what I would like to see is instead of just setting things up and hoping it comes out cache aligned on nr_frags why not take steps to guarantee it? You could do something like place and size the structure based on: SKB_DATA_ALIGN(sizeof(skb_shared_info) - offsetof(struct skb_shared_info, nr_frags)) + offsetof(struct skb_shared_info, nr_frags) That way you would have your alignment still guaranteed based off of the end of the structure, but anything placed before nr_frags would be placed on the end of the previous cache line.>> However the change being recommend here only resolves the issue for one >> specific architecture, and that is what I don''t agree with. What we >> need is a solution that also works for 64K pages or 32 bit pointers and >> I am fairly certain this current solution does not. > I think it does work for 32 bit pointers. What issue to do you see with > 64K pages? > > Ian.With 64K pages the MAX_SKB_FRAGS value drops from 17 to 16. That will undoubtedly mess up the alignment. Thanks, Alex
Ian Campbell
2012-Apr-11 17:00 UTC
Re: [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On Wed, 2012-04-11 at 17:31 +0100, Alexander Duyck wrote:> On 04/11/2012 01:00 AM, Ian Campbell wrote: > > On Tue, 2012-04-10 at 20:15 +0100, Alexander Duyck wrote: > >> On 04/10/2012 11:41 AM, Eric Dumazet wrote: > >>> On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote: > >>> > >>>> Have you checked this for 32 bit as well as 64? Based on my math your > >>>> next patch will still mess up the memset on 32 bit with the structure > >>>> being split somewhere just in front of hwtstamps. > >>>> > >>>> Why not just take frags and move it to the start of the structure? It > >>>> is already an unknown value because it can be either 16 or 17 depending > >>>> on the value of PAGE_SIZE, and since you are making changes to frags the > >>>> changes wouldn''t impact the alignment of the other values later on since > >>>> you are aligning the end of the structure. That way you would be > >>>> guaranteed that all of the fields that will be memset would be in the > >>>> last 64 bytes. > >>>> > >>> Now when a fragmented packet is copied in pskb_expand_head(), you access > >>> two separate zones of memory to copy the shinfo. But its supposed to be > >>> slow path. > >>> > >>> Problem with this is that the offsets of often used fields will be big > >>> (instead of being < 127) and code will be bigger on x86. > >> Actually now that I think about it my concerns go much further than the > >> memset. I''m convinced that this is going to cause a pretty significant > >> performance regression on multiple drivers, especially on non x86_64 > >> architecture. What we have right now on most platforms is a > >> skb_shared_info structure in which everything up to and including frag 0 > >> is all in one cache line. This gives us pretty good performance for igb > >> and ixgbe since that is our common case when jumbo frames are not > >> enabled is to split the head and place the data in a page. > > With all the changes in this series it is still possible to fit a > > maximum standard MTU frame and the shinfo on the same 4K page while also > > have the skb_shared_info up to and including frag [0] aligned to the > > same 64 byte cache line. > > > > The only exception is destructor_arg on 64 bit which is on the preceding > > cache line but that is not a field used in any hot path. > The problem I have is that this is only true on x86_64. Proper work > hasn''t been done to guarantee this on any other architectures.FWIW I did also explicitly cover i386 (see <1334130984.12209.195.camel@dagon.hellion.org.uk>)> I think what I would like to see is instead of just setting things up > and hoping it comes out cache aligned on nr_frags why not take steps to > guarantee it? You could do something like place and size the structure > based on: > SKB_DATA_ALIGN(sizeof(skb_shared_info) - offsetof(struct > skb_shared_info, nr_frags)) + offsetof(struct skb_shared_info, nr_frags) > > That way you would have your alignment still guaranteed based off of the > end of the structure, but anything placed before nr_frags would be > placed on the end of the previous cache line. > > >> However the change being recommend here only resolves the issue for one > >> specific architecture, and that is what I don''t agree with. What we > >> need is a solution that also works for 64K pages or 32 bit pointers and > >> I am fairly certain this current solution does not. > > I think it does work for 32 bit pointers. What issue to do you see with > > 64K pages? > > > > Ian. > With 64K pages the MAX_SKB_FRAGS value drops from 17 to 16. That will > undoubtedly mess up the alignment.Oh, I see. Need to think about this some more but your suggestion above is an interesting one, I''ll see what I can do with that. Ian.