Xuan Zhuo
2021-Jun-01 05:59 UTC
[PATCH net 2/2] virtio-net: get build_skb() buf by data ptr
On Tue, 1 Jun 2021 11:27:12 +0800, Jason Wang <jasowang at redhat.com> wrote:> > ? 2021/6/1 ??11:08, Xuan Zhuo ??: > > On Tue, 1 Jun 2021 11:03:37 +0800, Jason Wang <jasowang at redhat.com> wrote: > >> ? 2021/5/31 ??6:58, Xuan Zhuo ??: > >>> On Mon, 31 May 2021 14:10:55 +0800, Jason Wang <jasowang at redhat.com> wrote: > >>>> ? 2021/5/14 ??11:16, Xuan Zhuo ??: > >>>>> In the case of merge, the page passed into page_to_skb() may be a head > >>>>> page, not the page where the current data is located. > >>>> I don't get how this can happen? > >>>> > >>>> Maybe you can explain a little bit more? > >>>> > >>>> receive_mergeable() call page_to_skb() in two places: > >>>> > >>>> 1) XDP_PASS for linearized page , in this case we use xdp_page > >>>> 2) page_to_skb() for "normal" page, in this case the page contains the data > >>> The offset may be greater than PAGE_SIZE, because page is obtained by > >>> virt_to_head_page(), not the page where buf is located. And "offset" is the offset > >>> of buf relative to page. > >>> > >>> tailroom = truesize - len - offset; > >>> > >>> In this case, the tailroom must be less than 0. Although there may be enough > >>> content on this page to save skb_shared_info. > >> > >> Interesting, I think we don't use compound pages for virtio-net. (We > >> don't define SKB_FRAG_PAGE_ORDER). > >> > >> Am I wrong? > > > > It seems to me that it seems to be a fixed setting, not for us to configure > > independently > > > Looks like you are right. > > See comments below. > > > > > > Thanks. > > > > =========================================> > > > net/sock.c > > > > #define SKB_FRAG_PAGE_ORDER get_order(32768) > > DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); > > > > /** > > * skb_page_frag_refill - check that a page_frag contains enough room > > * @sz: minimum size of the fragment we want to get > > * @pfrag: pointer to page_frag > > * @gfp: priority for memory allocation > > * > > * Note: While this allocator tries to use high order pages, there is > > * no guarantee that allocations succeed. Therefore, @sz MUST be > > * less or equal than PAGE_SIZE. > > */ > > bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > > { > > if (pfrag->page) { > > if (page_ref_count(pfrag->page) == 1) { > > pfrag->offset = 0; > > return true; > > } > > if (pfrag->offset + sz <= pfrag->size) > > return true; > > put_page(pfrag->page); > > } > > > > pfrag->offset = 0; > > if (SKB_FRAG_PAGE_ORDER && > > !static_branch_unlikely(&net_high_order_alloc_disable_key)) { > > /* Avoid direct reclaim but allow kswapd to wake */ > > pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) | > > __GFP_COMP | __GFP_NOWARN | > > __GFP_NORETRY, > > SKB_FRAG_PAGE_ORDER); > > if (likely(pfrag->page)) { > > pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER; > > return true; > > } > > } > > pfrag->page = alloc_page(gfp); > > if (likely(pfrag->page)) { > > pfrag->size = PAGE_SIZE; > > return true; > > } > > return false; > > } > > EXPORT_SYMBOL(skb_page_frag_refill); > > > > > >> Thanks > >> > >> > >>> Thanks. > >>> > >>>> Thanks > >>>> > >>>> > >>>>> So when trying to > >>>>> get the buf where the data is located, you should directly use the > >>>>> pointer(p) to get the address corresponding to the page. > >>>>> > >>>>> At the same time, the offset of the data in the page should also be > >>>>> obtained using offset_in_page(). > >>>>> > >>>>> This patch solves this problem. But if you don?t use this patch, the > >>>>> original code can also run, because if the page is not the page of the > >>>>> current data, the calculated tailroom will be less than 0, and will not > >>>>> enter the logic of build_skb() . The significance of this patch is to > >>>>> modify this logical problem, allowing more situations to use > >>>>> build_skb(). > >>>>> > >>>>> Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> > >>>>> Acked-by: Michael S. Tsirkin <mst at redhat.com> > >>>>> --- > >>>>> drivers/net/virtio_net.c | 8 ++++++-- > >>>>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>>>> > >>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>>> index 3e46c12dde08..073fec4c0df1 100644 > >>>>> --- a/drivers/net/virtio_net.c > >>>>> +++ b/drivers/net/virtio_net.c > >>>>> @@ -407,8 +407,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, > >>>>> * see add_recvbuf_mergeable() + get_mergeable_buf_len() > >>>>> */ > >>>>> truesize = PAGE_SIZE; > >>>>> - tailroom = truesize - len - offset; > >>>>> - buf = page_address(page); > >>>>> + > >>>>> + /* page maybe head page, so we should get the buf by p, not the > >>>>> + * page > >>>>> + */ > >>>>> + tailroom = truesize - len - offset_in_page(p); > > > I wonder why offset_in_page(p) is correct? I guess it should be: > > tailroom = truesize - len - headroom; > > The reason is that the buffer is not necessarily allocated at the page > boundary.In my understanding, offset_in_page(p) is the offset of p in the page where it is located. In this case, the two should be equal. This has nothing to do with which page is allocated. Of course I think using headroom is a good idea, and it is semantically better. Thanks.> > Thanks > > > >>>>> + buf = (char *)((unsigned long)p & PAGE_MASK); > >>>>> } else { > >>>>> tailroom = truesize - len; > >>>>> buf = p; >
Jason Wang
2021-Jun-01 06:17 UTC
[PATCH net 2/2] virtio-net: get build_skb() buf by data ptr
? 2021/6/1 ??1:59, Xuan Zhuo ??:> On Tue, 1 Jun 2021 11:27:12 +0800, Jason Wang <jasowang at redhat.com> wrote: >> ? 2021/6/1 ??11:08, Xuan Zhuo ??: >>> On Tue, 1 Jun 2021 11:03:37 +0800, Jason Wang <jasowang at redhat.com> wrote: >>>> ? 2021/5/31 ??6:58, Xuan Zhuo ??: >>>>> On Mon, 31 May 2021 14:10:55 +0800, Jason Wang <jasowang at redhat.com> wrote: >>>>>> ? 2021/5/14 ??11:16, Xuan Zhuo ??: >>>>>>> In the case of merge, the page passed into page_to_skb() may be a head >>>>>>> page, not the page where the current data is located. >>>>>> I don't get how this can happen? >>>>>> >>>>>> Maybe you can explain a little bit more? >>>>>> >>>>>> receive_mergeable() call page_to_skb() in two places: >>>>>> >>>>>> 1) XDP_PASS for linearized page , in this case we use xdp_page >>>>>> 2) page_to_skb() for "normal" page, in this case the page contains the data >>>>> The offset may be greater than PAGE_SIZE, because page is obtained by >>>>> virt_to_head_page(), not the page where buf is located. And "offset" is the offset >>>>> of buf relative to page. >>>>> >>>>> tailroom = truesize - len - offset; >>>>> >>>>> In this case, the tailroom must be less than 0. Although there may be enough >>>>> content on this page to save skb_shared_info. >>>> Interesting, I think we don't use compound pages for virtio-net. (We >>>> don't define SKB_FRAG_PAGE_ORDER). >>>> >>>> Am I wrong? >>> It seems to me that it seems to be a fixed setting, not for us to configure >>> independently >> >> Looks like you are right. >> >> See comments below. >> >> >>> Thanks. >>> >>> =========================================>>> >>> net/sock.c >>> >>> #define SKB_FRAG_PAGE_ORDER get_order(32768) >>> DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); >>> >>> /** >>> * skb_page_frag_refill - check that a page_frag contains enough room >>> * @sz: minimum size of the fragment we want to get >>> * @pfrag: pointer to page_frag >>> * @gfp: priority for memory allocation >>> * >>> * Note: While this allocator tries to use high order pages, there is >>> * no guarantee that allocations succeed. Therefore, @sz MUST be >>> * less or equal than PAGE_SIZE. >>> */ >>> bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) >>> { >>> if (pfrag->page) { >>> if (page_ref_count(pfrag->page) == 1) { >>> pfrag->offset = 0; >>> return true; >>> } >>> if (pfrag->offset + sz <= pfrag->size) >>> return true; >>> put_page(pfrag->page); >>> } >>> >>> pfrag->offset = 0; >>> if (SKB_FRAG_PAGE_ORDER && >>> !static_branch_unlikely(&net_high_order_alloc_disable_key)) { >>> /* Avoid direct reclaim but allow kswapd to wake */ >>> pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) | >>> __GFP_COMP | __GFP_NOWARN | >>> __GFP_NORETRY, >>> SKB_FRAG_PAGE_ORDER); >>> if (likely(pfrag->page)) { >>> pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER; >>> return true; >>> } >>> } >>> pfrag->page = alloc_page(gfp); >>> if (likely(pfrag->page)) { >>> pfrag->size = PAGE_SIZE; >>> return true; >>> } >>> return false; >>> } >>> EXPORT_SYMBOL(skb_page_frag_refill); >>> >>> >>>> Thanks >>>> >>>> >>>>> Thanks. >>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>>> So when trying to >>>>>>> get the buf where the data is located, you should directly use the >>>>>>> pointer(p) to get the address corresponding to the page. >>>>>>> >>>>>>> At the same time, the offset of the data in the page should also be >>>>>>> obtained using offset_in_page(). >>>>>>> >>>>>>> This patch solves this problem. But if you don?t use this patch, the >>>>>>> original code can also run, because if the page is not the page of the >>>>>>> current data, the calculated tailroom will be less than 0, and will not >>>>>>> enter the logic of build_skb() . The significance of this patch is to >>>>>>> modify this logical problem, allowing more situations to use >>>>>>> build_skb(). >>>>>>> >>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> >>>>>>> Acked-by: Michael S. Tsirkin <mst at redhat.com> >>>>>>> --- >>>>>>> drivers/net/virtio_net.c | 8 ++++++-- >>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>>>> index 3e46c12dde08..073fec4c0df1 100644 >>>>>>> --- a/drivers/net/virtio_net.c >>>>>>> +++ b/drivers/net/virtio_net.c >>>>>>> @@ -407,8 +407,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, >>>>>>> * see add_recvbuf_mergeable() + get_mergeable_buf_len() >>>>>>> */ >>>>>>> truesize = PAGE_SIZE; >>>>>>> - tailroom = truesize - len - offset; >>>>>>> - buf = page_address(page); >>>>>>> + >>>>>>> + /* page maybe head page, so we should get the buf by p, not the >>>>>>> + * page >>>>>>> + */ >>>>>>> + tailroom = truesize - len - offset_in_page(p); >> >> I wonder why offset_in_page(p) is correct? I guess it should be: >> >> tailroom = truesize - len - headroom; >> >> The reason is that the buffer is not necessarily allocated at the page >> boundary. > In my understanding, offset_in_page(p) is the offset of p in the page where it > is located. In this case, the two should be equal.I think not, if the frag is not page aligned. offset_in_page(p) doesn't equal to headroom. Consider the case that the frag start from page offset 1500.> This has nothing to do with > which page is allocated. > > Of course I think using headroom is a good idea, and it is semantically better. > > Thanks.Thanks> >> Thanks >> >> >>>>>>> + buf = (char *)((unsigned long)p & PAGE_MASK); >>>>>>> } else { >>>>>>> tailroom = truesize - len; >>>>>>> buf = p;