Michael S. Tsirkin
2013-Nov-19  20:49 UTC
[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:> On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote: > > We need to drop the refcnt of page when we fail to allocate an skb for frag > > list, otherwise it will be leaked. The bug was introduced by commit > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx > > buffers to page frag allocators"). > > > > Cc: Michael Dalton <mwdalton at google.com> > > Cc: Eric Dumazet <edumazet at google.com> > > Cc: Rusty Russell <rusty at rustcorp.com.au> > > Cc: Michael S. Tsirkin <mst at redhat.com> > > Signed-off-by: Jason Wang <jasowang at redhat.com> > > --- > > The patch was needed for 3.12 stable. > > Good catch, but if we return from receive_mergeable() in the 'middle' > of the frags we would need for the current skb, who will > call the virtqueue_get_buf() to flush the remaining frags ? > > Don't we also need to call virtqueue_get_buf() like > > while (--num_buf) { > buf = virtqueue_get_buf(rq->vq, &len); > if (!buf) > break; > put_page(virt_to_head_page(buf)); > } > > ? > >Let me explain what worries me in your suggestion: struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC); if (unlikely(!nskb)) { head_skb->dev->stats.rx_dropped++; return -ENOMEM; } is this the failure case we are talking about? I think this is a symprom of a larger problem introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a, namely that we now need to allocate memory in the middle of processing a packet. I think discarding a completely valid and well-formed packet from the receive queue because we are unable to allocate new memory with GFP_ATOMIC for future packets is not a good idea. It certainly violates the principle of least surprize: when one sees host pass packet to guest, one expects the packet to get into the networking stack, not get dropped by the driver internally. Guest stack can do with the packet what it sees fit. We actually wake up a thread if we can't fill up the queue, that will fill it up in GFP_KERNEL context. So I think we should find a way to pre-allocate if necessary and avoid error paths where allocating new memory is a required to avoid drops. -- MST
Eric Dumazet
2013-Nov-19  21:36 UTC
[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote:> On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote: > > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote: > > > We need to drop the refcnt of page when we fail to allocate an skb for frag > > > list, otherwise it will be leaked. The bug was introduced by commit > > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx > > > buffers to page frag allocators"). > > > > > > Cc: Michael Dalton <mwdalton at google.com> > > > Cc: Eric Dumazet <edumazet at google.com> > > > Cc: Rusty Russell <rusty at rustcorp.com.au> > > > Cc: Michael S. Tsirkin <mst at redhat.com> > > > Signed-off-by: Jason Wang <jasowang at redhat.com> > > > --- > > > The patch was needed for 3.12 stable. > > > > Good catch, but if we return from receive_mergeable() in the 'middle' > > of the frags we would need for the current skb, who will > > call the virtqueue_get_buf() to flush the remaining frags ? > > > > Don't we also need to call virtqueue_get_buf() like > > > > while (--num_buf) { > > buf = virtqueue_get_buf(rq->vq, &len); > > if (!buf) > > break; > > put_page(virt_to_head_page(buf)); > > } > > > > ? > > > > > > > Let me explain what worries me in your suggestion: > > struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC); > if (unlikely(!nskb)) { > head_skb->dev->stats.rx_dropped++; > return -ENOMEM; > } > > is this the failure case we are talking about?I thought Jason patch was about this, no ?> > I think this is a symprom of a larger problem > introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a, > namely that we now need to allocate memory in the > middle of processing a packet. > > > I think discarding a completely valid and well-formed > packet from the receive queue because we are unable > to allocate new memory with GFP_ATOMIC > for future packets is not a good idea.How is it different with NIC processing in RX path ?> > It certainly violates the principle of least surprize: > when one sees host pass packet to guest, one expects > the packet to get into the networking stack, not get > dropped by the driver internally. > Guest stack can do with the packet what it sees fit. > > We actually wake up a thread if we can't fill up the queue, > that will fill it up in GFP_KERNEL context. > > So I think we should find a way to pre-allocate if necessary and avoid > error paths where allocating new memory is a required to avoid drops. >Really, under ATOMIC context, there is no way you can avoid dropping packets if you cannot allocate memory. If you cannot allocate sk_buff (256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the payload of next packets anyway. Same problem on a real NIC. Under memory pressure we _do_ packet drops. Nobody really complained. Sure, you can add yet another cache of pre-allocated skbs and pay the price of managing yet another cache layer, but still need to trop packets under stress. Pre-allocating skb on real NIC has a performance cost, because we clear sk_buff way ahead of time. By the time skb is finally received, cpu has to bring back into its cache memory cache lines.
Michael Dalton
2013-Nov-19  21:38 UTC
[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
Great catch Jason. I agree this now raises the larger issue of how to handle a memory alloc failure in the middle of receive. As Eric mentioned, we can drop the packet and free the remaining (num_buf) frags. Michael, perhaps I'm missing something, but why would you prefer pre-allocating buffers in this case? If the guest kernel is OOM'ing, dropping packets should provide backpressure. Also, we could just as easily fail the initial skb alloc in page_to_skb, and I think that case also needs to be handled now in the same fashion as a memory allocation failure in receive_mergeable. Best, Mike
Michael S. Tsirkin
2013-Nov-19  21:53 UTC
[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Tue, Nov 19, 2013 at 01:36:36PM -0800, Eric Dumazet wrote:> On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote: > > On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote: > > > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote: > > > > We need to drop the refcnt of page when we fail to allocate an skb for frag > > > > list, otherwise it will be leaked. The bug was introduced by commit > > > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx > > > > buffers to page frag allocators"). > > > > > > > > Cc: Michael Dalton <mwdalton at google.com> > > > > Cc: Eric Dumazet <edumazet at google.com> > > > > Cc: Rusty Russell <rusty at rustcorp.com.au> > > > > Cc: Michael S. Tsirkin <mst at redhat.com> > > > > Signed-off-by: Jason Wang <jasowang at redhat.com> > > > > --- > > > > The patch was needed for 3.12 stable. > > > > > > Good catch, but if we return from receive_mergeable() in the 'middle' > > > of the frags we would need for the current skb, who will > > > call the virtqueue_get_buf() to flush the remaining frags ? > > > > > > Don't we also need to call virtqueue_get_buf() like > > > > > > while (--num_buf) { > > > buf = virtqueue_get_buf(rq->vq, &len); > > > if (!buf) > > > break; > > > put_page(virt_to_head_page(buf)); > > > } > > > > > > ? > > > > > > > > > > > > Let me explain what worries me in your suggestion: > > > > struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC); > > if (unlikely(!nskb)) { > > head_skb->dev->stats.rx_dropped++; > > return -ENOMEM; > > } > > > > is this the failure case we are talking about? > > I thought Jason patch was about this, no ? > > > > > I think this is a symprom of a larger problem > > introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a, > > namely that we now need to allocate memory in the > > middle of processing a packet. > > > > > > I think discarding a completely valid and well-formed > > packet from the receive queue because we are unable > > to allocate new memory with GFP_ATOMIC > > for future packets is not a good idea. > > How is it different with NIC processing in RX path ?Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a it didn't drop packets received from host as far as I can tell. virtio is more like a pipe than a real NIC in this respect.> > > > It certainly violates the principle of least surprize: > > when one sees host pass packet to guest, one expects > > the packet to get into the networking stack, not get > > dropped by the driver internally. > > Guest stack can do with the packet what it sees fit. > > > > We actually wake up a thread if we can't fill up the queue, > > that will fill it up in GFP_KERNEL context. > > > > So I think we should find a way to pre-allocate if necessary and avoid > > error paths where allocating new memory is a required to avoid drops. > > > > Really, under ATOMIC context, there is no way you can avoid dropping > packets if you cannot allocate memory. If you cannot allocate sk_buff > (256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the > payload of next packets anyway.that's why we do: if (!try_fill_recv(rq, GFP_ATOMIC)) schedule_delayed_work(&vi->refill, 0); the queues are large enough for a single failure not to be an immediate problem.> Same problem on a real NIC. > > Under memory pressure we _do_ packet drops. > Nobody really complained. > > Sure, you can add yet another cache of pre-allocated skbs and pay the > price of managing yet another cache layer, but still need to trop > packets under stress.We don't need a cache even. Just enough to avoid dropping packets if allocation failed in the middle so we don't dequeue a buffer and then drop it. Once we use this reserved skb, we stop processing the queue until refill gives it back.> Pre-allocating skb on real NIC has a performance cost, because we clear > sk_buff way ahead of time. By the time skb is finally received, cpu has > to bring back into its cache memory cache lines. >Alternatively we can pre-allocate the memory but avoid clearing it maybe? -- MST
Jason Wang
2013-Nov-20  03:05 UTC
[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On 11/20/2013 04:49 AM, Michael S. Tsirkin wrote:> On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote: >> On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote: >>> We need to drop the refcnt of page when we fail to allocate an skb for frag >>> list, otherwise it will be leaked. The bug was introduced by commit >>> 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx >>> buffers to page frag allocators"). >>> >>> Cc: Michael Dalton <mwdalton at google.com> >>> Cc: Eric Dumazet <edumazet at google.com> >>> Cc: Rusty Russell <rusty at rustcorp.com.au> >>> Cc: Michael S. Tsirkin <mst at redhat.com> >>> Signed-off-by: Jason Wang <jasowang at redhat.com> >>> --- >>> The patch was needed for 3.12 stable. >> Good catch, but if we return from receive_mergeable() in the 'middle' >> of the frags we would need for the current skb, who will >> call the virtqueue_get_buf() to flush the remaining frags ? >> >> Don't we also need to call virtqueue_get_buf() like >> >> while (--num_buf) { >> buf = virtqueue_get_buf(rq->vq, &len); >> if (!buf) >> break; >> put_page(virt_to_head_page(buf)); >> } >> >> ? >> >> > > Let me explain what worries me in your suggestion: > > struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC); > if (unlikely(!nskb)) { > head_skb->dev->stats.rx_dropped++; > return -ENOMEM; > } > > is this the failure case we are talking about? > > I think this is a symprom of a larger problem > introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a, > namely that we now need to allocate memory in the > middle of processing a packet. > > > I think discarding a completely valid and well-formed > packet from the receive queue because we are unable > to allocate new memory with GFP_ATOMIC > for future packets is not a good idea. > > It certainly violates the principle of least surprize: > when one sees host pass packet to guest, one expects > the packet to get into the networking stack, not get > dropped by the driver internally. > Guest stack can do with the packet what it sees fit. > > We actually wake up a thread if we can't fill up the queue, > that will fill it up in GFP_KERNEL context. > > So I think we should find a way to pre-allocate if necessary and avoid > error paths where allocating new memory is a required to avoid drops. >The problem happens only on memory pressure, this pre-allocation may add more stress on this.
Michael S. Tsirkin
2013-Nov-20  09:06 UTC
[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Tue, Nov 19, 2013 at 01:38:16PM -0800, Michael Dalton wrote:> Great catch Jason. I agree this now raises the larger issue of how to > handle a memory alloc failure in the middle of receive. As Eric mentioned, > we can drop the packet and free the remaining (num_buf) frags. > > Michael, perhaps I'm missing something, but why would you prefer > pre-allocating buffers in this case? If the guest kernel is OOM'ing, > dropping packets should provide backpressure. > > Also, we could just as easily fail the initial skb alloc in page_to_skb, > and I think that case also needs to be handled now in the same fashion as > a memory allocation failure in receive_mergeable. > > Best, > > MikeYes I missed this last night. Thanks a lot Eric and Michael for pointing this out.
Reasonably Related Threads
- [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
- [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
- [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
- [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
- [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb