thr3ads.net - Linux Virtualization - [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2013-Nov-19 20:49 UTC

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet
wrote:> On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > We need to drop the refcnt of page when we fail to allocate an skb for
frag
> > list, otherwise it will be leaked. The bug was introduced by commit
> > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate
mergeable rx
> > buffers to page frag allocators").
> > 
> > Cc: Michael Dalton <mwdalton at google.com>
> > Cc: Eric Dumazet <edumazet at google.com>
> > Cc: Rusty Russell <rusty at rustcorp.com.au>
> > Cc: Michael S. Tsirkin <mst at redhat.com>
> > Signed-off-by: Jason Wang <jasowang at redhat.com>
> > ---
> > The patch was needed for 3.12 stable.
> 
> Good catch, but if we return from receive_mergeable() in the
'middle'
> of the frags we would need for the current skb, who will
> call the virtqueue_get_buf() to flush the remaining frags ?
> 
> Don't we also need to call virtqueue_get_buf() like 
> 
> while (--num_buf) {
>     buf = virtqueue_get_buf(rq->vq, &len);
>     if (!buf)
>         break;
>     put_page(virt_to_head_page(buf));
> }
> 
> ?
> 
> 

Let me explain what worries me in your suggestion:

                        struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
                        if (unlikely(!nskb)) {
                                head_skb->dev->stats.rx_dropped++;
                                return -ENOMEM;
                        }

is this the failure case we are talking about?

I think this is a symprom of a larger problem
introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
namely that we now need to allocate memory in the
middle of processing a packet.

I think discarding a completely valid and well-formed
packet from the receive queue because we are unable
to allocate new memory with GFP_ATOMIC
for future packets is not a good idea.

It certainly violates the principle of least surprize:
when one sees host pass packet to guest, one expects
the packet to get into the networking stack, not get
dropped by the driver internally.
Guest stack can do with the packet what it sees fit.

We actually wake up a thread if we can't fill up the queue,
that will fill it up in GFP_KERNEL context.

So I think we should find a way to pre-allocate if necessary and avoid
error paths where allocating new memory is a required to avoid drops.

-- 
MST

Eric Dumazet

2013-Nov-19 21:36 UTC

head link

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin
wrote:> On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
> > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > > We need to drop the refcnt of page when we fail to allocate an
skb for frag
> > > list, otherwise it will be leaked. The bug was introduced by
commit
> > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net:
migrate mergeable rx
> > > buffers to page frag allocators").
> > > 
> > > Cc: Michael Dalton <mwdalton at google.com>
> > > Cc: Eric Dumazet <edumazet at google.com>
> > > Cc: Rusty Russell <rusty at rustcorp.com.au>
> > > Cc: Michael S. Tsirkin <mst at redhat.com>
> > > Signed-off-by: Jason Wang <jasowang at redhat.com>
> > > ---
> > > The patch was needed for 3.12 stable.
> > 
> > Good catch, but if we return from receive_mergeable() in the
'middle'
> > of the frags we would need for the current skb, who will
> > call the virtqueue_get_buf() to flush the remaining frags ?
> > 
> > Don't we also need to call virtqueue_get_buf() like 
> > 
> > while (--num_buf) {
> >     buf = virtqueue_get_buf(rq->vq, &len);
> >     if (!buf)
> >         break;
> >     put_page(virt_to_head_page(buf));
> > }
> > 
> > ?
> > 
> > 
> 
> 
> Let me explain what worries me in your suggestion:
> 
>                         struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
>                         if (unlikely(!nskb)) {
>                                 head_skb->dev->stats.rx_dropped++;
>                                 return -ENOMEM;
>                         }
> 
> is this the failure case we are talking about?
I thought Jason patch was about this, no ?
> 
> I think this is a symprom of a larger problem
> introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
> namely that we now need to allocate memory in the
> middle of processing a packet.
> 
> 
> I think discarding a completely valid and well-formed
> packet from the receive queue because we are unable
> to allocate new memory with GFP_ATOMIC
> for future packets is not a good idea.
How is it different with NIC processing in RX path ?
> 
> It certainly violates the principle of least surprize:
> when one sees host pass packet to guest, one expects
> the packet to get into the networking stack, not get
> dropped by the driver internally.
> Guest stack can do with the packet what it sees fit.
> 
> We actually wake up a thread if we can't fill up the queue,
> that will fill it up in GFP_KERNEL context.
> 
> So I think we should find a way to pre-allocate if necessary and avoid
> error paths where allocating new memory is a required to avoid drops.
> 
Really, under ATOMIC context, there is no way you can avoid dropping
packets if you cannot allocate memory. If you cannot allocate sk_buff
(256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the
payload of next packets anyway. 

Same problem on a real NIC.

Under memory pressure we _do_ packet drops.
Nobody really complained.

Sure, you can add yet another cache of pre-allocated skbs and pay the
price of managing yet another cache layer, but still need to trop
packets under stress.

Pre-allocating skb on real NIC has a performance cost, because we clear
sk_buff way ahead of time. By the time skb is finally received, cpu has
to bring back into its cache memory cache lines.

Michael Dalton

2013-Nov-19 21:38 UTC

head link

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

Great catch Jason. I agree this now raises the larger issue of how to
handle a memory alloc failure in the middle of receive. As Eric mentioned,
we can drop the packet and free the remaining (num_buf) frags.

Michael, perhaps I'm missing something, but why would you prefer
pre-allocating buffers in this case? If the guest kernel is OOM'ing,
dropping packets should provide backpressure.

Also, we could just as easily fail the initial skb alloc in page_to_skb,
and I think that case also needs to be handled now in the same fashion as
a memory allocation failure in receive_mergeable.

Best,

Mike

Michael S. Tsirkin

2013-Nov-19 21:53 UTC

head link

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

On Tue, Nov 19, 2013 at 01:36:36PM -0800, Eric Dumazet
wrote:> On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
> > > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > > > We need to drop the refcnt of page when we fail to allocate
an skb for frag
> > > > list, otherwise it will be leaked. The bug was introduced by
commit
> > > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net:
migrate mergeable rx
> > > > buffers to page frag allocators").
> > > > 
> > > > Cc: Michael Dalton <mwdalton at google.com>
> > > > Cc: Eric Dumazet <edumazet at google.com>
> > > > Cc: Rusty Russell <rusty at rustcorp.com.au>
> > > > Cc: Michael S. Tsirkin <mst at redhat.com>
> > > > Signed-off-by: Jason Wang <jasowang at redhat.com>
> > > > ---
> > > > The patch was needed for 3.12 stable.
> > > 
> > > Good catch, but if we return from receive_mergeable() in the
'middle'
> > > of the frags we would need for the current skb, who will
> > > call the virtqueue_get_buf() to flush the remaining frags ?
> > > 
> > > Don't we also need to call virtqueue_get_buf() like 
> > > 
> > > while (--num_buf) {
> > >     buf = virtqueue_get_buf(rq->vq, &len);
> > >     if (!buf)
> > >         break;
> > >     put_page(virt_to_head_page(buf));
> > > }
> > > 
> > > ?
> > > 
> > > 
> > 
> > 
> > Let me explain what worries me in your suggestion:
> > 
> >                         struct sk_buff *nskb = alloc_skb(0,
GFP_ATOMIC);
> >                         if (unlikely(!nskb)) {
> >                                
head_skb->dev->stats.rx_dropped++;
> >                                 return -ENOMEM;
> >                         }
> > 
> > is this the failure case we are talking about?
> 
> I thought Jason patch was about this, no ?
> 
> > 
> > I think this is a symprom of a larger problem
> > introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
> > namely that we now need to allocate memory in the
> > middle of processing a packet.
> > 
> > 
> > I think discarding a completely valid and well-formed
> > packet from the receive queue because we are unable
> > to allocate new memory with GFP_ATOMIC
> > for future packets is not a good idea.
> 
> How is it different with NIC processing in RX path ?

Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
it didn't drop packets received from host as far as I can tell.
virtio is more like a pipe than a real NIC in this respect.
> > 
> > It certainly violates the principle of least surprize:
> > when one sees host pass packet to guest, one expects
> > the packet to get into the networking stack, not get
> > dropped by the driver internally.
> > Guest stack can do with the packet what it sees fit.
> > 
> > We actually wake up a thread if we can't fill up the queue,
> > that will fill it up in GFP_KERNEL context.
> > 
> > So I think we should find a way to pre-allocate if necessary and avoid
> > error paths where allocating new memory is a required to avoid drops.
> > 
> 
> Really, under ATOMIC context, there is no way you can avoid dropping
> packets if you cannot allocate memory. If you cannot allocate sk_buff
> (256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the
> payload of next packets anyway. 
that's why we do:

                if (!try_fill_recv(rq, GFP_ATOMIC))
                        schedule_delayed_work(&vi->refill, 0);


the queues are large enough for a single failure not to be
an immediate problem.

> Same problem on a real NIC.
> 
> Under memory pressure we _do_ packet drops.
> Nobody really complained.
>
> Sure, you can add yet another cache of pre-allocated skbs and pay the
> price of managing yet another cache layer, but still need to trop
> packets under stress.
We don't need a cache even. Just enough to avoid dropping packets
if allocation failed in the middle so we don't dequeue a buffer and then
drop it.

Once we use this reserved skb, we stop processing the queue until
refill gives it back.
> Pre-allocating skb on real NIC has a performance cost, because we clear
> sk_buff way ahead of time. By the time skb is finally received, cpu has
> to bring back into its cache memory cache lines.
> 
Alternatively we can pre-allocate the memory but avoid clearing it maybe?

-- 
MST

Jason Wang

2013-Nov-20 03:05 UTC

head link

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

On 11/20/2013 04:49 AM, Michael S. Tsirkin wrote:> On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
>> On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
>>> We need to drop the refcnt of page when we fail to allocate an skb
for frag
>>> list, otherwise it will be leaked. The bug was introduced by commit
>>> 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate
mergeable rx
>>> buffers to page frag allocators").
>>>
>>> Cc: Michael Dalton <mwdalton at google.com>
>>> Cc: Eric Dumazet <edumazet at google.com>
>>> Cc: Rusty Russell <rusty at rustcorp.com.au>
>>> Cc: Michael S. Tsirkin <mst at redhat.com>
>>> Signed-off-by: Jason Wang <jasowang at redhat.com>
>>> ---
>>> The patch was needed for 3.12 stable.
>> Good catch, but if we return from receive_mergeable() in the
'middle'
>> of the frags we would need for the current skb, who will
>> call the virtqueue_get_buf() to flush the remaining frags ?
>>
>> Don't we also need to call virtqueue_get_buf() like 
>>
>> while (--num_buf) {
>>     buf = virtqueue_get_buf(rq->vq, &len);
>>     if (!buf)
>>         break;
>>     put_page(virt_to_head_page(buf));
>> }
>>
>> ?
>>
>>
>
> Let me explain what worries me in your suggestion:
>
>                         struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
>                         if (unlikely(!nskb)) {
>                                 head_skb->dev->stats.rx_dropped++;
>                                 return -ENOMEM;
>                         }
>
> is this the failure case we are talking about?
>
> I think this is a symprom of a larger problem
> introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
> namely that we now need to allocate memory in the
> middle of processing a packet.
>
>
> I think discarding a completely valid and well-formed
> packet from the receive queue because we are unable
> to allocate new memory with GFP_ATOMIC
> for future packets is not a good idea.
>
> It certainly violates the principle of least surprize:
> when one sees host pass packet to guest, one expects
> the packet to get into the networking stack, not get
> dropped by the driver internally.
> Guest stack can do with the packet what it sees fit.
>
> We actually wake up a thread if we can't fill up the queue,
> that will fill it up in GFP_KERNEL context.
>
> So I think we should find a way to pre-allocate if necessary and avoid
> error paths where allocating new memory is a required to avoid drops.
>
The problem happens only on memory pressure, this pre-allocation may add
more stress on this.

Michael S. Tsirkin

2013-Nov-20 09:06 UTC

head link

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

On Tue, Nov 19, 2013 at 01:38:16PM -0800, Michael Dalton
wrote:> Great catch Jason. I agree this now raises the larger issue of how to
> handle a memory alloc failure in the middle of receive. As Eric mentioned,
> we can drop the packet and free the remaining (num_buf) frags.
> 
> Michael, perhaps I'm missing something, but why would you prefer
> pre-allocating buffers in this case? If the guest kernel is OOM'ing,
> dropping packets should provide backpressure.
> 
> Also, we could just as easily fail the initial skb alloc in page_to_skb,
> and I think that case also needs to be handled now in the same fashion as
> a memory allocation failure in receive_mergeable.
> 
> Best,
> 
> Mike
Yes I missed this last night. Thanks a lot Eric and Michael for pointing
this out.

Maybe Matching Threads

Search for more reasonably related threads

Linux Virtualization - Nov 2013 - [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

Maybe Matching Threads