thr3ads.net - Xen devel - netback: Delayed copy alternative [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Zoltan Kiss

2013-Nov-13 20:29 UTC

netback: Delayed copy alternative

Hi,

I''m trying to forward port delayed copy to my new grant mapping
patches.
One important problem I''ve faced is that classic used 
gnttab_copy_grant_page to replace the granted page with a local copy and 
unmap the grant. And this function has never been upstreamed as only 
netback used it. Unfortunately upstreaming it is not a very easy task, 
as the kernel''s grant table infrastructure doesn''t track at
the moment
whether the page is DMA mapped or not. It is required because we 
shouldn''t proceed with the copy and replace if a device already mapped 
the page for DMA.
David came up with an alternative idea: we do this delayed copy because 
we don''t want the guest''s page to get stucked in Dom0
indefinitely. The
only realistic case for that would be if the egress interface would be 
an another guest''s vif, where the guest (either due to a bug or as a 
malicious attempt) doesn''t empty its ring. I think it''s a safe
assumption that Dom0 otherwise doesn''t hold on to packets for too long.
Or if it does, then that''s a bug we should fix instead of doing a copy 
of the packet.
If we accept that only other vif''s can keep the skb indefinitely, then 
an easier solution would be to handle this problem on the RX side: the 
RX thread can also check whether this skb hanged around for too long and 
drop it. Actually, xenvif_start_xmit already checks if the guest 
provided enough slots for us to do the grant copy. If I understand it 
correctly. What do you think about such an approach?

Regards,

Zoli

Paul Durrant

2013-Nov-14 09:42 UTC

head link

Re: netback: Delayed copy alternative

> -----Original Message-----
> From: Zoltan Kiss
> Sent: 13 November 2013 20:30
> To: xen-devel@lists.xenproject.org; Ian Campbell; Wei Liu; Paul Durrant;
> Malcolm Crossley; David Vrabel
> Cc: Jonathan Davies
> Subject: netback: Delayed copy alternative
> 
> Hi,
> 
> I''m trying to forward port delayed copy to my new grant mapping
patches.
> One important problem I''ve faced is that classic used
> gnttab_copy_grant_page to replace the granted page with a local copy and
> unmap the grant. And this function has never been upstreamed as only
> netback used it. Unfortunately upstreaming it is not a very easy task,
> as the kernel''s grant table infrastructure doesn''t track
at the moment
> whether the page is DMA mapped or not. It is required because we
> shouldn''t proceed with the copy and replace if a device already
mapped
> the page for DMA.
> David came up with an alternative idea: we do this delayed copy because
> we don''t want the guest''s page to get stucked in Dom0
indefinitely. The
> only realistic case for that would be if the egress interface would be
> an another guest''s vif, where the guest (either due to a bug or as
a
> malicious attempt) doesn''t empty its ring. I think it''s a
safe
> assumption that Dom0 otherwise doesn''t hold on to packets for too
long.
> Or if it does, then that''s a bug we should fix instead of doing a
copy
> of the packet.
> If we accept that only other vif''s can keep the skb indefinitely,
then
> an easier solution would be to handle this problem on the RX side: the
> RX thread can also check whether this skb hanged around for too long and
> drop it. Actually, xenvif_start_xmit already checks if the guest
> provided enough slots for us to do the grant copy. If I understand it
> correctly. What do you think about such an approach?
> 
Well, now that David fixed the DMA unmap tracking thing, I believe that another
vif is *generally* the only place an skb can hang around for a long time. The
problem is that there is an edge case... If a network driver turns off queue
processing (for flow control reasons, and NB that 10G Ethernet requires the
driver to do this if the PHY signals flow control and internal buffering is
exhausted, 1G is allowed to be an open drain) then the skb can sit in the queue
indefinitely and there''s no way you can deal with this from the guest
RX side of netback. You need to have a copy-aside option to handle this.

  Paul

David Vrabel

2013-Nov-14 11:04 UTC

head link

Re: netback: Delayed copy alternative

On 14/11/13 09:42, Paul Durrant wrote:> Zoltan Kiss wrote:
>>
>> I''m trying to forward port delayed copy to my new grant
mapping patches.
>> One important problem I''ve faced is that classic used
>> gnttab_copy_grant_page to replace the granted page with a local copy
and
>> unmap the grant. And this function has never been upstreamed as only
>> netback used it. Unfortunately upstreaming it is not a very easy task,
>> as the kernel''s grant table infrastructure doesn''t
track at the moment
>> whether the page is DMA mapped or not. It is required because we
>> shouldn''t proceed with the copy and replace if a device
already mapped
>> the page for DMA.
>>
>> David came up with an alternative idea: we do this delayed copy because
>> we don''t want the guest''s page to get stucked in Dom0
indefinitely. The
>> only realistic case for that would be if the egress interface would be
>> an another guest''s vif, where the guest (either due to a bug
or as a
>> malicious attempt) doesn''t empty its ring. I think
it''s a safe
>> assumption that Dom0 otherwise doesn''t hold on to packets for
too long.
>> Or if it does, then that''s a bug we should fix instead of
doing a copy
>> of the packet.
>>
>> If we accept that only other vif''s can keep the skb
indefinitely, then
>> an easier solution would be to handle this problem on the RX side: the
>> RX thread can also check whether this skb hanged around for too long
and
>> drop it. Actually, xenvif_start_xmit already checks if the guest
>> provided enough slots for us to do the grant copy. If I understand it
>> correctly. What do you think about such an approach?
>>
> 
> Well, now that David fixed the DMA unmap tracking thing, I believe
> that another vif is *generally* the only place an skb can hang around
> for a long time. The problem is that there is an edge case... If a
> network driver turns off queue processing (for flow control reasons, and
> NB that 10G Ethernet requires the driver to do this if the PHY signals
> flow control and internal buffering is exhausted, 1G is allowed to be an
> open drain) then the skb can sit in the queue indefinitely and
there''s
> no way you can deal with this from the guest RX side of netback. You
> need to have a copy-aside option to handle this.
But the delayed copy won''t always help in this case as the packet may
be
DMA mapped and queued on hardware queues.

David

Zoltan Kiss

2013-Nov-14 19:27 UTC

head link

Timout packets in device''s TX queue

Discussing this further with Paul, we came to the conclusion that 
probably the best solution would be to drop these packets in qdisc. 
Netback RX path stop accepting new packets if the target guest doesn''t 
have enough room in the ring. Also (AFAIK) NIC drivers do the same if 
they don''t have more resource for TX, and this is all good for us. Our 
problem is that the queue in qdisc layer (sorry if my terminology not 
clear!) can still accumulate these packets indefinitely. The drastic 
measure would be to reduce txqueuelen to 0 during setup for every 
affected device, but that''s not really nice.
Instead, we should be able to configure qdisc to timeout packets on 
those queues, at least the SKBs where (skb_shinfo(skb)->tx_flags & 
SKBTX_DEV_ZEROCOPY). I''m not that familiar with it to know if
that''s
already possible, or if not, then how good idea would it be to implement it.
I''ve changed the subject and included netdev and Eric, maybe someone
can
shed some more light on this question.

Regards,

Zoli

On 14/11/13 09:42, Paul Durrant wrote:>> -----Original Message-----
>> From: Zoltan Kiss
>> Sent: 13 November 2013 20:30
>> To: xen-devel@lists.xenproject.org; Ian Campbell; Wei Liu; Paul
Durrant;
>> Malcolm Crossley; David Vrabel
>> Cc: Jonathan Davies
>> Subject: netback: Delayed copy alternative
>>
>> Hi,
>>
>> I''m trying to forward port delayed copy to my new grant
mapping patches.
>> One important problem I''ve faced is that classic used
>> gnttab_copy_grant_page to replace the granted page with a local copy
and
>> unmap the grant. And this function has never been upstreamed as only
>> netback used it. Unfortunately upstreaming it is not a very easy task,
>> as the kernel''s grant table infrastructure doesn''t
track at the moment
>> whether the page is DMA mapped or not. It is required because we
>> shouldn''t proceed with the copy and replace if a device
already mapped
>> the page for DMA.
>> David came up with an alternative idea: we do this delayed copy because
>> we don''t want the guest''s page to get stucked in Dom0
indefinitely. The
>> only realistic case for that would be if the egress interface would be
>> an another guest''s vif, where the guest (either due to a bug
or as a
>> malicious attempt) doesn''t empty its ring. I think
it''s a safe
>> assumption that Dom0 otherwise doesn''t hold on to packets for
too long.
>> Or if it does, then that''s a bug we should fix instead of
doing a copy
>> of the packet.
>> If we accept that only other vif''s can keep the skb
indefinitely, then
>> an easier solution would be to handle this problem on the RX side: the
>> RX thread can also check whether this skb hanged around for too long
and
>> drop it. Actually, xenvif_start_xmit already checks if the guest
>> provided enough slots for us to do the grant copy. If I understand it
>> correctly. What do you think about such an approach?
>>
>
> Well, now that David fixed the DMA unmap tracking thing, I believe that
another vif is *generally* the only place an skb can hang around for a long
time. The problem is that there is an edge case... If a network driver turns off
queue processing (for flow control reasons, and NB that 10G Ethernet requires
the driver to do this if the PHY signals flow control and internal buffering is
exhausted, 1G is allowed to be an open drain) then the skb can sit in the queue
indefinitely and there''s no way you can deal with this from the guest
RX side of netback. You need to have a copy-aside option to handle this.
>
>    Paul
>

Zoltan Kiss

2013-Nov-19 16:42 UTC

head link

Re: netback: Delayed copy alternative

After further discussions and investigations, it seems it is a viable 
approach to drop the packets in the RX path of the another VIF after a 
timeout, and don''t care about the rest of the cases (packets get
stucked
somewhere in the core stack, a driver, or in the queue of a Dom0 
userspace socket. In the latter case, they get copied anyway, so it 
shouldn''t happen)
Does anyone has a counterargument?

Zoli

On 13/11/13 20:29, Zoltan Kiss wrote:> Hi,
>
> I''m trying to forward port delayed copy to my new grant mapping
patches.
> One important problem I''ve faced is that classic used
> gnttab_copy_grant_page to replace the granted page with a local copy and
> unmap the grant. And this function has never been upstreamed as only
> netback used it. Unfortunately upstreaming it is not a very easy task,
> as the kernel''s grant table infrastructure doesn''t track
at the moment
> whether the page is DMA mapped or not. It is required because we
> shouldn''t proceed with the copy and replace if a device already
mapped
> the page for DMA.
> David came up with an alternative idea: we do this delayed copy because
> we don''t want the guest''s page to get stucked in Dom0
indefinitely. The
> only realistic case for that would be if the egress interface would be
> an another guest''s vif, where the guest (either due to a bug or as
a
> malicious attempt) doesn''t empty its ring. I think it''s a
safe
> assumption that Dom0 otherwise doesn''t hold on to packets for too
long.
> Or if it does, then that''s a bug we should fix instead of doing a
copy
> of the packet.
> If we accept that only other vif''s can keep the skb indefinitely,
then
> an easier solution would be to handle this problem on the RX side: the
> RX thread can also check whether this skb hanged around for too long and
> drop it. Actually, xenvif_start_xmit already checks if the guest
> provided enough slots for us to do the grant copy. If I understand it
> correctly. What do you think about such an approach?

Ian Campbell

2013-Nov-20 11:16 UTC

head link

Re: netback: Delayed copy alternative

On Tue, 2013-11-19 at 16:42 +0000, Zoltan Kiss wrote:> After further discussions and investigations, it seems it is a viable 
> approach to drop the packets in the RX path of the another VIF after a 
> timeout,
Since RX/TX in netback is a bit confusing (since it is inverted, but you
don''t seem to be using it that way): A diagram:

domU (netfront) --> dom0 (netback) --> network stack --> bridge ,
                                                                |
domU'' (netfront) <- timeout & drop <- dom0 (netback) <-
stack <-''

You are proposing dropping at "timeout & drop". Since the
dom0->domU''
path is based on copying there should be no problem with an skb getting
stuck with domU'' holding on to it. In effect you will be dropping
traffic from some internal queue before it hits the shared ring anyway.
You will be making sure that either the full skb fits on the ring or it
remains in the queue.

What about any queueing which occurs in "network stack" (either
instance) or "bridge?" How can you cancel an skb out of those? Are you
intending that by dropping packets a "timeout & drop" they would
eventually make their way to the second netback and be droppable? How
convinced are you that this is viable?
>  and don''t care about the rest of the cases (packets get stucked 
> somewhere in the core stack, a driver, or in the queue of a Dom0 
> userspace socket. In the latter case, they get copied anyway, so it 
> shouldn''t happen)
I think that is OK iff you are copying for dom0 delivery. If you are not
copying here then an dom0 process (including an anonymous one) which can
open a socket and receive traffic could block things indefinitely.

The more general case of an unprivileged or deprivileged (i.e. a process
which has dropped its root privs somehow) being able to interfere with
the traffic in a way which causes gridlock might need a little more
thought though.
> Does anyone has a counterargument?
> 
> Zoli
> 
> On 13/11/13 20:29, Zoltan Kiss wrote:
> > Hi,
> >
> > I''m trying to forward port delayed copy to my new grant
mapping patches.
> > One important problem I''ve faced is that classic used
> > gnttab_copy_grant_page to replace the granted page with a local copy
and
> > unmap the grant. And this function has never been upstreamed as only
> > netback used it. Unfortunately upstreaming it is not a very easy task,
> > as the kernel''s grant table infrastructure doesn''t
track at the moment
> > whether the page is DMA mapped or not. It is required because we
> > shouldn''t proceed with the copy and replace if a device
already mapped
> > the page for DMA.
> > David came up with an alternative idea: we do this delayed copy
because
> > we don''t want the guest''s page to get stucked in
Dom0 indefinitely. The
> > only realistic case for that would be if the egress interface would be
> > an another guest''s vif, where the guest (either due to a bug
or as a
> > malicious attempt) doesn''t empty its ring. I think
it''s a safe
> > assumption that Dom0 otherwise doesn''t hold on to packets for
too long.
> > Or if it does, then that''s a bug we should fix instead of
doing a copy
> > of the packet.
> > If we accept that only other vif''s can keep the skb
indefinitely, then
> > an easier solution would be to handle this problem on the RX side: the
> > RX thread can also check whether this skb hanged around for too long
and
> > drop it. Actually, xenvif_start_xmit already checks if the guest
> > provided enough slots for us to do the grant copy. If I understand it
> > correctly. What do you think about such an approach?
>

Zoltan Kiss

2013-Nov-20 12:28 UTC

head link

Re: netback: Delayed copy alternative

On 20/11/13 11:16, Ian Campbell wrote:> On Tue, 2013-11-19 at 16:42 +0000, Zoltan Kiss wrote:
>> After further discussions and investigations, it seems it is a viable
>> approach to drop the packets in the RX path of the another VIF after a
>> timeout,
>
> Since RX/TX in netback is a bit confusing (since it is inverted, but you
> don''t seem to be using it that way): A diagram:I meant "RX path" to be the Dom0->DomU path.
> domU (netfront) --> dom0 (netback) --> network stack --> bridge ,
>                                                                  |
> domU'' (netfront) <- timeout & drop <- dom0 (netback)
<- stack <-''
>
> You are proposing dropping at "timeout & drop". Since the
dom0->domU''
> path is based on copying there should be no problem with an skb getting
> stuck with domU'' holding on to it. In effect you will be dropping
> traffic from some internal queue before it hits the shared ring anyway.
> You will be making sure that either the full skb fits on the ring or it
> remains in the queue.I propose to drop from the qdisc queue, as it happens in classic kernel. 
I actually reimplemented the wake_queue timer.
When the stack calls xenvif_start_xmit, it checks if it can fit the 
packet into the ring, and it drops it if not. If it can fit, we are 
good, as it will be copied shortly. If not, dropping is also good for 
us, because we get back the pages.
Next start_xmit checks if there is room in the ring for a max sized 
packet, and stops queueing. That''s bad, because packets gather up in 
qdisc indefinitely, if the ring doesn''t move. The wake_queue timer is 
set therefore, and when it fires, it wakes the queueing. Then qdisc 
start calling start_xmit again, and as the ring is full, it drops the 
packet. And because we don''t stop the queueing when we drop, it will 
keep calling start_xmit and drops the packets, draining qdisc eventually.
> What about any queueing which occurs in "network stack" (either
> instance) or "bridge?" How can you cancel an skb out of those?
Are you
> intending that by dropping packets a "timeout & drop" they
would
> eventually make their way to the second netback and be droppable? How
> convinced are you that this is viable?I don''t know if the stack does too much queueing on that path apart
from
qdisc, I assume not. But even if it does, eventually those queues will 
advance when the qdisc one gets drained out.
>
>>   and don''t care about the rest of the cases (packets get
stucked
>> somewhere in the core stack, a driver, or in the queue of a Dom0
>> userspace socket. In the latter case, they get copied anyway, so it
>> shouldn''t happen)
>
> I think that is OK iff you are copying for dom0 delivery. If you are not
> copying here then an dom0 process (including an anonymous one) which can
> open a socket and receive traffic could block things indefinitely.
>
> The more general case of an unprivileged or deprivileged (i.e. a process
> which has dropped its root privs somehow) being able to interfere with
> the traffic in a way which causes gridlock might need a little more
> thought though.deliver_skb() will copy the skbs sent to Dom0 stack:

https://lkml.org/lkml/2012/7/20/363
>
>> Does anyone has a counterargument?
>>
>> Zoli
>>
>> On 13/11/13 20:29, Zoltan Kiss wrote:
>>> Hi,
>>>
>>> I''m trying to forward port delayed copy to my new grant
mapping patches.
>>> One important problem I''ve faced is that classic used
>>> gnttab_copy_grant_page to replace the granted page with a local
copy and
>>> unmap the grant. And this function has never been upstreamed as
only
>>> netback used it. Unfortunately upstreaming it is not a very easy
task,
>>> as the kernel''s grant table infrastructure
doesn''t track at the moment
>>> whether the page is DMA mapped or not. It is required because we
>>> shouldn''t proceed with the copy and replace if a device
already mapped
>>> the page for DMA.
>>> David came up with an alternative idea: we do this delayed copy
because
>>> we don''t want the guest''s page to get stucked in
Dom0 indefinitely. The
>>> only realistic case for that would be if the egress interface would
be
>>> an another guest''s vif, where the guest (either due to a
bug or as a
>>> malicious attempt) doesn''t empty its ring. I think
it''s a safe
>>> assumption that Dom0 otherwise doesn''t hold on to packets
for too long.
>>> Or if it does, then that''s a bug we should fix instead of
doing a copy
>>> of the packet.
>>> If we accept that only other vif''s can keep the skb
indefinitely, then
>>> an easier solution would be to handle this problem on the RX side:
the
>>> RX thread can also check whether this skb hanged around for too
long and
>>> drop it. Actually, xenvif_start_xmit already checks if the guest
>>> provided enough slots for us to do the grant copy. If I understand
it
>>> correctly. What do you think about such an approach?
>>
>
>

Xen devel - Nov 2013 - netback: Delayed copy alternative

netback: Delayed copy alternative

Re: netback: Delayed copy alternative

Re: netback: Delayed copy alternative

Timout packets in device''s TX queue

Re: netback: Delayed copy alternative

Re: netback: Delayed copy alternative

Re: netback: Delayed copy alternative