thr3ads.net - Xen devel - xennet: skb rides the rocket: 20 slots [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Sander Eikelenboom

2013-Jan-04 16:28 UTC

xennet: skb rides the rocket: 20 slots

Hi Ian,

Today i fired up an old VM with a bittorrent client, trying to download some
torrents.
I seem to be hitting the unlikely case of "xennet: skb rides the rocket: xx
slots" and this results in some dropped packets in domU, I don''t
see any warnings in dom0.

I have added some extra info, but i don''t have enough knowledge if this
could/should be prevented from happening ?

[16798.629141] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
[16800.575182] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
[16801.589166] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
[16803.279039] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:3
page_size:4096 prot:0800
[16809.973268] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
[16811.420048] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:6
page_size:4096 prot:0800
[16814.872686] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
[16815.359099] xennet: skb rides the rocket: 21 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:20 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
[16825.851906] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
[16828.295083] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
[16837.386684] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
[16838.609683] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
[16841.783729] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:6
page_size:4096 prot:0800
[16843.841678] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
[16849.847614] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:3690 skb_headlen:54
skb->len:64294, skb->data_len:64240 skb->truesize:65008 nr_frags:4
page_size:4096 prot:0800
[16853.787128] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800

# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3e:c4:20:46
          inet addr:192.168.1.12  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9090214 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5902090 errors:0 dropped:304 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:19757934770 (18.4 GiB)  TX bytes:16855238200 (15.6 GiB)
          Interrupt:25




diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c26e28b..50ac403 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -552,7 +552,11 @@ static int xennet_start_xmit(struct sk_buff *skb, struct
net_device *dev)
                xennet_count_skb_frag_slots(skb);
        if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
                net_alert_ratelimited(
-                       "xennet: skb rides the rocket: %d slots\n",
slots);
+                       "xennet: skb rides the rocket: %d slots
MAX_SKB_FRAGS: %d div_roundup:%d xennet_count_skb_frag_slots:%d offset:%d
skb_headlen:%d skb->len:%d, skb->data_len:%d skb->truesize:%d
nr_frags:%d page_size:%d prot:%04x \n
+                             slots, MAX_SKB_FRAGS,DIV_ROUND_UP(offset + len,
PAGE_SIZE),
+                             xennet_count_skb_frag_slots(skb), offset, len,
skb->len,
+                             skb->data_len, skb->truesize,
skb_shinfo(skb)->nr_frags,
+                             PAGE_SIZE, ntohs(skb->protocol));
                goto drop;
        }

Ian Campbell

2013-Jan-07 10:55 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Fri, 2013-01-04 at 16:28 +0000, Sander Eikelenboom
wrote:> Hi Ian,
I''m not actually the netfront maintainer. Adding Konrad. Annie too
since
IIRC she was at one point looking at various buffer coalescing schemes.
> Today i fired up an old VM with a bittorrent client, trying to
> download some torrents.
How old? Which kernel?
> I seem to be hitting the unlikely case of "xennet: skb rides the
> rocket: xx slots" and this results in some dropped packets in domU, I
> don''t see any warnings in dom0.
> 
> I have added some extra info, but i don''t have enough knowledge if
> this could/should be prevented from happening ?
MAX_SKB_FRAGS has never, AFAIK, been as big as 19 or 20 so I''m not sure
this can have ever worked.

These SKBs seem to be pretty big (not quite 64KB), seemingly most of the
data is contained in a smallish number of frags, which suggests compound
pages and therefore a reasonably modern kernel?

This probably relates somewhat to the issues described in the
"netchannel vs MAX_SKB_FRAGS" thread last year, or at least the
solution
to that would necessarily involve fixing this issue.

In the meantime you could try disabling sg and/or various offloads for
that domain''s vif.
> [16798.629141] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
> [16800.575182] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
> [16801.589166] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
> [16803.279039] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:3
page_size:4096 prot:0800
> [16809.973268] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
> [16811.420048] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:6
page_size:4096 prot:0800
> [16814.872686] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
> [16815.359099] xennet: skb rides the rocket: 21 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:20 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
> [16825.851906] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
> [16828.295083] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
> [16837.386684] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
> [16838.609683] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
> [16841.783729] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:6
page_size:4096 prot:0800
> [16843.841678] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
> [16849.847614] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:3690 skb_headlen:54
skb->len:64294, skb->data_len:64240 skb->truesize:65008 nr_frags:4
page_size:4096 prot:0800
> [16853.787128] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
> 
> # ifconfig
> eth0      Link encap:Ethernet  HWaddr 00:16:3e:c4:20:46
>           inet addr:192.168.1.12  Bcast:192.168.1.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:9090214 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:5902090 errors:0 dropped:304 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:19757934770 (18.4 GiB)  TX bytes:16855238200 (15.6 GiB)
>           Interrupt:25
> 
> 
> 
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index c26e28b..50ac403 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -552,7 +552,11 @@ static int xennet_start_xmit(struct sk_buff *skb,
struct net_device *dev)
>                 xennet_count_skb_frag_slots(skb);
>         if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>                 net_alert_ratelimited(
> -                       "xennet: skb rides the rocket: %d
slots\n", slots);
> +                       "xennet: skb rides the rocket: %d slots
MAX_SKB_FRAGS: %d div_roundup:%d xennet_count_skb_frag_slots:%d offset:%d
skb_headlen:%d skb->len:%d, skb->data_len:%d skb->truesize:%d
nr_frags:%d page_size:%d prot:%04x \n
> +                             slots, MAX_SKB_FRAGS,DIV_ROUND_UP(offset +
len, PAGE_SIZE),
> +                             xennet_count_skb_frag_slots(skb), offset,
len, skb->len,
> +                             skb->data_len, skb->truesize,
skb_shinfo(skb)->nr_frags,
> +                             PAGE_SIZE, ntohs(skb->protocol));
>                 goto drop;
>         }
>

Sander Eikelenboom

2013-Jan-07 12:30 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

Monday, January 7, 2013, 11:55:15 AM, you wrote:
> On Fri, 2013-01-04 at 16:28 +0000, Sander Eikelenboom wrote:
>> Hi Ian,
> I''m not actually the netfront maintainer. Adding Konrad. Annie too
since
> IIRC she was at one point looking at various buffer coalescing schemes.
Ah ok, i thought you were :-) sorry
>> Today i fired up an old VM with a bittorrent client, trying to
>> download some torrents.
> How old? Which kernel?
3.8.0-rc2 at the moment, it''s a PV guest.
>> I seem to be hitting the unlikely case of "xennet: skb rides the
>> rocket: xx slots" and this results in some dropped packets in
domU, I
>> don''t see any warnings in dom0.
>> 
>> I have added some extra info, but i don''t have enough
knowledge if
>> this could/should be prevented from happening ?
> MAX_SKB_FRAGS has never, AFAIK, been as big as 19 or 20 so I''m not
sure
> this can have ever worked.
> These SKBs seem to be pretty big (not quite 64KB), seemingly most of the
> data is contained in a smallish number of frags, which suggests compound
> pages and therefore a reasonably modern kernel?
> This probably relates somewhat to the issues described in the
> "netchannel vs MAX_SKB_FRAGS" thread last year, or at least the
solution
> to that would necessarily involve fixing this issue.
> In the meantime you could try disabling sg and/or various offloads for
> that domain''s vif.
Will do.

>> [16798.629141] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16800.575182] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16801.589166] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
>> [16803.279039] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:3
page_size:4096 prot:0800
>> [16809.973268] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16811.420048] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:6
page_size:4096 prot:0800
>> [16814.872686] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
>> [16815.359099] xennet: skb rides the rocket: 21 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:20 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
>> [16825.851906] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16828.295083] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
>> [16837.386684] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16838.609683] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16841.783729] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:6
page_size:4096 prot:0800
>> [16843.841678] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16849.847614] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:3690 skb_headlen:54
skb->len:64294, skb->data_len:64240 skb->truesize:65008 nr_frags:4
page_size:4096 prot:0800
>> [16853.787128] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> 
>> # ifconfig
>> eth0      Link encap:Ethernet  HWaddr 00:16:3e:c4:20:46
>>           inet addr:192.168.1.12  Bcast:192.168.1.255 
Mask:255.255.255.0
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           RX packets:9090214 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:5902090 errors:0 dropped:304 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:19757934770 (18.4 GiB)  TX bytes:16855238200 (15.6
GiB)
>>           Interrupt:25
>> 
>> 
>> 
>> 
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index c26e28b..50ac403 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -552,7 +552,11 @@ static int xennet_start_xmit(struct sk_buff *skb,
struct net_device *dev)
>>                 xennet_count_skb_frag_slots(skb);
>>         if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>>                 net_alert_ratelimited(
>> -                       "xennet: skb rides the rocket: %d
slots\n", slots);
>> +                       "xennet: skb rides the rocket: %d slots
MAX_SKB_FRAGS: %d div_roundup:%d xennet_count_skb_frag_slots:%d offset:%d
skb_headlen:%d skb->len:%d, skb->data_len:%d skb->truesize:%d
nr_frags:%d page_size:%d prot:%04x \n
>> +                             slots, MAX_SKB_FRAGS,DIV_ROUND_UP(offset
+ len, PAGE_SIZE),
>> +                             xennet_count_skb_frag_slots(skb), offset,
len, skb->len,
>> +                             skb->data_len, skb->truesize,
skb_shinfo(skb)->nr_frags,
>> +                             PAGE_SIZE, ntohs(skb->protocol));
>>                 goto drop;
>>         }
>>

Ian Campbell

2013-Jan-07 13:27 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Mon, 2013-01-07 at 12:30 +0000, Sander Eikelenboom
wrote:> Monday, January 7, 2013, 11:55:15 AM, you wrote:
> 
> > On Fri, 2013-01-04 at 16:28 +0000, Sander Eikelenboom wrote:
> >> Hi Ian,
> 
> > I''m not actually the netfront maintainer. Adding Konrad.
Annie too since
> > IIRC she was at one point looking at various buffer coalescing
schemes.
> 
> Ah ok, i thought you were :-) sorry
I''m netback ;-)
> >> Today i fired up an old VM with a bittorrent client, trying to
> >> download some torrents.
> 
> > How old? Which kernel?
> 
> 3.8.0-rc2 at the moment, it''s a PV guest.
Are you sure? That''s pretty new, not old.

Ian.

Sander Eikelenboom

2013-Jan-07 14:05 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

Monday, January 7, 2013, 2:27:50 PM, you wrote:
> On Mon, 2013-01-07 at 12:30 +0000, Sander Eikelenboom wrote:
>> Monday, January 7, 2013, 11:55:15 AM, you wrote:
>> 
>> > On Fri, 2013-01-04 at 16:28 +0000, Sander Eikelenboom wrote:
>> >> Hi Ian,
>> 
>> > I''m not actually the netfront maintainer. Adding Konrad.
Annie too since
>> > IIRC she was at one point looking at various buffer coalescing
schemes.
>> 
>> Ah ok, i thought you were :-) sorry
> I''m netback ;-)
Ah i thought of front and back as sides of the same coin :-p
>> >> Today i fired up an old VM with a bittorrent client, trying to
>> >> download some torrents.
>> 
>> > How old? Which kernel?
>> 
>> 3.8.0-rc2 at the moment, it''s a PV guest.
> Are you sure? That''s pretty new, not old.
Yes i''m sure, with old i meant not booted for some time.
All my PV guests use the same from a dom0 supplied kernel, which at the moment
happens to be the same as dom0''s too.

> Ian.

Ian Campbell

2013-Jan-07 14:12 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Mon, 2013-01-07 at 14:05 +0000, Sander Eikelenboom
wrote:> Monday, January 7, 2013, 2:27:50 PM, you wrote:
> 
> > On Mon, 2013-01-07 at 12:30 +0000, Sander Eikelenboom wrote:
> >> Monday, January 7, 2013, 11:55:15 AM, you wrote:
> >> 
> >> > On Fri, 2013-01-04 at 16:28 +0000, Sander Eikelenboom wrote:
> >> >> Hi Ian,
> >> 
> >> > I''m not actually the netfront maintainer. Adding
Konrad. Annie too since
> >> > IIRC she was at one point looking at various buffer
coalescing schemes.
> >> 
> >> Ah ok, i thought you were :-) sorry
> 
> > I''m netback ;-)
> 
> Ah i thought of front and back as sides of the same coin :-p
IMHO Having them be somewhat separate encourages greater transparency
when evolving the protocol. Not least because there are other front and
backends other than just the Linux ones.
> >> >> Today i fired up an old VM with a bittorrent client,
trying to
> >> >> download some torrents.
> >> 
> >> > How old? Which kernel?
> >> 
> >> 3.8.0-rc2 at the moment, it''s a PV guest.
> 
> > Are you sure? That''s pretty new, not old.
> 
> Yes i''m sure, with old i meant not booted for some time.
> All my PV guests use the same from a dom0 supplied kernel, which at the
moment happens to be the same as dom0''s too.
AH, I see. I assumed you meant old kernel.
> 
> 
> > Ian.
> 
> 
>

ANNIE LI

2013-Jan-08 02:12 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On 2013-1-7 18:55, Ian Campbell wrote:> On Fri, 2013-01-04 at 16:28 +0000, Sander Eikelenboom wrote:
>> Hi Ian,
> I''m not actually the netfront maintainer. Adding Konrad. Annie too
since
> IIRC she was at one point looking at various buffer coalescing schemes.
>
>> Today i fired up an old VM with a bittorrent client, trying to
>> download some torrents.
> How old? Which kernel?
>
>> I seem to be hitting the unlikely case of "xennet: skb rides the
>> rocket: xx slots" and this results in some dropped packets in
domU, I
>> don''t see any warnings in dom0.
If debug is enabled, following netback code should print out messages in 
netbk_count_requests.

                 if (unlikely(frags >= MAX_SKB_FRAGS)) {
                         netdev_dbg(vif->dev, "Too many frags\n");
                         return -frags;
                 }
>>
>> I have added some extra info, but i don''t have enough
knowledge if
>> this could/should be prevented from happening ?
> MAX_SKB_FRAGS has never, AFAIK, been as big as 19 or 20 so I''m not
sure
> this can have ever worked.
>
> These SKBs seem to be pretty big (not quite 64KB), seemingly most of the
> data is contained in a smallish number of frags, which suggests compound
> pages and therefore a reasonably modern kernel?
>
> This probably relates somewhat to the issues described in the
> "netchannel vs MAX_SKB_FRAGS" thread last year, or at least the
solution
> to that would necessarily involve fixing this issue.
If netback complains about "Too many frags", then it should be 
MAX_SKB_FRAGS limitation in netback results in dropping packets in 
netfront. It is possible that other netfronts(windows?) also hit this.

Thanks
Annie>
> In the meantime you could try disabling sg and/or various offloads for
> that domain''s vif.
>
>> [16798.629141] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16800.575182] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16801.589166] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
>> [16803.279039] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:3
page_size:4096 prot:0800
>> [16809.973268] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16811.420048] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:6
page_size:4096 prot:0800
>> [16814.872686] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
>> [16815.359099] xennet: skb rides the rocket: 21 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:20 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
>> [16825.851906] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16828.295083] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:5
page_size:4096 prot:0800
>> [16837.386684] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16838.609683] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16841.783729] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:106 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:6
page_size:4096 prot:0800
>> [16843.841678] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>> [16849.847614] xennet: skb rides the rocket: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:18 offset:3690 skb_headlen:54
skb->len:64294, skb->data_len:64240 skb->truesize:65008 nr_frags:4
page_size:4096 prot:0800
>> [16853.787128] xennet: skb rides the rocket: 20 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154 skb_headlen:1622
skb->len:64294, skb->data_len:62672 skb->truesize:64976 nr_frags:4
page_size:4096 prot:0800
>>
>> # ifconfig
>> eth0      Link encap:Ethernet  HWaddr 00:16:3e:c4:20:46
>>            inet addr:192.168.1.12  Bcast:192.168.1.255 
Mask:255.255.255.0
>>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>            RX packets:9090214 errors:0 dropped:0 overruns:0 frame:0
>>            TX packets:5902090 errors:0 dropped:304 overruns:0 carrier:0
>>            collisions:0 txqueuelen:1000
>>            RX bytes:19757934770 (18.4 GiB)  TX bytes:16855238200 (15.6
GiB)
>>            Interrupt:25
>>
>>
>>
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index c26e28b..50ac403 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -552,7 +552,11 @@ static int xennet_start_xmit(struct sk_buff *skb,
struct net_device *dev)
>>                  xennet_count_skb_frag_slots(skb);
>>          if (unlikely(slots>  MAX_SKB_FRAGS + 1)) {
>>                  net_alert_ratelimited(
>> -                       "xennet: skb rides the rocket: %d
slots\n", slots);
>> +                       "xennet: skb rides the rocket: %d slots
MAX_SKB_FRAGS: %d div_roundup:%d xennet_count_skb_frag_slots:%d offset:%d
skb_headlen:%d skb->len:%d, skb->data_len:%d skb->truesize:%d
nr_frags:%d page_size:%d prot:%04x \n
>> +                             slots, MAX_SKB_FRAGS,DIV_ROUND_UP(offset
+ len, PAGE_SIZE),
>> +                             xennet_count_skb_frag_slots(skb), offset,
len, skb->len,
>> +                             skb->data_len, skb->truesize,
skb_shinfo(skb)->nr_frags,
>> +                             PAGE_SIZE, ntohs(skb->protocol));
>>                  goto drop;
>>          }
>>
>

Ian Campbell

2013-Jan-08 10:05 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Tue, 2013-01-08 at 02:12 +0000, ANNIE LI wrote:
> >>
> >> I have added some extra info, but i don''t have enough
knowledge if
> >> this could/should be prevented from happening ?
> > MAX_SKB_FRAGS has never, AFAIK, been as big as 19 or 20 so
I''m not sure
> > this can have ever worked.
> >
> > These SKBs seem to be pretty big (not quite 64KB), seemingly most of
the
> > data is contained in a smallish number of frags, which suggests
compound
> > pages and therefore a reasonably modern kernel?
> >
> > This probably relates somewhat to the issues described in the
> > "netchannel vs MAX_SKB_FRAGS" thread last year, or at least
the solution
> > to that would necessarily involve fixing this issue.
> 
> If netback complains about "Too many frags", then it should be 
> MAX_SKB_FRAGS limitation in netback results in dropping packets in 
> netfront. It is possible that other netfronts(windows?) also hit this.
It''s very possible. I rather suspect that non-Linux frontends have
workarounds (e.g. manual resegmenting etc) for this case.

Ian.

Paul Durrant

2013-Jan-08 10:16 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Ian Campbell
> Sent: 08 January 2013 10:06
> To: ANNIE LI
> Cc: Sander Eikelenboom; Konrad Rzeszutek Wilk; xen-devel
> Subject: Re: [Xen-devel] xennet: skb rides the rocket: 20 slots
> 
> On Tue, 2013-01-08 at 02:12 +0000, ANNIE LI wrote:
> 
> > >>
> > >> I have added some extra info, but i don''t have
enough knowledge if
> > >> this could/should be prevented from happening ?
> > > MAX_SKB_FRAGS has never, AFAIK, been as big as 19 or 20 so
I''m not
> > > sure this can have ever worked.
> > >
> > > These SKBs seem to be pretty big (not quite 64KB), seemingly most
of
> > > the data is contained in a smallish number of frags, which
suggests
> > > compound pages and therefore a reasonably modern kernel?
> > >
> > > This probably relates somewhat to the issues described in the
> > > "netchannel vs MAX_SKB_FRAGS" thread last year, or at
least the
> > > solution to that would necessarily involve fixing this issue.
> >
> > If netback complains about "Too many frags", then it should
be
> > MAX_SKB_FRAGS limitation in netback results in dropping packets in
> > netfront. It is possible that other netfronts(windows?) also hit this.
> 
> It''s very possible. I rather suspect that non-Linux frontends have
> workarounds (e.g. manual resegmenting etc) for this case.
> 
Citrix Windows PV drivers certainly do; packets will be copied if they have more
than MAX_SKB_FRAGS fragments. We also choose a maximal TSO size carefully to try
to avoid Windows'' network stack sending down more than MAX_SKB_FRAGS
fragments in the first place.

  Paul

Sander Eikelenboom

2013-Jan-08 20:55 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

Tuesday, January 8, 2013, 3:12:33 AM, you wrote:


> On 2013-1-7 18:55, Ian Campbell wrote:
>> On Fri, 2013-01-04 at 16:28 +0000, Sander Eikelenboom wrote:
>>> Hi Ian,
>> I''m not actually the netfront maintainer. Adding Konrad. Annie
too since
>> IIRC she was at one point looking at various buffer coalescing schemes.
>>
>>> Today i fired up an old VM with a bittorrent client, trying to
>>> download some torrents.
>> How old? Which kernel?
>>
>>> I seem to be hitting the unlikely case of "xennet: skb rides
the
>>> rocket: xx slots" and this results in some dropped packets in
domU, I
>>> don''t see any warnings in dom0.
> If debug is enabled, following netback code should print out messages in 
> netbk_count_requests.
>                  if (unlikely(frags >= MAX_SKB_FRAGS)) {
>                          netdev_dbg(vif->dev, "Too many
frags\n");
>                          return -frags;
>                  }
I have added some rate limited warns in this function. However none seems to be
triggered while the pv-guest reports the "skb rides the rocket" ..

diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
index f2d6b78..2f02da9 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -902,11 +902,13 @@ static int netbk_count_requests(struct xenvif *vif,
        do {
                if (frags >= work_to_do) {
                        netdev_dbg(vif->dev, "Need more frags\n");
+                       net_warn_ratelimited("%s: Need more frags:%d,
work_to_do:%d \n",vif->dev->name, frags, work_to_do);
                        return -frags;
                }

                if (unlikely(frags >= MAX_SKB_FRAGS)) {
                        netdev_dbg(vif->dev, "Too many frags\n");
+                        net_warn_ratelimited("%s: Too many frags:%d,
MAX_SKB_FRAGS:%d \n",vif->dev->name, frags, MAX_SKB_FRAGS);
                        return -frags;
                }

@@ -914,6 +916,7 @@ static int netbk_count_requests(struct xenvif *vif,
                       sizeof(*txp));
                if (txp->size > first->size) {
                        netdev_dbg(vif->dev, "Frags galore\n");
+                        net_warn_ratelimited("%s: Frags galore:%d,
txp->size:%d  first->size:%d\n",vif->dev->name, frags,
txp->size , first->size);
                        return -frags;
                }

@@ -923,6 +926,7 @@ static int netbk_count_requests(struct xenvif *vif,
                if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
                        netdev_dbg(vif->dev, "txp->offset: %x, size:
%u\n",
                                 txp->offset, txp->size);
+                        net_warn_ratelimited("%s: Hmm:%d, (txp->offset
+ txp->size):%d   PAGE_SIZE:%d\n",vif->dev->name, frags, 
(txp->offset + txp->size)  ,PAGE_SIZE);
                        return -frags;
                }
        } while ((txp++)->flags & XEN_NETTXF_more_data);



>>>
>>> I have added some extra info, but i don''t have enough
knowledge if
>>> this could/should be prevented from happening ?
>> MAX_SKB_FRAGS has never, AFAIK, been as big as 19 or 20 so I''m
not sure
>> this can have ever worked.
>>
>> These SKBs seem to be pretty big (not quite 64KB), seemingly most of
the
>> data is contained in a smallish number of frags, which suggests
compound
>> pages and therefore a reasonably modern kernel?
>>
>> This probably relates somewhat to the issues described in the
>> "netchannel vs MAX_SKB_FRAGS" thread last year, or at least
the solution
>> to that would necessarily involve fixing this issue.
> If netback complains about "Too many frags", then it should be 
> MAX_SKB_FRAGS limitation in netback results in dropping packets in 
> netfront. It is possible that other netfronts(windows?) also hit this.
> Thanks
> Annie
>>
>> In the meantime you could try disabling sg and/or various offloads for
>> that domain''s vif.
>>
>>> [16798.629141] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:106
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:4 page_size:4096 prot:0800
>>> [16800.575182] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:4 page_size:4096 prot:0800
>>> [16801.589166] xennet: skb rides the rocket: 20 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:5 page_size:4096 prot:0800
>>> [16803.279039] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:106
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:3 page_size:4096 prot:0800
>>> [16809.973268] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:4 page_size:4096 prot:0800
>>> [16811.420048] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:6 page_size:4096 prot:0800
>>> [16814.872686] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:106
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:5 page_size:4096 prot:0800
>>> [16815.359099] xennet: skb rides the rocket: 21 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:20 offset:2154
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:5 page_size:4096 prot:0800
>>> [16825.851906] xennet: skb rides the rocket: 20 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:19 offset:106
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:4 page_size:4096 prot:0800
>>> [16828.295083] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:2154
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:5 page_size:4096 prot:0800
>>> [16837.386684] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:106
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:4 page_size:4096 prot:0800
>>> [16838.609683] xennet: skb rides the rocket: 20 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:19 offset:106
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:4 page_size:4096 prot:0800
>>> [16841.783729] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:106
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:6 page_size:4096 prot:0800
>>> [16843.841678] xennet: skb rides the rocket: 20 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:4 page_size:4096 prot:0800
>>> [16849.847614] xennet: skb rides the rocket: 19 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:18 offset:3690
skb_headlen:54 skb->len:64294, skb->data_len:64240 skb->truesize:65008
nr_frags:4 page_size:4096 prot:0800
>>> [16853.787128] xennet: skb rides the rocket: 20 slots
MAX_SKB_FRAGS: 17 div_roundup:1 xennet_count_skb_frag_slots:19 offset:2154
skb_headlen:1622 skb->len:64294, skb->data_len:62672
skb->truesize:64976 nr_frags:4 page_size:4096 prot:0800
>>>
>>> # ifconfig
>>> eth0      Link encap:Ethernet  HWaddr 00:16:3e:c4:20:46
>>>            inet addr:192.168.1.12  Bcast:192.168.1.255 
Mask:255.255.255.0
>>>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>            RX packets:9090214 errors:0 dropped:0 overruns:0 frame:0
>>>            TX packets:5902090 errors:0 dropped:304 overruns:0
carrier:0
>>>            collisions:0 txqueuelen:1000
>>>            RX bytes:19757934770 (18.4 GiB)  TX bytes:16855238200
(15.6 GiB)
>>>            Interrupt:25
>>>
>>>
>>>
>>>
>>> diff --git a/drivers/net/xen-netfront.c
b/drivers/net/xen-netfront.c
>>> index c26e28b..50ac403 100644
>>> --- a/drivers/net/xen-netfront.c
>>> +++ b/drivers/net/xen-netfront.c
>>> @@ -552,7 +552,11 @@ static int xennet_start_xmit(struct sk_buff
*skb, struct net_device *dev)
>>>                  xennet_count_skb_frag_slots(skb);
>>>          if (unlikely(slots>  MAX_SKB_FRAGS + 1)) {
>>>                  net_alert_ratelimited(
>>> -                       "xennet: skb rides the rocket: %d
slots\n", slots);
>>> +                       "xennet: skb rides the rocket: %d
slots MAX_SKB_FRAGS: %d div_roundup:%d xennet_count_skb_frag_slots:%d offset:%d
skb_headlen:%d skb->len:%d, skb->data_len:%d skb->truesize:%d
nr_frags:%d page_size:%d prot:%04x \n
>>> +                             slots,
MAX_SKB_FRAGS,DIV_ROUND_UP(offset + len, PAGE_SIZE),
>>> +                             xennet_count_skb_frag_slots(skb),
offset, len, skb->len,
>>> +                             skb->data_len, skb->truesize,
skb_shinfo(skb)->nr_frags,
>>> +                             PAGE_SIZE, ntohs(skb->protocol));
>>>                  goto drop;
>>>          }
>>>
>>

James Harper

2013-Jan-08 20:57 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

> > If netback complains about "Too many frags", then it should
be
> > MAX_SKB_FRAGS limitation in netback results in dropping packets in
> > netfront. It is possible that other netfronts(windows?) also hit this.
> 
> It''s very possible. I rather suspect that non-Linux frontends have
> workarounds (e.g. manual resegmenting etc) for this case.
> 
GPLPV gathers fragments together if there are too many. Mostly this is just the
header but there could be a whole lot of copying going on in a worst-case
packet.

It would be nice if a max-frags value was written to xenstore... do different
backends (solaris? Bsd?) have different requirements? Or maybe that''s
already been discussed - I haven''t followed this thread closely.

James

Konrad Rzeszutek Wilk

2013-Jan-08 22:04 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Tue, Jan 08, 2013 at 08:57:19PM +0000, James Harper
wrote:> > > If netback complains about "Too many frags", then it
should be
> > > MAX_SKB_FRAGS limitation in netback results in dropping packets
in
> > > netfront. It is possible that other netfronts(windows?) also hit
this.
> > 
> > It''s very possible. I rather suspect that non-Linux frontends
have
> > workarounds (e.g. manual resegmenting etc) for this case.
> > 
> 
> GPLPV gathers fragments together if there are too many. Mostly this is just
the header but there could be a whole lot of copying going on in a worst-case
packet.
> 
> It would be nice if a max-frags value was written to xenstore... do
different backends (solaris? Bsd?) have different requirements? Or maybe
that''s already been discussed - I haven''t followed this thread
closely.
I think we discussed it last year as something that should be exported via
XenBus. But never
got down to figure out what it entails in - I think. This was
pre-holidays.> 
> James

ANNIE LI

2013-Jan-09 07:10 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On 2013-1-9 4:55, Sander Eikelenboom wrote:>>                   if (unlikely(frags>= MAX_SKB_FRAGS)) {
>>                           netdev_dbg(vif->dev, "Too many
frags\n");
>>                           return -frags;
>>                   }
> I have added some rate limited warns in this function. However none seems
to be triggered while the pv-guest reports the "skb rides the rocket"
..
Oh,  yes, "skb rides the rocket" is a protect mechanism in netfront,
and
it is not caused by netback checking code, but they all concern about 
the same thing(frags >= MAX_SKB_FRAGS ). I thought those packets were 
dropped by backend check, sorry for the confusion.

In netfront, following code would check whether required slots exceed 
MAX_SKB_FRAGS, and drop skbs which does not meet this requirement directly.

         if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
                 net_alert_ratelimited(
                         "xennet: skb rides the rocket: %d slots\n",
slots);
                 goto drop;
         }

In netback, following code also compared frags with MAX_SKB_FRAGS, and 
create error response for netfront which does not meet this requirment. 
In this case, netfront will also drop corresponding skbs.

                 if (unlikely(frags >= MAX_SKB_FRAGS)) {
                         netdev_dbg(vif->dev, "Too many frags\n");
                         return -frags;
                 }

So it is correct that netback log was not print out because those 
packets are drops directly by frontend check, not by backend check. 
Without the frontend check, it is likely that netback check would block 
these skbs and create error response for netfront.

So two ways are available: workaround in netfront for those packets, 
doing re-fragment copying, but not sure how copying hurt performance. 
Another is to implement in netback, as discussed in "netchannel vs 
MAX_SKB_FRAGS". Maybe these two mechanism are all necessary?

Thanks
Annie>
> diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
> index f2d6b78..2f02da9 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -902,11 +902,13 @@ static int netbk_count_requests(struct xenvif *vif,
>          do {
>                  if (frags>= work_to_do) {
>                          netdev_dbg(vif->dev, "Need more
frags\n");
> +                       net_warn_ratelimited("%s: Need more frags:%d,
work_to_do:%d \n",vif->dev->name, frags, work_to_do);
>                          return -frags;
>                  }
>
>                  if (unlikely(frags>= MAX_SKB_FRAGS)) {
>                          netdev_dbg(vif->dev, "Too many
frags\n");
> +                        net_warn_ratelimited("%s: Too many frags:%d,
MAX_SKB_FRAGS:%d \n",vif->dev->name, frags, MAX_SKB_FRAGS);
>                          return -frags;
>                  }
>
> @@ -914,6 +916,7 @@ static int netbk_count_requests(struct xenvif *vif,
>                         sizeof(*txp));
>                  if (txp->size>  first->size) {
>                          netdev_dbg(vif->dev, "Frags
galore\n");
> +                        net_warn_ratelimited("%s: Frags galore:%d,
txp->size:%d  first->size:%d\n",vif->dev->name, frags,
txp->size , first->size);
>                          return -frags;
>                  }
>
> @@ -923,6 +926,7 @@ static int netbk_count_requests(struct xenvif *vif,
>                  if (unlikely((txp->offset + txp->size)> 
PAGE_SIZE)) {
>                          netdev_dbg(vif->dev, "txp->offset: %x,
size: %u\n",
>                                   txp->offset, txp->size);
> +                        net_warn_ratelimited("%s: Hmm:%d,
(txp->offset + txp->size):%d   PAGE_SIZE:%d\n",vif->dev->name,
frags,  (txp->offset + txp->size)  ,PAGE_SIZE);
>                          return -frags;
>                  }
>          } while ((txp++)->flags&  XEN_NETTXF_more_data);

Konrad Rzeszutek Wilk

2013-Jan-09 15:08 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Wed, Jan 09, 2013 at 03:10:56PM +0800, ANNIE LI
wrote:> 
> 
> On 2013-1-9 4:55, Sander Eikelenboom wrote:
> >>                  if (unlikely(frags>= MAX_SKB_FRAGS)) {
> >>                          netdev_dbg(vif->dev, "Too many
frags\n");
> >>                          return -frags;
> >>                  }
> >I have added some rate limited warns in this function. However none
seems to be triggered while the pv-guest reports the "skb rides the
rocket" ..
> 
> Oh,  yes, "skb rides the rocket" is a protect mechanism in
netfront,
> and it is not caused by netback checking code, but they all concern
> about the same thing(frags >= MAX_SKB_FRAGS ). I thought those
> packets were dropped by backend check, sorry for the confusion.
> 
> In netfront, following code would check whether required slots
> exceed MAX_SKB_FRAGS, and drop skbs which does not meet this
> requirement directly.
> 
>         if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>                 net_alert_ratelimited(
>                         "xennet: skb rides the rocket: %d
slots\n", slots);
>                 goto drop;
>         }
> 
> In netback, following code also compared frags with MAX_SKB_FRAGS,
> and create error response for netfront which does not meet this
> requirment. In this case, netfront will also drop corresponding
> skbs.
> 
>                 if (unlikely(frags >= MAX_SKB_FRAGS)) {
>                         netdev_dbg(vif->dev, "Too many
frags\n");
>                         return -frags;
>                 }
> 
> So it is correct that netback log was not print out because those
> packets are drops directly by frontend check, not by backend check.
> Without the frontend check, it is likely that netback check would
> block these skbs and create error response for netfront.
> 
> So two ways are available: workaround in netfront for those packets,
> doing re-fragment copying, but not sure how copying hurt
> performance. Another is to implement in netback, as discussed in
There is already some copying done (the copying of the socket data
from userspace to the kernel) - so the extra copy might not be that
bad as the data can be in the cache. This would probably be a way
to deal with old backends that cannot deal with this new feature-flag.
> "netchannel vs MAX_SKB_FRAGS". Maybe these two mechanism are all
> necessary?
Lets see first if this is indeed the problem. Perhaps a simple debug
patch that just does:

	s/MAX_SKB_FRAGS/DEBUG_MAX_FRAGS/
	#define DEBUG_MAX_FRAGS 21

in both netback and netfront to set the maximum number of frags we can
handle to 21? If that works with Sander test - then yes, it looks like
we really need to get this ''feature-max-skb-frags'' done.

Ian Campbell

2013-Jan-09 16:34 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Wed, 2013-01-09 at 15:08 +0000, Konrad Rzeszutek Wilk
wrote:> On Wed, Jan 09, 2013 at 03:10:56PM +0800, ANNIE LI wrote:
> > 
> > 
> > On 2013-1-9 4:55, Sander Eikelenboom wrote:
> > >>                  if (unlikely(frags>= MAX_SKB_FRAGS)) {
> > >>                          netdev_dbg(vif->dev, "Too
many frags\n");
> > >>                          return -frags;
> > >>                  }
> > >I have added some rate limited warns in this function. However
none seems to be triggered while the pv-guest reports the "skb rides the
rocket" ..
> > 
> > Oh,  yes, "skb rides the rocket" is a protect mechanism in
netfront,
> > and it is not caused by netback checking code, but they all concern
> > about the same thing(frags >= MAX_SKB_FRAGS ). I thought those
> > packets were dropped by backend check, sorry for the confusion.
> > 
> > In netfront, following code would check whether required slots
> > exceed MAX_SKB_FRAGS, and drop skbs which does not meet this
> > requirement directly.
> > 
> >         if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> >                 net_alert_ratelimited(
> >                         "xennet: skb rides the rocket: %d
slots\n", slots);
> >                 goto drop;
> >         }
> > 
> > In netback, following code also compared frags with MAX_SKB_FRAGS,
> > and create error response for netfront which does not meet this
> > requirment. In this case, netfront will also drop corresponding
> > skbs.
> > 
> >                 if (unlikely(frags >= MAX_SKB_FRAGS)) {
> >                         netdev_dbg(vif->dev, "Too many
frags\n");
> >                         return -frags;
> >                 }
> > 
> > So it is correct that netback log was not print out because those
> > packets are drops directly by frontend check, not by backend check.
> > Without the frontend check, it is likely that netback check would
> > block these skbs and create error response for netfront.
> > 
> > So two ways are available: workaround in netfront for those packets,
> > doing re-fragment copying, but not sure how copying hurt
> > performance. Another is to implement in netback, as discussed in
> 
> There is already some copying done (the copying of the socket data
> from userspace to the kernel) - so the extra copy might not be that
> bad as the data can be in the cache. This would probably be a way
> to deal with old backends that cannot deal with this new feature-flag.
Or for any backend which doesn''t handle enough slots for the current
skb. Essentially you need to do a fragmentation in software in the
frontend.

For backends which don''t support the flag we should just hardcode some
number (some historically common value) as the default unless
negotiation says otherwise, no need to copy everything...
> 
> > "netchannel vs MAX_SKB_FRAGS". Maybe these two mechanism are
all
> > necessary?
> 
> Lets see first if this is indeed the problem. Perhaps a simple debug
> patch that just does:
> 
> 	s/MAX_SKB_FRAGS/DEBUG_MAX_FRAGS/
> 	#define DEBUG_MAX_FRAGS 21
Do you mean to do this globally or just in the netfront (or back)
driver?

Doing it globally would just be a case of changing the define, without
the s//, but there would be performance implications to growing the
shinfo.

Just changing netfront won''t work because there will only be
MAX_SKB_FRAGS in the skb''s shinfo so you won''t be able to add
any more.

Ian.

Konrad Rzeszutek Wilk

2013-Jan-09 17:05 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Wed, Jan 09, 2013 at 04:34:01PM +0000, Ian Campbell
wrote:> On Wed, 2013-01-09 at 15:08 +0000, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jan 09, 2013 at 03:10:56PM +0800, ANNIE LI wrote:
> > > 
> > > 
> > > On 2013-1-9 4:55, Sander Eikelenboom wrote:
> > > >>                  if (unlikely(frags>= MAX_SKB_FRAGS))
{
> > > >>                          netdev_dbg(vif->dev,
"Too many frags\n");
> > > >>                          return -frags;
> > > >>                  }
> > > >I have added some rate limited warns in this function.
However none seems to be triggered while the pv-guest reports the "skb
rides the rocket" ..
> > > 
> > > Oh,  yes, "skb rides the rocket" is a protect mechanism
in netfront,
> > > and it is not caused by netback checking code, but they all
concern
> > > about the same thing(frags >= MAX_SKB_FRAGS ). I thought those
> > > packets were dropped by backend check, sorry for the confusion.
> > > 
> > > In netfront, following code would check whether required slots
> > > exceed MAX_SKB_FRAGS, and drop skbs which does not meet this
> > > requirement directly.
> > > 
> > >         if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> > >                 net_alert_ratelimited(
> > >                         "xennet: skb rides the rocket: %d
slots\n", slots);
> > >                 goto drop;
> > >         }
> > > 
> > > In netback, following code also compared frags with
MAX_SKB_FRAGS,
> > > and create error response for netfront which does not meet this
> > > requirment. In this case, netfront will also drop corresponding
> > > skbs.
> > > 
> > >                 if (unlikely(frags >= MAX_SKB_FRAGS)) {
> > >                         netdev_dbg(vif->dev, "Too many
frags\n");
> > >                         return -frags;
> > >                 }
> > > 
> > > So it is correct that netback log was not print out because those
> > > packets are drops directly by frontend check, not by backend
check.
> > > Without the frontend check, it is likely that netback check would
> > > block these skbs and create error response for netfront.
> > > 
> > > So two ways are available: workaround in netfront for those
packets,
> > > doing re-fragment copying, but not sure how copying hurt
> > > performance. Another is to implement in netback, as discussed in
> > 
> > There is already some copying done (the copying of the socket data
> > from userspace to the kernel) - so the extra copy might not be that
> > bad as the data can be in the cache. This would probably be a way
> > to deal with old backends that cannot deal with this new feature-flag.
> 
> Or for any backend which doesn''t handle enough slots for the
current
> skb. Essentially you need to do a fragmentation in software in the
> frontend.
> 
> For backends which don''t support the flag we should just hardcode
some
> number (some historically common value) as the default unless
> negotiation says otherwise, no need to copy everything...
> 
> > 
> > > "netchannel vs MAX_SKB_FRAGS". Maybe these two
mechanism are all
> > > necessary?
> > 
> > Lets see first if this is indeed the problem. Perhaps a simple debug
> > patch that just does:
> > 
> > 	s/MAX_SKB_FRAGS/DEBUG_MAX_FRAGS/
> > 	#define DEBUG_MAX_FRAGS 21
> 
> Do you mean to do this globally or just in the netfront (or back)
> driver?
Our frontend and backend driver.> 
> Doing it globally would just be a case of changing the define, without
> the s//, but there would be performance implications to growing the
> shinfo.
> 
> Just changing netfront won''t work because there will only be
> MAX_SKB_FRAGS in the skb''s shinfo so you won''t be able to
add any more.
Oh, then doing the negotiation for the "slots" is gated to a maximum
of MAX_SKB_FRAGS. Hmmm.. that presents a problem for this bug as
just negotiating the max_skb_frags feature won''t solve the "riding
the rocket"
problem.

Ian Campbell

2013-Jan-09 18:02 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Wed, 2013-01-09 at 17:05 +0000, Konrad Rzeszutek Wilk
wrote:> On Wed, Jan 09, 2013 at 04:34:01PM +0000, Ian Campbell wrote:
> > On Wed, 2013-01-09 at 15:08 +0000, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Jan 09, 2013 at 03:10:56PM +0800, ANNIE LI wrote:
> > > > 
> > > > 
> > > > On 2013-1-9 4:55, Sander Eikelenboom wrote:
> > > > >>                  if (unlikely(frags>=
MAX_SKB_FRAGS)) {
> > > > >>                          netdev_dbg(vif->dev,
"Too many frags\n");
> > > > >>                          return -frags;
> > > > >>                  }
> > > > >I have added some rate limited warns in this function.
However none seems to be triggered while the pv-guest reports the "skb
rides the rocket" ..
> > > > 
> > > > Oh,  yes, "skb rides the rocket" is a protect
mechanism in netfront,
> > > > and it is not caused by netback checking code, but they all
concern
> > > > about the same thing(frags >= MAX_SKB_FRAGS ). I thought
those
> > > > packets were dropped by backend check, sorry for the
confusion.
> > > > 
> > > > In netfront, following code would check whether required
slots
> > > > exceed MAX_SKB_FRAGS, and drop skbs which does not meet this
> > > > requirement directly.
> > > > 
> > > >         if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> > > >                 net_alert_ratelimited(
> > > >                         "xennet: skb rides the rocket:
%d slots\n", slots);
> > > >                 goto drop;
> > > >         }
> > > > 
> > > > In netback, following code also compared frags with
MAX_SKB_FRAGS,
> > > > and create error response for netfront which does not meet
this
> > > > requirment. In this case, netfront will also drop
corresponding
> > > > skbs.
> > > > 
> > > >                 if (unlikely(frags >= MAX_SKB_FRAGS)) {
> > > >                         netdev_dbg(vif->dev, "Too
many frags\n");
> > > >                         return -frags;
> > > >                 }
> > > > 
> > > > So it is correct that netback log was not print out because
those
> > > > packets are drops directly by frontend check, not by backend
check.
> > > > Without the frontend check, it is likely that netback check
would
> > > > block these skbs and create error response for netfront.
> > > > 
> > > > So two ways are available: workaround in netfront for those
packets,
> > > > doing re-fragment copying, but not sure how copying hurt
> > > > performance. Another is to implement in netback, as
discussed in
> > > 
> > > There is already some copying done (the copying of the socket
data
> > > from userspace to the kernel) - so the extra copy might not be
that
> > > bad as the data can be in the cache. This would probably be a way
> > > to deal with old backends that cannot deal with this new
feature-flag.
> > 
> > Or for any backend which doesn''t handle enough slots for the
current
> > skb. Essentially you need to do a fragmentation in software in the
> > frontend.
> > 
> > For backends which don''t support the flag we should just
hardcode some
> > number (some historically common value) as the default unless
> > negotiation says otherwise, no need to copy everything...
> > 
> > > 
> > > > "netchannel vs MAX_SKB_FRAGS". Maybe these two
mechanism are all
> > > > necessary?
> > > 
> > > Lets see first if this is indeed the problem. Perhaps a simple
debug
> > > patch that just does:
> > > 
> > > 	s/MAX_SKB_FRAGS/DEBUG_MAX_FRAGS/
> > > 	#define DEBUG_MAX_FRAGS 21
> > 
> > Do you mean to do this globally or just in the netfront (or back)
> > driver?
> 
> Our frontend and backend driver.
> > 
> > Doing it globally would just be a case of changing the define, without
> > the s//, but there would be performance implications to growing the
> > shinfo.
> > 
> > Just changing netfront won''t work because there will only be
> > MAX_SKB_FRAGS in the skb''s shinfo so you won''t be
able to add any more.
> 
> Oh, then doing the negotiation for the "slots" is gated to a
maximum
> of MAX_SKB_FRAGS. Hmmm.. that presents a problem for this bug as
> just negotiating the max_skb_frags feature won''t solve the
"riding the rocket"
> problem.
Right, I think a corallary to the negotiation is that having agreed a
maximum you are responsible for ensuring you only send out requests
which meet it. IOW if you advertise this feature you also have to be
prepared to resegment if the OS above you gives you something larger
(either too many frags or via compound pages).

e.g. if the dom0 OS can support 16 frags and the domU 18 then netfront
and netback would agree on 16 slots maximum. If the domU OS passes an
SKB which requests more slots to netfront then netfront would have to
segment it before sending.

Ian.

ANNIE LI

2013-Jan-10 11:22 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On 2013-1-9 23:08, Konrad Rzeszutek Wilk wrote:> On Wed, Jan 09, 2013 at 03:10:56PM +0800, ANNIE LI wrote:
>>
>> On 2013-1-9 4:55, Sander Eikelenboom wrote:
>>>>                   if (unlikely(frags>= MAX_SKB_FRAGS)) {
>>>>                           netdev_dbg(vif->dev, "Too
many frags\n");
>>>>                           return -frags;
>>>>                   }
>>> I have added some rate limited warns in this function. However none
seems to be triggered while the pv-guest reports the "skb rides the
rocket" ..
>> Oh,  yes, "skb rides the rocket" is a protect mechanism in
netfront,
>> and it is not caused by netback checking code, but they all concern
>> about the same thing(frags>= MAX_SKB_FRAGS ). I thought those
>> packets were dropped by backend check, sorry for the confusion.
>>
>> In netfront, following code would check whether required slots
>> exceed MAX_SKB_FRAGS, and drop skbs which does not meet this
>> requirement directly.
>>
>>          if (unlikely(slots>  MAX_SKB_FRAGS + 1)) {
>>                  net_alert_ratelimited(
>>                          "xennet: skb rides the rocket: %d
slots\n", slots);
>>                  goto drop;
>>          }
>>
>> In netback, following code also compared frags with MAX_SKB_FRAGS,
>> and create error response for netfront which does not meet this
>> requirment. In this case, netfront will also drop corresponding
>> skbs.
>>
>>                  if (unlikely(frags>= MAX_SKB_FRAGS)) {
>>                          netdev_dbg(vif->dev, "Too many
frags\n");
>>                          return -frags;
>>                  }
>>
>> So it is correct that netback log was not print out because those
>> packets are drops directly by frontend check, not by backend check.
>> Without the frontend check, it is likely that netback check would
>> block these skbs and create error response for netfront.
>>
>> So two ways are available: workaround in netfront for those packets,
>> doing re-fragment copying, but not sure how copying hurt
>> performance. Another is to implement in netback, as discussed in
> There is already some copying done (the copying of the socket data
> from userspace to the kernel) - so the extra copy might not be that
> bad as the data can be in the cache. This would probably be a way
> to deal with old backends that cannot deal with this new feature-flag.
I am thinking to do re-fragment in netfront for these skbs like following,

Create a new skb, copy linear data and frag data from original skb into 
this one, and make every frags data size is PAGE_SIZE except for the 
last fragment. It is possible that the last fragment length is less than 
PAGE_SIZE, then free the original skb. The skb packet is large, and 
there will be lots of copys.

struct skbuff *xennet_refrag_skb(skb)
{
    create newskb
    copying data and doing fragmentation
    return newskb
}

.......

if (unlikely(slots>  MAX_SKB_FRAGS + 1)) {
         net_alert_ratelimited(
               "xennet: skb rides the rocket: %d slots\n", slots);
         skb = xennet_refrag_skb(skb);
}
.....

Thanks
Annie>
>> "netchannel vs MAX_SKB_FRAGS". Maybe these two mechanism are
all
>> necessary?
> Lets see first if this is indeed the problem. Perhaps a simple debug
> patch that just does:
>
> 	s/MAX_SKB_FRAGS/DEBUG_MAX_FRAGS/
> 	#define DEBUG_MAX_FRAGS 21
>
> in both netback and netfront to set the maximum number of frags we can
> handle to 21? If that works with Sander test - then yes, it looks like
> we really need to get this ''feature-max-skb-frags'' done.
>

Sander Eikelenboom

2013-Jan-10 12:24 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

Thursday, January 10, 2013, 12:22:22 PM, you wrote:


> On 2013-1-9 23:08, Konrad Rzeszutek Wilk wrote:
>> On Wed, Jan 09, 2013 at 03:10:56PM +0800, ANNIE LI wrote:
>>>
>>> On 2013-1-9 4:55, Sander Eikelenboom wrote:
>>>>>                   if (unlikely(frags>= MAX_SKB_FRAGS)) {
>>>>>                           netdev_dbg(vif->dev, "Too
many frags\n");
>>>>>                           return -frags;
>>>>>                   }
>>>> I have added some rate limited warns in this function. However
none seems to be triggered while the pv-guest reports the "skb rides the
rocket" ..
>>> Oh,  yes, "skb rides the rocket" is a protect mechanism
in netfront,
>>> and it is not caused by netback checking code, but they all concern
>>> about the same thing(frags>= MAX_SKB_FRAGS ). I thought those
>>> packets were dropped by backend check, sorry for the confusion.
>>>
>>> In netfront, following code would check whether required slots
>>> exceed MAX_SKB_FRAGS, and drop skbs which does not meet this
>>> requirement directly.
>>>
>>>          if (unlikely(slots>  MAX_SKB_FRAGS + 1)) {
>>>                  net_alert_ratelimited(
>>>                          "xennet: skb rides the rocket: %d
slots\n", slots);
>>>                  goto drop;
>>>          }
>>>
>>> In netback, following code also compared frags with MAX_SKB_FRAGS,
>>> and create error response for netfront which does not meet this
>>> requirment. In this case, netfront will also drop corresponding
>>> skbs.
>>>
>>>                  if (unlikely(frags>= MAX_SKB_FRAGS)) {
>>>                          netdev_dbg(vif->dev, "Too many
frags\n");
>>>                          return -frags;
>>>                  }
>>>
>>> So it is correct that netback log was not print out because those
>>> packets are drops directly by frontend check, not by backend check.
>>> Without the frontend check, it is likely that netback check would
>>> block these skbs and create error response for netfront.
>>>
>>> So two ways are available: workaround in netfront for those
packets,
>>> doing re-fragment copying, but not sure how copying hurt
>>> performance. Another is to implement in netback, as discussed in
>> There is already some copying done (the copying of the socket data
>> from userspace to the kernel) - so the extra copy might not be that
>> bad as the data can be in the cache. This would probably be a way
>> to deal with old backends that cannot deal with this new feature-flag.
> I am thinking to do re-fragment in netfront for these skbs like following,
> Create a new skb, copy linear data and frag data from original skb into 
> this one, and make every frags data size is PAGE_SIZE except for the 
> last fragment. It is possible that the last fragment length is less than 
> PAGE_SIZE, then free the original skb. The skb packet is large, and 
> there will be lots of copys.
> struct skbuff *xennet_refrag_skb(skb)
> {
>     create newskb
>     copying data and doing fragmentation
>     return newskb
> }
> .......
if (unlikely(slots>>  MAX_SKB_FRAGS + 1)) {>          net_alert_ratelimited(
>                "xennet: skb rides the rocket: %d slots\n",
slots);
>          skb = xennet_refrag_skb(skb);
> }
> .....
> Thanks
> Annie
>>
>>> "netchannel vs MAX_SKB_FRAGS". Maybe these two mechanism
are all
>>> necessary?
>> Lets see first if this is indeed the problem. Perhaps a simple debug
>> patch that just does:
>>
>>       s/MAX_SKB_FRAGS/DEBUG_MAX_FRAGS/
>>       #define DEBUG_MAX_FRAGS 21
>>
>> in both netback and netfront to set the maximum number of frags we can
>> handle to 21? If that works with Sander test - then yes, it looks like
>> we really need to get this ''feature-max-skb-frags''
done.
>>

What i''m wondering about (without any knowledge about the innerworkings
of the network stack):
- Why can''t netfront handle the largers packets in the first place ?
- Would it be right to do something like xennet_get_responses() does for rx:

        int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
        if (unlikely(frags > max)) {
            net_warn_ratelimited("%s: Too many frags max:%d, frags:%d
\n",skb->dev->name, max, frags);
                err = -E2BIG;
        }
  Instead of dropping the packet ?
  Don''t know if this propagates via network protocols and will cause
clients to reduce the packetsize and by that way avoid the need to copying in
the first place ?


--
Sander

Ian Campbell

2013-Jan-10 12:26 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Thu, 2013-01-10 at 11:22 +0000, ANNIE LI wrote:> I am thinking to do re-fragment in netfront for these skbs like following,
> 
> Create a new skb, copy linear data and frag data from original skb into 
> this one, and make every frags data size is PAGE_SIZE except for the 
> last fragment. It is possible that the last fragment length is less than 
> PAGE_SIZE, then free the original skb. The skb packet is large, and 
> there will be lots of copys.
You don''t need (or I suspect want) to copy, you can directly add pages
from the source skb''s frags to the destination skb''s frags,
with
appropriate refcount frobbing. You can also share a page between two (or
more) skbs in the case where the boundary between two skbs happens to be
in the middle of a page.

But more importantly than all that you need to do more than just
refragment, you actually need to resegment i.e. you need to duplicate
the headers (Ethernet, IP, TCP) at the front of each new skb and adjust
the (psuedo-)checksums as appropriate (which will depend on whether the
SKB is GSO, or just has checksum offload or nothing).

So you need to go from
  <Ether><IP><TCP><...data... ...data...>
to
  <Ether><IP><TCP><...data...>
<Ether><IP><TCP><...data...>

Where the headers are adjusted to cope with this. Some of data might be
in frags but equally some might be in skb->data.

I''m guessing that Linux already has code which can do this for you,
since it has a software fallback for GSO.

Ian.
> 
> struct skbuff *xennet_refrag_skb(skb)
> {
>     create newskb
>     copying data and doing fragmentation
>     return newskb
> }
> 
> .......
> 
> if (unlikely(slots>  MAX_SKB_FRAGS + 1)) {
>          net_alert_ratelimited(
>                "xennet: skb rides the rocket: %d slots\n",
slots);
>          skb = xennet_refrag_skb(skb);
> }
> .....
> 
> Thanks
> Annie
> >
> >> "netchannel vs MAX_SKB_FRAGS". Maybe these two mechanism
are all
> >> necessary?
> > Lets see first if this is indeed the problem. Perhaps a simple debug
> > patch that just does:
> >
> > 	s/MAX_SKB_FRAGS/DEBUG_MAX_FRAGS/
> > 	#define DEBUG_MAX_FRAGS 21
> >
> > in both netback and netfront to set the maximum number of frags we can
> > handle to 21? If that works with Sander test - then yes, it looks like
> > we really need to get this ''feature-max-skb-frags''
done.
> >

Konrad Rzeszutek Wilk

2013-Jan-10 15:39 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Thu, Jan 10, 2013 at 12:26:56PM +0000, Ian Campbell
wrote:> On Thu, 2013-01-10 at 11:22 +0000, ANNIE LI wrote:
> > I am thinking to do re-fragment in netfront for these skbs like
following,
> > 
> > Create a new skb, copy linear data and frag data from original skb
into
> > this one, and make every frags data size is PAGE_SIZE except for the 
> > last fragment. It is possible that the last fragment length is less
than
> > PAGE_SIZE, then free the original skb. The skb packet is large, and 
> > there will be lots of copys.
> 
> You don''t need (or I suspect want) to copy, you can directly add
pages
> from the source skb''s frags to the destination skb''s
frags, with
> appropriate refcount frobbing. You can also share a page between two (or
> more) skbs in the case where the boundary between two skbs happens to be
> in the middle of a page.
That is allowed by the networking subsystem? I guess it will allow it as
the refcount keeps the users of the skb sane so that you don''t
inadvertly
pull the page out of the skb (or for highman pages - unmap it).
> 
> But more importantly than all that you need to do more than just
> refragment, you actually need to resegment i.e. you need to duplicate
> the headers (Ethernet, IP, TCP) at the front of each new skb and adjust
> the (psuedo-)checksums as appropriate (which will depend on whether the
> SKB is GSO, or just has checksum offload or nothing).
> 
> So you need to go from
>   <Ether><IP><TCP><...data... ...data...>
> to
>   <Ether><IP><TCP><...data...>
<Ether><IP><TCP><...data...>
> 
> Where the headers are adjusted to cope with this. Some of data might be
> in frags but equally some might be in skb->data.
> 
> I''m guessing that Linux already has code which can do this for
you,
> since it has a software fallback for GSO.
Hm, my fear is that global extra copying will mean that we lose the benefit
of having this in the cache - as the payload from user-space will not
land on the same cache-lines anymore with this adjustment.

This will require a bit more thinking on how to solve this particular
problem.

Ian Campbell

2013-Jan-10 16:25 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Thu, 2013-01-10 at 15:39 +0000, Konrad Rzeszutek Wilk
wrote:> On Thu, Jan 10, 2013 at 12:26:56PM +0000, Ian Campbell wrote:
> > On Thu, 2013-01-10 at 11:22 +0000, ANNIE LI wrote:
> > > I am thinking to do re-fragment in netfront for these skbs like
following,
> > > 
> > > Create a new skb, copy linear data and frag data from original
skb into
> > > this one, and make every frags data size is PAGE_SIZE except for
the
> > > last fragment. It is possible that the last fragment length is
less than
> > > PAGE_SIZE, then free the original skb. The skb packet is large,
and
> > > there will be lots of copys.
> > 
> > You don''t need (or I suspect want) to copy, you can directly
add pages
> > from the source skb''s frags to the destination skb''s
frags, with
> > appropriate refcount frobbing. You can also share a page between two
(or
> > more) skbs in the case where the boundary between two skbs happens to
be
> > in the middle of a page.
> 
> That is allowed by the networking subsystem?
Yes, happens all the time (well not quite, but it happens)
>  I guess it will allow it as
> the refcount keeps the users of the skb sane so that you don''t
inadvertly
> pull the page out of the skb (or for highman pages - unmap it).
Correct.
> 
> > 
> > But more importantly than all that you need to do more than just
> > refragment, you actually need to resegment i.e. you need to duplicate
> > the headers (Ethernet, IP, TCP) at the front of each new skb and
adjust
> > the (psuedo-)checksums as appropriate (which will depend on whether
the
> > SKB is GSO, or just has checksum offload or nothing).
> > 
> > So you need to go from
> >   <Ether><IP><TCP><...data... ...data...>
> > to
> >   <Ether><IP><TCP><...data...>
<Ether><IP><TCP><...data...>
> > 
> > Where the headers are adjusted to cope with this. Some of data might
be
> > in frags but equally some might be in skb->data.
> > 
> > I''m guessing that Linux already has code which can do this
for you,
> > since it has a software fallback for GSO.
> 
> Hm, my fear is that global extra copying will mean that we lose the benefit
> of having this in the cache - as the payload from user-space will not
> land on the same cache-lines anymore with this adjustment.
Remember that this only happens at all in extreme or unusual cases. The
common case should be that an skb fits into the negotiated number of
slots.

Also there isn''t that much copying. You need to copy the heads from
skb->data to new_skb->data but the frags you can just reference
directly.
> This will require a bit more thinking on how to solve this particular
> problem.

ANNIE LI

2013-Jan-11 07:34 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On 2013-1-10 20:26, Ian Campbell wrote:> On Thu, 2013-01-10 at 11:22 +0000, ANNIE LI wrote:
>> I am thinking to do re-fragment in netfront for these skbs like
following,
>>
>> Create a new skb, copy linear data and frag data from original skb into
>> this one, and make every frags data size is PAGE_SIZE except for the
>> last fragment. It is possible that the last fragment length is less
than
>> PAGE_SIZE, then free the original skb. The skb packet is large, and
>> there will be lots of copys.
> You don''t need (or I suspect want) to copy, you can directly add
pages
> from the source skb''s frags to the destination skb''s
frags, with
> appropriate refcount frobbing. You can also share a page between two (or
> more) skbs in the case where the boundary between two skbs happens to be
> in the middle of a page.
>
> But more importantly than all that you need to do more than just
> refragment, you actually need to resegment i.e. you need to duplicate
> the headers (Ethernet, IP, TCP) at the front of each new skb and adjust
> the (psuedo-)checksums as appropriate (which will depend on whether the
> SKB is GSO, or just has checksum offload or nothing).
>
> So you need to go from
>    <Ether><IP><TCP><...data... ...data...>
> to
>    <Ether><IP><TCP><...data...> 
<Ether><IP><TCP><...data...>
>
> Where the headers are adjusted to cope with this. Some of data might be
> in frags but equally some might be in skb->data.
>
> I''m guessing that Linux already has code which can do this for
you,
> since it has a software fallback for GSO.
There is "skb_gso_segment" existing to perform segmentation on skb and
return a list of segments. From the code, it seems this function is 
specific to GSO packets, maybe it is not available to other non-gso 
large packets which requires slots is larger than MAX_SKB_FRAGS + 1". So 
for non-gso offload packets, I would go forward to write a function to 
do resegment.

Thanks
Annie>
> Ian.
>
>> struct skbuff *xennet_refrag_skb(skb)
>> {
>>      create newskb
>>      copying data and doing fragmentation
>>      return newskb
>> }
>>
>> .......
>>
>> if (unlikely(slots>   MAX_SKB_FRAGS + 1)) {
>>           net_alert_ratelimited(
>>                 "xennet: skb rides the rocket: %d slots\n",
slots);
>>           skb = xennet_refrag_skb(skb);
>> }
>> .....
>>
>> Thanks
>> Annie
>>>> "netchannel vs MAX_SKB_FRAGS". Maybe these two
mechanism are all
>>>> necessary?
>>> Lets see first if this is indeed the problem. Perhaps a simple
debug
>>> patch that just does:
>>>
>>> 	s/MAX_SKB_FRAGS/DEBUG_MAX_FRAGS/
>>> 	#define DEBUG_MAX_FRAGS 21
>>>
>>> in both netback and netfront to set the maximum number of frags we
can
>>> handle to 21? If that works with Sander test - then yes, it looks
like
>>> we really need to get this
''feature-max-skb-frags'' done.
>>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Ian Campbell

2013-Jan-11 09:56 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Fri, 2013-01-11 at 07:34 +0000, ANNIE LI wrote:> > I''m guessing that Linux already has code which can do this
for you,
> > since it has a software fallback for GSO.
> 
> There is "skb_gso_segment" existing to perform segmentation on
skb and
> return a list of segments. From the code, it seems this function is 
> specific to GSO packets, maybe it is not available to other non-gso 
> large packets which requires slots is larger than MAX_SKB_FRAGS + 1".
So
> for non-gso offload packets, I would go forward to write a function to 
> do resegment.
Without GSO I don''t think you should be seeing packets larger than the
MTU, which would normally be either ~1500 or ~9000 and fit easily within
any sensible negotiation for the max frags. I don''t think you should
worry unduly about this case.

Ian.

Paul Durrant

2013-Jan-11 10:09 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

> -----Original Message-----
> Without GSO I don''t think you should be seeing packets larger than
the MTU,
> which would normally be either ~1500 or ~9000 and fit easily within any
> sensible negotiation for the max frags. I don''t think you should
worry unduly
> about this case.
> 
A stack could still send down a packet with one byte per frag though, right? A
copy-and-coalesce path would still be needed in this case.

  Paul

Ian Campbell

2013-Jan-11 10:16 UTC

head link

Re: xennet: skb rides the rocket: 20 slots

On Fri, 2013-01-11 at 10:09 +0000, Paul Durrant wrote:> > -----Original Message-----
> > Without GSO I don''t think you should be seeing packets larger
than the MTU,
> > which would normally be either ~1500 or ~9000 and fit easily within
any
> > sensible negotiation for the max frags. I don''t think you
should worry unduly
> > about this case.
> > 
> 
> A stack could still send down a packet with one byte per frag though,
> right? A copy-and-coalesce path would still be needed in this case.
True. In that case skb_linearise would probably do the job on Linux.

Ian

Sander Eikelenboom

2013-Mar-09 12:56 UTC

head link

Fwd: Re: xennet: skb rides the rocket: 20 slots

This is a forwarded message
From: ANNIE LI <annie.li@oracle.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Date: Monday, January 14, 2013, 10:39:53 AM
Subject: [Xen-devel] xennet: skb rides the rocket: 20 slots


Resend because xen-devel wasn''t copied on the original ...


===8<==============Original message text==============Hi

I created a patch for this, but I failed to reproduce this issue and 
verify it. The patch was attached,

Thanks
Annie

On 2013-1-11 18:16, Ian Campbell wrote:> On Fri, 2013-01-11 at 10:09 +0000, Paul Durrant wrote:
>>> -----Original Message-----
>>> Without GSO I don''t think you should be seeing packets
larger than the MTU,
>>> which would normally be either ~1500 or ~9000 and fit easily within
any
>>> sensible negotiation for the max frags. I don''t think you
should worry unduly
>>> about this case.
>>>
>> A stack could still send down a packet with one byte per frag though,
>> right? A copy-and-coalesce path would still be needed in this case.
> True. In that case skb_linearise would probably do the job on Linux.
>
> Ian
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
===8<===========End of original message text==========
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Sander Eikelenboom

2013-Mar-09 12:57 UTC

head link

Fwd: Re: xennet: skb rides the rocket: 20 slots

This is a forwarded message
From: Sander Eikelenboom <linux@eikelenboom.it>
To: ANNIE LI <annie.li@oracle.com>
Date: Thursday, January 24, 2013, 9:45:42 AM
Subject: [Xen-devel] xennet: skb rides the rocket: 20 slots


Resend because xen-devel wasn''t copied on the original ...

===8<==============Original message text==============
Monday, January 14, 2013, 10:39:53 AM, you wrote:
> Hi
> I created a patch for this, but I failed to reproduce this issue and 
> verify it. The patch was attached,
Hi Annie,

I finally had time to seriously test the patch.
I put in some more warn''s and made the bracing a bit more explicit (i
hope i did the bracing right).

Problem is the current code crashes with:

[ 4189.815911] nf_conntrack: automatic helper assignment is deprecated and it
will be removed soon. Use the iptables CT target to attach helpers instead.
[29601.932324] xennet:  xennet_xmit_skb err_end: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:0 offset:106 skb_headlen:54
skb->len:64454, skb->data_len:0 skb->truesize:65168 nr_frags:0
page_size:4096 prot:0800 gso:1 linearize:0 gso_segs:46 dev:eth0 transp:0006
[29601.932426] BUG: unable to handle kernel NULL pointer dereference at         
(null)
[29601.932461] IP: [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
[29601.932498] PGD 2d497067 PUD 2dd94067 PMD 0
[29601.932526] Oops: 0000 [#1] PREEMPT SMP
[29601.932549] Modules linked in:
[29601.932566] CPU 0
[29601.932581] Pid: 2948, comm: deluged Not tainted
3.8.0-rc4-20130123-netpatched-rocketscience-radeon-qmax-new-a #1
[29601.932615] RIP: e030:[<ffffffff816551f4>]  [<ffffffff816551f4>]
xennet_xmit_skb+0x204/0x370
[29601.932650] RSP: e02b:ffff88002cd95698  EFLAGS: 00010207
[29601.932669] RAX: 0000000000000000 RBX: ffff8800049574e8 RCX: 0000000000000036
[29601.932691] RDX: ffff88000398006a RSI: ffff88002ce4b8c0 RDI: 0000000000000000
[29601.932712] RBP: ffff88002cd95758 R08: 0000000000000000 R09: 0000000000000000
[29601.932734] R10: 0000000000000000 R11: 0000000000000002 R12: ffff88002cc98000
[29601.932755] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000036
[29601.932786] FS:  00007fa9ca911700(0000) GS:ffff88002fc00000(0000)
knlGS:0000000000000000
[29601.932811] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[29601.932830] CR2: 0000000000000000 CR3: 000000002cf0d000 CR4: 0000000000000660
[29601.932853] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[29601.932877] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[29601.932900] Process deluged (pid: 2948, threadinfo ffff88002cd94000, task
ffff88002ce4b150)
[29601.932923] Stack:
[29601.932934]  ffff880000000036 000000000000fbc6 ffff880000000000
24e008890000fe90
[29601.932970]  ffff880000000000 0000000000001000 ffff880000000800
ffff880000000001
[29601.933004]  0000000000000000 ffff88000000002e ffff88002cc98000
ffff880000000006
[29601.933795] Call Trace:
[29601.933795]  [<ffffffff818570b9>] dev_hard_start_xmit+0x219/0x480
[29601.933795]  [<ffffffff81873856>] sch_direct_xmit+0xf6/0x290
[29601.933795]  [<ffffffff818574c6>] dev_queue_xmit+0x1a6/0x5a0
[29601.933795]  [<ffffffff81857320>] ? dev_hard_start_xmit+0x480/0x480
[29601.933795]  [<ffffffff810af255>] ? trace_softirqs_off+0x85/0x1b0
[29601.933795]  [<ffffffff818f2596>] ip_finish_output+0x226/0x530
[29601.933795]  [<ffffffff818f243d>] ? ip_finish_output+0xcd/0x530
[29601.933795]  [<ffffffff818f28f9>] ip_output+0x59/0xe0
[29601.933795]  [<ffffffff818f1438>] ip_local_out+0x28/0x90
[29601.933795]  [<ffffffff818f19df>] ip_queue_xmit+0x17f/0x490
[29601.933795]  [<ffffffff818f1860>] ? ip_send_unicast_reply+0x330/0x330
[29601.933795]  [<ffffffff810a55f7>] ? getnstimeofday+0x47/0xe0
[29601.933795]  [<ffffffff818471e9>] ? __skb_clone+0x29/0x120
[29601.933795]  [<ffffffff81907c2d>] tcp_transmit_skb+0x3fd/0x8d0
[29601.933795]  [<ffffffff8190ac9a>] tcp_write_xmit+0x22a/0xa80
[29601.933795]  [<ffffffff81137ebe>] ? alloc_pages_current+0xde/0x1c0
[29601.933795]  [<ffffffff8190b51b>] tcp_push_one+0x2b/0x40
[29601.933795]  [<ffffffff818fbdc4>] tcp_sendmsg+0x8d4/0xe10
[29601.933795]  [<ffffffff819225d6>] inet_sendmsg+0xa6/0x100
[29601.933795]  [<ffffffff81922530>] ? inet_autobind+0x60/0x60
[29601.933795]  [<ffffffff8183eb52>] sock_sendmsg+0x82/0xb0
[29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
[29601.933795]  [<ffffffff81118a34>] ? might_fault+0x84/0x90
[29601.933795]  [<ffffffff811189eb>] ? might_fault+0x3b/0x90
[29601.933795]  [<ffffffff8114082b>] ? __kmalloc+0xfb/0x160
[29601.933795]  [<ffffffff8184c9ad>] ? verify_iovec+0x7d/0xf0
[29601.933795]  [<ffffffff8183fb73>] __sys_sendmsg+0x393/0x3a0
[29601.933795]  [<ffffffff819bafc5>] ?
_raw_spin_unlock_irqrestore+0x75/0xa0
[29601.933795]  [<ffffffff810b58f8>] ? lock_acquire+0xd8/0x100
[29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
[29601.933795]  [<ffffffff81168777>] ? fget_light+0xd7/0x140
[29601.933795]  [<ffffffff811686da>] ? fget_light+0x3a/0x140
[29601.933795]  [<ffffffff8183fd34>] sys_sendmsg+0x44/0x80
[29601.933795]  [<ffffffff819bbe29>] system_call_fastpath+0x16/0x1b
[29601.933795] Code: e9 ca fe ff ff 49 8b b4 24 80 00 00 00 48 89 df 45 31 ed e8
1f 16 20 00 48 3d 00 f0 ff ff 48 89 c7 76 08 e9 17 01 00 00 4c 89 f7 <4c>
8b 37 4c 89 e6 48 c7 07 00 00 00 00 41 ff c5 e8 b7 f3 ff ff
[29601.933795] RIP  [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
[29601.933795]  RSP <ffff88002cd95698>
[29601.933795] CR2: 0000000000000000
[29602.018741] ---[ end trace 5ec54203e8f81a1b ]---
[29602.018747] Kernel panic - not syncing: Fatal exception in interrupt



Which accoring to addr2line is:

segs = segs->next;

I have attached the resulting patch

--
Sander
> Thanks
> Annie
> On 2013-1-11 18:16, Ian Campbell wrote:
>> On Fri, 2013-01-11 at 10:09 +0000, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> Without GSO I don''t think you should be seeing packets
larger than the MTU,
>>>> which would normally be either ~1500 or ~9000 and fit easily
within any
>>>> sensible negotiation for the max frags. I don''t think
you should worry unduly
>>>> about this case.
>>>>
>>> A stack could still send down a packet with one byte per frag
though,
>>> right? A copy-and-coalesce path would still be needed in this case.
>> True. In that case skb_linearise would probably do the job on Linux.
>>
>> Ian
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel===8<===========End of original message text==========


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

ANNIE LI

2013-Mar-10 05:22 UTC

head link

Re: Fwd: Re: xennet: skb rides the rocket: 20 slots

On 2013-3-9 20:57, Sander Eikelenboom wrote:> This is a forwarded message
> From: Sander Eikelenboom<linux@eikelenboom.it>
> To: ANNIE LI<annie.li@oracle.com>
> Date: Thursday, January 24, 2013, 9:45:42 AM
> Subject: [Xen-devel] xennet: skb rides the rocket: 20 slots
>
>
> Resend because xen-devel wasn''t copied on the original ...
>
> ===8<==============Original message text==============>
> Monday, January 14, 2013, 10:39:53 AM, you wrote:
>
>> Hi
>> I created a patch for this, but I failed to reproduce this issue and
>> verify it. The patch was attached,
> Hi Annie,
>
> I finally had time to seriously test the patch.
> I put in some more warn''s and made the bracing a bit more explicit
(i hope i did the bracing right).
>
> Problem is the current code crashes with:
>
> [ 4189.815911] nf_conntrack: automatic helper assignment is deprecated and
it will be removed soon. Use the iptables CT target to attach helpers instead.
> [29601.932324] xennet:  xennet_xmit_skb err_end: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:0 offset:106 skb_headlen:54
skb->len:64454, skb->data_len:0 skb->truesize:65168 nr_frags:0
page_size:4096 prot:0800 gso:1 linearize:0 gso_segs:46 dev:eth0 transp:0006
> [29601.932426] BUG: unable to handle kernel NULL pointer dereference at    
(null)
> [29601.932461] IP: [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
> [29601.932498] PGD 2d497067 PUD 2dd94067 PMD 0
> [29601.932526] Oops: 0000 [#1] PREEMPT SMP
> [29601.932549] Modules linked in:
> [29601.932566] CPU 0
> [29601.932581] Pid: 2948, comm: deluged Not tainted
3.8.0-rc4-20130123-netpatched-rocketscience-radeon-qmax-new-a #1
> [29601.932615] RIP: e030:[<ffffffff816551f4>] 
[<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
> [29601.932650] RSP: e02b:ffff88002cd95698  EFLAGS: 00010207
> [29601.932669] RAX: 0000000000000000 RBX: ffff8800049574e8 RCX:
0000000000000036
> [29601.932691] RDX: ffff88000398006a RSI: ffff88002ce4b8c0 RDI:
0000000000000000
> [29601.932712] RBP: ffff88002cd95758 R08: 0000000000000000 R09:
0000000000000000
> [29601.932734] R10: 0000000000000000 R11: 0000000000000002 R12:
ffff88002cc98000
> [29601.932755] R13: 0000000000000000 R14: 0000000000000001 R15:
0000000000000036
> [29601.932786] FS:  00007fa9ca911700(0000) GS:ffff88002fc00000(0000)
knlGS:0000000000000000
> [29601.932811] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [29601.932830] CR2: 0000000000000000 CR3: 000000002cf0d000 CR4:
0000000000000660
> [29601.932853] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
> [29601.932877] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
> [29601.932900] Process deluged (pid: 2948, threadinfo ffff88002cd94000,
task ffff88002ce4b150)
> [29601.932923] Stack:
> [29601.932934]  ffff880000000036 000000000000fbc6 ffff880000000000
24e008890000fe90
> [29601.932970]  ffff880000000000 0000000000001000 ffff880000000800
ffff880000000001
> [29601.933004]  0000000000000000 ffff88000000002e ffff88002cc98000
ffff880000000006
> [29601.933795] Call Trace:
> [29601.933795]  [<ffffffff818570b9>] dev_hard_start_xmit+0x219/0x480
> [29601.933795]  [<ffffffff81873856>] sch_direct_xmit+0xf6/0x290
> [29601.933795]  [<ffffffff818574c6>] dev_queue_xmit+0x1a6/0x5a0
> [29601.933795]  [<ffffffff81857320>] ?
dev_hard_start_xmit+0x480/0x480
> [29601.933795]  [<ffffffff810af255>] ? trace_softirqs_off+0x85/0x1b0
> [29601.933795]  [<ffffffff818f2596>] ip_finish_output+0x226/0x530
> [29601.933795]  [<ffffffff818f243d>] ? ip_finish_output+0xcd/0x530
> [29601.933795]  [<ffffffff818f28f9>] ip_output+0x59/0xe0
> [29601.933795]  [<ffffffff818f1438>] ip_local_out+0x28/0x90
> [29601.933795]  [<ffffffff818f19df>] ip_queue_xmit+0x17f/0x490
> [29601.933795]  [<ffffffff818f1860>] ?
ip_send_unicast_reply+0x330/0x330
> [29601.933795]  [<ffffffff810a55f7>] ? getnstimeofday+0x47/0xe0
> [29601.933795]  [<ffffffff818471e9>] ? __skb_clone+0x29/0x120
> [29601.933795]  [<ffffffff81907c2d>] tcp_transmit_skb+0x3fd/0x8d0
> [29601.933795]  [<ffffffff8190ac9a>] tcp_write_xmit+0x22a/0xa80
> [29601.933795]  [<ffffffff81137ebe>] ? alloc_pages_current+0xde/0x1c0
> [29601.933795]  [<ffffffff8190b51b>] tcp_push_one+0x2b/0x40
> [29601.933795]  [<ffffffff818fbdc4>] tcp_sendmsg+0x8d4/0xe10
> [29601.933795]  [<ffffffff819225d6>] inet_sendmsg+0xa6/0x100
> [29601.933795]  [<ffffffff81922530>] ? inet_autobind+0x60/0x60
> [29601.933795]  [<ffffffff8183eb52>] sock_sendmsg+0x82/0xb0
> [29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
> [29601.933795]  [<ffffffff81118a34>] ? might_fault+0x84/0x90
> [29601.933795]  [<ffffffff811189eb>] ? might_fault+0x3b/0x90
> [29601.933795]  [<ffffffff8114082b>] ? __kmalloc+0xfb/0x160
> [29601.933795]  [<ffffffff8184c9ad>] ? verify_iovec+0x7d/0xf0
> [29601.933795]  [<ffffffff8183fb73>] __sys_sendmsg+0x393/0x3a0
> [29601.933795]  [<ffffffff819bafc5>] ?
_raw_spin_unlock_irqrestore+0x75/0xa0
> [29601.933795]  [<ffffffff810b58f8>] ? lock_acquire+0xd8/0x100
> [29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
> [29601.933795]  [<ffffffff81168777>] ? fget_light+0xd7/0x140
> [29601.933795]  [<ffffffff811686da>] ? fget_light+0x3a/0x140
> [29601.933795]  [<ffffffff8183fd34>] sys_sendmsg+0x44/0x80
> [29601.933795]  [<ffffffff819bbe29>] system_call_fastpath+0x16/0x1b
> [29601.933795] Code: e9 ca fe ff ff 49 8b b4 24 80 00 00 00 48 89 df 45 31
ed e8 1f 16 20 00 48 3d 00 f0 ff ff 48 89 c7 76 08 e9 17 01 00 00 4c 89
f7<4c>  8b 37 4c 89 e6 48 c7 07 00 00 00 00 41 ff c5 e8 b7 f3 ff ff
> [29601.933795] RIP  [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
> [29601.933795]  RSP<ffff88002cd95698>
> [29601.933795] CR2: 0000000000000000
> [29602.018741] ---[ end trace 5ec54203e8f81a1b ]---
> [29602.018747] Kernel panic - not syncing: Fatal exception in interrupt
>
>
>
> Which accoring to addr2line is:
>
> segs = segs->next;
>
> I have attached the resulting patch
Hi Sander,

Thanks for testing this. Maybe something wrong when doing segment...

Ian,

It is likely that coalescing work in netback you mentioned in thread(Is: 
SKB_MAX_LEN bites again. Was: Re: bug disabling guest interface) is not 
enough for some extremely large packets. So I assume the segment work in 
netfront is also necessary.

Thanks
Annie>
> --
> Sander
>
>> Thanks
>> Annie
>> On 2013-1-11 18:16, Ian Campbell wrote:
>>> On Fri, 2013-01-11 at 10:09 +0000, Paul Durrant wrote:
>>>>> -----Original Message-----
>>>>> Without GSO I don''t think you should be seeing
packets larger than the MTU,
>>>>> which would normally be either ~1500 or ~9000 and fit
easily within any
>>>>> sensible negotiation for the max frags. I don''t
think you should worry unduly
>>>>> about this case.
>>>>>
>>>> A stack could still send down a packet with one byte per frag
though,
>>>> right? A copy-and-coalesce path would still be needed in this
case.
>>> True. In that case skb_linearise would probably do the job on
Linux.
>>>
>>> Ian
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
> ===8<===========End of original message text==========>
>
>

Ian Campbell

2013-Mar-12 11:37 UTC

head link

Re: Fwd: Re: xennet: skb rides the rocket: 20 slots

On Sun, 2013-03-10 at 05:22 +0000, ANNIE LI wrote:> It is likely that coalescing work in netback you mentioned in thread(Is: 
> SKB_MAX_LEN bites again. Was: Re: bug disabling guest interface) is not 
> enough for some extremely large packets. So I assume the segment work in 
> netfront is also necessary.
I expect so too, although the segmentation thing seems like it is lower
priority, at least right now.

Ian.

annie li

2013-Mar-15 05:14 UTC

head link

Re: Fwd: Re: xennet: skb rides the rocket: 20 slots

Hi Sander,

I checked this patch, and am thinking skb_gso_segment may return NULL. 
(see description of skb_gso_segment). And maybe this cause the NULL 
pointer deference issue.

BTW, how did you send out those skbs with 64454 size? only copying files 
or doing iperf/netperf test? I hope I can verify it locally.

Thanks
Annie
On 2013-3-9 20:57, Sander Eikelenboom wrote:> This is a forwarded message
> From: Sander Eikelenboom <linux@eikelenboom.it>
> To: ANNIE LI <annie.li@oracle.com>
> Date: Thursday, January 24, 2013, 9:45:42 AM
> Subject: [Xen-devel] xennet: skb rides the rocket: 20 slots
>
>
> Resend because xen-devel wasn''t copied on the original ...
>
> ===8<==============Original message text==============>
> Monday, January 14, 2013, 10:39:53 AM, you wrote:
>
>> Hi
>> I created a patch for this, but I failed to reproduce this issue and
>> verify it. The patch was attached,
> Hi Annie,
>
> I finally had time to seriously test the patch.
> I put in some more warn''s and made the bracing a bit more explicit
(i hope i did the bracing right).
>
> Problem is the current code crashes with:
>
> [ 4189.815911] nf_conntrack: automatic helper assignment is deprecated and
it will be removed soon. Use the iptables CT target to attach helpers instead.
> [29601.932324] xennet:  xennet_xmit_skb err_end: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:0 offset:106 skb_headlen:54
skb->len:64454, skb->data_len:0 skb->truesize:65168 nr_frags:0
page_size:4096 prot:0800 gso:1 linearize:0 gso_segs:46 dev:eth0 transp:0006
> [29601.932426] BUG: unable to handle kernel NULL pointer dereference at    
(null)
> [29601.932461] IP: [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
> [29601.932498] PGD 2d497067 PUD 2dd94067 PMD 0
> [29601.932526] Oops: 0000 [#1] PREEMPT SMP
> [29601.932549] Modules linked in:
> [29601.932566] CPU 0
> [29601.932581] Pid: 2948, comm: deluged Not tainted
3.8.0-rc4-20130123-netpatched-rocketscience-radeon-qmax-new-a #1
> [29601.932615] RIP: e030:[<ffffffff816551f4>] 
[<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
> [29601.932650] RSP: e02b:ffff88002cd95698  EFLAGS: 00010207
> [29601.932669] RAX: 0000000000000000 RBX: ffff8800049574e8 RCX:
0000000000000036
> [29601.932691] RDX: ffff88000398006a RSI: ffff88002ce4b8c0 RDI:
0000000000000000
> [29601.932712] RBP: ffff88002cd95758 R08: 0000000000000000 R09:
0000000000000000
> [29601.932734] R10: 0000000000000000 R11: 0000000000000002 R12:
ffff88002cc98000
> [29601.932755] R13: 0000000000000000 R14: 0000000000000001 R15:
0000000000000036
> [29601.932786] FS:  00007fa9ca911700(0000) GS:ffff88002fc00000(0000)
knlGS:0000000000000000
> [29601.932811] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [29601.932830] CR2: 0000000000000000 CR3: 000000002cf0d000 CR4:
0000000000000660
> [29601.932853] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
> [29601.932877] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
> [29601.932900] Process deluged (pid: 2948, threadinfo ffff88002cd94000,
task ffff88002ce4b150)
> [29601.932923] Stack:
> [29601.932934]  ffff880000000036 000000000000fbc6 ffff880000000000
24e008890000fe90
> [29601.932970]  ffff880000000000 0000000000001000 ffff880000000800
ffff880000000001
> [29601.933004]  0000000000000000 ffff88000000002e ffff88002cc98000
ffff880000000006
> [29601.933795] Call Trace:
> [29601.933795]  [<ffffffff818570b9>] dev_hard_start_xmit+0x219/0x480
> [29601.933795]  [<ffffffff81873856>] sch_direct_xmit+0xf6/0x290
> [29601.933795]  [<ffffffff818574c6>] dev_queue_xmit+0x1a6/0x5a0
> [29601.933795]  [<ffffffff81857320>] ?
dev_hard_start_xmit+0x480/0x480
> [29601.933795]  [<ffffffff810af255>] ? trace_softirqs_off+0x85/0x1b0
> [29601.933795]  [<ffffffff818f2596>] ip_finish_output+0x226/0x530
> [29601.933795]  [<ffffffff818f243d>] ? ip_finish_output+0xcd/0x530
> [29601.933795]  [<ffffffff818f28f9>] ip_output+0x59/0xe0
> [29601.933795]  [<ffffffff818f1438>] ip_local_out+0x28/0x90
> [29601.933795]  [<ffffffff818f19df>] ip_queue_xmit+0x17f/0x490
> [29601.933795]  [<ffffffff818f1860>] ?
ip_send_unicast_reply+0x330/0x330
> [29601.933795]  [<ffffffff810a55f7>] ? getnstimeofday+0x47/0xe0
> [29601.933795]  [<ffffffff818471e9>] ? __skb_clone+0x29/0x120
> [29601.933795]  [<ffffffff81907c2d>] tcp_transmit_skb+0x3fd/0x8d0
> [29601.933795]  [<ffffffff8190ac9a>] tcp_write_xmit+0x22a/0xa80
> [29601.933795]  [<ffffffff81137ebe>] ? alloc_pages_current+0xde/0x1c0
> [29601.933795]  [<ffffffff8190b51b>] tcp_push_one+0x2b/0x40
> [29601.933795]  [<ffffffff818fbdc4>] tcp_sendmsg+0x8d4/0xe10
> [29601.933795]  [<ffffffff819225d6>] inet_sendmsg+0xa6/0x100
> [29601.933795]  [<ffffffff81922530>] ? inet_autobind+0x60/0x60
> [29601.933795]  [<ffffffff8183eb52>] sock_sendmsg+0x82/0xb0
> [29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
> [29601.933795]  [<ffffffff81118a34>] ? might_fault+0x84/0x90
> [29601.933795]  [<ffffffff811189eb>] ? might_fault+0x3b/0x90
> [29601.933795]  [<ffffffff8114082b>] ? __kmalloc+0xfb/0x160
> [29601.933795]  [<ffffffff8184c9ad>] ? verify_iovec+0x7d/0xf0
> [29601.933795]  [<ffffffff8183fb73>] __sys_sendmsg+0x393/0x3a0
> [29601.933795]  [<ffffffff819bafc5>] ?
_raw_spin_unlock_irqrestore+0x75/0xa0
> [29601.933795]  [<ffffffff810b58f8>] ? lock_acquire+0xd8/0x100
> [29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
> [29601.933795]  [<ffffffff81168777>] ? fget_light+0xd7/0x140
> [29601.933795]  [<ffffffff811686da>] ? fget_light+0x3a/0x140
> [29601.933795]  [<ffffffff8183fd34>] sys_sendmsg+0x44/0x80
> [29601.933795]  [<ffffffff819bbe29>] system_call_fastpath+0x16/0x1b
> [29601.933795] Code: e9 ca fe ff ff 49 8b b4 24 80 00 00 00 48 89 df 45 31
ed e8 1f 16 20 00 48 3d 00 f0 ff ff 48 89 c7 76 08 e9 17 01 00 00 4c 89 f7
<4c> 8b 37 4c 89 e6 48 c7 07 00 00 00 00 41 ff c5 e8 b7 f3 ff ff
> [29601.933795] RIP  [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
> [29601.933795]  RSP <ffff88002cd95698>
> [29601.933795] CR2: 0000000000000000
> [29602.018741] ---[ end trace 5ec54203e8f81a1b ]---
> [29602.018747] Kernel panic - not syncing: Fatal exception in interrupt
>
>
>
> Which accoring to addr2line is:
>
> segs = segs->next;
>
> I have attached the resulting patch
>
> --
> Sander
>
>> Thanks
>> Annie
>> On 2013-1-11 18:16, Ian Campbell wrote:
>>> On Fri, 2013-01-11 at 10:09 +0000, Paul Durrant wrote:
>>>>> -----Original Message-----
>>>>> Without GSO I don''t think you should be seeing
packets larger than the MTU,
>>>>> which would normally be either ~1500 or ~9000 and fit
easily within any
>>>>> sensible negotiation for the max frags. I don''t
think you should worry unduly
>>>>> about this case.
>>>>>
>>>> A stack could still send down a packet with one byte per frag
though,
>>>> right? A copy-and-coalesce path would still be needed in this
case.
>>> True. In that case skb_linearise would probably do the job on
Linux.
>>>
>>> Ian
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
> ===8<===========End of original message text==========>
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Sander Eikelenboom

2013-Mar-15 21:29 UTC

head link

Re: Fwd: Re: xennet: skb rides the rocket: 20 slots

Friday, March 15, 2013, 6:14:47 AM, you wrote:
> Hi Sander,
> I checked this patch, and am thinking skb_gso_segment may return     NULL.
(see description of skb_gso_segment). And maybe this cause the     NULL pointer
deference issue.
Will adjust the patch and see what is does ..
> BTW, how did you send out those skbs with 64454 size? only copying    
files or doing iperf/netperf test? I hope I can verify it locally.
Unfortunately I wasn''t able to determine a simple and reliable test,
based on iperf/netperf and the likes.
I can quite reliably trigger it in a PV guest with the deluge torrent client.
Since the VM uses network based storage for the data (glusterfs) that also
causes some traffic.
But i think it''s the torrent traffic that causes it (since bittorrent
can stretch networks quite to some limits (in bandwidth, packet size, number of
connections).

> Thanks
> Annie
> On 2013-3-9 20:57, Sander Eikelenboom       wrote:
>     
>   
> This is a forwarded message
> From: Sander Eikelenboom <linux@eikelenboom.it>
> To: ANNIE LI <annie.li@oracle.com>
> Date: Thursday, January 24, 2013, 9:45:42 AM
> Subject: [Xen-devel] xennet: skb rides the rocket: 20 slots
> Resend because xen-devel wasn''t copied on the original ...
> ===8<==============Original message text==============
> Monday, January 14, 2013, 10:39:53 AM, you wrote:
>   
>   
> Hi
>   
>   
>   
>   
> I created a patch for this, but I failed to reproduce this issue and 
> verify it. The patch was attached,
>   
>   
> Hi Annie,
> I finally had time to seriously test the patch.
> I put in some more warn''s and made the bracing a bit more explicit
(i hope i did the bracing right).
> Problem is the current code crashes with:
> [ 4189.815911] nf_conntrack: automatic helper assignment is deprecated and
it will be removed soon. Use the iptables CT target to attach helpers instead.
> [29601.932324] xennet:  xennet_xmit_skb err_end: 19 slots MAX_SKB_FRAGS: 17
div_roundup:1 xennet_count_skb_frag_slots:0 offset:106 skb_headlen:54
skb->len:64454, skb->data_len:0 skb->truesize:65168 nr_frags:0
page_size:4096 prot:0800 gso:1 linearize:0 gso_segs:46 dev:eth0 transp:0006
> [29601.932426] BUG: unable to handle kernel NULL pointer dereference at    
(null)
> [29601.932461] IP: [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
> [29601.932498] PGD 2d497067 PUD 2dd94067 PMD 0
> [29601.932526] Oops: 0000 [#1] PREEMPT SMP
> [29601.932549] Modules linked in:
> [29601.932566] CPU 0
> [29601.932581] Pid: 2948, comm: deluged Not tainted
3.8.0-rc4-20130123-netpatched-rocketscience-radeon-qmax-new-a #1
> [29601.932615] RIP: e030:[<ffffffff816551f4>] 
[<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
> [29601.932650] RSP: e02b:ffff88002cd95698  EFLAGS: 00010207
> [29601.932669] RAX: 0000000000000000 RBX: ffff8800049574e8 RCX:
0000000000000036
> [29601.932691] RDX: ffff88000398006a RSI: ffff88002ce4b8c0 RDI:
0000000000000000
> [29601.932712] RBP: ffff88002cd95758 R08: 0000000000000000 R09:
0000000000000000
> [29601.932734] R10: 0000000000000000 R11: 0000000000000002 R12:
ffff88002cc98000
> [29601.932755] R13: 0000000000000000 R14: 0000000000000001 R15:
0000000000000036
> [29601.932786] FS:  00007fa9ca911700(0000) GS:ffff88002fc00000(0000)
knlGS:0000000000000000
> [29601.932811] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [29601.932830] CR2: 0000000000000000 CR3: 000000002cf0d000 CR4:
0000000000000660
> [29601.932853] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
> [29601.932877] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
> [29601.932900] Process deluged (pid: 2948, threadinfo ffff88002cd94000,
task ffff88002ce4b150)
> [29601.932923] Stack:
> [29601.932934]  ffff880000000036 000000000000fbc6 ffff880000000000
24e008890000fe90
> [29601.932970]  ffff880000000000 0000000000001000 ffff880000000800
ffff880000000001
> [29601.933004]  0000000000000000 ffff88000000002e ffff88002cc98000
ffff880000000006
> [29601.933795] Call Trace:
> [29601.933795]  [<ffffffff818570b9>] dev_hard_start_xmit+0x219/0x480
> [29601.933795]  [<ffffffff81873856>] sch_direct_xmit+0xf6/0x290
> [29601.933795]  [<ffffffff818574c6>] dev_queue_xmit+0x1a6/0x5a0
> [29601.933795]  [<ffffffff81857320>] ?
dev_hard_start_xmit+0x480/0x480
> [29601.933795]  [<ffffffff810af255>] ? trace_softirqs_off+0x85/0x1b0
> [29601.933795]  [<ffffffff818f2596>] ip_finish_output+0x226/0x530
> [29601.933795]  [<ffffffff818f243d>] ? ip_finish_output+0xcd/0x530
> [29601.933795]  [<ffffffff818f28f9>] ip_output+0x59/0xe0
> [29601.933795]  [<ffffffff818f1438>] ip_local_out+0x28/0x90
> [29601.933795]  [<ffffffff818f19df>] ip_queue_xmit+0x17f/0x490
> [29601.933795]  [<ffffffff818f1860>] ?
ip_send_unicast_reply+0x330/0x330
> [29601.933795]  [<ffffffff810a55f7>] ? getnstimeofday+0x47/0xe0
> [29601.933795]  [<ffffffff818471e9>] ? __skb_clone+0x29/0x120
> [29601.933795]  [<ffffffff81907c2d>] tcp_transmit_skb+0x3fd/0x8d0
> [29601.933795]  [<ffffffff8190ac9a>] tcp_write_xmit+0x22a/0xa80
> [29601.933795]  [<ffffffff81137ebe>] ? alloc_pages_current+0xde/0x1c0
> [29601.933795]  [<ffffffff8190b51b>] tcp_push_one+0x2b/0x40
> [29601.933795]  [<ffffffff818fbdc4>] tcp_sendmsg+0x8d4/0xe10
> [29601.933795]  [<ffffffff819225d6>] inet_sendmsg+0xa6/0x100
> [29601.933795]  [<ffffffff81922530>] ? inet_autobind+0x60/0x60
> [29601.933795]  [<ffffffff8183eb52>] sock_sendmsg+0x82/0xb0
> [29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
> [29601.933795]  [<ffffffff81118a34>] ? might_fault+0x84/0x90
> [29601.933795]  [<ffffffff811189eb>] ? might_fault+0x3b/0x90
> [29601.933795]  [<ffffffff8114082b>] ? __kmalloc+0xfb/0x160
> [29601.933795]  [<ffffffff8184c9ad>] ? verify_iovec+0x7d/0xf0
> [29601.933795]  [<ffffffff8183fb73>] __sys_sendmsg+0x393/0x3a0
> [29601.933795]  [<ffffffff819bafc5>] ?
_raw_spin_unlock_irqrestore+0x75/0xa0
> [29601.933795]  [<ffffffff810b58f8>] ? lock_acquire+0xd8/0x100
> [29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
> [29601.933795]  [<ffffffff81168777>] ? fget_light+0xd7/0x140
> [29601.933795]  [<ffffffff811686da>] ? fget_light+0x3a/0x140
> [29601.933795]  [<ffffffff8183fd34>] sys_sendmsg+0x44/0x80
> [29601.933795]  [<ffffffff819bbe29>] system_call_fastpath+0x16/0x1b
> [29601.933795] Code: e9 ca fe ff ff 49 8b b4 24 80 00 00 00 48 89 df 45 31
ed e8 1f 16 20 00 48 3d 00 f0 ff ff 48 89 c7 76 08 e9 17 01 00 00 4c 89 f7
<4c> 8b 37 4c 89 e6 48 c7 07 00 00 00 00 41 ff c5 e8 b7 f3 ff ff
> [29601.933795] RIP  [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
> [29601.933795]  RSP <ffff88002cd95698>
> [29601.933795] CR2: 0000000000000000
> [29602.018741] ---[ end trace 5ec54203e8f81a1b ]---
> [29602.018747] Kernel panic - not syncing: Fatal exception in interrupt

> Which accoring to addr2line is:
segs = segs->>next;
> I have attached the resulting patch
> --
> Sander
>   
>   
> Thanks
> Annie
>   
>   
>   
>   
> On 2013-1-11 18:16, Ian Campbell wrote:
>   
>   
> On Fri, 2013-01-11 at 10:09 +0000, Paul Durrant wrote:
>   
>   
>   
> -----Original Message-----
> Without GSO I don''t think you should be seeing packets larger than
the MTU,
> which would normally be either ~1500 or ~9000 and fit easily within any
> sensible negotiation for the max frags. I don''t think you should
worry unduly
> about this case.
>   
>   
> A stack could still send down a packet with one byte per frag though,
> right? A copy-and-coalesce path would still be needed in this case.
>   
>   
> True. In that case skb_linearise would probably do the job on Linux.
> Ian
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>   
>   
>   
> ===8<===========End of original message text==========

>   
>   
>   
>   
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>   
>   
>

Reasonably Related Threads

Search for more reasonably related threads

Xen devel - Jan 2013 - xennet: skb rides the rocket: 20 slots

xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Re: xennet: skb rides the rocket: 20 slots

Fwd: Re: xennet: skb rides the rocket: 20 slots

Fwd: Re: xennet: skb rides the rocket: 20 slots

Re: Fwd: Re: xennet: skb rides the rocket: 20 slots

Re: Fwd: Re: xennet: skb rides the rocket: 20 slots

Re: Fwd: Re: xennet: skb rides the rocket: 20 slots

Re: Fwd: Re: xennet: skb rides the rocket: 20 slots

Reasonably Related Threads