Hi Xen developers, I had some questions regarding XSA39, and I hope you all can answer them. We''ve been rolling out some hosts with the XSA39 patch applied and have come across a problem where a few of our customer DomUs keep hitting the conditions which call netbk_fatal_tx_err(), mostly "Frag is bigger than frame." These specific DomUs hit it repeatedly, between once every few hours to every few days, and the customers say that they''re sending legitimate traffic (which I''m inclined to believe is true). If the XSA39 protection is this easy to hit under normal circumstances, should the solution really be as harsh as disconnecting the vif? Would it be possible to just drop the packet(s) without causing netback to spin while processing them? FWIW, I have gotten a pcap dump from one of the customers facing this problem. Wireshark does complain about a UDP packet in the sample being too large (65538 bytes, over the max of 65535). Could this packet possibly be the culprit? Unfortunately Wireshark refuses to load in the packet, so I have not yet been able to dissect it. Thanks in advance, -Nick
On Thu, 2013-02-28 at 20:36 +0000, Nick Pegg wrote:> Hi Xen developers, > > I had some questions regarding XSA39, and I hope you all can answer them. > > We''ve been rolling out some hosts with the XSA39 patch applied and have > come across a problem where a few of our customer DomUs keep hitting the > conditions which call netbk_fatal_tx_err(), mostly "Frag is bigger than"mostly" or "all"? As all the reports I''ve seen so far are "Frag is bigger than frame", so it would be interesting to know whether there''s any other error as well. We''re investigating whether this is a genuine bug in Xen netfront / netback implementation, or it is something in kernel generating bogus packet. The solution to this issue may different depending on various factors. Wei.
Nick Pegg writes ("[Xen-devel] Questions about XSA39"):> I had some questions regarding XSA39, and I hope you all can answer them. > > We''ve been rolling out some hosts with the XSA39 patch applied and have > come across a problem where a few of our customer DomUs keep hitting the > conditions which call netbk_fatal_tx_err(), mostly "Frag is bigger than > frame." These specific DomUs hit it repeatedly, between once every few > hours to every few days, and the customers say that they''re sending > legitimate traffic (which I''m inclined to believe is true). > > If the XSA39 protection is this easy to hit under normal circumstances, > should the solution really be as harsh as disconnecting the vif? Would > it be possible to just drop the packet(s) without causing netback to > spin while processing them? > > FWIW, I have gotten a pcap dump from one of the customers facing this > problem. Wireshark does complain about a UDP packet in the sample being > too large (65538 bytes, over the max of 65535). Could this packet > possibly be the culprit? Unfortunately Wireshark refuses to load in the > packet, so I have not yet been able to dissect it.CCing the authors/reviewers of the XSA-39 patch. Ian.
On 3/1/13 6:52 AM, Wei Liu wrote:> On Thu, 2013-02-28 at 20:36 +0000, Nick Pegg wrote: >> Hi Xen developers, >> >> I had some questions regarding XSA39, and I hope you all can answer them. >> >> We''ve been rolling out some hosts with the XSA39 patch applied and have >> come across a problem where a few of our customer DomUs keep hitting the >> conditions which call netbk_fatal_tx_err(), mostly "Frag is bigger than > > "mostly" or "all"? As all the reports I''ve seen so far are "Frag is > bigger than frame", so it would be interesting to know whether there''s > any other error as well. > > We''re investigating whether this is a genuine bug in Xen netfront / > netback implementation, or it is something in kernel generating bogus > packet. The solution to this issue may different depending on various > factors. > > > Wei. >After digging through our recent logs, we had four occurrences of "Too many frags." All others (>85%) were "Frag is bigger than frame." -Nick
Reasonably Related Threads
- netback Oops then xenwatch stuck in D state
- [PATCH net-next 2/2] xen-netback: avoid allocating variable size array on stack
- [PATCH 1/4] xen/netback: shutdown the ring if it contains garbage.
- xennet: skb rides the rocket: 20 slots
- SKB paged fragment lifecycle on receive