Hi, I encountered a strange issue, the xennet interface in DomU stopped sending out anything in some rare cases. I got chance to get more information on one reproduce, here are some findings. The netfront driver check there isn''t available TX slot any more, and it stopped the TX queue. static inline int netfront_tx_slot_available(struct netfront_info *np) { return ((np->tx.req_prod_pvt - np->tx.rsp_cons) < (TX_MAX_TARGET - MAX_SKB_FRAGS - 2)); } Here is some runtime debugging information after that issue occurred: [3833225.489956] tx.req_prod_pvt=0x210daaa tx.rsp_cons=0x210d9be [3833225.489958] TX_MAX_TARGET = 0x100, MAX_SKB_FRAGS = 0x12 dev->state=0x7 [3833225.489961] np->tx.sring->rsp_prod = 0x210d9be np->tx.sring->req_prod=0x210daaa [3833225.489964] np->tx.sring->req_event=0x210d9bf np->tx.sring->rsp_event=0x210da35 The "dev->state" of xennet interface in DomU: [3833225.489968] __LINK_STATE_XOFF: yes [3833225.489970] __LINK_STATE_START: yes [3833225.489971] __LINK_STATE_PRESENT: yes [3833225.489973] __LINK_STATE_SCHED: no [3833225.489975] __LINK_STATE_NOCARRIER: no [3833225.489976] __LINK_STATE_RX_SCHED: no [3833225.489978] __LINK_STATE_LINKWATCH_PENDING: no [3833225.489979] __LINK_STATE_DORMANT: no [3833225.489981] __LINK_STATE_QDISC_RUNNING: no Due to tx.rsp_cons == np->tx.sring->rsp_prod == 0x210d9be, the network_tx_buf_gc() will do nothing. The problem is, the TX queue will never been enable any more. Could anybody helps to understand this, any inputs are appreciated. The platform information: Xen: 3.4.2 x86_64 8GB MEM + 8 CPU cores. Dom0: 2.6.28.8(xenified kernel) x86 1 GB MEM + 1 CPU cores DomU: 2.6.28.8(xenified kernel) x86 3 GB MEM + 2 CPU cores <there are other two DomUs running without any problems> Thanks, -Shunli Protected by Websense Hosted Email Security -- www.websense.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 2012-10-31 at 03:47 +0000, Yi, Shunli wrote:> > Dom0: 2.6.28.8(xenified kernel) x86 1 GB MEM + 1 CPU > cores > > DomU: 2.6.28.8(xenified kernel) x86 3 GB MEM + 2 CPU > coresThose are both ancient and AFAIK not supported by any distro. I strongly recommend you update to something more recent -- either a kernel supported your distro or an up to date mainine version. I see a commit in mainline "xen-netfront: correct MAX_TX_TARGET calculation" which might be related. Ian.
Ian, Thanks for your information. And sorry for writing a wrong kernel version by mistake, we are using 2.6.18.8, which was downloaded from Xen.org when got Xen-3.4.2 release. I''ve seen that patch, just don''t think it can impacts this. (http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/ddb83aec5afc ) You can see the " TX_MAX_TARGET = 0x100" in the log below, it''s 256 already. Actually, we are doubting it''s an overflow issue, I''m reading the netfront and netback to find the possibility of overflow. From one occurrence, I got the details from both netfront and netback. Wish it can help us to find something: On backend interface: [3357955.991282] rx.rsp_prod_pvt=0xc11734a0 rx.req_cons=0xf4d0eb80 [3357955.991283] tx.rsp_prod_pvt=0xc0af64 tx.req_cons=0xc0af64 On frontend interface: [3363909.017567] tx.req_prod_pvt=0x12ef8f3 tx.rsp_cons=0x12ef807 [3363909.017569] TX_MAX_TARGET = 0x100, MAX_SKB_FRAGS = 0x12 dev->state=0x7 [3363909.017572] np->tx.sring->rsp_prod = 0x12ef807 np->tx.sring->req_prod=0x12ef8f3 [3363909.017575] np->tx.sring->req_event=0x12ef800 np->tx.sring->rsp_event=0x12ef87e We have some rack servers in product encountered this rarely, no way to reproduce in lab now. I''ve setup two rack servers to reproduce this in lab, both run about 20 days without reproducing. Could somebody share some experience on this ? Any sharing would be appreciated. Great thanks. -Shunli -----Original Message----- From: Ian Campbell [mailto:ian.campbell@citrix.com] Sent: Wednesday, October 31, 2012 5:13 PM To: Yi, Shunli Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] Xennet half die---netfront TX queue was stopped. On Wed, 2012-10-31 at 03:47 +0000, Yi, Shunli wrote:> > Dom0: 2.6.28.8(xenified kernel) x86 1 GB MEM + 1 CPU > cores > > DomU: 2.6.28.8(xenified kernel) x86 3 GB MEM + 2 CPU > coresThose are both ancient and AFAIK not supported by any distro. I strongly recommend you update to something more recent -- either a kernel supported your distro or an up to date mainine version. I see a commit in mainline "xen-netfront: correct MAX_TX_TARGET calculation" which might be related. Ian. To report this as spam, please forward to spam@websense.com. Thank you. Protected by Websense Hosted Email Security -- www.websense.com
On Wed, 2012-10-31 at 10:16 +0000, Yi, Shunli wrote:> Ian, > Thanks for your information. > And sorry for writing a wrong kernel version by mistake, we are using > 2.6.18.8,2.6.18.8 is even more ancient.> which was downloaded from Xen.org when got Xen-3.4.2 release.Well, 3.4.2 is also pretty old too. In general there is no requirement to use the kernel which is supplied with a given version of Xen (we don''t even supply one any more). So there is no real reason to stick with the 2.6.18 that happened to come with 3.4.2. You should upgrade at least your kernel as I suggested. Ian.
>> Ian, >> Thanks for your information. >> And sorry for writing a wrong kernel version by mistake, we are using >> 2.6.18.8,>2.6.18.8 is even more ancient.>> which was downloaded from Xen.org when got Xen-3.4.2 release.>Well, 3.4.2 is also pretty old too.>In general there is no requirement to use the kernel which is supplied >with a given version of Xen (we don''t even supply one any more). So >there is no real reason to stick with the 2.6.18 that happened to come >with 3.4.2.>You should upgrade at least your kernel as I suggested.Yes, I agree and we are planning to migrate to the newer version(include the Xen and kernel). But before that, I''m still trying to find a quick fix for that. Thanks for your time . ^_^ -Shunli Protected by Websense Hosted Email Security -- www.websense.com