Mariusz Mazur
2010-Mar-15 11:07 UTC
[Xen-users] domU network interface half-dies regularly
I''m trying to figure out how to debug this. Any suggestions would be appreciated. Every once in a while a random domU on a random xen server of ours has its network interface die. I''ve recently figured out what the exact symptoms are: TX count on that interface (as seen from inside the domU) stops increasing. There''s no way of actually sending anything from within the domU. Even arp packets aren''t sent. Everything works fine with receiving packets however. Of the things I did check: - Doing an ip set link down/up on both dom0/domU doesn''t do anything. - Removing/reattaching the dom0 interface from/to its bridge doesn''t help. - It''s interface-specific. I''m currently logged onto a domU that has one of its net interfaces half-dead as described, but the other perfectly functional. - Interestingly, the problem prevents "xm save" from working. It timeouts without anything getting written to disk (except a kilobyte or so of, I''m guessing, some headers). - I''m seeing this problem across: - 2.6.18 xen.org dom0 3.3.X and 3.4.X - xen.org hypervisor 3.3.X and 3.4.X - domU xen.org 2.6.18.8_xen3.3.0U - kernel.org 2.6.29.6 (pvops) - A few different machines from different vendors. - Nothing in dom0/domU kernel logs. Whatever the cause is, I seriously doubt it''s domU''s fault, considering I''m seeing the problem on both xen.org and kernel.org domU kernels. I also don''t know what the trigger is (plus, those are production systems), so enabling a bunch of DEBUG prints in xen isn''t much of an option. Any suggestions/hints on where to look next? I''m guessing there are ways of inspecting various network code structures. --mmazur _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Dingwall
2010-Mar-15 12:57 UTC
RE: [Xen-users] domU network interface half-dies regularly
> I''m trying to figure out how to debug this. Any suggestions would be > appreciated. > > Every once in a while a random domU on a random xen server of ours has > its > network interface die. I''ve recently figured out what the exact > symptoms are: > TX count on that interface (as seen from inside the domU) stops > increasing. > There''s no way of actually sending anything from within the domU. Even > arp > packets aren''t sent. Everything works fine with receiving packets > however. > > Of the things I did check: > - Doing an ip set link down/up on both dom0/domU doesn''t do anything. > - Removing/reattaching the dom0 interface from/to its bridge doesn''t > help. > - It''s interface-specific. I''m currently logged onto a domU that has > one of > its net interfaces half-dead as described, but the other perfectly > functional. > - Interestingly, the problem prevents "xm save" from working. It > timeouts > without anything getting written to disk (except a kilobyte or so of, > I''m > guessing, some headers). > - I''m seeing this problem across: > - 2.6.18 xen.org dom0 3.3.X and 3.4.X > - xen.org hypervisor 3.3.X and 3.4.X > - domU xen.org 2.6.18.8_xen3.3.0U > - kernel.org 2.6.29.6 (pvops) > - A few different machines from different vendors. > - Nothing in dom0/domU kernel logs. > > Whatever the cause is, I seriously doubt it''s domU''s fault, considering > I''m > seeing the problem on both xen.org and kernel.org domU kernels. I also > don''t > know what the trigger is (plus, those are production systems), so > enabling a > bunch of DEBUG prints in xen isn''t much of an option. > > Any suggestions/hints on where to look next? I''m guessing there are > ways of > inspecting various network code structures. >I was experiencing some network issues in the past but typically they affected all domU at once and not individually. I managed to solve my problems with these config options in the kernel: # CONFIG_XEN_COMPAT_030002_AND_LATER is not set # CONFIG_XEN_COMPAT_030004_AND_LATER is not set # CONFIG_XEN_COMPAT_030100_AND_LATER is not set # CONFIG_XEN_COMPAT_030200_AND_LATER is not set # CONFIG_XEN_COMPAT_030300_AND_LATER is not set CONFIG_XEN_COMPAT_LATEST_ONLY=y (Was CONFIG_XEN_COMPAT_030002_AND_LATER=y) CONFIG_XEN_NETDEV_TX_SHIFT=10 (Was CONFIG_XEN_NETDEV_TX_SHIFT=8) I think it is actually the second that really made the difference as it looks like it controls the size of some buffer and it seems that heavy net traffic caused it to fill and break things. James This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Daniel De Marco
2010-Mar-16 17:20 UTC
Re: [Xen-users] domU network interface half-dies regularly
I had the same problem yesterday. One of the domU running on a server had the same symptoms: TX counter stopped while the RX one was increasing normally. I''m running Centos 5.3 with 2.6.18-92.1.22.el5xen on the dom0 and CentOS 5.2 with 2.6.18-92.1.22.el5xen on the domU. Rebooting the domU solvs the problem, but it isn''t an attractive solution... Daniel. * Mariusz Mazur <mmazur@kernel.pl> [03/15/2010 07:10]:> I''m trying to figure out how to debug this. Any suggestions would be > appreciated. > > Every once in a while a random domU on a random xen server of ours has its > network interface die. I''ve recently figured out what the exact symptoms are: > TX count on that interface (as seen from inside the domU) stops increasing. > There''s no way of actually sending anything from within the domU. Even arp > packets aren''t sent. Everything works fine with receiving packets however. > > Of the things I did check: > - Doing an ip set link down/up on both dom0/domU doesn''t do anything. > - Removing/reattaching the dom0 interface from/to its bridge doesn''t help. > - It''s interface-specific. I''m currently logged onto a domU that has one of > its net interfaces half-dead as described, but the other perfectly functional. > - Interestingly, the problem prevents "xm save" from working. It timeouts > without anything getting written to disk (except a kilobyte or so of, I''m > guessing, some headers). > - I''m seeing this problem across: > - 2.6.18 xen.org dom0 3.3.X and 3.4.X > - xen.org hypervisor 3.3.X and 3.4.X > - domU xen.org 2.6.18.8_xen3.3.0U > - kernel.org 2.6.29.6 (pvops) > - A few different machines from different vendors. > - Nothing in dom0/domU kernel logs. > > Whatever the cause is, I seriously doubt it''s domU''s fault, considering I''m > seeing the problem on both xen.org and kernel.org domU kernels. I also don''t > know what the trigger is (plus, those are production systems), so enabling a > bunch of DEBUG prints in xen isn''t much of an option. > > Any suggestions/hints on where to look next? I''m guessing there are ways of > inspecting various network code structures. > > --mmazur > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Daniel De Marco
2010-Mar-16 17:21 UTC
Re: [Xen-users] domU network interface half-dies regularly
sorry, the domU is running Centos 5.3 with 2.6.18-128.1.10.el5xen Daniel. * Daniel De Marco <ddm@bartol.udel.edu> [03/16/2010 13:20]:> I had the same problem yesterday. One of the domU running on a server > had the same symptoms: TX counter stopped while the RX one was > increasing normally. > I''m running Centos 5.3 with 2.6.18-92.1.22.el5xen on the dom0 and CentOS > 5.2 with 2.6.18-92.1.22.el5xen on the domU. > > Rebooting the domU solvs the problem, but it isn''t an attractive > solution... > > Daniel. > > * Mariusz Mazur <mmazur@kernel.pl> [03/15/2010 07:10]: > > I''m trying to figure out how to debug this. Any suggestions would be > > appreciated. > > > > Every once in a while a random domU on a random xen server of ours has its > > network interface die. I''ve recently figured out what the exact symptoms are: > > TX count on that interface (as seen from inside the domU) stops increasing. > > There''s no way of actually sending anything from within the domU. Even arp > > packets aren''t sent. Everything works fine with receiving packets however. > > > > Of the things I did check: > > - Doing an ip set link down/up on both dom0/domU doesn''t do anything. > > - Removing/reattaching the dom0 interface from/to its bridge doesn''t help. > > - It''s interface-specific. I''m currently logged onto a domU that has one of > > its net interfaces half-dead as described, but the other perfectly functional. > > - Interestingly, the problem prevents "xm save" from working. It timeouts > > without anything getting written to disk (except a kilobyte or so of, I''m > > guessing, some headers). > > - I''m seeing this problem across: > > - 2.6.18 xen.org dom0 3.3.X and 3.4.X > > - xen.org hypervisor 3.3.X and 3.4.X > > - domU xen.org 2.6.18.8_xen3.3.0U > > - kernel.org 2.6.29.6 (pvops) > > - A few different machines from different vendors. > > - Nothing in dom0/domU kernel logs. > > > > Whatever the cause is, I seriously doubt it''s domU''s fault, considering I''m > > seeing the problem on both xen.org and kernel.org domU kernels. I also don''t > > know what the trigger is (plus, those are production systems), so enabling a > > bunch of DEBUG prints in xen isn''t much of an option. > > > > Any suggestions/hints on where to look next? I''m guessing there are ways of > > inspecting various network code structures. > > > > --mmazur > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@lists.xensource.com > > http://lists.xensource.com/xen-users > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mariusz Mazur
2010-Mar-17 12:21 UTC
Re: [Xen-users] domU network interface half-dies regularly
On Tuesday 16 of March 2010, Daniel De Marco wrote:> > I had the same problem yesterday. One of the domU running on a server > had the same symptoms: TX counter stopped while the RX one was > increasing normally. > I''m running Centos 5.3 with 2.6.18-92.1.22.el5xen on the dom0 and CentOS > 5.2 with 2.6.18-92.1.22.el5xen on the domU. > > Rebooting the domU solvs the problem, but it isn''t an attractive > solution...Is it repeatable? I''m trying to debug and fix this, but the fact that I don''t know how to trigger it and have to wait for it to appear all by itself doesn''t help. --mmazur _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Daniel De Marco
2010-Mar-17 13:21 UTC
Re: [Xen-users] domU network interface half-dies regularly
* Mariusz Mazur <mmazur@kernel.pl> [03/17/2010 08:21]:> Is it repeatable? I''m trying to debug and fix this, but the fact that I don''t > know how to trigger it and have to wait for it to appear all by itself doesn''t > help.It happened to me just once on a domU that had an uptime of ~110 days. The other domUs on the same dom0 had the same uptime, but they didn''t have the problem. Daniel. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users