I''ve been running some tests using PV drivers in a linux HVM domain and have been unable to determine why after some period of time the network connection just stops working. I''m using the default bridging setup, I''ve seen this on xen-unstable changeset 15017, and all the way back to 14280. Guest and Host are pae. Any pointers on where to start debugging this? Nothing interesting shows up in xm dmesg, dmesg in the guest, none of the logs, nor any of the networking configuration output. I''ve not be able to recreate this using just PV domains. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Ryan Harper <ryanh@us.ibm.com> [2007-05-09 11:07]:> I''ve been running some tests using PV drivers in a linux HVM domain and > have been unable to determine why after some period of time the network > connection just stops working. I''m using the default bridging setup, > I''ve seen this on xen-unstable changeset 15017, and all the way back to > 14280. Guest and Host are pae. Any pointers on where to start > debugging this? Nothing interesting shows up in xm dmesg, dmesg in the > guest, none of the logs, nor any of the networking configuration output. >If I pause the domain and then unpause, networking comes back. Does this help narrow down where I should be looking for debugging this issue? -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Ryan Harper <ryanh@us.ibm.com> [2007-05-09 12:42]:> * Ryan Harper <ryanh@us.ibm.com> [2007-05-09 11:07]: > > I''ve been running some tests using PV drivers in a linux HVM domain and > > have been unable to determine why after some period of time the network > > connection just stops working. I''m using the default bridging setup, > > I''ve seen this on xen-unstable changeset 15017, and all the way back to > > 14280. Guest and Host are pae. Any pointers on where to start > > debugging this? Nothing interesting shows up in xm dmesg, dmesg in the > > guest, none of the logs, nor any of the networking configuration output. > > > > If I pause the domain and then unpause, networking comes back. Does > this help narrow down where I should be looking for debugging this > issue?Actually, what works more reliably is to ifdown vifX.0; and then ifconfig vifX.0 0, which brings it back up, we get bridge topology state changes, and then network traffic resumes. Using tcpdump, I can see traffic arrive in the domain, but no traffic leaves the guest. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > If I pause the domain and then unpause, networking comes back. Does > > this help narrow down where I should be looking for debugging this > > issue? > > Actually, what works more reliably is to ifdown vifX.0; and then > ifconfig vifX.0 0, which brings it back up, we get bridge topology > state changes, and then network traffic resumes.Presumably taking the guest interface down makes no difference? (Not sure you can unload the module, but have you tried?)> Using tcpdump, I can see traffic arrive in the domain, but no traffic > leaves the guest.So, packets seem to be received by the guest, but if you tcpdump the associated vifX.0 you don''t see anything (whereas a tcpdump in the guest indicates packets are being sent). One way to debug this would be to add a dom0 sysrq key handler to dump the producer consumer pointers, or otherwise export them via sysfs. Does cat /proc/interrupts show rx interrupts on the vif? Ian> -- > Ryan Harper > Software Engineer; Linux Technology Center > IBM Corp., Austin, Tx > (512) 838-9253 T/L: 678-9253 > ryanh@us.ibm.com > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Ian Pratt <Ian.Pratt@cl.cam.ac.uk> [2007-05-09 18:00]:> > > If I pause the domain and then unpause, networking comes back. Does > > > this help narrow down where I should be looking for debugging this > > > issue? > > > > Actually, what works more reliably is to ifdown vifX.0; and then > > ifconfig vifX.0 0, which brings it back up, we get bridge topology > > state changes, and then network traffic resumes. > > Presumably taking the guest interface down makes no difference? (Not > sure you can unload the module, but have you tried?)I tried. It doesn''t completely work, I''ll get the dmesg output again for future reference. Reloading the module didn''t help as it set the device mac add to all nulls.> > > Using tcpdump, I can see traffic arrive in the domain, but no traffic > > leaves the guest. > > So, packets seem to be received by the guest, but if you tcpdump the > associated vifX.0 you don''t see anything (whereas a tcpdump in the guest > indicates packets are being sent).tcpdump on vifX.0 shows traffic on the bridge, arps for the guest ip. tcpdump in the guest showed it getting the arps, but no reply. ie, no outgoing traffic. I''ve worked around this issue by cycling the vif in the host. What I am seeing now is that sometimes the guest just doesn''t seem to be making progress, no cpu time. xm console the guest hangs any new processes don''t seem to execute. For example, I can have a console session connected and watch networking die, cycle the vif, pings start working again, and running ps in the guest just blocks. xm list shows the guest in the block state. At this point, the guest is pretty much dead even though it will continue to process ICMP packets. There isn''t much output in the qemu-dm log file, but I''ll toss that in here to see if it rings any bells: domid: 5 qemu: the number of cpus is 1 Watching /local/domain/5/logdirty/next-active qemu_map_cache_init nr_buckets = 4000 shared page at pfn 1ffff buffered io page at pfn 1fffd Time offset set 0 xs_read(): vncpasswd get error. /vm/73c84d4e-220c-5e88-5cf4-2786f4ce5a44/vncpasswd. char device redirected to /dev/pts/3 I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 Triggered log-dirty buffer switch xs_write(/vm/73c84d4e-220c-5e88-5cf4-2786f4ce5a44/rtc/timeoffset, rtc/timeoffset): write error More details: Host 32-bit pae, guest 32-bit, 1 vcpu, 512M ram I''ve tried running with acpi=0 apic=0, and 1,1 respectively, but no change in behavior.> > One way to debug this would be to add a dom0 sysrq key handler to dump > the producer consumer pointers, or otherwise export them via sysfs. Does > cat /proc/interrupts show rx interrupts on the vif?I''ll give these a spin. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Presumably taking the guest interface down makes no difference? (Not > > sure you can unload the module, but have you tried?) > > I tried. It doesn''t completely work, I''ll get the dmesg output again > for future reference. Reloading the module didn''t help as it set the > device mac add to all nulls.It should be able to read the MAC from xenstore. This must be a bug.> > > Using tcpdump, I can see traffic arrive in the domain, but no > traffic > > > leaves the guest. > > > > So, packets seem to be received by the guest, but if you tcpdump the > > associated vifX.0 you don''t see anything (whereas a tcpdump in the > guest > > indicates packets are being sent). > > tcpdump on vifX.0 shows traffic on the bridge, arps for the guest ip. > tcpdump in the guest showed it getting the arps, but no reply. ie, no > outgoing traffic.Hang on, you mean within the guest you don''t see it sending a reply? If true, that must be a guest issue and its hard to see how doing anything in dom0 will help.> I''ve worked around this issue by cycling the vif in the host. > > What I am seeing now is that sometimes the guest just doesn''t seem to > be making progress, no cpu time. xm console the guest hangs any new > processes don''t seem to execute. For example, I can have a console > session connected and watch networking die, cycle the vif, pings start > working again, and running ps in the guest just blocks. xm list shows > the guest in the block state. At this point, the guest is pretty much > dead even though it will continue to process ICMP packets.That sounds like a symptom of the block devices being wedged. Are you using a PV block device or emulated IDE? Ian> > There isn''t much output in the qemu-dm log file, but I''ll toss that in > here to see if it rings any bells: > > domid: 5 > qemu: the number of cpus is 1 > Watching /local/domain/5/logdirty/next-active > qemu_map_cache_init nr_buckets = 4000 > shared page at pfn 1ffff > buffered io page at pfn 1fffd > Time offset set 0 > xs_read(): vncpasswd get error. /vm/73c84d4e-220c-5e88-5cf4- > 2786f4ce5a44/vncpasswd. > char device redirected to /dev/pts/3 > I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 > Triggered log-dirty buffer switch > xs_write(/vm/73c84d4e-220c-5e88-5cf4-2786f4ce5a44/rtc/timeoffset, > rtc/timeoffset): write error > > > More details: > > Host 32-bit pae, guest 32-bit, 1 vcpu, 512M ram > > I''ve tried running with acpi=0 apic=0, and 1,1 respectively, but no > change in behavior. > > > > > One way to debug this would be to add a dom0 sysrq key handler to > dump > > the producer consumer pointers, or otherwise export them via sysfs. > Does > > cat /proc/interrupts show rx interrupts on the vif? > > I''ll give these a spin. > > > -- > Ryan Harper > Software Engineer; Linux Technology Center > IBM Corp., Austin, Tx > (512) 838-9253 T/L: 678-9253 > ryanh@us.ibm.com_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Ian Pratt <Ian.Pratt@cl.cam.ac.uk> [2007-05-09 18:46]:> > > Presumably taking the guest interface down makes no difference? (Not > > > sure you can unload the module, but have you tried?) > > > > I tried. It doesn''t completely work, I''ll get the dmesg output again > > for future reference. Reloading the module didn''t help as it set the > > device mac add to all nulls. > > It should be able to read the MAC from xenstore. This must be a bug. > > > > > Using tcpdump, I can see traffic arrive in the domain, but no > > traffic > > > > leaves the guest. > > > > > > So, packets seem to be received by the guest, but if you tcpdump the > > > associated vifX.0 you don''t see anything (whereas a tcpdump in the > > guest > > > indicates packets are being sent). > > > > tcpdump on vifX.0 shows traffic on the bridge, arps for the guest ip. > > tcpdump in the guest showed it getting the arps, but no reply. ie, no > > outgoing traffic. > > Hang on, you mean within the guest you don''t see it sending a reply? If > true, that must be a guest issue and its hard to see how doing anything > in dom0 will help.I''m not sure if there the networking is the real problem. The trouble is that in many cases when the networking chokes, the console is hosed as well which makes it rather difficult to capture tcpdump from within. The next time networking is down but console is up, I''ll confirm the tcpdump from the guest.> > > > I''ve worked around this issue by cycling the vif in the host. > > > > What I am seeing now is that sometimes the guest just doesn''t seem to > > be making progress, no cpu time. xm console the guest hangs any new > > processes don''t seem to execute. For example, I can have a console > > session connected and watch networking die, cycle the vif, pings start > > working again, and running ps in the guest just blocks. xm list shows > > the guest in the block state. At this point, the guest is pretty much > > dead even though it will continue to process ICMP packets. > > That sounds like a symptom of the block devices being wedged. Are you > using a PV block device or emulated IDE?using PV block. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel