Hi All! We now have a small and growing group of customers running on Xen-hosted machines -- Chris Clarke (in the Cc:) was the first, a few months ago, under Xen 1.0 (would that make him the first commercial Xenoserver customer?). We switched to 1.2 in mid-February. Other than the following, the only recent issues are related to working out the bugs and features in my own controller code, which I owe you another copy of. But we have seen a recurring issue where a few domains hang for no readily apparent reason, don''t respond to ''xc_dom_control.py shutdown'', but do respond to ''xc_dom_control.py destroy''. I usually see alternating "NFS server not found" and "NFS server OK" messages on the domain 0 console around the time that a guest on that node hangs. When this happens, it seems to usually be associated with someone running something I/O intensive like ''rsync'' or ''apt-get'' in the guest domain. Right now I''m running all swap partitions in VBD''s, and the root partitions are all on a central NFS server so that: - I can mirror them and back them up. - We can migrate guests between nodes by assigning a guest to a different node -- right now that''s implemented via shutdown/reboot. - We can recover from hardware failure in a couple of minutes, just by assigning a guest to a different node. But when researching this problem I noted a message from Ian (18 Mar 2004) Linux saying: We''ve seen some weird hangs under extreme conditions with NFS root, but we can reproduce these on stock Linux :-( Ian, do these symptoms sound like this is what we''re hitting? Until I can reliably reproduce the problem myself, I''m going to assume this is the case. What are other people doing to meet those requirements of backups, migration, and failover? How is the live migration code? The copy-on-write NFSd, or COW VBD''s? Any other backup or mirroring code added to VBD''s lately? Other alternatives (ENBD etc.) that anyone knows from experience to be production-quality? Here''s what I''m going to have to do unless I hear otherwise: - Try moving the NFS server to the Xen server node itself. This will provide better bandwidth and latency versus the 100Mb switch we''re going through now. I don''t know if that will help. I will need to backup each individual node''s disk then. Each node''s disks will need to be mirrored (who else is using md raid 1 for DOM0''s root partition?) And we won''t be able to cleanly migrate guests between nodes. No hardware failover either. Grrr. - If that doesn''t work, then I''ll need to migrate each root into a Xen virtual block device on the node (right now only swap is there). Then I won''t be able to ensure backups get done myself -- any backups will have to be done from within each guest''s O/S. They can''t be mirrored. And migrating between nodes becomes doubly hard, and can take hours depending on partition size. No hardware failover. Thoughts/suggestions? Steve -- Stephen G. Traugott (KG6HDQ) UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com -- http://Infrastructures.Org ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Jacob Gorm Hansen
2004-May-17 19:23 UTC
RE: [Xen-devel] Xen hangs with NFS root under high loads
There seems to be a problem with packets being lost inside Xen with recent versions of unstable. The NFS code in Linux may react badly to this, but with some loads of the unprivileged domain I am able to get about 1% packet loss with intra-machine traffic, which is probably why it freaks. I don''t think I had these problems with the ~2 months old version of unstable I was running before. /Jacob ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2004-May-17 22:52 UTC
Re: [Xen-devel] Xen hangs with NFS root under high loads
> > There seems to be a problem with packets being lost inside Xen with recent > versions of unstable. The NFS code in Linux may react badly to this, but > with some loads of the unprivileged domain I am able to get about 1% packet > loss with intra-machine traffic, which is probably why it freaks. > > I don''t think I had these problems with the ~2 months old version of > unstable I was running before. > > /JacobAny idea whether this is transmit or receive? Is it only inter-dom traffic? -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Jacob Gorm Hansen
2004-May-18 10:03 UTC
RE: [Xen-devel] Xen hangs with NFS root under high loads
> Any idea whether this is transmit or receive? Is it only inter-dom > traffic?I will try and test it a little more, it appears to be both inter and intra, though. Jacob ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2004-May-18 10:08 UTC
Re: [Xen-devel] Xen hangs with NFS root under high loads
> > > Any idea whether this is transmit or receive? Is it only inter-dom > > traffic? > > I will try and test it a little more, it appears to be both inter and intra, > though. > > JacobIf you make a debug build of Xen ''debug=y make'' then you may well get a message whenever a packet is dropped. -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel