I have been using Xen for over a year now. For the most part I have had very good success with it and we are now working on rolling it out throughout my company. But I just ran across something really annoying and dangerous. When I first started playing with xen I read all of the docs I could find and at that time I am pretty sure xen did not automatically save domains when the machine was shut down. Later on I noticed that it was trying to do so but was failing because the directory to save to did not exist on my machine for some reason (was not created during the install). After that I completely forgot about this behavior. A month or two ago I upgraded to Xen 3.0 from mercurial (I don''t have the sources around anymore and I don''t see how to get xen to tell me its exact version) and it seems that domain saving on shutdown is now working. Great. I recently had some unrelated system problems which caused me to need to shut down, boot from a rescue disk, and mount the logical volume normally used by my mail server and do quite a bit of work on it. Once done I booted the system normally, xen started the mail domain, and all kinds of weird stuff started happening related to the filesystem. I shut down the domain, did an fsck of the mail server logical volume, and found thousands of errors. Then I realized what had happened. The xen domain was saving state to the disk including internal buffers and who knows what that were not synch''d to the disk. So I mounted a very dirty filesystem, made a bunch of changes, then the mail server domain came back up expecting the fs to be in the same state it was left in and proceeded as if everything were normal which ended up causing massive corruption and many lost emails. Fortunately this is on a dev machine which hosts a bunch of personal domains and other stuff and not business critical things. But it is still highly annoying. I recommend that whenever Xen saves a domain that the domain somehow sync the filesystem state to disk. Ideally the fs would even be marked clean so that if someone needs to mount the fs while the domain is not running such as I did they can. There really needs to be a way for a xen domain, upon being started, to know that the fs is in a sane and consistent state just as it was when it was saved. Ensuring that only filesystems marked clean are left after a save and mounted upon restart is one way to do that. Or is there some sort of time stamp such as a last mount time in the fs that the domain can look at and save with the domain state and make sure that the last mount time has not changed when the domain is restarted? I realize that most of these things are filesystem/OS specific. It would be really nice to have a general solution to this. I think something needs to be done because the current situation seems quite dangerous. For now I have disabled the saving/restarting of domains and will do so on all of our production systems also. It''s a risk I just can''t take. I mentioned this to someone on the IRC channel and they said "That is documented behavior." Unfortunately that doesn''t bring back my data. It wasn''t documented when I started using Xen and I can''t possibly keep up on everything written about Xen in the meantime. -- Tracy R Reed http://ultraviolet.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25 Jun 2006, at 01:32, Tracy R Reed wrote:> I mentioned this to someone on the IRC channel and they said "That is > documented behavior." Unfortunately that doesn''t bring back my data. It > wasn''t documented when I started using Xen and I can''t possibly keep up > on everything written about Xen in the meantime.I''m not sure if the behaviour is documented, but it certainly isn''t new. Save/restore has always behaved like that -- a filesystem should be considered ''locked down'' by a guest except when the guest OS is shut down cleanly. No interlock is enforced or metadata maintained for this in open source tools. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Jun-26 21:13 UTC
Re: [Xen-devel] Domain saving and filesystem corruption
Keir Fraser wrote:> > On 25 Jun 2006, at 01:32, Tracy R Reed wrote: > >> I mentioned this to someone on the IRC channel and they said "That is >> documented behavior." Unfortunately that doesn''t bring back my data. It >> wasn''t documented when I started using Xen and I can''t possibly keep up >> on everything written about Xen in the meantime. > > I''m not sure if the behaviour is documented, but it certainly isn''t > new. Save/restore has always behaved like that -- a filesystem should > be considered ''locked down'' by a guest except when the guest OS is > shut down cleanly. No interlock is enforced or metadata maintained for > this in open source tools.You really ought to avoid save/restore/migrate when not using network or checkpointable storage. You will almost certainly eventually get some sort of corruption. I didn''t realize xend actually tries to save domains on shutdown. Seems like a bad idea to me. Is this correct? Is this only for domains started with /etc/init.d/xendomains? Regards, Anthony Liguori> -- Keir > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tracy R Reed
2006-Jun-26 22:05 UTC
Re: [Xen-devel] Domain saving and filesystem corruption
Anthony Liguori wrote:> You really ought to avoid save/restore/migrate when not using network or > checkpointable storage. You will almost certainly eventually get some > sort of corruption.No doubt. Thing is, I didn''t realize it was doing this. The machine so rarely gets rebooted that I never noticed it saving out the state of the domains to disk. I am impressed with how fast it does it though.> I didn''t realize xend actually tries to save domains on shutdown. Seems > like a bad idea to me. Is this correct? Is this only for domains > started with /etc/init.d/xendomains?On RedHat (I run FC5 in my domain0 and CentOS 4.3 in my domains) you can look in /etc/sysconfig/xendomains to see how this all works. It looks like by default it will try to save the state of all domains unless you set XENDOMAINS_AUTO_ONLY to true. It is set to false by default. One odd thing I see is this: # Directory to save running domains to when the system (dom0) is # shut down. Will also be used to restore domains from if # XENDOMAINS_RESTORE # is set (see below). Leave empty to disable domain saving on shutdown # (e.g. because you rather shut domains down). # If domain saving does succeed, SHUTDOWN will not be executed. # #XENDOMAINS_SAVE=/var/lib/xen/save So XENDOMAINS_SAVE is commented out by default. So it should be "". So why are the domains being saved? It looks like it should not have defaulted to trying to save all of the domains but it should have skipped saving them since XENDOMAINS_SAVE is not defined and it should have executed the commands in XENDOMAINS_SHUTDOWN. I am not in front of my Xen console right now where I can play with this but I will try to look into it tonight when I am. -- Tracy R Reed http://ultraviolet.org A: Because we read from top to bottom, left to right Q: Why should I start my reply below the quoted text _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Eric Peterson
2006-Jun-27 02:45 UTC
Re: [Xen-devel] Domain saving and filesystem corruption
On 6/26/06, Tracy R Reed <treed@ultraviolet.org> wrote:> One odd thing I see is this: > > # Directory to save running domains to when the system (dom0) is > # shut down. Will also be used to restore domains from if > # XENDOMAINS_RESTORE > # is set (see below). Leave empty to disable domain saving on shutdown > # (e.g. because you rather shut domains down). > # If domain saving does succeed, SHUTDOWN will not be executed. > # > #XENDOMAINS_SAVE=/var/lib/xen/save > > So XENDOMAINS_SAVE is commented out by default. So it should be "". SoI believe this is just a place holder to indicate the default value that is used in the code. The comment block indicates that you would need to have something like this to disable it: XENDOMAINS_SAVE="" That''s how I interpret code such as this. I may be wrong. -Eric _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel