I''m writing a bash script to do a block-level backup of my LVM backed xen domUs from the current xen-server to a backup xen-server. I''m using LVM snapshots in this script, and I''m aware that there is a risk when using snapshots that uncommited data may be lost. However, as I understand it, when saving a xen domain to restore later, a checkpoint file is created that contains a copy of everything in the domU''s RAM so that the domU can be restored later. The idea behind my script is to pause the domU and save a checkpoint file, then create LVM snapshots of the important volumes, then restore the domU. This would allow me to restore the domU at a later time on another system without losing any data (because uncommitted data would be in RAM). I understand that this will cause some downtime, but before I thought of this, I was being asked to shut down the domU completely before creating the lvm snapshots. Anyway, I would like to open this up for comments and criticism. Here is what I''m doing. I cut out all the boring comments and error checking. I think the variable names should be pretty self-explanatory, but if anyone wants me to, I can post the whole thing. xm save $XENNAME $CHECKPATH lvcreate --snapshot --size $LVPATH_SIZE --name $LVPATH_SNAP $LVPATH lvcreate --snapshot --size $LVPATH_SWAP_SIZE --name $LVPATH_SWAP_SNAP $LVPATH_SWAP xm restore $CHECKPATH blocksync.py $LVPATH_SWAP_SNAP $BACKUPSERVER $LVPATH_SWAP lvremove -f $LVPATH_SWAP_SNAP blocksync.py $LVPATH_SNAP $BACKUPSERVER $LVPATH lvremove -f $LVPATH_SNAP rsync -a $CHECKPATH $BACKUPSERVER:$CHECKPATH rsync -a $CONFPATH $BACKUPSERVER:$CONFPATH Oh yeah, this uses the blocksync.py script that I got from <http://www.bouncybouncy.net/ramblings/posts/xen_live_migration_without_shared_storage/>. It simply syncs block devices between two computers. This means that the first backup will be pretty slow and network intensive, but subsequent backups should be pretty snappy. Let me know what you think or if there is something I''ve overlooked. Thanks. -- Agent Rooker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> The idea behind my script is to pause the domU and save a checkpoint > file, then create LVM snapshots of the important volumes, then restore > the domU. This would allow me to restore the domU at a later time on > another system without losing any data (because uncommitted data would > be in RAM). I understand that this will cause some downtime, but > before I thought of this, I was being asked to shut down the domU > completely before creating the lvm snapshots. Anyway, I would like to > open this up for comments and criticism.I had the same idea, however the general consensus on the list seems to be that its not safe. I was never able to get a clear answer as to how saving the checkpoints did not solve the majority of those issues. The one case where I figured there would be a window for corruption was if information was coming over the wire that was being written. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Oct 28, 2008 at 1:10 PM, Nick Anderson <nick@anders0n.net> wrote:>> The idea behind my script is to pause the domU and save a checkpoint >> file, then create LVM snapshots of the important volumes, then restore >> the domU. This would allow me to restore the domU at a later time on >> another system without losing any data (because uncommitted data would >> be in RAM). I understand that this will cause some downtime, but >> before I thought of this, I was being asked to shut down the domU >> completely before creating the lvm snapshots. Anyway, I would like to >> open this up for comments and criticism. > > I had the same idea, however the general consensus on the list seems > to be that its not safe. I was never able to get a clear answer as to > how saving the checkpoints did not solve the majority of those issues. > The one case where I figured there would be a window for corruption > was if information was coming over the wire that was being written. >I should also mention that we''re also doing domU level backups with NetBackup. So in case something bad happens, the worst thing that could happen is that we would have to rebuild the system and restore from one of the backup tapes. The idea behind this script is to have a quick and easy failover option in case something bad happens to the main xen server. But if someone could make clear the dangers and pitfalls associated with this plan, that would be very helpful. -- Agent Rooker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tuesday 28 October 2008 04:04:39 pm Agent Rooker wrote:> But if someone could make clear the dangers and pitfalls associated > with this plan, that would be very helpful.Yeah, pausing a domain does nothing to its disks from the domU''s OS''s point of view. A pause freezes everything in memory, but commits nothing that''s in domU''s filesystem buffers and whatnot to disk -- the whole point is to not disrupt the OS. If you block-level back up a paused domain, the restored filesystem will be corrupted. Given your existing backup from within the domU, I don''t believe you''re gaining anything here. John -- John Madden Sr. UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Oct 28, 2008 at 3:22 PM, John Madden <jmadden@ivytech.edu> wrote:> On Tuesday 28 October 2008 04:04:39 pm Agent Rooker wrote: >> But if someone could make clear the dangers and pitfalls associated >> with this plan, that would be very helpful. > > Yeah, pausing a domain does nothing to its disks from the domU''s OS''s point of > view. A pause freezes everything in memory, but commits nothing that''s in > domU''s filesystem buffers and whatnot to disk -- the whole point is to not > disrupt the OS. If you block-level back up a paused domain, the restored > filesystem will be corrupted. Given your existing backup from within the > domU, I don''t believe you''re gaining anything here.the point of pausing the DomU is to get hold of a snapshot of memory state, as well as the block device(s). when restoring, the DomU would return to the ''same'' moment it was when paused. both the disk and CPU would get back in time. as mentioned, any dangers would be those related with ''external'' state: network connections, hardware clock, and such. i''d guess that a mostly autonomous server should survive this kind of snapshotting, and go back to work; but if it depends on other systems, losing so many connections at once might (shoud?) trigger a reboot. or at least a service restart. doesn''t sound like something worthy to replace real backups; but might buy you much lower restore times, if you have the capacity to do both. -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > Yeah, pausing a domain does nothing to its disks from the domU''s OS''s > > point of view. A pause freezes everything in memory, but commits nothing > > that''s in domU''s filesystem buffers and whatnot to disk -- the whole > > point is to not disrupt the OS. If you block-level back up a paused > > domain, the restored filesystem will be corrupted. Given your existing > > backup from within the domU, I don''t believe you''re gaining anything > > here. > > the point of pausing the DomU is to get hold of a snapshot of memory > state, as well as the block device(s). when restoring, the DomU would > return to the ''same'' moment it was when paused. both the disk and CPU > would get back in time....And? ...If his backup of domU happens while paused, he''s still got an incomplete filesystem. Why not sent a `kill -STOP` to your mysqld, back up its files, restore the files somewhere else, and try to bring up mysql cleanly? This is essentially what you''re doing with a domU pause. John -- John Madden Sr. UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Oct 28, 2008 at 3:34 PM, Javier Guerra <javier@guerrag.com> wrote:> On Tue, Oct 28, 2008 at 3:22 PM, John Madden <jmadden@ivytech.edu> wrote: >> On Tuesday 28 October 2008 04:04:39 pm Agent Rooker wrote: >>> But if someone could make clear the dangers and pitfalls associated >>> with this plan, that would be very helpful. >> >> Yeah, pausing a domain does nothing to its disks from the domU''s OS''s point of >> view. A pause freezes everything in memory, but commits nothing that''s in >> domU''s filesystem buffers and whatnot to disk -- the whole point is to not >> disrupt the OS. If you block-level back up a paused domain, the restored >> filesystem will be corrupted. Given your existing backup from within the >> domU, I don''t believe you''re gaining anything here. > > the point of pausing the DomU is to get hold of a snapshot of memory > state, as well as the block device(s). when restoring, the DomU would > return to the ''same'' moment it was when paused. both the disk and CPU > would get back in time. > > as mentioned, any dangers would be those related with ''external'' > state: network connections, hardware clock, and such. i''d guess that > a mostly autonomous server should survive this kind of snapshotting, > and go back to work; but if it depends on other systems, losing so > many connections at once might (shoud?) trigger a reboot. or at least > a service restart. > > doesn''t sound like something worthy to replace real backups; but might > buy you much lower restore times, if you have the capacity to do both. > > -- > Javier >That''s what I''m thinking, anyway. As I said in a previous message in this thread, "...we''re also doing domU level backups with NetBackup." So in case my plan totally backfires, we''re no worse off than we would be otherwise. If we do end up needing to reboot the domU after restoring it on the other xen server, that''s still a lot faster than rebuilding the server and then restoring from the tape files. We have proper backups going at a different time each night as well, this is really more to reduce the downtime we would face in the case of a hardware failure with the xen-server. We should really be using a network area storage cluster and a HA xen cluster to provide the best availability and reliability, but until I can convince the department to expense for that, this will have to do for now. An alternative solution would be to bring the domUs down for a cold block-level backup each night, but that is just a little more downtime than I would like. -- Agent Rooker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Oct 28, 2008 at 3:44 PM, John Madden <jmadden@ivytech.edu> wrote:>> the point of pausing the DomU is to get hold of a snapshot of memory >> state, as well as the block device(s). when restoring, the DomU would >> return to the ''same'' moment it was when paused. both the disk and CPU >> would get back in time. > > ...And? ...If his backup of domU happens while paused, he''s still got an > incomplete filesystem.the ''missing'' parts are in the RAM backup. -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Oct 28, 2008 at 3:54 PM, Javier Guerra <javier@guerrag.com> wrote:> On Tue, Oct 28, 2008 at 3:44 PM, John Madden <jmadden@ivytech.edu> wrote: >>> the point of pausing the DomU is to get hold of a snapshot of memory >>> state, as well as the block device(s). when restoring, the DomU would >>> return to the ''same'' moment it was when paused. both the disk and CPU >>> would get back in time. >> >> ...And? ...If his backup of domU happens while paused, he''s still got an >> incomplete filesystem. > > the ''missing'' parts are in the RAM backup. > > > -- > Javier >That''s right. As far as I can tell, the only thing the domU would notice is that the system clock jumps forward suddenly, and all the network connections suddenly drop. I''m having trouble understanding John''s worries about the filesystem integrity. It''s true, it would be a problem if I just did ''xm create'' on the block device, but I''m planning on doing ''xm restore'', which should sidestep this problem. -- Agent Rooker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Oct 28, 2008 at 1:10 PM, Nick Anderson <nick@anders0n.net> wrote:> The one case where I figured there would be a window for corruption > was if information was coming over the wire that was being written.So would this be any worse than just yanking the network cable? -- Agent Rooker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> the ''missing'' parts are in the RAM backup.RAM is not backup. =) The point is to have something on tape (etc) to handle a dom0 failure. John -- John Madden Sr. UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Oct 28, 2008 at 4:02 PM, John Madden <jmadden@ivytech.edu> wrote:>> the ''missing'' parts are in the RAM backup. > > RAM is not backup. =) The point is to have something on tape (etc) to handle > a dom0 failure.s/RAM backup/backed up RAM/g -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Agent Rooker wrote:> On Tue, Oct 28, 2008 at 3:34 PM, Javier Guerra <javier@guerrag.com> wrote: > >> the point of pausing the DomU is to get hold of a snapshot of memory >> state, as well as the block device(s).>> doesn''t sound like something worthy to replace real backups; but might >> buy you much lower restore times, if you have the capacity to do both. >> >> -- >> Javier >> >> > > That''s what I''m thinking, anyway. As I said in a previous message in > this thread, "...we''re also doing domU level backups with NetBackup." >I assume that in domU you also use some kind of snapshot (lvm or whatever) so that NetBackup sees consistent files (e.g. all files backed up are from the same time, not changing in the middle of backup process)? If you do, then I assume: a. you can tolerate whatever corruption that may possibly happen using that method (i.e. the same kind of corruption you can get if you yank the power cord), or b. your application can recover from [a] (e.g. Using Innodb instead of MyIsam for MySQL) With that in mind, it should be easier to simply use snapshot without the need of xm save/restore. It will save some domU "downtime" (the time needed to save and restore domU). Another thing to consider, when the question "how to backup domU" arised on this list in the past (and it comes up quite often, search the list archive) I''d generally reply "try using zfs snapshot". Which means : - for backup in domU, you either need an opensolaris or zfs-fuse/linux running on domU - for backup in dom0, you need opensolaris dom0 (using zfs volume), whatever the OS/fs running on domU. Another alternative is to have an opensolaris server exporting zfs volumes via iscsi, have dom0/domU import it, and do all backups on the storage server. The benefit is that : - zfs snapshot is much faster than lvm snapshot (when using lvm snapshot disk writes will be doubled : to the original lv and the snapshot lv) - subsequent zfs snapshot is much faster since zfs tracks changes between snapshots internally (compared to rsync/blocksync which needs to read all files/blocks and compare their stats/checksum, thus eating lots of disk read i/o during backup process)> An alternative solution would be to bring the domUs down for a cold > block-level backup each night, but that is just a little more downtime > than I would like. > > >Your current backup solution uses lots of disk I/O, which might result in severe performance degradation during backup. Depending on your requirements, this might be okay, but you''ll get bettere performance with zfs. Regards, Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
"Fajar A. Nugraha" <fajar@fajar.net> writes:> With that in mind, it should be easier to simply use snapshot without > the need of xm save/restore. It will save some domU "downtime" (the time > needed to save and restore domU).the idea is that ''xm save'' saves your ram, including any write-back disk cash that has not been flushed. So if I do a ''xm save'' and save the savefile, and take a bit-for-bit copy of the backing device while the domain is still frozen, I should be able to restore the bit-for-bit copy of the backing device at some point in the future, and then ''xm restore'' the savefile I saved, and end up exactly where I was, with no inconsistancies or corruptions, as all disk writes that had not been flushed to disk are still in ram. (I can reduce downtime by only taking a snapshot while the domU is down, then doing the bit-for-bit copy off the snapshot.) Of course, xm save/restore is pretty picky about things like CPU archatecture (and, for that matter, the path to the disk) so as always, you want to test restoring your backup to another server. A backup that isn''t tested is no backup at all.> Another thing to consider, when the question "how to backup domU" arised > on this list in the past (and it comes up quite often, search the list > archive) I''d generally reply "try using zfs snapshot". Which means : > - for backup in domU, you either need an opensolaris or zfs-fuse/linux > running on domUYeah, that''s great if you are using opensolaris in the DomU (or something else that supports zfs well) but from what I understand, the linux zfs-fuse stuff is pretty slow.> - for backup in dom0, you need opensolaris dom0 (using zfs volume), > whatever the OS/fs running on domU.This does sound interesting, though I haven''t tried it.> Another alternative is to have an opensolaris server exporting zfs > volumes via iscsi, have dom0/domU import it, and do all backups on the > storage server.this is also interesting. Software ISCSI is obvously going to be slower than native disk, but how much slower? it is an interesting question. Right now, all my storage is local to the Dom0, and many hosts have excess disk. I''ve been thinking about exporting the excess disk via iscsi or NFS so that customers who want to buy more storage can do so without me worrying about balancing the local storage on various Dom0 hosts. and it would be easy enough to do that from within a OpenSolaris DomU. the big question in my mind is ''how much of the zfs benifits do I retain if I export over iscsi and format the block device ext3?''> The benefit is that : > - zfs snapshot is much faster than lvm snapshot (when using lvm snapshot > disk writes will be doubled : to the original lv and the snapshot lv)LVM snapshots do have... performance consiquences.> - subsequent zfs snapshot is much faster since zfs tracks changes > between snapshots internally (compared to rsync/blocksync which needs to > read all files/blocks and compare their stats/checksum, thus eating lots > of disk read i/o during backup process) > > > An alternative solution would be to bring the domUs down for a cold > > block-level backup each night, but that is just a little more downtime > > than I would like. > > Your current backup solution uses lots of disk I/O, which might result > in severe performance degradation during backup. Depending on your > requirements, this might be okay, but you''ll get bettere performance > with zfs.Unless you are willing to move to a system with good ZFS support, I doubt it. I bet (though I don''t know for sure) that the iscsi overhead is going to be greater than the difference between zfs snapshots and lvm snapshots. If the dd is causing performance problems, use ionice, and set it to the ''idle'' class. your backup will be really, really slow but will not interfere with other I/O. (I have tested that, and it does seem to work as advertised.) Now, you''re right about lvm snapshots being slow, so the domain being backed up is going to be slow until the backup finishes and the snapshot is cleared, but ionice makes a huge difference for the other domains on the box. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Luke S Crawford wrote:> "Fajar A. Nugraha" <fajar@fajar.net> writes: > >> With that in mind, it should be easier to simply use snapshot without >> the need of xm save/restore. It will save some domU "downtime" (the time >> needed to save and restore domU). >> > > the idea is that ''xm save'' saves your ram, including any write-back > disk cash that has not been flushed. So if I do a ''xm save'' and save the > savefile, and take a bit-for-bit copy of the backing device while the > domain is still frozen, I should be able to restore the bit-for-bit copy > of the backing device at some point in the future, and then ''xm restore'' the > savefile I saved, and end up exactly where I was, with no inconsistancies > or corruptions, as all disk writes that had not been flushed to disk are > still in ram. > > (I can reduce downtime by only taking a snapshot while the domU is down, then > doing the bit-for-bit copy off the snapshot.) > >Even with snapshot, there''s still the time required to "xm save" and "xm restore". I guess it''s more about choice, really. If I snapshot without xm save-restore, I get a "dirty" filesystem backup, but services would run as usual. If I do xm save-restore, I get a "clean" backup, but that also means all services on that domU would be unavailable for (at least) the duration of xm save-restore. I choose the first one.> you want to test > restoring your backup to another server. A backup that isn''t tested is > no backup at all. > >Good point on that.>> Another thing to consider, when the question "how to backup domU" arised >> on this list in the past (and it comes up quite often, search the list >> archive) I''d generally reply "try using zfs snapshot". Which means : >> - for backup in domU, you either need an opensolaris or zfs-fuse/linux >> running on domU >> > > Yeah, that''s great if you are using opensolaris in the DomU (or something > else that supports zfs well) but from what I understand, the linux zfs-fuse > stuff is pretty slow. > >Not really. zfs-fuse is slow if you let it handle raid (about half lvm/md throughput). Since I mostly need the snapshot feature, I use zfs-fuse on top of lvm. Performance-wise, depending on how you use it, it''s similar to ext3. Best case scenario, if you : - disable checksum - enable compression - set application block size to match zfs block size (or vice versa) you can actually get better read i/o performance (with cpu usage tradeoff).>> - for backup in dom0, you need opensolaris dom0 (using zfs volume), >> whatever the OS/fs running on domU. >> > > This does sound interesting, though I haven''t tried it. > >I''m using opensolaris snv_98 dom0, and it works fine for the most part. There are differences from linux dom0 though, like the fact that (for now) you can''t bridge a vlan interface to dom0 (you can only bridge physical interfaces).>> Another alternative is to have an opensolaris server exporting zfs >> volumes via iscsi, have dom0/domU import it, and do all backups on the >> storage server. >> > > this is also interesting. Software ISCSI is obvously going to be slower > than native disk, but how much slower? it is an interesting question. > >This thread might give some info http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/051749.html> Right now, all my storage is local to the Dom0, and many hosts have > excess disk. I''ve been thinking about exporting the excess disk via iscsi > or NFS so that customers who want to buy more storage can do so without me > worrying about balancing the local storage on various Dom0 hosts. > > and it would be easy enough to do that from within a OpenSolaris DomU. > > the big question in my mind is ''how much of the zfs benifits do I retain > if I export over iscsi and format the block device ext3?'' > >You can get : - zfs checksum and raidz, which would ensure data integrity (up to the exported block-level anyway) - transparent compression. Having compressed ext3 volumes is nice for certain usage. - snapshot and clone. Similar to qcow, but with block-device benefits.> If the dd is causing performance problems, use ionice, and set it to the > ''idle'' class. your backup will be really, really slow but will not > interfere with other I/O. (I have tested that, and it does seem to work > as advertised.) Now, you''re right about lvm snapshots being slow, so the > domain being backed up is going to be slow until the backup finishes and the > snapshot is cleared, but ionice makes a huge difference for the other > domains on the box. > >Good hint on ionice. At least it can isolate the performance penalty. Regards, Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Agent Rooker wrote:> On Tue, Oct 28, 2008 at 1:10 PM, Nick Anderson <nick@anders0n.net> wrote: > >> The one case where I figured there would be a window for corruption >> was if information was coming over the wire that was being written. >> > > So would this be any worse than just yanking the network cable? > > >Yes. When you take a snapshot backup of domU you''re basically doing a fork() of a running system. The parent, the original continues and runs all its outstanding transactions (receiving mail, sending mail, buying CDs from Amazon) and those transactions complete. When (or if) you restore the snapshot backup those half-completed transactions will continue as well. Anything in-bound will have been lost and chances are the worst that will happen is that you''ll log a message. Anything outbound will be duplicated -- two copies of a mail messages, two CDs from Amazon, credit card debited twice. Oops. You can, of course, prepare a system for a snapshot (or fork) so that transactions are completed or arrangements made for those important ones to not continue in the forked copy. This isn''t a problem unique to Xen, of course, you can have this problem with normal, bare-metal backups and transactions that are held on disk rather than in memory -- it''s just that when you do bare-metal backups you''re selective about what you back up and you don''t typically backup things like, for example, the sendmail queue. jch _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wednesday 29 October 2008 12:13:27 am Luke S Crawford wrote:> the idea is that ''xm save'' saves your ram, including any write-back > disk cash that has not been flushed. So if I do a ''xm save'' and save the > savefile, and take a bit-for-bit copy of the backing device while the > domain is still frozen, I should be able to restore the bit-for-bit copy > of the backing device at some point in the future, and then ''xm restore'' > the savefile I saved, and end up exactly where I was, with no > inconsistancies or corruptions, as all disk writes that had not been > flushed to disk are still in ram.Yes, saving the domain and backing up the saved instance will work. Pausing will not. John -- John Madden Sr. UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Oct 29, 2008 at 5:34 AM, John Haxby <john.haxby@oracle.com> wrote:> Agent Rooker wrote: >> >> On Tue, Oct 28, 2008 at 1:10 PM, Nick Anderson <nick@anders0n.net> wrote: >> >>> >>> The one case where I figured there would be a window for corruption >>> was if information was coming over the wire that was being written. >>> >> >> So would this be any worse than just yanking the network cable? >> >> >> > > Yes. When you take a snapshot backup of domU you''re basically doing a > fork() of a running system. The parent, the original continues and runs > all its outstanding transactions (receiving mail, sending mail, buying CDs > from Amazon) and those transactions complete. > > When (or if) you restore the snapshot backup those half-completed > transactions will continue as well. Anything in-bound will have been lost > and chances are the worst that will happen is that you''ll log a message. > Anything outbound will be duplicated -- two copies of a mail messages, two > CDs from Amazon, credit card debited twice. Oops. > > You can, of course, prepare a system for a snapshot (or fork) so that > transactions are completed or arrangements made for those important ones to > not continue in the forked copy. This isn''t a problem unique to Xen, of > course, you can have this problem with normal, bare-metal backups and > transactions that are held on disk rather than in memory -- it''s just that > when you do bare-metal backups you''re selective about what you back up and > you don''t typically backup things like, for example, the sendmail queue. > > jch > >Assuming these transactions are done over TCP, wouldn''t the duplicate outgoing packets be discarded by the receiving server as out of order? The exact workings of TCP are a little out of my depth, but that is my understanding of it. But let''s suppose that you''re right. What if I just disable networking in the xen config file before doing an emergency restore and then shutting down the domU cleanly before starting it up again with networking. That should make the process of restoring a domU less risky. -- Agent Rooker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
-----Original Message----- From: xen-users-bounces@lists.xensource.com [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of Agent Rooker Sent: Thursday, October 30, 2008 16:19 To: xen-users@lists.xensource.com Subject: Re: [Xen-users] Block level domU backup On Wed, Oct 29, 2008 at 5:34 AM, John Haxby <john.haxby@oracle.com> wrote:> Agent Rooker wrote: >> >> On Tue, Oct 28, 2008 at 1:10 PM, Nick Anderson <nick@anders0n.net> wrote: >> >>> >>> The one case where I figured there would be a window for corruption >>> was if information was coming over the wire that was being written. >>> >> >> So would this be any worse than just yanking the network cable? >> >> >> > > Yes. When you take a snapshot backup of domU you''re basically doing a > fork() of a running system. The parent, the original continues and runs > all its outstanding transactions (receiving mail, sending mail, buying CDs > from Amazon) and those transactions complete. > > When (or if) you restore the snapshot backup those half-completed > transactions will continue as well. Anything in-bound will have been lost > and chances are the worst that will happen is that you''ll log a message. > Anything outbound will be duplicated -- two copies of a mail messages,two> CDs from Amazon, credit card debited twice. Oops. > > You can, of course, prepare a system for a snapshot (or fork) so that > transactions are completed or arrangements made for those important onesto> not continue in the forked copy. This isn''t a problem unique to Xen, of > course, you can have this problem with normal, bare-metal backups and > transactions that are held on disk rather than in memory -- it''s just that > when you do bare-metal backups you''re selective about what you back up and > you don''t typically backup things like, for example, the sendmail queue. > > jch > >Assuming these transactions are done over TCP, wouldn''t the duplicate outgoing packets be discarded by the receiving server as out of order? The exact workings of TCP are a little out of my depth, but that is my understanding of it. But let''s suppose that you''re right. What if I just disable networking in the xen config file before doing an emergency restore and then shutting down the domU cleanly before starting it up again with networking. That should make the process of restoring a domU less risky. -- Agent Rooker _______________________________________________ You are correct, outgoing packets that were part of a stream would be discarded for being out of order. However, new connections (for mail about to be delivered from queue, for example) would be established as new connections and go through. This would be true even if you tried restoring without networking, which may or may not be an option, as the mail would still be in queue upon starting up. You could certainly do a clean shutdown and startup after restore, but it wouldn''t change anything unless there was a problem with such a large number of connections suddenly stuck in time_wait, and such a problem would probably be considered a bug, as other situations (like a router going down) can cause the same situation and your server shouldn''t crash. However, I''m not so familiar with the software side of things, and shutting down cleanly couldn''t hurt, I just suspect it would be a waste of time. Dustin _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Oct 31, 2008 at 7:59 AM, Dustin Henning <Dustin.Henning@prd-inc.com> wrote:> You are correct, outgoing packets that were part of a stream would > be discarded for being out of order. However, new connections (for mail > about to be delivered from queue, for example) would be established as new > connections and go through. This would be true even if you tried restoring > without networking, which may or may not be an option, as the mail would > still be in queue upon starting up.this would also happen with a traditional backup. not something to lose sleep on -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Oct 31, 2008 at 9:05 AM, Javier Guerra <javier@guerrag.com> wrote:> On Fri, Oct 31, 2008 at 7:59 AM, Dustin Henning > <Dustin.Henning@prd-inc.com> wrote: >> You are correct, outgoing packets that were part of a stream would >> be discarded for being out of order. However, new connections (for mail >> about to be delivered from queue, for example) would be established as new >> connections and go through. This would be true even if you tried restoring >> without networking, which may or may not be an option, as the mail would >> still be in queue upon starting up. > > this would also happen with a traditional backup. > > not something to lose sleep on > > > -- > Javier >Anyway, I finally got around to testing it. Everything seems to work just fine. Here is my script''s logfile for the test run with time stamps. Oct 31 15:27:20 --> Beginning sync. Oct 31 15:27:20 --> Pausing domU webserver and saving xen checkpoint file... Oct 31 15:27:40 --> Creating root snapshot... Logical volume "webserver_snapshot" created Oct 31 15:27:43 --> Creating swap snapshot... Logical volume "webserver_swap_snapshot" created Oct 31 15:27:43 --> Restoring Xen domU... Oct 31 15:28:03 --> Syncing swap volume... Oct 31 15:28:29 --> Removing swap snapshot... Logical volume "webserver_swap_snapshot" successfully removed Oct 31 15:28:30 --> Syncing root volume... Oct 31 15:30:08 --> Removing root snapshot... Logical volume "webserver_snapshot" successfully removed Oct 31 15:30:09 --> Transferring checkpoint file... Oct 31 15:30:42 --> Transferring config file... Oct 31 15:30:43 --> Done! As you can see, the domU remains paused for a little over 40 seconds, and the snapshots exist for a little more than 2 minutes. For those 2 minutes, I/O efficiency will suffer, but as long as we do this during non-peak hours, it should be a non-issue. I''m still nervous about some bug in xen causing the domU to not restore after the snapshots get created, but after some more extensive testing I hope to put that worry to rest. I did run into an issue when I restored the domU on the backup server. I was not able to ssh in. That was the only service running on that domU. However, after restarting, I was able to ssh just fine, and everything else seems normal. I''ll keep testing and report any findings. -- Agent Rooker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users