Just looking for some feedback from other people who do this. I know its not a good "backup" method but "crash consistent" images have been very useful for me in disaster situations just to get OS running quickly then restore data from a data backup. My typical setup is to put the LV in snapshot mode while guest is running then dd the data to a backup file which is on a NFS mount point. The thing that seems to be happening is that the VM''s performance gets pretty poor during the time the copy is happening. My guesses at why this was happening were: 1. dom0 having equal weight to the other 4 guests on the box and somehow hogging cpu time 2. lack of QoS on the IO side / dom0 hogging IO 3. process priorities in dom0 4. NFS overhead For each of these items I tried to adjust things to see if it improved. 1. Tried increasing dom0 weight to 4x the other VM''s. 2. Saw pasi mentioning dm-ioband a few times and think this might address IO scheduling but haven''t tried it yet. 3. Tried nice-ing the dd to lowest priority and qemu-dm to highest 4. Changing destination to a local Changing the things above didn''t really seem to help either alone or in combination. My setup is Xen 3.2 and Xen 4.0 on dual nehalem processors, 24GB RAM, RAID 5+0 of WD RE3 1TB disks. The hardware in the boxes is quite good and there seems to be no noticable difference between Xen versions. What I''d ideally like to accomplish is to be able to take the backups with the least possible impact on the running VM''s as possible. I honestly don''t care how long the backups take but I want to avoid just slowing them down to a fixed speed, because it seems inefficient/hacky. Can anyone share their experiences both good and bad? Thanks, - chris _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I think this got missed during the mailinglist downtime last weekend... I can''t imagine no one has any inpurt? - chris On Sat, Apr 17, 2010 at 2:53 PM, chris <tknchris@gmail.com> wrote:> Just looking for some feedback from other people who do this. I know > its not a good "backup" method but "crash consistent" images have been > very useful for me in disaster situations just to get OS running > quickly then restore data from a data backup. My typical setup is to > put the LV in snapshot mode while guest is running then dd the data to > a backup file which is on a NFS mount point. The thing that seems to > be happening is that the VM''s performance gets pretty poor during the > time the copy is happening. My guesses at why this was happening were: > > 1. dom0 having equal weight to the other 4 guests on the box and > somehow hogging cpu time > 2. lack of QoS on the IO side / dom0 hogging IO > 3. process priorities in dom0 > 4. NFS overhead > > For each of these items I tried to adjust things to see if it improved. > > 1. Tried increasing dom0 weight to 4x the other VM''s. > 2. Saw pasi mentioning dm-ioband a few times and think this might > address IO scheduling but haven''t tried it yet. > 3. Tried nice-ing the dd to lowest priority and qemu-dm to highest > 4. Changing destination to a local > > Changing the things above didn''t really seem to help either alone or > in combination. My setup is Xen 3.2 and Xen 4.0 on dual nehalem > processors, 24GB RAM, RAID 5+0 of WD RE3 1TB disks. The hardware in > the boxes is quite good and there seems to be no noticable difference > between Xen versions. What I''d ideally like to accomplish is to be > able to take the backups with the least possible impact on the running > VM''s as possible. I honestly don''t care how long the backups take but > I want to avoid just slowing them down to a fixed speed, because it > seems inefficient/hacky. Can anyone share their experiences both good > and bad? > > Thanks, > - chris >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jeff Sturm
2010-Apr-23 19:34 UTC
RE: [Xen-users] Re: Snapshotting LVM backed guests from dom0
Chris, Saw your original post, but hesitated to respond, since I''m not really an expert on either Linux block I/O or NFS. Anyway... On Sat, Apr 17, 2010 at 2:53 PM, chris <tknchris@gmail.com> wrote:> Just looking for some feedback from other people who do this. I know > its not a good "backup" method but "crash consistent" images have been > very useful for me in disaster situations just to get OS running > quickly then restore data from a data backup. My typical setup is to > put the LV in snapshot mode while guest is running then dd the data to > a backup file which is on a NFS mount point. The thing that seems to > be happening is that the VM''s performance gets pretty poor during the > time the copy is happening.We see this all the time on Linux hosts. One process with heavy I/O can starve others. I''m not quite sure why but I suspect it has something to do with the unified buffer cache. When reading a large volume with "normal" I/O, buffer pages might get quickly replaced with pages that are never going to be read again, and your buffer cache hit ratio suffers. Every other process on the affected host that needs to do I/O may experience longer latency as a result. With Xen, that includes any domU. A quick fix that worked for us: Direct I/O. Run your "dd" command with "iflag=direct" and/or "oflag=direct", if your version supports it (definitely works on CentOS 5.x, definitely *not* on CentOS 4.x). This bypasses the buffer cache completely and forces dd to read/write direct to the underlying disk device. Make sure you use an ample block size ("bs=64k" or larger) so the copy will finish in reasonable time. Not sure if that''ll work properly with NFS, however. (Having been badly burned by NFS numerous times I tend to not use it on production hosts.) To copy disks from one host to another, we resort to tricks like piping over ssh (e.g. "dd if=<somefile> iflag=direct bs=256k | ssh <otherhost> -c ''dd of=<otherfile> oflag=direct bs=256k''"). These copies run slow, but steady. Importantly they run with minimal impact on other processing going on at the time.> 3. Tried nice-ing the dd to lowest priority and qemu-dm to highest"nice" applies only to CPU scheduling and probably isn''t helpful for this. You could try playing with ionice, which lets you override scheduling priorities on a per-process basis. Jeff _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Nick Couchman
2010-Apr-23 19:37 UTC
[Xen-users] Re: Snapshotting LVM backed guests from dom0
> On Sat, Apr 17, 2010 at 2:53 PM, chris <tknchris@gmail.com> wrote: >> Just looking for some feedback from other people who do this. I know >> its not a good "backup" method but "crash consistent" images have been >> very useful for me in disaster situations just to get OS running >> quickly then restore data from a data backup. My typical setup is to >> put the LV in snapshot mode while guest is running then dd the data to >> a backup file which is on a NFS mount point. The thing that seems to >> be happening is that the VM''s performance gets pretty poor during the >> time the copy is happening. My guesses at why this was happening were: >> >> 1. dom0 having equal weight to the other 4 guests on the box and >> somehow hogging cpu time >> 2. lack of QoS on the IO side / dom0 hogging IO >> 3. process priorities in dom0 >> 4. NFS overhead >> >> For each of these items I tried to adjust things to see if it improved. >> >> 1. Tried increasing dom0 weight to 4x the other VM''s.Probably not going to help - if you increase the weight, you''ll choke out your other domUs, if you decrease the weight, the domUs also may be affected because network and disk I/O end up going through dom0 in the end, anyway.>> 2. Saw pasi mentioning dm-ioband a few times and think this might >> address IO scheduling but haven''t tried it yet. >> 3. Tried nice-ing the dd to lowest priority and qemu-dm to highestI would expect this to help, some, but may not be the only thing. Also, remember that network and disk I/O are still done through drivers on dom0, which means pushing qemu-dm to the highest really won''t buy you anything. I would expect re-niceing dd to help some, though.>> 4. Changing destination to a localThis indicates that the bottleneck is local and not the network. The next step would be to grab some Linux performance monitoring and debugging tools and figure out where your bottleneck is. So, things like top, xentop, iostat, vmstat, and sar may be useful in determining what component is hitting its performance limit and needs to be tweaked or worked around. -Nick -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thanks everyone for the tips i will try experimenting with these over this weekend and let you know how much it helps if any. - chris On Fri, Apr 23, 2010 at 3:37 PM, Nick Couchman <Nick.Couchman@seakr.com> wrote:>> On Sat, Apr 17, 2010 at 2:53 PM, chris <tknchris@gmail.com> wrote: >>> Just looking for some feedback from other people who do this. I know >>> its not a good "backup" method but "crash consistent" images have been >>> very useful for me in disaster situations just to get OS running >>> quickly then restore data from a data backup. My typical setup is to >>> put the LV in snapshot mode while guest is running then dd the data to >>> a backup file which is on a NFS mount point. The thing that seems to >>> be happening is that the VM''s performance gets pretty poor during the >>> time the copy is happening. My guesses at why this was happening were: >>> >>> 1. dom0 having equal weight to the other 4 guests on the box and >>> somehow hogging cpu time >>> 2. lack of QoS on the IO side / dom0 hogging IO >>> 3. process priorities in dom0 >>> 4. NFS overhead >>> >>> For each of these items I tried to adjust things to see if it improved. >>> >>> 1. Tried increasing dom0 weight to 4x the other VM''s. > > Probably not going to help - if you increase the weight, you''ll choke out your other domUs, if you decrease the weight, the domUs also may be affected because network and disk I/O end up going through dom0 in the end, anyway. > >>> 2. Saw pasi mentioning dm-ioband a few times and think this might >>> address IO scheduling but haven''t tried it yet. >>> 3. Tried nice-ing the dd to lowest priority and qemu-dm to highest > > I would expect this to help, some, but may not be the only thing. Also, remember that network and disk I/O are still done through drivers on dom0, which means pushing qemu-dm to the highest really won''t buy you anything. I would expect re-niceing dd to help some, though. > >>> 4. Changing destination to a local > > This indicates that the bottleneck is local and not the network. The next step would be to grab some Linux performance monitoring and debugging tools and figure out where your bottleneck is. So, things like top, xentop, iostat, vmstat, and sar may be useful in determining what component is hitting its performance limit and needs to be tweaked or worked around. > > -Nick > > > > > -------- > This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users