thr3ads.net - Xen users - Disk i/o on Dom0 suddenly too slow [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Micky

2013-Jul-08 20:52 UTC

Disk i/o on Dom0 suddenly too slow

Until now, things were fine but today while doing a regular lvm
snapshot backup, I noticed a huge lag in I/O.

Whenever i ran dd on dom0, the load on Linux domUs increased to high values.

A simple read/write test on Dom0 showed the speed of 1.2 MB/s.

# dd if=/dev/zero of=./test bs=1k count=1048576
^C^C^C^C
545259520 bytes (520 MB) copied, 402.32934 seconds, 1.3 MB/s

Running Xen 4.2.1.

Has anyone noticed anything similar?

Micky

2013-Jul-09 03:15 UTC

head link

Re: Disk i/o on Dom0 suddenly too slow

To add to my last email.
This happens to be related to LVM snapshot. Every time a snapshot is
created for the LVM where a particular domu is on, load on that domu
spikes up to 30 and things become sluggish.

Ionice and lvm parameter formalities didn''t help!

To the the real guys out there:
How do you use LVM snapshots with Xen dom0, if any? To me, it seems
like LVM snapshotting isn''t a short-term backup strategy at all!

On Tue, Jul 9, 2013 at 1:52 AM, Micky <mickylmartin@gmail.com>
wrote:> Until now, things were fine but today while doing a regular lvm
> snapshot backup, I noticed a huge lag in I/O.
>
> Whenever i ran dd on dom0, the load on Linux domUs increased to high
values.
>
> A simple read/write test on Dom0 showed the speed of 1.2 MB/s.
>
> # dd if=/dev/zero of=./test bs=1k count=1048576
> ^C^C^C^C
> 545259520 bytes (520 MB) copied, 402.32934 seconds, 1.3 MB/s
>
> Running Xen 4.2.1.
>
> Has anyone noticed anything similar?

Adam Goryachev

2013-Jul-09 04:53 UTC

head link

Re: Disk i/o on Dom0 suddenly too slow

On 09/07/13 13:15, Micky wrote:> To add to my last email.
> This happens to be related to LVM snapshot. Every time a snapshot is
> created for the LVM where a particular domu is on, load on that domu
> spikes up to 30 and things become sluggish.
>
> Ionice and lvm parameter formalities didn''t help!
>
> To the the real guys out there:
> How do you use LVM snapshots with Xen dom0, if any? To me, it seems
> like LVM snapshotting isn''t a short-term backup strategy at all!
I''ve had similar issues, in fact, for the life of the LVM snapshot, 
performance seems to severely degrade. Usually a single snapshot is ok, 
but I wanted to have three snapshots, and each day delete the oldest and 
create a new one.

I''ve found two "solutions":
1) Make your storage backend perform like a god so that after you take 
the snapshots performance is like a stroll down the road. (ie, I''ve 
upgraded to SSD based storage which can get approx 1.5TB/s write and 
2.5TB/s read) ....
2) Only keep a single snapshot, and if possible, remove it as soon as 
your backup is completed.... and/or keep writes to a minimum while the 
snapshot is active.

My plan is to do something like this:
1) Have two storage backend machines
2) Use DRBD to sync the two of them (primary sits on RAID device, 
secondary sits on LVM on RAID device)
3) Use LVM on top of the DRBD to create LV''s for each domU
5) Take a snapshot using the underlying LVM (below DRBD) on the secondary
6) Run your backup processes on the snapshot of the DRBD
7) Delete the snapshot

The problem I have is that probably step 6 and 7 might involve 
disconnecting the backup server from the primary (break the DRBD), and 
promote it to primary, and make various changes to it (ie, create a 
split-brain scenario intentionally). After finished the backup process, 
you may need to invalidate the entire DRBD and re-sync, which could be 
too time consuming (and itself cause a performance issue).

I haven''t yet got that far in the process, so if you do something it 
would be helpful to hear about it.

Also any other people who can share what they do and what works 
well/doesn''t work would be nice to see.

Finally, the other problem I have with LVM on Debian (stable) is that 
every week or two, it will freeze on lvremove, and other lvs or LV 
related commands will freeze. The only solution seems to be a reboot. 
(Using kernel 3.2.0-4-686-pae #1 SMP Debian 3.2.41-2 i686). I haven''t 
tracked this down or reported it yet, but it is frustrating to have to 
reboot the dom0 so often.

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au

Micky

2013-Jul-09 08:49 UTC

head link

Re: Disk i/o on Dom0 suddenly too slow

> I''ve had similar issues, in fact, for the life of the LVM
snapshot,
> performance seems to severely degrade. Usually a single snapshot is ok, but
> I wanted to have three snapshots, and each day delete the oldest and create
> a new one.
>
What a coincidence! I am doing exactly the same.
> I''ve found two "solutions":
> 1) Make your storage backend perform like a god so that after you take the
> snapshots performance is like a stroll down the road. (ie, I''ve
upgraded to
> SSD based storage which can get approx 1.5TB/s write and 2.5TB/s read) ....
> 2) Only keep a single snapshot, and if possible, remove it as soon as your
> backup is completed.... and/or keep writes to a minimum while the snapshot
> is active.
That''s what the script I wrote, is doing. Check
http://github.com/bassu/xen-scripts/
As for SSDs, I didn''t find them stable as in long-term production
environments!
> My plan is to do something like this:
> 1) Have two storage backend machines
> 2) Use DRBD to sync the two of them (primary sits on RAID device, secondary
> sits on LVM on RAID device)
> 3) Use LVM on top of the DRBD to create LV''s for each domU
> 5) Take a snapshot using the underlying LVM (below DRBD) on the secondary
> 6) Run your backup processes on the snapshot of the DRBD
> 7) Delete the snapshot
Sounds a lot complicated. Block level snapshots under grouped block
level devices -- seems like a lot of overhead!
Gluster may be a lot more useful in this case -- just a slight guess.
> I haven''t yet got that far in the process, so if you do something
it would
> be helpful to hear about it.
>
> Also any other people who can share what they do and what works
well/doesn''t
> work would be nice to see.
I am experimenting with a few tricks. I will share the outcome like
the script I just shared :)
> Finally, the other problem I have with LVM on Debian (stable) is that every
> week or two, it will freeze on lvremove, and other lvs or LV related
> commands will freeze. The only solution seems to be a reboot. (Using kernel
> 3.2.0-4-686-pae #1 SMP Debian 3.2.41-2 i686). I haven''t tracked
this down or
> reported it yet, but it is frustrating to have to reboot the dom0 so often.
LVM is slow as heck when it comes to snapshots. And everywhere I look,
people talk about the "copy on write" magic,
but no one tells you that you are gonna bite your tongue!
> Regards,Cheers.

Adam Goryachev

2013-Jul-10 02:33 UTC

head link

Re: Disk i/o on Dom0 suddenly too slow

On 09/07/13 18:49, Micky wrote:>> I''ve found two "solutions":
>> 1) Make your storage backend perform like a god so that after you take
the
>> snapshots performance is like a stroll down the road. (ie,
I''ve upgraded to
>> SSD based storage which can get approx 1.5TB/s write and 2.5TB/s read)
....
>> 2) Only keep a single snapshot, and if possible, remove it as soon as
your
>> backup is completed.... and/or keep writes to a minimum while the
snapshot
>> is active.
> That''s what the script I wrote, is doing. Check
> http://github.com/bassu/xen-scripts/
I had a quick read through of your script... looks pretty nice and 
complete, just a couple of comments:
1) line 159 you do a killall -9 dd, but you know the pid of dd that you 
launched, you might accidentally kill another dd process run from 
another script/etc... so consider to change to killall -9 $ddpid

2) in find_lvm you call lvdisplay, and this is where I tend to have the 
same problem (various lvm2 processes hang forever, including lvs, and 
lvremove when removing snapshots). I don''t know a good way to solve
that
except reboot when it happens.

3) You set the snapshot chunk size to 512k, what does this do, does it 
really make much difference?

4) You are reading the full snapshot, writing out the full uncompressed 
copy of the image, then reading the copy back and writing the compressed 
copy out. You could optimize this by reading the snapshot, and writing 
compressed data directly in one step. If the CPU is faster than the 
disk, this will reduce the overall backup time, and might also reduce 
the time the snapshot hangs around.

5) I found if the LV is on the same disk as I am saving the dump to, 
then this drastically slows things down (reading/writing the same disk 
in different locations at the same time). Either backup to different 
disks if possible.

My script is currently much simpler, I simply create the snapshots and 
remove the old ones (no full copies of the snapshots/etc).

I use backuppc which I''ve got working for one system to snapshot the
VM,
mount the image, backup with rsync, then umount and remove the snapshot. 
I still like to keep a full image snapshot, and even better to send that 
raw image offsite.

Another scenario I shutdown the VM (using an image file), then simply 
copy the file via some tools into chunks of 100M, then startup the VM.
> As for SSDs, I didn''t find them stable as in long-term production
environments!
Interesting, I''ve had problems with a number of SSD''s, but
since I
started using the Intel 520s, I''ve not had any issues. I have one 
environment with about 10 heavily used windows domU''s, the SAN is using
5 x 480G SSD''s, and so far haven''t had any issues (I think
over 12
months now). It would be interesting to hear if you have any additional 
information/comments?
>> My plan is to do something like this:
>> 1) Have two storage backend machines
>> 2) Use DRBD to sync the two of them (primary sits on RAID device,
secondary
>> sits on LVM on RAID device)
>> 3) Use LVM on top of the DRBD to create LV''s for each domU
>> 5) Take a snapshot using the underlying LVM (below DRBD) on the
secondary
>> 6) Run your backup processes on the snapshot of the DRBD
>> 7) Delete the snapshot
> Sounds a lot complicated. Block level snapshots under grouped block
> level devices -- seems like a lot of overhead!
> Gluster may be a lot more useful in this case -- just a slight guess.
In my opinion, gluster will add a lot of overhead anyway, and maybe is 
not sufficiently stable, and certainly I don''t know it well enough to 
put into production. While LVM + MD + DRBD are all simple, low overhead, 
well understood, etc... Each read/write with LVM/MD/DRBD is simply a 
remap process to a physical device read/write, while glusterfs seems 
more of a filesystem with more overhead/complexity.
>> I haven''t yet got that far in the process, so if you do
something it would
>> be helpful to hear about it.
>>
>> Also any other people who can share what they do and what works
well/doesn''t
>> work would be nice to see.
> I am experimenting with a few tricks. I will share the outcome like
> the script I just shared :)
Thanks, appreciated.
>> Finally, the other problem I have with LVM on Debian (stable) is that
every
>> week or two, it will freeze on lvremove, and other lvs or LV related
>> commands will freeze. The only solution seems to be a reboot. (Using
kernel
>> 3.2.0-4-686-pae #1 SMP Debian 3.2.41-2 i686). I haven''t
tracked this down or
>> reported it yet, but it is frustrating to have to reboot the dom0 so
often.
> LVM is slow as heck when it comes to snapshots. And everywhere I look,
> people talk about the "copy on write" magic,
> but no one tells you that you are gonna bite your tongue!If biting my tongue would help, I''d do it :)

Running multiple VM''s on a single storage device, especially spinning 
disks, seems to be challenging to ensure the right performance with all 
the contention/etc... Using SSD''s should be a lot simpler/easier, but 
LVM performance is making that really difficult, and I still don''t 
understand why performance is so horrible. At some point, I''ll join the
LVM list and investigate in more detail, but I''ve got "good
enough"
performance so far, and have other higher priority issues on my list...

Thanks again.

Adam

-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

Micky

2013-Jul-10 06:07 UTC

head link

Re: Disk i/o on Dom0 suddenly too slow

First off, thanks for checking.
Secondly, I have managed to resolve the disk dumping issues from LVM
snapshots and preliminary tests are satisfactory.

Turns out, the default scheduler CFQ was not suited for this workload.
Dom0: echo deadline > /sys/block/sda/queue/scheduler
DomU: echo noop > /sys/block/xvda/queue/scheduler

If you need reasons, let me know and I''ll explain the findings further.

Since I am using megaraid controller, I looked at LSI recommendations
and tweaked kernel further.
This overall gave me 50% performance boost on cheap Seagate disks.

No more sluggishness!!

About the script:

1) Good catch. That was indeed the purpose of creating $ddpid. Seems
like a typo.

2) We use RHEL/CentOS in production so I have never had such an issue
so didn''t consider. But you could do something like:
[[ $(ps -p $(pidof lvdisplay) -o etimes:1=) -gt 300 ]] do something if
it executes for more than 5 mins

3) My tests at time showed 512k snapshot chunk size gave more speed to
dd writes. But now after I have switched to deadline scheduler, there
are best results without specifying -c parameter to lvm and dd''ing
with bs=100M. Also, there''s no need for ionice since it''s
works with
CFQ only.

4) It takes the same amount of CPU time though. Dumping and
compressing large chunks at the same time with pipes and stdouts can
cause weird issues with FIFOs. IMHO, why risk taking a chance of
having corrupt backups when the only real way in the world to test the
backups is by restoring them! A little certainty of knowing of not
having a dirty backup is worth little more of I/O expense!

5) Affirmative. That is why two separate config variables exist there:
BACKUP_DIR and PROCESS_DIR
> My script is currently much simpler, I simply create the snapshots and
> remove the old ones (no full copies of the snapshots/etc).
Seems fine. In my case there are more than few nodes and tens of
domains. So the above works pretty well for me as short term backup
strategy!
> I use backuppc which I''ve got working for one system to snapshot
the VM,
> mount the image, backup with rsync, then umount and remove the snapshot. I
> still like to keep a full image snapshot, and even better to send that raw
> image offsite.
I use Burp from inside the domu.
> It would be interesting to hear if you have any additional
information/comments?
Well, I started with few small machines and one after another SSDs
died on me either due to a firmware problems or bad blocks. I tried
Crucial, switched to Intel and then Samsung. The latter were ones that
ran fine for the longest time. Now I just use these for personal
laptops.
> Another scenario I shutdown the VM (using an image file), then simply copy
> the file via some tools into chunks of 100M, then startup the VM.
Seems fine from administration point of view but people have become
uptime conscious these days.
> In my opinion, gluster will add a lot of overhead anyway, and maybe is not
> sufficiently stable, and certainly I don''t know it well enough to
put into
> production. While LVM + MD + DRBD are all simple, low overhead, well
> understood, etc... Each read/write with LVM/MD/DRBD is simply a remap
> process to a physical device read/write, while glusterfs seems more of a
> filesystem with more overhead/complexity.
And I haven''t played much with DRBD so there are only guesses. My
understanding with network based domains'' I/O is that unless you have
high speed disks or network equipment or preferably a SAN, the domains
will suffer from I/O latency if there are more than a few. Simply the
gigabit switches and so called 6Gb/s SAS drives aren''t sufficient.
> Running multiple VM''s on a single storage device, especially
spinning disks,
> seems to be challenging to ensure the right performance with all the
> contention/etc... Using SSD''s should be a lot simpler/easier, but
LVM
> performance is making that really difficult, and I still don''t
understand
> why performance is so horrible. At some point, I''ll join the LVM
list and
> investigate in more detail, but I''ve got "good enough"
performance so far,
> and have other higher priority issues on my list...
So true. Try the workaround I mentioned above of switching the
scheduler to noop or deadline, and see if you find any improvements.
> Thanks again.Quite welcome!

Micky

2013-Jul-10 14:45 UTC

head link

Re: Disk i/o on Dom0 suddenly too slow

Adam:
P.S.
https://github.com/bassu/xen-scripts/commit/20294000bee25fa986adfe284fc3d0c2aa11965f

On Wed, Jul 10, 2013 at 11:07 AM, Micky <mickylmartin@gmail.com>
wrote:> First off, thanks for checking.
> Secondly, I have managed to resolve the disk dumping issues from LVM
> snapshots and preliminary tests are satisfactory.
>
> Turns out, the default scheduler CFQ was not suited for this workload.
> Dom0: echo deadline > /sys/block/sda/queue/scheduler
> DomU: echo noop > /sys/block/xvda/queue/scheduler
>
> If you need reasons, let me know and I''ll explain the findings
further.
>
> Since I am using megaraid controller, I looked at LSI recommendations
> and tweaked kernel further.
> This overall gave me 50% performance boost on cheap Seagate disks.
>
> No more sluggishness!!
>
> About the script:
>
> 1) Good catch. That was indeed the purpose of creating $ddpid. Seems
> like a typo.
>
> 2) We use RHEL/CentOS in production so I have never had such an issue
> so didn''t consider. But you could do something like:
> [[ $(ps -p $(pidof lvdisplay) -o etimes:1=) -gt 300 ]] do something if
> it executes for more than 5 mins
>
> 3) My tests at time showed 512k snapshot chunk size gave more speed to
> dd writes. But now after I have switched to deadline scheduler, there
> are best results without specifying -c parameter to lvm and dd''ing
> with bs=100M. Also, there''s no need for ionice since it''s
works with
> CFQ only.
>
> 4) It takes the same amount of CPU time though. Dumping and
> compressing large chunks at the same time with pipes and stdouts can
> cause weird issues with FIFOs. IMHO, why risk taking a chance of
> having corrupt backups when the only real way in the world to test the
> backups is by restoring them! A little certainty of knowing of not
> having a dirty backup is worth little more of I/O expense!
>
> 5) Affirmative. That is why two separate config variables exist there:
> BACKUP_DIR and PROCESS_DIR
>
>> My script is currently much simpler, I simply create the snapshots and
>> remove the old ones (no full copies of the snapshots/etc).
>
> Seems fine. In my case there are more than few nodes and tens of
> domains. So the above works pretty well for me as short term backup
> strategy!
>
>> I use backuppc which I''ve got working for one system to
snapshot the VM,
>> mount the image, backup with rsync, then umount and remove the
snapshot. I
>> still like to keep a full image snapshot, and even better to send that
raw
>> image offsite.
>
> I use Burp from inside the domu.
>
>> It would be interesting to hear if you have any additional
information/comments?
>
> Well, I started with few small machines and one after another SSDs
> died on me either due to a firmware problems or bad blocks. I tried
> Crucial, switched to Intel and then Samsung. The latter were ones that
> ran fine for the longest time. Now I just use these for personal
> laptops.
>
>> Another scenario I shutdown the VM (using an image file), then simply
copy
>> the file via some tools into chunks of 100M, then startup the VM.
>
> Seems fine from administration point of view but people have become
> uptime conscious these days.
>
>> In my opinion, gluster will add a lot of overhead anyway, and maybe is
not
>> sufficiently stable, and certainly I don''t know it well enough
to put into
>> production. While LVM + MD + DRBD are all simple, low overhead, well
>> understood, etc... Each read/write with LVM/MD/DRBD is simply a remap
>> process to a physical device read/write, while glusterfs seems more of
a
>> filesystem with more overhead/complexity.
>
> And I haven''t played much with DRBD so there are only guesses. My
> understanding with network based domains'' I/O is that unless you
have
> high speed disks or network equipment or preferably a SAN, the domains
> will suffer from I/O latency if there are more than a few. Simply the
> gigabit switches and so called 6Gb/s SAS drives aren''t sufficient.
>
>> Running multiple VM''s on a single storage device, especially
spinning disks,
>> seems to be challenging to ensure the right performance with all the
>> contention/etc... Using SSD''s should be a lot simpler/easier,
but LVM
>> performance is making that really difficult, and I still don''t
understand
>> why performance is so horrible. At some point, I''ll join the
LVM list and
>> investigate in more detail, but I''ve got "good
enough" performance so far,
>> and have other higher priority issues on my list...
>
> So true. Try the workaround I mentioned above of switching the
> scheduler to noop or deadline, and see if you find any improvements.
>
>> Thanks again.
> Quite welcome!

Xen users - Jul 2013 - Disk i/o on Dom0 suddenly too slow

Disk i/o on Dom0 suddenly too slow

Re: Disk i/o on Dom0 suddenly too slow

Re: Disk i/o on Dom0 suddenly too slow

Re: Disk i/o on Dom0 suddenly too slow

Re: Disk i/o on Dom0 suddenly too slow

Re: Disk i/o on Dom0 suddenly too slow

Re: Disk i/o on Dom0 suddenly too slow