thr3ads.net - Xen users - [Xen-users] lots of cycles in i/o wait state [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Miles Fidelman

2010-Jun-05 22:59 UTC

[Xen-users] lots of cycles in i/o wait state

Hi Folks,

I''ve been doing some experimenting to see how far I can push some old 
hardware into a virtualized environment - partially to see how much use 
I can get out of the hardware, and partially to learn more about the 
behavior of, and interactions between, software RAID, LVM, DRBD, and Xen.

Basic configuration:

- two machines, 4 disk drives each, two 1G ethernet ports (1 each to the 
outside world, 1 each as a cross-connect)
- each machine runs Xen 3 on top of Debian Lenny (the basic install)
- very basic Dom0s - just running the hypervisor and i/o (including disk 
management)
---- software RAID6 (md)
---- LVM
---- DRBD
---- heartbeat to provide some failure migration
- dom0, on each machine, runs directly on md RAID volumes (RAID1 for 
boot, RAID6 for root and swap)
- each Xen VM uses 2 DRBD volumes - one for root, one for swap
- one of the VMs has a third volume, used for backup copies of files

One domU, on one machine, runs a medium volume mail/list server.  This 
used to run non-virtualized on one of the machines, and I moved it into 
a domU.  Before virtualization, everything just hummed along (98% idle 
time as reported by top).  Virtualized, the machine is mostly idle, but 
now top reports a lot of i/o wait time, usually in the 20-25% range).

As I''ve started experimenting with adding additional domUs, in various 
configurations, I''ve found that my mail server can get into a state 
where it''s spending almost all of its cycles in an i/o wait state (95% 
and higher as reported by top).  This is particularly noticeable when I 
run a backup job (essentially a large tar job that reads from the root 
volume and writes to the backup volume).  The domU grinds to halt.

So I''ve been trying to track down the bottlenecks.

At first, I thought this was probably a function of pushing my disk 
stack beyond reasonable limits - what with multiple domUs on top of DRBD 
volumes, on top of LVM volumes, on top of software RAID6 (md).  I 
figured I was seeing a lot of disk churning.

But... after running some disk benchmarks, what I''m seeing is something
else:

- I took one machine, turned off all the domUs, and turned off DRBD
- I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/sec to 
90MB/sec of throughput depending on the test (not exactly sure what this 
means, but it''s a baseline)
- I then brought up DRBD and various combinations of domUs, and ran the 
benchmark in various places
- the most interesting result, running in the same domU as the mail 
server: 34M-60M depending on the test (not much degredation from running 
directly on the RAID volume
- but.... while running, the benchmark, the baseline i/o wait percentage 
jumps from 25% to the 70-90% range

So... the question becomes, if it''s not disk churning, what''s
causing
all those i/o wait cycles?  I''m starting to think it might involve 
buffering or other interactions in the hypervisor.

Any thoughts or suggestions regarding diagnostics and/or tuning?  (Other 
than "throw hardware at it" of course :-).

Thanks very much,

Miles Fidelman





_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Florian Manschwetus

2010-Jun-05 23:25 UTC

head link

Re: [Xen-users] lots of cycles in i/o wait state

Sounds like hvm without pv-drivers, so a lot of i/o-wait would be normal.
Solution => enhance yout kernel / drivers.

Florian


Am 06.06.2010 00:59, schrieb Miles Fidelman:> Hi Folks,
> 
> I''ve been doing some experimenting to see how far I can push some
old
> hardware into a virtualized environment - partially to see how much use
> I can get out of the hardware, and partially to learn more about the
> behavior of, and interactions between, software RAID, LVM, DRBD, and Xen.
> 
> Basic configuration:
> 
> - two machines, 4 disk drives each, two 1G ethernet ports (1 each to the
> outside world, 1 each as a cross-connect)
> - each machine runs Xen 3 on top of Debian Lenny (the basic install)
> - very basic Dom0s - just running the hypervisor and i/o (including disk
> management)
> ---- software RAID6 (md)
> ---- LVM
> ---- DRBD
> ---- heartbeat to provide some failure migration
> - dom0, on each machine, runs directly on md RAID volumes (RAID1 for
> boot, RAID6 for root and swap)
> - each Xen VM uses 2 DRBD volumes - one for root, one for swap
> - one of the VMs has a third volume, used for backup copies of files
> 
> One domU, on one machine, runs a medium volume mail/list server.  This
> used to run non-virtualized on one of the machines, and I moved it into
> a domU.  Before virtualization, everything just hummed along (98% idle
> time as reported by top).  Virtualized, the machine is mostly idle, but
> now top reports a lot of i/o wait time, usually in the 20-25% range).
> 
> As I''ve started experimenting with adding additional domUs, in
various
> configurations, I''ve found that my mail server can get into a
state
> where it''s spending almost all of its cycles in an i/o wait state
(95%
> and higher as reported by top).  This is particularly noticeable when I
> run a backup job (essentially a large tar job that reads from the root
> volume and writes to the backup volume).  The domU grinds to halt.
> 
> So I''ve been trying to track down the bottlenecks.
> 
> At first, I thought this was probably a function of pushing my disk
> stack beyond reasonable limits - what with multiple domUs on top of DRBD
> volumes, on top of LVM volumes, on top of software RAID6 (md).  I
> figured I was seeing a lot of disk churning.
> 
> But... after running some disk benchmarks, what I''m seeing is
something
> else:
> 
> - I took one machine, turned off all the domUs, and turned off DRBD
> - I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/sec to
> 90MB/sec of throughput depending on the test (not exactly sure what this
> means, but it''s a baseline)
> - I then brought up DRBD and various combinations of domUs, and ran the
> benchmark in various places
> - the most interesting result, running in the same domU as the mail
> server: 34M-60M depending on the test (not much degredation from running
> directly on the RAID volume
> - but.... while running, the benchmark, the baseline i/o wait percentage
> jumps from 25% to the 70-90% range
> 
> So... the question becomes, if it''s not disk churning,
what''s causing
> all those i/o wait cycles?  I''m starting to think it might involve
> buffering or other interactions in the hypervisor.
> 
> Any thoughts or suggestions regarding diagnostics and/or tuning?  (Other
> than "throw hardware at it" of course :-).
> 
> Thanks very much,
> 
> Miles Fidelman
> 
> 
> 
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
> 



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Miles Fidelman

2010-Jun-05 23:52 UTC

head link

Re: [Xen-users] lots of cycles in i/o wait state

Florian Manschwetus wrote:> Sounds like hvm without pv-drivers, so a lot of i/o-wait would be normal.
> Solution => enhance yout kernel / drivers.
>Can you elaborate just a bit?
I''m running the stock Debian Lenny xen install (package: 
xen-linux-system-2.6.26-2-xen-686); domU installed with xen-tools and 
debootstrap; uname -r for both dom0 and domU returns 2.6.26-2-xen-686

my hardware doesn''t support HVM, so everything is PV based

so... I''m not quite sure what I''m missing, how I would
deterimine if, in
fact, I''m missing pv-drivers, or how I''d install them if they
are missing

A few more details would be very much appreciated.

Thanks again,

Miles Fidelman


-- 
In theory, there is no difference between theory and practice.
In <fnord> practice, there is. .... Yogi Berra



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Miles Fidelman

2010-Jun-07 02:17 UTC

head link

Re: [Xen-users] lots of cycles in i/o wait state

re. my previous messages on this topic:

It''s absolutely amazing with mounting volumes with "noatime"
set will do
to reduce i/o wait times!  Took a while to figure this out, though.

Miles>
> Am 06.06.2010 00:59, schrieb Miles Fidelman:
>    
>> Hi Folks,
>>
>> I''ve been doing some experimenting to see how far I can push
some old
>> hardware into a virtualized environment - partially to see how much use
>> I can get out of the hardware, and partially to learn more about the
>> behavior of, and interactions between, software RAID, LVM, DRBD, and
Xen.
>>
>> Basic configuration:
>>
>> - two machines, 4 disk drives each, two 1G ethernet ports (1 each to
the
>> outside world, 1 each as a cross-connect)
>> - each machine runs Xen 3 on top of Debian Lenny (the basic install)
>> - very basic Dom0s - just running the hypervisor and i/o (including
disk
>> management)
>> ---- software RAID6 (md)
>> ---- LVM
>> ---- DRBD
>> ---- heartbeat to provide some failure migration
>> - dom0, on each machine, runs directly on md RAID volumes (RAID1 for
>> boot, RAID6 for root and swap)
>> - each Xen VM uses 2 DRBD volumes - one for root, one for swap
>> - one of the VMs has a third volume, used for backup copies of files
>>
>> One domU, on one machine, runs a medium volume mail/list server.  This
>> used to run non-virtualized on one of the machines, and I moved it into
>> a domU.  Before virtualization, everything just hummed along (98% idle
>> time as reported by top).  Virtualized, the machine is mostly idle, but
>> now top reports a lot of i/o wait time, usually in the 20-25% range).
>>
>> As I''ve started experimenting with adding additional domUs, in
various
>> configurations, I''ve found that my mail server can get into a
state
>> where it''s spending almost all of its cycles in an i/o wait
state (95%
>> and higher as reported by top).  This is particularly noticeable when I
>> run a backup job (essentially a large tar job that reads from the root
>> volume and writes to the backup volume).  The domU grinds to halt.
>>
>> So I''ve been trying to track down the bottlenecks.
>>
>> At first, I thought this was probably a function of pushing my disk
>> stack beyond reasonable limits - what with multiple domUs on top of
DRBD
>> volumes, on top of LVM volumes, on top of software RAID6 (md).  I
>> figured I was seeing a lot of disk churning.
>>
>> But... after running some disk benchmarks, what I''m seeing is
something
>> else:
>>
>> - I took one machine, turned off all the domUs, and turned off DRBD
>> - I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/sec to
>> 90MB/sec of throughput depending on the test (not exactly sure what
this
>> means, but it''s a baseline)
>> - I then brought up DRBD and various combinations of domUs, and ran the
>> benchmark in various places
>> - the most interesting result, running in the same domU as the mail
>> server: 34M-60M depending on the test (not much degredation from
running
>> directly on the RAID volume
>> - but.... while running, the benchmark, the baseline i/o wait
percentage
>> jumps from 25% to the 70-90% range
>>
>> So... the question becomes, if it''s not disk churning,
what''s causing
>> all those i/o wait cycles?  I''m starting to think it might
involve
>> buffering or other interactions in the hypervisor.
>>
>> Any thoughts or suggestions regarding diagnostics and/or tuning? 
(Other
>> than "throw hardware at it" of course :-).
>>
>> Thanks very much,
>>
>> Miles Fidelman
>>
>>
>>
>>
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xensource.com
>> http://lists.xensource.com/xen-users
>>
>>      
>
>    
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Brian Krusic

2010-Jun-07 04:28 UTC

head link

Re: [Xen-users] lots of cycles in i/o wait state

brrrrrilliant!

i forgot about that option, good call.

- Brian

On Jun 6, 2010, at 7:17 PM, Miles Fidelman wrote:
> re. my previous messages on this topic:
>
> It''s absolutely amazing with mounting volumes with
"noatime" set
> will do to reduce i/o wait times!  Took a while to figure this out,  
> though.
>
> Miles
>>
>> Am 06.06.2010 00:59, schrieb Miles Fidelman:
>>
>>> Hi Folks,
>>>
>>> I''ve been doing some experimenting to see how far I can
push some
>>> old
>>> hardware into a virtualized environment - partially to see how  
>>> much use
>>> I can get out of the hardware, and partially to learn more about
the
>>> behavior of, and interactions between, software RAID, LVM, DRBD,  
>>> and Xen.
>>>
>>> Basic configuration:
>>>
>>> - two machines, 4 disk drives each, two 1G ethernet ports (1 each  
>>> to the
>>> outside world, 1 each as a cross-connect)
>>> - each machine runs Xen 3 on top of Debian Lenny (the basic
install)
>>> - very basic Dom0s - just running the hypervisor and i/o  
>>> (including disk
>>> management)
>>> ---- software RAID6 (md)
>>> ---- LVM
>>> ---- DRBD
>>> ---- heartbeat to provide some failure migration
>>> - dom0, on each machine, runs directly on md RAID volumes (RAID1
for
>>> boot, RAID6 for root and swap)
>>> - each Xen VM uses 2 DRBD volumes - one for root, one for swap
>>> - one of the VMs has a third volume, used for backup copies of
files
>>>
>>> One domU, on one machine, runs a medium volume mail/list server.   
>>> This
>>> used to run non-virtualized on one of the machines, and I moved it
>>> into
>>> a domU.  Before virtualization, everything just hummed along (98%  
>>> idle
>>> time as reported by top).  Virtualized, the machine is mostly  
>>> idle, but
>>> now top reports a lot of i/o wait time, usually in the 20-25%  
>>> range).
>>>
>>> As I''ve started experimenting with adding additional
domUs, in
>>> various
>>> configurations, I''ve found that my mail server can get
into a state
>>> where it''s spending almost all of its cycles in an i/o
wait state
>>> (95%
>>> and higher as reported by top).  This is particularly noticeable  
>>> when I
>>> run a backup job (essentially a large tar job that reads from the  
>>> root
>>> volume and writes to the backup volume).  The domU grinds to halt.
>>>
>>> So I''ve been trying to track down the bottlenecks.
>>>
>>> At first, I thought this was probably a function of pushing my disk
>>> stack beyond reasonable limits - what with multiple domUs on top  
>>> of DRBD
>>> volumes, on top of LVM volumes, on top of software RAID6 (md).  I
>>> figured I was seeing a lot of disk churning.
>>>
>>> But... after running some disk benchmarks, what I''m seeing
is
>>> something
>>> else:
>>>
>>> - I took one machine, turned off all the domUs, and turned off DRBD
>>> - I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/ 
>>> sec to
>>> 90MB/sec of throughput depending on the test (not exactly sure  
>>> what this
>>> means, but it''s a baseline)
>>> - I then brought up DRBD and various combinations of domUs, and  
>>> ran the
>>> benchmark in various places
>>> - the most interesting result, running in the same domU as the mail
>>> server: 34M-60M depending on the test (not much degredation from  
>>> running
>>> directly on the RAID volume
>>> - but.... while running, the benchmark, the baseline i/o wait  
>>> percentage
>>> jumps from 25% to the 70-90% range
>>>
>>> So... the question becomes, if it''s not disk churning,
what''s
>>> causing
>>> all those i/o wait cycles?  I''m starting to think it might
involve
>>> buffering or other interactions in the hypervisor.
>>>
>>> Any thoughts or suggestions regarding diagnostics and/or tuning?   
>>> (Other
>>> than "throw hardware at it" of course :-).
>>>
>>> Thanks very much,
>>>
>>> Miles Fidelman
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-users mailing list
>>> Xen-users@lists.xensource.com
>>> http://lists.xensource.com/xen-users
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xensource.com
>> http://lists.xensource.com/xen-users
>
>
> -- 
> In theory, there is no difference between theory and practice.
> In<fnord>  practice, there is.   .... Yogi Berra
>
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Florian Manschwetus

2010-Jun-07 06:15 UTC

head link

Re: [Xen-users] lots of cycles in i/o wait state

Am 06.06.2010 23:12, schrieb Miles Fidelman:> Hi Florian,
> 
> Florian Manschwetus wrote:
>> Sounds like hvm without pv-drivers, so a lot of i/o-wait would be
normal.
>> Solution =>  enhance yout kernel / drivers.
>>
>> Florian
>>    
> Can you provide a few more details on this?  I''m running on a
machine
> that doesn''t support hardware VM extensions, so I''m
pretty sure
> everything is installed with PV drivers.
> If your metal has no VM extensions you would need a pv aware kernel, not
ust pv drivers to run anything, so the problem I mentioned is sure out
of the race here. (observed running win2008(normal/r2)x64).
> On the other hand, the VM started as a copy of a system that was running
> directly on bare iron - so maybe there are some kernal modules getting
> loaded somewhere along the line that shouldn''t be.
> 
> Any suggestions regarding diagnostics?
> Would be worth a look at dom0 activity here, maybe you have an i/o
bottleneck at all.

Florian
> Thank you very much,
> 
> Miles Fidelman
> 
> --- original query follows ---...



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Pasi Kärkkäinen

2010-Jun-07 08:08 UTC

head link

Re: [Xen-users] lots of cycles in i/o wait state

On Sat, Jun 05, 2010 at 06:59:51PM -0400, Miles Fidelman
wrote:> Hi Folks,
>
> I''ve been doing some experimenting to see how far I can push some
old
> hardware into a virtualized environment - partially to see how much use  
> I can get out of the hardware, and partially to learn more about the  
> behavior of, and interactions between, software RAID, LVM, DRBD, and Xen.
>
> Basic configuration:
>
> - two machines, 4 disk drives each, two 1G ethernet ports (1 each to the  
> outside world, 1 each as a cross-connect)
> - each machine runs Xen 3 on top of Debian Lenny (the basic install)
> - very basic Dom0s - just running the hypervisor and i/o (including disk  
> management)
> ---- software RAID6 (md)
>
Software RAID6 will really suck for random IO performance..
IO pattern from running multiple VMs will be random!
> ---- LVM
> ---- DRBD
> ---- heartbeat to provide some failure migration
> - dom0, on each machine, runs directly on md RAID volumes (RAID1 for  
> boot, RAID6 for root and swap)
> - each Xen VM uses 2 DRBD volumes - one for root, one for swap
> - one of the VMs has a third volume, used for backup copies of files
>
> One domU, on one machine, runs a medium volume mail/list server.  This  
> used to run non-virtualized on one of the machines, and I moved it into  
> a domU.  Before virtualization, everything just hummed along (98% idle  
> time as reported by top).  Virtualized, the machine is mostly idle, but  
> now top reports a lot of i/o wait time, usually in the 20-25% range).
>
Is your disk/partition aligment properly set up? Doing it wrong could
cause bad performance. It''s easy to mess it up with VMs.
> As I''ve started experimenting with adding additional domUs, in
various
> configurations, I''ve found that my mail server can get into a
state
> where it''s spending almost all of its cycles in an i/o wait state
(95%
> and higher as reported by top).  This is particularly noticeable when I  
> run a backup job (essentially a large tar job that reads from the root  
> volume and writes to the backup volume).  The domU grinds to halt.
>
Is that iowait measure in the guest, or in dom0?
> So I''ve been trying to track down the bottlenecks.
>
> At first, I thought this was probably a function of pushing my disk  
> stack beyond reasonable limits - what with multiple domUs on top of DRBD  
> volumes, on top of LVM volumes, on top of software RAID6 (md).  I  
> figured I was seeing a lot of disk churning.
>
Yeah, that setup will slow you down a lot. 

RAID6 is bad for random IO performance, and DRBD doesn''t really help
there..
> But... after running some disk benchmarks, what I''m seeing is
something
> else:
>
> - I took one machine, turned off all the domUs, and turned off DRBD
> - I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/sec to  
> 90MB/sec of throughput depending on the test (not exactly sure what this  
> means, but it''s a baseline)
>
> - I then brought up DRBD and various combinations of domUs, and ran the  
> benchmark in various places
> - the most interesting result, running in the same domU as the mail  
> server: 34M-60M depending on the test (not much degredation from running  
> directly on the RAID volume
> - but.... while running, the benchmark, the baseline i/o wait percentage  
> jumps from 25% to the 70-90% range
>
Again run "iostat 1" in both the domU and dom0, and compare the
results.
Also run "xm top" in dom0 to monitor the overall CPU usage.
> So... the question becomes, if it''s not disk churning,
what''s causing
> all those i/o wait cycles?  I''m starting to think it might involve
> buffering or other interactions in the hypervisor.
>
> Any thoughts or suggestions regarding diagnostics and/or tuning?  (Other  
> than "throw hardware at it" of course :-).
>
Remember your storage cannot do many random IOs..

-- Pasi


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Miles Fidelman

2010-Jun-07 10:58 UTC

head link

Re: [Xen-users] lots of cycles in i/o wait state

Pasi Kärkkäinen wrote:> On Sat, Jun 05, 2010 at 06:59:51PM -0400, Miles Fidelman wrote:
>    
>> Hi Folks,
>>
>> I''ve been doing some experimenting to see how far I can push
some old
>> hardware into a virtualized environment - partially to see how much use
>> I can get out of the hardware, and partially to learn more about the
>> behavior of, and interactions between, software RAID, LVM, DRBD, and
Xen.
>>
>>      
> Is your disk/partition aligment properly set up? Doing it wrong could
> cause bad performance. It''s easy to mess it up with VMs.
>    can you say a little more about what you mean by "properly set up" vs.
not properly set up?>> As I''ve started experimenting with adding additional domUs, in
various
>> configurations, I''ve found that my mail server can get into a
state
>> where it''s spending almost all of its cycles in an i/o wait
state (95%
>> and higher as reported by top).  This is particularly noticeable when I
>> run a backup job (essentially a large tar job that reads from the root
>> volume and writes to the backup volume).  The domU grinds to halt.
>>      
> Is that iowait measure in the guest, or in dom0?
>    iowait ONLY suffers in the guest

when I run stress tests, iowait (in the guest) jumps considerably when:
- running a benchmark (bonnie++) in dom0, on either host (to be 
expected, given that dom0 gets priority)
- running bonnie++ in the guest with iowait problems

running bonnie++ in another guest does not impact the
iowaits> Again run "iostat 1" in both the domU and dom0, and compare the
results.
> Also run "xm top" in dom0 to monitor the overall CPU usage.
>    very little CPU load

iostat (and vmstat) are what really helped me track things down; and 
after doing a lot of googling on "performance tuning" and
"iowait" I
came across the suggestion to add "noatime" to my mount options --- 
brought my iowait times way down, and sped up performance

you learn something new every day :-)

Thanks again, to all,

Miles Fidelman




-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Jun 2010 - lots of cycles in i/o wait state

[Xen-users] lots of cycles in i/o wait state

Re: [Xen-users] lots of cycles in i/o wait state

Re: [Xen-users] lots of cycles in i/o wait state

Re: [Xen-users] lots of cycles in i/o wait state

Re: [Xen-users] lots of cycles in i/o wait state

Re: [Xen-users] lots of cycles in i/o wait state

Re: [Xen-users] lots of cycles in i/o wait state

Re: [Xen-users] lots of cycles in i/o wait state