thr3ads.net - Xen users - [Xen-users] disk I/O problems under load? (Xen-3.4.1/x86

If this information is useful, please help other people find it:
Share via:

Luca Lesinigo

2009-Sep-30 23:33 UTC

[Xen-users] disk I/O problems under load? (Xen-3.4.1/x86_64)

I''m getting problems whenever the load on a system increase, but IMHO  
it should be well withing hardware capabilities.

My configuration:
- HP Proliant DL160G5, with a single quadcore E5405, 14GiB RAM, 2x1TB  
sata disks Hitachi 7K1000.B on the onboard sata controller (intel  
chipset)
- Xen-3.4.1 64bit hypervisor, compiled from gentoo portage, with  
default commandline settings (I just specify the serial console and  
nothing else)
- Domain-0 with gentoo''s xen-sources 2.6.21 (the xen 2.6.18 tarball  
didn''t have networking, I think the HP Tigon3 gigabit driver is too  
old but hadn''t time to look into that
- Domain-0 is using the CFQ i/o scheduler, and works from a software  
raid-1, no tickless kernel, HZ=100. It has all the free ram (currently  
some 5.x GiB)
- the rest of the disks is also mirrored in a raid-1 device, and I use  
LVM2 on top of that
- 6x paravirt 64bit DomU with 2.6.29-gentoo-r5 kernel, with NOOP i/o  
scheduler, tickless kernel. 1 - 1.5GiB of ram each.
- 1x HVM 32bit Windows XP DomU, without any paravirt driver, 512MiB RAM
- I use logical volumes as storage space for DomU''s, the linux ones  
also have 0.5GiB of swap space (unused, no DomU is swapping)
- all the linux DomU are on ext3 (noatime), and all DomU are single- 
cpu (just one vcpu each)
- network is bridged (one lan and one wan interface on the physical  
system and the same for each domU), no jumbo frames

Usually load on the system is very low. But when there is some I/O  
related load (I can easily trigger it by rsync''ing lots of stuff  
between domU''s or from a different system to one of the domU or to the
dom0) load gets very high and I often see domU''s spending all their  
cpu time in "wait" [for I/O] state. When that happens, load on  
Domain-0 gets high (jumps from <1 to >5) and loads on DomU''s get
high
too probably because of processes waiting for I/O to happen. Sometimes  
iostat will even show exactly 0.00 tps on all the dm-X devices (domU  
storage backends) and some activity on the physical devices, like all  
domU I/O activity froze up while dom0 is busy flushing caches or doing  
some other stuff.

vmstat in Dom0 shows one or two cores (25% or 50% cpu time) busy in  
''iowait'' state, and context switches go in the thousands, but
not in
the hundreths thousands that http://wiki.xensource.com/xenwiki/KnownIssues 
  talks about.

I tried pinning cpus: Domain-0 had its four VCPUs pinned to CPUs 0 and  
1, some domU''s pinned to CPU 2, and some domU''s pinned to CPU
3. As
far as I can tell it did not do any difference.
I also (briefly) tested with all linux DomU''s running with the CFQ  
scheduler, while it didn''t seem to make any difference it also was too
short of a test to trust it much.

What''s worse, sometimes I get qemu-dm processes (for the HVM domU) in  
zombie state. It also happened that the HVM domU crashed and I wasn''t  
able to restart it: I got the hotplug scripts not working error from  
xm create, and looking in xenstore-ls I saw instances of the crashed  
domU with all its resources (which probably was the cause of the  
error?). Had the reboot the whole system to be able to start that  
domain again.

Normally iostat in Domain-0 shows more or less high tps (200~300 under  
normal load, even higher if I play around with rsync to artificially  
trigger the problems) on the md device where all the DomU reside, and  
much less (usually just 10-20% of the previous value) on the two  
physical disks sda and sdb that compose the mirror. I guess I see less  
tps because the scheduler/elevator in Dom-0 is doing its job.

I don''t know if the load problems and the HVM problem are linked or  
not, but I also don''t know where to look to solve any one of them.

Any help would be appreciated, thank you very much. Also, what are  
ideal/recommended settings in dom0 and domU regarding i/o schedulers  
and tickless or not?
Is there any reason to leave the hypervisor some extra free ram or it  
is ok to just let xend shrink dom0 when needed and leave free just the  
minimum? If I sum up memory (currently) used by domains, I get  
14146MiB. xm info says 14335MiB total_memory and 10MiB free_memory.

--
Luca Lesinigo

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Pasi Kärkkäinen

2009-Oct-01 07:12 UTC

head link

Re: [Xen-users] disk I/O problems under load? (Xen-3.4.1/x86_64)

On Thu, Oct 01, 2009 at 01:33:57AM +0200, Luca Lesinigo
wrote:> I''m getting problems whenever the load on a system increase, but
IMHO
> it should be well withing hardware capabilities.
> 
> My configuration:
> - HP Proliant DL160G5, with a single quadcore E5405, 14GiB RAM, 2x1TB  
> sata disks Hitachi 7K1000.B on the onboard sata controller (intel  
> chipset)
> - Xen-3.4.1 64bit hypervisor, compiled from gentoo portage, with  
> default commandline settings (I just specify the serial console and  
> nothing else)
> - Domain-0 with gentoo''s xen-sources 2.6.21 (the xen 2.6.18
tarball
> didn''t have networking, I think the HP Tigon3 gigabit driver is
too
> old but hadn''t time to look into that
> - Domain-0 is using the CFQ i/o scheduler, and works from a software  
> raid-1, no tickless kernel, HZ=100. It has all the free ram (currently  
> some 5.x GiB)
> - the rest of the disks is also mirrored in a raid-1 device, and I use  
> LVM2 on top of that
> - 6x paravirt 64bit DomU with 2.6.29-gentoo-r5 kernel, with NOOP i/o  
> scheduler, tickless kernel. 1 - 1.5GiB of ram each.
> - 1x HVM 32bit Windows XP DomU, without any paravirt driver, 512MiB RAM
> - I use logical volumes as storage space for DomU''s, the linux
ones
> also have 0.5GiB of swap space (unused, no DomU is swapping)
> - all the linux DomU are on ext3 (noatime), and all DomU are single- 
> cpu (just one vcpu each)
> - network is bridged (one lan and one wan interface on the physical  
> system and the same for each domU), no jumbo frames
> 
> Usually load on the system is very low. But when there is some I/O  
> related load (I can easily trigger it by rsync''ing lots of stuff  
> between domU''s or from a different system to one of the domU or to
the
> dom0) load gets very high and I often see domU''s spending all
their
> cpu time in "wait" [for I/O] state. When that happens, load on  
> Domain-0 gets high (jumps from <1 to >5) and loads on DomU''s
get high
> too probably because of processes waiting for I/O to happen. Sometimes  
> iostat will even show exactly 0.00 tps on all the dm-X devices (domU  
> storage backends) and some activity on the physical devices, like all  
> domU I/O activity froze up while dom0 is busy flushing caches or doing  
> some other stuff.
> 
> vmstat in Dom0 shows one or two cores (25% or 50% cpu time) busy in  
> ''iowait'' state, and context switches go in the thousands,
but not in
> the hundreths thousands that http://wiki.xensource.com/xenwiki/KnownIssues 
>  talks about.
> 
You have only 2x 7200 rpm disks for 7 virtual machines and you''re
wondering why there''s a lot of iowait? :)
> I tried pinning cpus: Domain-0 had its four VCPUs pinned to CPUs 0 and  
> 1, some domU''s pinned to CPU 2, and some domU''s pinned to
CPU 3. As
> far as I can tell it did not do any difference.
> I also (briefly) tested with all linux DomU''s running with the CFQ
> scheduler, while it didn''t seem to make any difference it also was
too
> short of a test to trust it much.
> 
> What''s worse, sometimes I get qemu-dm processes (for the HVM domU)
in
> zombie state. It also happened that the HVM domU crashed and I
wasn''t
> able to restart it: I got the hotplug scripts not working error from  
> xm create, and looking in xenstore-ls I saw instances of the crashed  
> domU with all its resources (which probably was the cause of the  
> error?). Had the reboot the whole system to be able to start that  
> domain again.
> 
> Normally iostat in Domain-0 shows more or less high tps (200~300 under  
> normal load, even higher if I play around with rsync to artificially  
> trigger the problems) on the md device where all the DomU reside, and  
> much less (usually just 10-20% of the previous value) on the two  
> physical disks sda and sdb that compose the mirror. I guess I see less  
> tps because the scheduler/elevator in Dom-0 is doing its job.
> 
> I don''t know if the load problems and the HVM problem are linked
or
> not, but I also don''t know where to look to solve any one of them.
> 
> Any help would be appreciated, thank you very much. Also, what are  
> ideal/recommended settings in dom0 and domU regarding i/o schedulers  
> and tickless or not?
> Is there any reason to leave the hypervisor some extra free ram or it  
> is ok to just let xend shrink dom0 when needed and leave free just the  
> minimum? If I sum up memory (currently) used by domains, I get  
> 14146MiB. xm info says 14335MiB total_memory and 10MiB free_memory.
> 
Single 7200 rpm SATA disk can do around 120 random IOPS.. 
120 IO operations per second.

120 IOPS / 7 VMs = 17 IOPS available per VM.

That''s not much..

-- Pasi


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Luca Lesinigo

2009-Oct-01 11:33 UTC

head link

Re: [Xen-users] disk I/O problems under load? (Xen-3.4.1/x86_64)

[resending to list as I erroneously sent directly to Pasi]

Il giorno 01/ott/09, alle ore 09:12, Pasi Kärkkäinen ha
scritto:> You have only 2x 7200 rpm disks for 7 virtual machines and you''re
> wondering why there''s a lot of iowait? :)Actually, no. I just reported the observed behaviour of the system.
I fully expect, as load increases, to hit the disk i/o bottleneck  
before having any cpu/ram/network problem.
>> Normally iostat in Domain-0 shows more or less high tps (200~300  
>> under normal load, even higher if I play around with rsync to  
>> artificially trigger the problems) on the md device where all the  
>> DomU reside,Here we''re talking about the MD ''virtual'' device,
it''s a linux kernel
artifact...>> and  much less (usually just 10-20% of the previous value) on the  
>> two physical disks sda and sdb that compose the mirror....and here we''re talking about the infamous 7200rpm disks, that usual
under normal load are floating at 20-30 iops.

What I did _not_ expect was that all domains would literally freeze up  
for the whole duration of a single i/o intensive job (rsync example),  
or any qemu-dm / HVM domain to crash (but I''m not sure if this is  
related or not). We''re not talking about a slow website, but about  
people not even connecting to HTTP (or POP, FTP, etc...) servers,  
without the latter ever hitting any application-level hard limit like  
number of worker threads/processes and the like.

I expected the CompletelyFairScheduler in linux to be fair, maybe not  
completely fair but a little fair at least ;), in distributing I/O  
load in the system so it would gracefully slow down as load increased.  
I do not have scientific tests, but my guess is that a single, non  
virtualized system, would keep up with that load, maybe slowing down  
during peaks but not freezing up anywhere.

While I''m still learning how to get the most out of Xen, I''m
not 100%
sure about my choice of kernel configurations (see my questions about  
I/O schedulers and tickless kernels) and hypervisor usage (see free  
ram, cpu pinning, etc). If my guess that the load should be
''light'' on
that system is correct, I''m probably just hitting some  
underoptimization issues in my setup.
On the same line, I can add two other disks to that system (no  
networked storage for now, I have to rely on 4 local sata bays), I''ll  
study up how to get the most iops out of the hardware I have.

Thanks,
--
Luca Lesinigo
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Sep 2009 - disk I/O problems under load? (Xen-3.4.1/x86_64)

[Xen-users] disk I/O problems under load? (Xen-3.4.1/x86_64)

Re: [Xen-users] disk I/O problems under load? (Xen-3.4.1/x86_64)

Re: [Xen-users] disk I/O problems under load? (Xen-3.4.1/x86_64)