thr3ads.net - Xen users - [Xen-users] dom0 hangs when doing heavy I/O on domU [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Kiefer Chang

2011-Apr-20 15:51 UTC

[Xen-users] dom0 hangs when doing heavy I/O on domU

Hi all,

We are using XEN as hypervisor to setup our private cloud.
The framework is Eucalyptus and using CentOS 5.4 as dom0 OS.

Sometimes we find some machines'' dom0 become unresponsive, the symptoms
are:
(1) We can''t log into dom0 via ssh. After typing password, it just
stops
there.
(2) We can ping dom0 successfully.
(3) We can log into domU without problem.

The unresponsive dom0 eventually "alive" after a period of time. Maybe
half
hour or even several hours.
Then we can log into dom0 without problem. And everything works fine except
some weird things like:
(1) Some daemons stop logging during unresponsive period. The log file has a
gap.
(2) daemon is dead during the unresponsive period.

We can''t find any suspicious log on system log (system log
doesn''t log
during the period, either).
Also I redirect the console to com1, turn xen loglvl to all. There are no
logs during the period either.
I can switch to xen console by pressing Ctrl+a three times during the
unresponsive period. The console for xen is working.
We don''t do heavy I/O in dom0, just deploy some daemons like snmpd...

We are not sure what cause this, but we find a way to reproduce the same
symptom: heavy I/O in VMs.
The following is the test configuration:

Hypervisor
=========XEN 3.4.2
(Also tried 3.4.3)

DOM0
===CPU: 2 Xeon E5620(2.4GHz, 6 cores, 12 threads) dedicate 1 core.
(The symptom is much easier to be reproduce by dedicate only 1 core to dom0)
Memory: dedicate 2048M to dom0. (Node has 24G memory)
OS: CentOS 5.4. Kernel 2.6.18-164. (I also try 2.6.18-194, 2.6.18-238, and
xenlinux 2.6.18.8)
Disk: two SATA disks (Seagate ST3500630NS, 500G)
 sda and sdb. sda is used as dom0 OS''s root/swap. sdb to is formatted
as
ext3 fs and used to store VM''s image.

VMs (I use 3 VMs)
================CPU: 4 VCPUs
Memory: 1024M
Disks: 3 files in dom0''s sdb. They are root device, swap, and the disk
to
IO. (sda1, sda2, sda3 in VM)
OS: CentOS 5.4 base image. And kernel is updated to 2.6.18.194.el5xen, also
I''ve tried xenlinux 2.6.18.8 and 2.6.18.238)
Tests:
Create ext3 fs on sda3. mount sda3 to a folder. Performing vdbench
filesystem I/O on the mount folder.
The I/O behavior is:
 (1) Create 300 files, each 99m large
(2) Random select files and sequential write random patterns the file in
64KB blocks. Read the blocks to verify when done.
 (3) There is no rate limit, so the program tries its best to do I/O.
 I can provide configuration file for the workload if needed.

When running I/O on only 1 VM. The dom0 almost doesn''t response.
It''s very
hard to login. (Rarely success)
When running I/O on 3 VMs. The dom0 get worse. Log in is not possbile (block
after typing password). The symptom happens as mentioned before.
I also try to log in to dom0 on VGA console, it blocks after typing
password.
A pre-logged in session may be still working, I can issue top command. But
once I try to open file, it will block there.

The files are attach to VMs by "file://" method: dom0 uses loop device
to
associate the file and attach the loop device to VMs. From
XEN''s manual I found this method is not recommended now. So
I''ve tried
tap:aio method to attach this files to VMs. The dom0 seems good
when using this method, but we find when one VM is doing heavy I/O on its
disk, other VMs can''t perform I/O well. They can''t even
finish the booting.

If there are unclear statements please tell me, I will explain in more
detail.
Any suggestions and thoughts will be valuable to me, thanks for reading.


Best Regards,
Kiefer Chang


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Florian Heigl

2011-Apr-21 10:59 UTC

head link

Re: [Xen-users] dom0 hangs when doing heavy I/O on domU

Hi Chang,

2011/4/20 Kiefer Chang <zapchang@gmail.com>:> Hi all,
> We are using XEN as hypervisor to setup our private cloud.
> The framework is Eucalyptus and using CentOS 5.4 as dom0 OS.
> Sometimes we find some machines'' dom0 become unresponsive, the
symptoms are:
> (1) We can''t log into dom0 via ssh. After typing password, it just
stops
> there.
> (2) We can ping dom0 successfully.
> (3) We can log into domU without problem.
> The unresponsive dom0 eventually "alive" after a period of time.
Maybe half
> hour or even several hours.
So one of your domUs is trashing the disks and dom0 can''t get enough
performance, right?
- are they sharing a disk?
- can you check what I/O scheduler you are using?
  (with cfq you can then use ionice to lower prio on all blkback
threads a little. that way dom0 will "win the race")

In general, your dom0 is privileged in terms of IO access rights, but
not in IO peformance. So if one domU goes crazy, it will affect
anything.
... until you take measures :)
I''d suggest you switch to deadline scheduler and re-test.
dom0 on a different disk media is also very advisable imho.


Flo

-- 
the purpose of libvirt is to provide an abstraction layer hiding all
xen features added since 2006 until they were finally understood and
copied by the kvm devs.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Joost Roeleveld

2011-Apr-21 13:39 UTC

head link

Re: [Xen-users] dom0 hangs when doing heavy I/O on domU

On Thursday 21 April 2011 12:59:17 Florian Heigl wrote:> Hi Chang,
> 
> 2011/4/20 Kiefer Chang <zapchang@gmail.com>:
> > Hi all,
> > We are using XEN as hypervisor to setup our private cloud.
> > The framework is Eucalyptus and using CentOS 5.4 as dom0 OS.
> > Sometimes we find some machines'' dom0 become unresponsive,
the symptoms
> > are: (1) We can''t log into dom0 via ssh. After typing
password, it just
> > stops there.
> > (2) We can ping dom0 successfully.
> > (3) We can log into domU without problem.
> > The unresponsive dom0 eventually "alive" after a period of
time. Maybe
> > half hour or even several hours.
> 
> So one of your domUs is trashing the disks and dom0 can''t get
enough
> performance, right?
> - are they sharing a disk?
> - can you check what I/O scheduler you are using?
>   (with cfq you can then use ionice to lower prio on all blkback
> threads a little. that way dom0 will "win the race")
> 
> In general, your dom0 is privileged in terms of IO access rights, but
> not in IO peformance. So if one domU goes crazy, it will affect
> anything.
> ... until you take measures :)
> I''d suggest you switch to deadline scheduler and re-test.
> dom0 on a different disk media is also very advisable imho.
Another possible cause for this is if dom0 is not pinned to a
"private" cpu-
core.
I noticed similar issues before I pinned dom0 to cpu0 and configured all the 
other domUs to use the other 3 cores.

--
Joost

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Kiefer Chang

2011-Apr-21 14:56 UTC

head link

Re: [Xen-users] dom0 hangs when doing heavy I/O on domU

Hi Florian,

- are they sharing a disk?
  Yes, 3 VMs'' images are stored in the same disk sdb. sda is used for
dom0
root filesystem.
- can you check what I/O scheduler you are using?
  Default to CFQ in for sdb. I tried to ionice all blkback processes to
class 2 and still have no luck.
  I also tried deadline scheduler before.

Right now I found a cure is to make sdb as physical volume and setup volume
groups/logical volumes on it.
Attach logical volumes to VMs by "phy" method.
The symptom is gone when 3 VMs perform the same I/Os.
I know XEN manual suggest using blktap and phy method for VM storages.
But we think it''s much easier to manage VM''s image files then
LVM volumes
since we provision VMs by downloading their images from servers.

Thanks!

--
Kiefer Chang




2011/4/21 Florian Heigl <florian.heigl@gmail.com>
> Hi Chang,
>
> 2011/4/20 Kiefer Chang <zapchang@gmail.com>:
> > Hi all,
> > We are using XEN as hypervisor to setup our private cloud.
> > The framework is Eucalyptus and using CentOS 5.4 as dom0 OS.
> > Sometimes we find some machines'' dom0 become unresponsive,
the symptoms
> are:
> > (1) We can''t log into dom0 via ssh. After typing password, it
just stops
> > there.
> > (2) We can ping dom0 successfully.
> > (3) We can log into domU without problem.
> > The unresponsive dom0 eventually "alive" after a period of
time. Maybe
> half
> > hour or even several hours.
>
> So one of your domUs is trashing the disks and dom0 can''t get
enough
> performance, right?
> - are they sharing a disk?
> - can you check what I/O scheduler you are using?
>  (with cfq you can then use ionice to lower prio on all blkback
> threads a little. that way dom0 will "win the race")
>
> In general, your dom0 is privileged in terms of IO access rights, but
> not in IO peformance. So if one domU goes crazy, it will affect
> anything.
> ... until you take measures :)
> I''d suggest you switch to deadline scheduler and re-test.
> dom0 on a different disk media is also very advisable imho.
>
>
> Flo
>
> --
> the purpose of libvirt is to provide an abstraction layer hiding all
> xen features added since 2006 until they were finally understood and
> copied by the kvm devs.
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Kiefer Chang

2011-Apr-21 15:02 UTC

head link

Re: [Xen-users] dom0 hangs when doing heavy I/O on domU

Hi Joost,

I am sure dom0 is pinned to first core (configuring xen parameters in grub
menu) and domUs won''t use first core by specifying VCPU and CPU
relation in
VM''s xml file.
Initially we didn''t dedicate specific core to dom0, so XEN might
allocate
cores to dom0 and domU dynamically.
When we dedicate a single core to dom0 we find the symptom is much easier to
reproduce.

Thanks.
--
Kiefer Chang

2011/4/21 Joost Roeleveld <joost@antarean.org>
> On Thursday 21 April 2011 12:59:17 Florian Heigl wrote:
> > Hi Chang,
> >
> > 2011/4/20 Kiefer Chang <zapchang@gmail.com>:
> > > Hi all,
> > > We are using XEN as hypervisor to setup our private cloud.
> > > The framework is Eucalyptus and using CentOS 5.4 as dom0 OS.
> > > Sometimes we find some machines'' dom0 become
unresponsive, the symptoms
> > > are: (1) We can''t log into dom0 via ssh. After typing
password, it just
> > > stops there.
> > > (2) We can ping dom0 successfully.
> > > (3) We can log into domU without problem.
> > > The unresponsive dom0 eventually "alive" after a period
of time. Maybe
> > > half hour or even several hours.
> >
> > So one of your domUs is trashing the disks and dom0 can''t get
enough
> > performance, right?
> > - are they sharing a disk?
> > - can you check what I/O scheduler you are using?
> >   (with cfq you can then use ionice to lower prio on all blkback
> > threads a little. that way dom0 will "win the race")
> >
> > In general, your dom0 is privileged in terms of IO access rights, but
> > not in IO peformance. So if one domU goes crazy, it will affect
> > anything.
> > ... until you take measures :)
> > I''d suggest you switch to deadline scheduler and re-test.
> > dom0 on a different disk media is also very advisable imho.
>
> Another possible cause for this is if dom0 is not pinned to a
"private"
> cpu-
> core.
> I noticed similar issues before I pinned dom0 to cpu0 and configured all
> the
> other domUs to use the other 3 cores.
>
> --
> Joost
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Apr 2011 - dom0 hangs when doing heavy I/O on domU

[Xen-users] dom0 hangs when doing heavy I/O on domU

Re: [Xen-users] dom0 hangs when doing heavy I/O on domU

Re: [Xen-users] dom0 hangs when doing heavy I/O on domU

Re: [Xen-users] dom0 hangs when doing heavy I/O on domU

Re: [Xen-users] dom0 hangs when doing heavy I/O on domU