thr3ads.net - Xen users - [Xen-users] apparently random lockups [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Eric S. Johansson

2005-Dec-23 14:14 UTC

[Xen-users] apparently random lockups

I''m using xen unstable taken from hg repository a couple of days before
3.0 release (how''s that for timing).

I''ve been struggling with a series of random lockups on a domU
I''m using
as a mailserver.  the initial problems were that any significant 
activity either on the mail server or the IMAP server would cause the 
domU to go dead with absolute no information in the log files or console.

At first it appeared to something filesystem related so I converted 
everything to ext3.  then I discovered that if I had accidentally left 
the domU /home filesystem mounted, the load average on the domU machine 
would climb very rapidly.

making sure everything is clean, setting the virtual CPUs down to one, 
upgrading allocated memory to 256 MB, things seemed okay after 
exercising the system fairly heavily with a recursive grep through a 
large set of files.

then the problems with e-mail access causing lockups started up again. 
coincidentally, e-mail in the inbox has vanished from view while still 
remaining in the filesystem.  This may or may not be related but it''s 
another data point.

quite frankly, I at my wits end.  any ideas?

---eric



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

William Man

2005-Dec-23 14:19 UTC

head link

[Xen-users] dom0 Errors

Hi,

I was wondering if anyone can make sense of these errors in the message log:

Dec 23 14:14:31 localhost kernel: Badness in local_bh_enable at
kernel/softirq.
Dec 23 14:14:31 localhost kernel:  [local_bh_enable+130/144]
local_bh_enable+0x
Dec 23 14:14:31 localhost kernel:  [skb_checksum+317/704]
skb_checksum+0x13d/0x
Dec 23 14:14:31 localhost kernel:  [udp_poll+154/352] udp_poll+0x9a/0x160
Dec 23 14:14:31 localhost kernel:  [sock_poll+41/64] sock_poll+0x29/0x40
Dec 23 14:14:31 localhost kernel:  [do_pollfd+149/160] do_pollfd+0x95/0xa0
Dec 23 14:14:31 localhost kernel:  [do_poll+106/208] do_poll+0x6a/0xd0
Dec 23 14:14:31 localhost kernel:  [sys_poll+353/576] sys_poll+0x161/0x240
Dec 23 14:14:31 localhost kernel:  [sys_gettimeofday+60/144]
sys_gettimeofday+0
Dec 23 14:14:31 localhost kernel:  [__pollwait+0/208] __pollwait+0x0/0xd0
Dec 23 14:14:31 localhost kernel:  [syscall_call+7/11] syscall_call+0x7/0xb

And

Dec 18 17:19:30 localhost kernel: hdc: lost interrupt


Can anyone shed any light on what is going on?  Also, dom0 randomly hangs
without any errors in the logs, sometimes after a few hours of being up,
sometimes days.

Thanks

William


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Eric S. Johansson

2005-Dec-23 17:43 UTC

head link

Re: [Xen-users] apparently random lockups

Eric S. Johansson wrote:> I''m using xen unstable taken from hg repository a couple of days
before
> 3.0 release (how''s that for timing).
> 
> I''ve been struggling with a series of random lockups on a domU
I''m using
> as a mailserver.  the initial problems were that any significant 
> activity either on the mail server or the IMAP server would cause the 
> domU to go dead with absolute no information in the log files or console.
now that I''ve had some sleep, here''s a little bit more
information.  It
looked like at all the lockups were focused on one particular domU 
instance.  when I woke up, a different domU instance was "dead".  it
was
not responding to connections over its ethernet interface.

I connected to the console and found I could login. ifconfig showed that 
the interface was up and had an IP address.  But I could not go out to 
that interface to any other machine.  Restarting the virtual machine 
brought the interface back to life.

I think I''m tripping over a series of bugs and getting confused. 
Teasing apart my experience, I would say that I hit two bugs definitely 
and the feeling that there are more can be chalked up to paranoia.

Bug 1: dual mounting an LVM partition creates excessively high load 
averages in a domU instance.  By dual mounting I mean mounting the 
partition in dom0 as well as one domU instance.

even though the load average climbs within the domU, there is no 
indication of that load climbing from the outside with xm top.

to reproduce, mount one lvm partition in both dom0 and a domU.  run some 
disk intensive process like a recursive grep on the partition in the 
domU.  Load average should climb within a couple of minutes and was 
unstoppable by my experience.

bug 2: ethernet interfaces go dead.  it only seems to happen on one domU 
at a time but seems tied to ethernet activity level.  You should be able 
to log in via the console and shut down the domU machine.  This is much 
harder to reproduce but I suspect some form of rapid or intense ethernet 
activity should trigger it.

I suspect both of these problems are easier to reproduce on a slow 
machine (i.e. Pentium III 500) like the one I''m using.  ;-)

---eric

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Daniel Goertzen

2005-Dec-23 19:48 UTC

head link

RE: [Xen-users] apparently random lockups

>Bug 1: dual mounting an LVM partition creates excessively high load
averages in a domU>instance.  By dual mounting I mean mounting the partition in dom0 as wellas one domU instance.


I note that you didn''t say which filesystem you are using or wether
they are
mounted ro or rw.  For most filesystems, you can only multi-mount them if
all the mounts are read-only.  If you want to mount read-write, then the
partition can only be mounted once.  Some of the weird filesystem issues you
mentioned in your first email might be explained by multiple rw mounts.
Also, perhaps the rising load average is your filesystem driver trying to
deal with the corrupted partition.

If you''re using read-only mounts, GFS, or OCFS2, then ignore this.

Cheers,
Dan.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Eric S. Johansson

2005-Dec-23 20:00 UTC

head link

Re: [Xen-users] apparently random lockups

Daniel Goertzen wrote:>  
> 
>> Bug 1: dual mounting an LVM partition creates excessively high load
> averages in a domU
>> instance.  By dual mounting I mean mounting the partition in dom0 as
well
> as one domU instance.
> 
> 
> I note that you didn''t say which filesystem you are using or
wether they are
> mounted ro or rw.  For most filesystems, you can only multi-mount them if
> all the mounts are read-only.  If you want to mount read-write, then the
> partition can only be mounted once.  Some of the weird filesystem issues
you
> mentioned in your first email might be explained by multiple rw mounts.
> Also, perhaps the rising load average is your filesystem driver trying to
> deal with the corrupted partition.
didn''t really matter. reiserfs and ext3 both seem to fail the same way.
  They were both mounted read/write and they only accelerated a failure 
apparently.

I just had my mail virtual machine with a growing load average lockup 
yet again.  I was logged in when this happened and discovered that you 
can''t shut down such a machine.  You can only destroy it.  As for 
corrupted partitions, I''ve been checking them out with fsck and they
are
fine.

at this point, I''m going to scrap xen for the time being and go back to
something stable like a 1995 architecture for virtual domains.  I really 
love the concept of xen and when it works it''s wonderful but fo the
past
couple of weeks I''ve gone through hell try to keep a small-scale set of
services running and it''s just not worth it anymore.  I really need to 
be able to sleep at night and not wake up to toasted machines yet again.

I am going to keep playing with it from time to time as I wait for it to 
become mature enough to be what I consider trustworthy.  I guess
that''ll
be about the time that virtual machine friendly chips show up in 
laptops. :-)

thanks for all the help and best of luck to the xen team.

--- eric

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Charles Duffy

2005-Dec-23 20:07 UTC

head link

[Xen-users] Re: apparently random lockups

Eric S. Johansson wrote:> didn''t really matter. reiserfs and ext3 both seem to fail the same
way.
>  They were both mounted read/write and they only accelerated a failure 
> apparently.
Neither reiserfs or ext3 can support multiple mounts if any of them is 
read-write. Before you make any determinations about stability, stop 
doing that.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Eric S. Johansson

2005-Dec-23 20:21 UTC

head link

Re: [Xen-users] Re: apparently random lockups

Charles Duffy wrote:> Eric S. Johansson wrote:
>> didn''t really matter. reiserfs and ext3 both seem to fail the
same
>> way.  They were both mounted read/write and they only accelerated a 
>> failure apparently.
> 
> Neither reiserfs or ext3 can support multiple mounts if any of them is 
> read-write. Before you make any determinations about stability, stop 
> doing that.
I did

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Dec 2005 - apparently random lockups

[Xen-users] apparently random lockups

[Xen-users] dom0 Errors

Re: [Xen-users] apparently random lockups

RE: [Xen-users] apparently random lockups

Re: [Xen-users] apparently random lockups

[Xen-users] Re: apparently random lockups

Re: [Xen-users] Re: apparently random lockups