I''m using xen unstable taken from hg repository a couple of days before 3.0 release (how''s that for timing). I''ve been struggling with a series of random lockups on a domU I''m using as a mailserver. the initial problems were that any significant activity either on the mail server or the IMAP server would cause the domU to go dead with absolute no information in the log files or console. At first it appeared to something filesystem related so I converted everything to ext3. then I discovered that if I had accidentally left the domU /home filesystem mounted, the load average on the domU machine would climb very rapidly. making sure everything is clean, setting the virtual CPUs down to one, upgrading allocated memory to 256 MB, things seemed okay after exercising the system fairly heavily with a recursive grep through a large set of files. then the problems with e-mail access causing lockups started up again. coincidentally, e-mail in the inbox has vanished from view while still remaining in the filesystem. This may or may not be related but it''s another data point. quite frankly, I at my wits end. any ideas? ---eric _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi, I was wondering if anyone can make sense of these errors in the message log: Dec 23 14:14:31 localhost kernel: Badness in local_bh_enable at kernel/softirq. Dec 23 14:14:31 localhost kernel: [local_bh_enable+130/144] local_bh_enable+0x Dec 23 14:14:31 localhost kernel: [skb_checksum+317/704] skb_checksum+0x13d/0x Dec 23 14:14:31 localhost kernel: [udp_poll+154/352] udp_poll+0x9a/0x160 Dec 23 14:14:31 localhost kernel: [sock_poll+41/64] sock_poll+0x29/0x40 Dec 23 14:14:31 localhost kernel: [do_pollfd+149/160] do_pollfd+0x95/0xa0 Dec 23 14:14:31 localhost kernel: [do_poll+106/208] do_poll+0x6a/0xd0 Dec 23 14:14:31 localhost kernel: [sys_poll+353/576] sys_poll+0x161/0x240 Dec 23 14:14:31 localhost kernel: [sys_gettimeofday+60/144] sys_gettimeofday+0 Dec 23 14:14:31 localhost kernel: [__pollwait+0/208] __pollwait+0x0/0xd0 Dec 23 14:14:31 localhost kernel: [syscall_call+7/11] syscall_call+0x7/0xb And Dec 18 17:19:30 localhost kernel: hdc: lost interrupt Can anyone shed any light on what is going on? Also, dom0 randomly hangs without any errors in the logs, sometimes after a few hours of being up, sometimes days. Thanks William _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Eric S. Johansson wrote:> I''m using xen unstable taken from hg repository a couple of days before > 3.0 release (how''s that for timing). > > I''ve been struggling with a series of random lockups on a domU I''m using > as a mailserver. the initial problems were that any significant > activity either on the mail server or the IMAP server would cause the > domU to go dead with absolute no information in the log files or console.now that I''ve had some sleep, here''s a little bit more information. It looked like at all the lockups were focused on one particular domU instance. when I woke up, a different domU instance was "dead". it was not responding to connections over its ethernet interface. I connected to the console and found I could login. ifconfig showed that the interface was up and had an IP address. But I could not go out to that interface to any other machine. Restarting the virtual machine brought the interface back to life. I think I''m tripping over a series of bugs and getting confused. Teasing apart my experience, I would say that I hit two bugs definitely and the feeling that there are more can be chalked up to paranoia. Bug 1: dual mounting an LVM partition creates excessively high load averages in a domU instance. By dual mounting I mean mounting the partition in dom0 as well as one domU instance. even though the load average climbs within the domU, there is no indication of that load climbing from the outside with xm top. to reproduce, mount one lvm partition in both dom0 and a domU. run some disk intensive process like a recursive grep on the partition in the domU. Load average should climb within a couple of minutes and was unstoppable by my experience. bug 2: ethernet interfaces go dead. it only seems to happen on one domU at a time but seems tied to ethernet activity level. You should be able to log in via the console and shut down the domU machine. This is much harder to reproduce but I suspect some form of rapid or intense ethernet activity should trigger it. I suspect both of these problems are easier to reproduce on a slow machine (i.e. Pentium III 500) like the one I''m using. ;-) ---eric _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
>Bug 1: dual mounting an LVM partition creates excessively high loadaverages in a domU>instance. By dual mounting I mean mounting the partition in dom0 as wellas one domU instance. I note that you didn''t say which filesystem you are using or wether they are mounted ro or rw. For most filesystems, you can only multi-mount them if all the mounts are read-only. If you want to mount read-write, then the partition can only be mounted once. Some of the weird filesystem issues you mentioned in your first email might be explained by multiple rw mounts. Also, perhaps the rising load average is your filesystem driver trying to deal with the corrupted partition. If you''re using read-only mounts, GFS, or OCFS2, then ignore this. Cheers, Dan. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Daniel Goertzen wrote:> > >> Bug 1: dual mounting an LVM partition creates excessively high load > averages in a domU >> instance. By dual mounting I mean mounting the partition in dom0 as well > as one domU instance. > > > I note that you didn''t say which filesystem you are using or wether they are > mounted ro or rw. For most filesystems, you can only multi-mount them if > all the mounts are read-only. If you want to mount read-write, then the > partition can only be mounted once. Some of the weird filesystem issues you > mentioned in your first email might be explained by multiple rw mounts. > Also, perhaps the rising load average is your filesystem driver trying to > deal with the corrupted partition.didn''t really matter. reiserfs and ext3 both seem to fail the same way. They were both mounted read/write and they only accelerated a failure apparently. I just had my mail virtual machine with a growing load average lockup yet again. I was logged in when this happened and discovered that you can''t shut down such a machine. You can only destroy it. As for corrupted partitions, I''ve been checking them out with fsck and they are fine. at this point, I''m going to scrap xen for the time being and go back to something stable like a 1995 architecture for virtual domains. I really love the concept of xen and when it works it''s wonderful but fo the past couple of weeks I''ve gone through hell try to keep a small-scale set of services running and it''s just not worth it anymore. I really need to be able to sleep at night and not wake up to toasted machines yet again. I am going to keep playing with it from time to time as I wait for it to become mature enough to be what I consider trustworthy. I guess that''ll be about the time that virtual machine friendly chips show up in laptops. :-) thanks for all the help and best of luck to the xen team. --- eric _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Eric S. Johansson wrote:> didn''t really matter. reiserfs and ext3 both seem to fail the same way. > They were both mounted read/write and they only accelerated a failure > apparently.Neither reiserfs or ext3 can support multiple mounts if any of them is read-write. Before you make any determinations about stability, stop doing that. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Charles Duffy wrote:> Eric S. Johansson wrote: >> didn''t really matter. reiserfs and ext3 both seem to fail the same >> way. They were both mounted read/write and they only accelerated a >> failure apparently. > > Neither reiserfs or ext3 can support multiple mounts if any of them is > read-write. Before you make any determinations about stability, stop > doing that.I did _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users