Robert Verspuy
2007-Sep-12 07:10 UTC
[Fedora-xen] Fedora 7 + kernel 2.6.20-2931 + Xen + clvmd gives spinlock bug and hangs
Hi All, I''m pretty new in the xen and clustering stuff, but I have the following setup: 2 server (2 x dual Xeon 2GHz, 6GB RAM, 2 x 160GB SATA + 4 x 400GB SATA) Running both fedora 7, latest updates (kernel 2.6.20-2931.fc7xen) I want to set this up as both Dom0''s are the storage nodes (with GFS2) And both server are running 2 or 3 XEN VM''s with the application (2 x bind, 2 x postfix, 1 x mysql, 1 x postgresql). And the possibility to migrate a VM to the other server in case of any problems. The 2 x 160GB is setup as an raid 1, on top of that LVM with local volumes for /boot, / and <swap> This should hold the local files the 4 x 400GB is setup as 3 disks RAID 5 and 1 disk spare, giving me about 800GB of usable diskspace. On top op the RAID 5 I have DRBD, to keep both raid devices between both server in sync. With DRBD v8 it is possible to use them active/active, if you use a cluster aware file system (like GFS2) So I''ve setup openais, and cman. Running on both servers, interacting fine. When starting clvmd, I get the following (on both servers): Sep 10 10:08:55 fosfor kernel: BUG: spinlock already unlocked on CPU#0, dlm_recoverd/4016 (Not tainted) Sep 10 10:08:56 fosfor kernel: lock: ffff88016158fd50, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1 Sep 10 10:08:56 fosfor kernel: Sep 10 10:08:56 fosfor kernel: Call Trace: Sep 10 10:08:56 fosfor kernel: [<ffffffff8020b97c>] _raw_spin_unlock+0x2e/0x7f Sep 10 10:08:56 fosfor kernel: [<ffffffff88370233>] :dlm:dlm_lowcomms_get_buffer+0xf7/0x1cb Sep 10 10:08:56 fosfor kernel: [<ffffffff8836c375>] :dlm:create_rcom+0x3a/0xb3 Sep 10 10:08:56 fosfor kernel: [<ffffffff8836cb81>] :dlm:dlm_rcom_status+0x58/0x137 Sep 10 10:08:56 fosfor kernel: [<ffffffff8836d066>] :dlm:dlm_set_recover_status+0x1a/0x2e Sep 10 10:08:56 fosfor kernel: [<ffffffff8836beb8>] :dlm:dlm_recover_members+0x332/0x3ea Sep 10 10:08:56 fosfor kernel: [<ffffffff80294813>] keventd_create_kthread+0x0/0x6a Sep 10 10:08:56 fosfor kernel: [<ffffffff8836df8f>] :dlm:dlm_recoverd+0x399/0x3e3 Sep 10 10:08:56 fosfor kernel: [<ffffffff8836dbf6>] :dlm:dlm_recoverd+0x0/0x3e3 Sep 10 10:08:56 fosfor kernel: [<ffffffff80294813>] keventd_create_kthread+0x0/0x6a Sep 10 10:08:56 fosfor kernel: [<ffffffff80232bae>] kthread+0xd0/0xff Sep 10 10:08:56 fosfor kernel: dlm: got connection from 1 Sep 10 10:08:56 fosfor kernel: [<ffffffff8025ba68>] child_rip+0xa/0x12 Sep 10 10:08:56 fosfor kernel: [<ffffffff80294813>] keventd_create_kthread+0x0/0x6a Sep 10 10:08:56 fosfor kernel: [<ffffffff80232ade>] kthread+0x0/0xff Sep 10 10:08:56 fosfor kernel: [<ffffffff8025ba5e>] child_rip+0x0/0x12 Sep 10 10:08:56 fosfor kernel: After that, two dlm processes are running at 100% cpu load at one processor. (also on both servers) When I stop the clvmd service, the server hangs (only the server, where I stop the clvmd) According to http://www.redhat.com/archives/linux-cluster/2007-April/msg00133.html and specially http://www.redhat.com/archives/linux-cluster/2007-April/msg00171.html I should use a kernel 2.6.21 or newer. But this is not available for fedora 7 with xen. I also tried to use an older kerne (2.6.20-2925.9.fc7xen) both that doesn''t work also. Or can the cause of this problem be located somewhere else? Does anybody know when a newer kernel for fedora 7 with xen will be released (don''t see any new kernel in testing also...) Thnx in advance, Robert Verspuy