I've gotten the error below several times on different builds on different
hardware:
The setup is a bit different from the norm, it's a xenofied 2.6.16 kernel
runing Debian Etch with block device backends for the ocfs2 storage. (yes I
know it's an adventerous setup).  I'm using the ocfs2 from the kernel,
and
the ocfs2-tools from Debian, (1.2.1-1).
99% of the time it's great, fences well, does it's job.  Only one node
is
actually being "used" but both are on and mounted.  I have seen these
errors
when just leaving it overnight, and after a while it bombs, system load
doesn't seem to be a factor.  When I logged in this morning, on the node
that had very little load, I found this in the console:
(2016,0):dlm_proxy_ast_handler:321 ERROR: got ast for unknown lockres!
cookie=144115188078155225, name=M0000000000000006676050a149f878, namelen=31
When I attempted to shutdown the 2nd node, it exploded with the following
error, and locked up both nodes, and I'm looking for clarification, or even
just a starting point:
Thanks
kernel BUG at <bad filename>:58347!
invalid opcode: 0000 [#1]
SMP
Modules linked in: ocfs2 ipv6 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager
configfs dm_snapshot dm_mirror dm_mod ext3 jbd
CPU:    0
EIP:    0061:[<d113cec5>]    Not tainted VLI
EFLAGS: 00010202   (2.6.16-xen-domU #1)
EIP is at __dlm_lockres_reserve_ast+0x35/0x40 [ocfs2_dlm]
eax: 00000028   ebx: cd9bfa80   ecx: f578d000   edx: 00000000
esi: 00000002   edi: cb695200   ebp: cd9bfa80   esp: cb079d3c
ds: 007b   es: 007b   ss: 0069
Process umount (pid: 8046, threadinfo=cb078000 task=cf7ac030)
Stack: <0>cd9bfa80 cd9bfa8c d1140c57 cd9bfa80 cc412ea0 0000001f 00000002
00000028
       00000000 c03b8203 00000400 c011e200 cb695230 cd9bfa8c d114c6cc
cd9bfac8
       02000000 c4fe7000 c82d2e00 c4fe7000 00000000 cc412ea0 0000001f
00000000
Call Trace:
 [<d1140c57>] dlm_migrate_lockres+0x677/0x15f0 [ocfs2_dlm]
 [<c011e200>] vprintk+0x290/0x330
 [<c02f45d6>] schedule+0x536/0x860
 [<d11347a5>] dlm_purge_lockres+0x75/0x230 [ocfs2_dlm]
 [<d1131868>] dlm_unregister_domain+0x108/0x740 [ocfs2_dlm]
 [<c02f49cf>] wait_for_completion+0xaf/0x110
 [<c0116b70>] default_wake_function+0x0/0x20
 [<d126ae8d>] ocfs2_remove_lockres_tracking+0xd/0x40 [ocfs2]
 [<c0133552>] kthread_stop_sem+0x82/0xb0
 [<d127038d>] ocfs2_dlm_shutdown+0xed/0x360 [ocfs2]
 [<d129f295>] ocfs2_unregister_net_handlers+0x25/0xc0 [ocfs2]
 [<d129a521>] ocfs2_dismount_volume+0x181/0x4c0 [ocfs2]
 [<d129aa81>] ocfs2_put_super+0x31/0xe0 [ocfs2]
 [<c016c3f2>] generic_shutdown_super+0x92/0x150
 [<c016c4d9>] kill_block_super+0x29/0x50
 [<c016c5ea>] deactivate_super+0x7a/0xa0
 [<c018453b>] sys_umount+0x4b/0x2d0
 [<c0105171>] syscall_call+0x7/0xb
Code: 43 48 84 c0 7f 21 0f b7 43 5a a8 20 75 0b a8 20 75 19 f0 ff 43 44 59
5b c3 89 1c 24 e8 15 6a ff ff 0f b7 43 5a eb e7 0f 0b eb db <0f> 0b eb e3
8d
b4 26 00 00 00 00 53 8b 5c 24 0c 8d 43 48 e8 73
 Badness in do_exit at kernel/exit.c:802
 [<c012134d>] do_exit+0x89d/0x8b0
 [<c011007b>] prepare_for_smp+0x4b/0x160
 [<c0105c9a>] die+0x23a/0x240
 [<c0106590>] do_invalid_op+0x0/0xc0
 [<c010663f>] do_invalid_op+0xaf/0xc0
 [<d113cec5>] __dlm_lockres_reserve_ast+0x35/0x40 [ocfs2_dlm]
 [<d113d7a5>] dlm_init_mle+0x85/0x180 [ocfs2_dlm]
 [<c0105303>] error_code+0x2b/0x30
 [<d113cec5>] __dlm_lockres_reserve_ast+0x35/0x40 [ocfs2_dlm]
 [<d1140c57>] dlm_migrate_lockres+0x677/0x15f0 [ocfs2_dlm]
 [<c011e200>] vprintk+0x290/0x330
 [<c02f45d6>] schedule+0x536/0x860
 [<d11347a5>] dlm_purge_lockres+0x75/0x230 [ocfs2_dlm]
 [<d1131868>] dlm_unregister_domain+0x108/0x740 [ocfs2_dlm]
 [<c02f49cf>] wait_for_completion+0xaf/0x110
 [<c0116b70>] default_wake_function+0x0/0x20
 [<d126ae8d>] ocfs2_remove_lockres_tracking+0xd/0x40 [ocfs2]
 [<c0133552>] kthread_stop_sem+0x82/0xb0
 [<d127038d>] ocfs2_dlm_shutdown+0xed/0x360 [ocfs2]
 [<d129f295>] ocfs2_unregister_net_handlers+0x25/0xc0 [ocfs2]
 [<d129a521>] ocfs2_dismount_volume+0x181/0x4c0 [ocfs2]
 [<d129aa81>] ocfs2_put_super+0x31/0xe0 [ocfs2]
 [<c016c3f2>] generic_shutdown_super+0x92/0x150
 [<c016c4d9>] kill_block_super+0x29/0x50
 [<c016c5ea>] deactivate_super+0x7a/0xa0
 [<c018453b>] sys_umount+0x4b/0x2d0
 [<c0105171>] syscall_call+0x7/0xb
(6766,0):o2net_idle_timer:1284 connection to node vserver1-3 (num 2) at
10.10.69.113:7777 has been idle for 10 seconds, shutting it down.
(6766,0):o2net_idle_timer:1297 here are some times that might help debug the
situation: (tmr 1160659302.852081 now 1160659312.846775 dr 1160659307.850562adv
1160659302.852111:1160659302.852111 func (b9bad2f8:506) 1160659302.852082:
1160659302.852088)
BUG: soft lockup detected on CPU#0!
Pid: 6766, comm:           dlm_thread
EIP: 0061:[<c02f5e57>] CPU: 0
EIP is at _spin_lock+0x7/0x10
 EFLAGS: 00000286    Not tainted  (2.6.16-xen-domU #1)
EAX: cd9bfac8 EBX: cd9bfac8 ECX: cb69520c EDX: cb695200
ESI: cd9bfaa4 EDI: 00000000 EBP: cb695214 DS: 007b ES: 007b
CR0: 8005003b CR2: b7eec83c CR3: 00e76000 CR4: 00000660
 [<d1134e0c>] dlm_thread+0x26c/0x11f0 [ocfs2_dlm]
 [<c0133720>] kthread+0xc0/0x110
 [<c0133960>] autoremove_wake_function+0x0/0x60
 [<c0133734>] kthread+0xd4/0x110
 [<d1134ba0>] dlm_thread+0x0/0x11f0 [ocfs2_dlm]
 [<c0133660>] kthread+0x0/0x110
 [<c0102bd5>] kernel_thread_helper+0x5/0x10
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20061012/15ff147a/attachment.html
The ocfs2 shipping with that kernel is missing few dlm patches. I'll put together some patches. There is a bugzilla logged on this. Bleeding Edge wrote:> > I've gotten the error below several times on different builds on > different hardware: > > The setup is a bit different from the norm, it's a xenofied 2.6.16 > kernel runing Debian Etch with block device backends for the ocfs2 > storage. (yes I know it's an adventerous setup). I'm using the ocfs2 > from the kernel, and the ocfs2-tools from Debian, ( 1.2.1-1). > > 99% of the time it's great, fences well, does it's job. Only one node > is actually being "used" but both are on and mounted. I have seen > these errors when just leaving it overnight, and after a while it > bombs, system load doesn't seem to be a factor. When I logged in this > morning, on the node that had very little load, I found this in the > console: > > (2016,0):dlm_proxy_ast_handler:321 ERROR: got ast for unknown lockres! > cookie=144115188078155225, name=M0000000000000006676050a149f878, > namelen=31 > > When I attempted to shutdown the 2nd node, it exploded with the > following error, and locked up both nodes, and I'm looking for > clarification, or even just a starting point: > > Thanks > > > > > kernel BUG at <bad filename>:58347! > invalid opcode: 0000 [#1] > SMP > Modules linked in: ocfs2 ipv6 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager > configfs dm_snapshot dm_mirror dm_mod ext3 jbd > CPU: 0 > EIP: 0061:[<d113cec5>] Not tainted VLI > EFLAGS: 00010202 (2.6.16-xen-domU #1) > EIP is at __dlm_lockres_reserve_ast+0x35/0x40 [ocfs2_dlm] > eax: 00000028 ebx: cd9bfa80 ecx: f578d000 edx: 00000000 > esi: 00000002 edi: cb695200 ebp: cd9bfa80 esp: cb079d3c > ds: 007b es: 007b ss: 0069 > Process umount (pid: 8046, threadinfo=cb078000 task=cf7ac030) > Stack: <0>cd9bfa80 cd9bfa8c d1140c57 cd9bfa80 cc412ea0 0000001f > 00000002 00000028 > 00000000 c03b8203 00000400 c011e200 cb695230 cd9bfa8c d114c6cc > cd9bfac8 > 02000000 c4fe7000 c82d2e00 c4fe7000 00000000 cc412ea0 0000001f > 00000000 > Call Trace: > [<d1140c57>] dlm_migrate_lockres+0x677/0x15f0 [ocfs2_dlm] > [<c011e200>] vprintk+0x290/0x330 > [<c02f45d6>] schedule+0x536/0x860 > [<d11347a5>] dlm_purge_lockres+0x75/0x230 [ocfs2_dlm] > [<d1131868>] dlm_unregister_domain+0x108/0x740 [ocfs2_dlm] > [<c02f49cf>] wait_for_completion+0xaf/0x110 > [<c0116b70>] default_wake_function+0x0/0x20 > [<d126ae8d>] ocfs2_remove_lockres_tracking+0xd/0x40 [ocfs2] > [<c0133552>] kthread_stop_sem+0x82/0xb0 > [<d127038d>] ocfs2_dlm_shutdown+0xed/0x360 [ocfs2] > [<d129f295>] ocfs2_unregister_net_handlers+0x25/0xc0 [ocfs2] > [<d129a521>] ocfs2_dismount_volume+0x181/0x4c0 [ocfs2] > [<d129aa81>] ocfs2_put_super+0x31/0xe0 [ocfs2] > [<c016c3f2>] generic_shutdown_super+0x92/0x150 > [<c016c4d9>] kill_block_super+0x29/0x50 > [<c016c5ea>] deactivate_super+0x7a/0xa0 > [<c018453b>] sys_umount+0x4b/0x2d0 > [<c0105171>] syscall_call+0x7/0xb > Code: 43 48 84 c0 7f 21 0f b7 43 5a a8 20 75 0b a8 20 75 19 f0 ff 43 > 44 59 5b c3 89 1c 24 e8 15 6a ff ff 0f b7 43 5a eb e7 0f 0b eb db <0f> > 0b eb e3 8d b4 26 00 00 00 00 53 8b 5c 24 0c 8d 43 48 e8 73 > Badness in do_exit at kernel/exit.c:802 > [<c012134d>] do_exit+0x89d/0x8b0 > [<c011007b>] prepare_for_smp+0x4b/0x160 > [<c0105c9a>] die+0x23a/0x240 > [<c0106590>] do_invalid_op+0x0/0xc0 > [<c010663f>] do_invalid_op+0xaf/0xc0 > [<d113cec5>] __dlm_lockres_reserve_ast+0x35/0x40 [ocfs2_dlm] > [<d113d7a5>] dlm_init_mle+0x85/0x180 [ocfs2_dlm] > [<c0105303>] error_code+0x2b/0x30 > [<d113cec5>] __dlm_lockres_reserve_ast+0x35/0x40 [ocfs2_dlm] > [<d1140c57>] dlm_migrate_lockres+0x677/0x15f0 [ocfs2_dlm] > [<c011e200>] vprintk+0x290/0x330 > [<c02f45d6>] schedule+0x536/0x860 > [<d11347a5>] dlm_purge_lockres+0x75/0x230 [ocfs2_dlm] > [<d1131868>] dlm_unregister_domain+0x108/0x740 [ocfs2_dlm] > [<c02f49cf>] wait_for_completion+0xaf/0x110 > [<c0116b70>] default_wake_function+0x0/0x20 > [<d126ae8d>] ocfs2_remove_lockres_tracking+0xd/0x40 [ocfs2] > [<c0133552>] kthread_stop_sem+0x82/0xb0 > [<d127038d>] ocfs2_dlm_shutdown+0xed/0x360 [ocfs2] > [<d129f295>] ocfs2_unregister_net_handlers+0x25/0xc0 [ocfs2] > [<d129a521>] ocfs2_dismount_volume+0x181/0x4c0 [ocfs2] > [<d129aa81>] ocfs2_put_super+0x31/0xe0 [ocfs2] > [<c016c3f2>] generic_shutdown_super+0x92/0x150 > [<c016c4d9>] kill_block_super+0x29/0x50 > [<c016c5ea>] deactivate_super+0x7a/0xa0 > [<c018453b>] sys_umount+0x4b/0x2d0 > [<c0105171>] syscall_call+0x7/0xb > (6766,0):o2net_idle_timer:1284 connection to node vserver1-3 (num 2) > at 10.10.69.113:7777 <http://10.10.69.113:7777> has been idle for 10 > seconds, shutting it down. > (6766,0):o2net_idle_timer:1297 here are some times that might help > debug the situation: (tmr 1160659302.852081 now 1160659312.846775 dr > 1160659307.850562 adv 1160659302.852111:1160659302.852111 func > (b9bad2f8:506) 1160659302.852082:1160659302.852088) > BUG: soft lockup detected on CPU#0! > > Pid: 6766, comm: dlm_thread > EIP: 0061:[<c02f5e57>] CPU: 0 > EIP is at _spin_lock+0x7/0x10 > EFLAGS: 00000286 Not tainted (2.6.16-xen-domU #1) > EAX: cd9bfac8 EBX: cd9bfac8 ECX: cb69520c EDX: cb695200 > ESI: cd9bfaa4 EDI: 00000000 EBP: cb695214 DS: 007b ES: 007b > CR0: 8005003b CR2: b7eec83c CR3: 00e76000 CR4: 00000660 > [<d1134e0c>] dlm_thread+0x26c/0x11f0 [ocfs2_dlm] > [<c0133720>] kthread+0xc0/0x110 > [<c0133960>] autoremove_wake_function+0x0/0x60 > [<c0133734>] kthread+0xd4/0x110 > [<d1134ba0>] dlm_thread+0x0/0x11f0 [ocfs2_dlm] > [<c0133660>] kthread+0x0/0x110 > [<c0102bd5>] kernel_thread_helper+0x5/0x10 > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >