Hello I'm still having weekly panics on my system, but now I've at least got something to report back from the netconsole. To summarize system: 2x Dell 1950 connected to a EMC CX3-20 SAN. Centos 5 x86_64 2.6.18-8.1.8.el5 #1 SMP. Tonight both servers locked up - both while idling afaik. But this time tilesrv2 reported the following via netconsole before it went dead. (4225,2):dlm_drop_lockres_ref:2289 ERROR: while dropping ref on 359E1C1D38374654BC5E5896EB7D5187:M0000000000000009cb578f4d7803fc (master=0) got -22. (4225,2):dlm_print_one_lock_resource:294 lockres: M0000000000000009cb578f4d7803fc, owner=0, state=64 (4225,2):__dlm_print_one_lock_resource:309 lockres: M0000000000000009cb578f4d7803fc, owner=0, state=64 (4225,2):__dlm_print_one_lock_resource:311 last used: 4354492857, on purge list: yes (4225,2):dlm_print_lockres_refmap:277 refmap nodes: [ ], inflight=0 (4225,2):__dlm_print_one_lock_resource:313 granted queue: (4225,2):__dlm_print_one_lock_resource:328 converting queue: (4225,2):__dlm_print_one_lock_resource:343 blocked queue: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at ...mushran/BUILD/ocfs2-1.2.6/fs/ocfs2/dlm/dlmmaster.c:2291 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:04.0/0000:0c:00.0 /host1/rport-1:0-1/target1:0:1/1:0:1:4/vendor CPU 2 Modules linked in: netconsole autofs4 hidp ocfs2(U) nfs lockd fscache nfs_acl rfcomm l2cap bluetooth ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs sunrpc ipt_REJECT ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables dm_emc dm_round_robin dm_multipath video sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac ipv6 parport_pc lp parport joydev shpchp bnx2 sr_mod ide_cd serio_raw cdrom sg pcspkr dm_snapshot dm_zero dm_mirror dm_mod usb_storage qla2xxx scsi_transport_fc megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcdPid: 4225, comm: dlm_thread Not tainted 2.6.18-8.1.8.el5 #1 [<ffffffff884d60d3>] :ocfs2_dlm:dlm_drop_lockres_ref+0x1d3/0x1ec RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff802da65c R13: ffff81012d087000 R14: ffff8100435c5f60 R15: ffffffff8009b4f6 CR2: 000000001ec07000 CR3: 00000001289c9000 CR4: 00000000000006e0 303030303030304d 0000000000000000 0000000000000000 ffff81001defe648 [<ffffffff884e9031>] :ocfs2_dlm:dlm_purge_lockres+0x175/0x34a [<ffffffff8009b6b9>] autoremove_wake_function+0x0/0x2e [<ffffffff884e93c2>] :ocfs2_dlm:dlm_thread+0x0/0x579 [<ffffffff80032189>] kthread+0xfe/0x132 [<ffffffff8005bfe5>] child_rip+0xa/0x11 [<ffffffff8009b4f6>] keventd_create_kthread+0x0/0x61 [<ffffffff8005bfdb>] child_rip+0x0/0x11 0f d6 c2 83 d8 5c [<ffffffff884d60d3>] :ocfs2_dlm:dlm_drop_lockres_ref+0x1d3/0x1ec <0>Kernel panic - not syncing: Fatal exception I'd be happy to provide more info or open a bug-report. Just tell me what you need. I hope this is a better report than last time :) Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20070828/e29d6ce3/attachment.html
Sunil Mushran
2007-Aug-28 09:48 UTC
[Ocfs2-users] Re: Kernel panic on OCFS2 1.2.6-6 for EL5
Please file a bugzilla. It is very hard to track issue via email. Attach the trace below. You should also see a corresponding message in one of the other nodes. Specifically node 0. Add that too in the bugzilla. Daniel wrote:> Hello > > I'm still having weekly panics on my system, but now I've at least got > something to report back from the netconsole. > > To summarize system: 2x Dell 1950 connected to a EMC CX3-20 SAN. > Centos 5 x86_64 2.6.18-8.1.8.el5 #1 SMP. > > Tonight both servers locked up - both while idling afaik. But this > time tilesrv2 reported the following via netconsole before it went dead. > > (4225,2):dlm_drop_lockres_ref:2289 ERROR: while dropping ref on > 359E1C1D38374654BC5E5896EB7D5187:M0000000000000009cb578f4d7803fc > (master=0) got -22. > (4225,2):dlm_print_one_lock_resource:294 lockres: > M0000000000000009cb578f4d7803fc, owner=0, state=64 > (4225,2):__dlm_print_one_lock_resource:309 lockres: > M0000000000000009cb578f4d7803fc, owner=0, state=64 > (4225,2):__dlm_print_one_lock_resource:311 last used: 4354492857, on > purge list: yes > (4225,2):dlm_print_lockres_refmap:277 refmap nodes: [ ], inflight=0 > (4225,2):__dlm_print_one_lock_resource:313 granted queue: > (4225,2):__dlm_print_one_lock_resource:328 converting queue: > (4225,2):__dlm_print_one_lock_resource:343 blocked queue: > ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at ...mushran/BUILD/ocfs2-1.2.6/fs/ocfs2/dlm/dlmmaster.c:2291 > invalid opcode: 0000 [1] SMP > last sysfs file: /devices/pci0000:00/0000:00: > 04.0/0000:0c:00.0/host1/rport-1:0-1/target1:0:1/1:0:1:4/vendor > CPU 2 > Modules linked in: netconsole autofs4 hidp ocfs2(U) nfs lockd fscache > nfs_acl rfcomm l2cap bluetooth ocfs2_dlmfs(U) ocfs2_dlm(U) > ocfs2_nodemanager(U) configfs sunrpc ipt_REJECT ip6t_REJECT xt_tcpudp > ip6table_filter ip6_tables x_tables dm_emc dm_round_robin dm_multipath > video sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac > ipv6 parport_pc lp parport joydev shpchp bnx2 sr_mod ide_cd serio_raw > cdrom sg pcspkr dm_snapshot dm_zero dm_mirror dm_mod usb_storage > qla2xxx scsi_transport_fc megaraid_sas sd_mod scsi_mod ext3 jbd > ehci_hcd ohci_hcdPid: 4225, comm: dlm_thread Not tainted > 2.6.18-8.1.8.el5 #1 > [<ffffffff884d60d3>] :ocfs2_dlm:dlm_drop_lockres_ref+0x1d3/0x1ec > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff802da65c > R13: ffff81012d087000 R14: ffff8100435c5f60 R15: ffffffff8009b4f6 > CR2: 000000001ec07000 CR3: 00000001289c9000 CR4: 00000000000006e0 > 303030303030304d > 0000000000000000 0000000000000000 ffff81001defe648 > [<ffffffff884e9031>] :ocfs2_dlm:dlm_purge_lockres+0x175/0x34a > [<ffffffff8009b6b9>] autoremove_wake_function+0x0/0x2e > [<ffffffff884e93c2>] :ocfs2_dlm:dlm_thread+0x0/0x579 > [<ffffffff80032189>] kthread+0xfe/0x132 > [<ffffffff8005bfe5>] child_rip+0xa/0x11 > [<ffffffff8009b4f6>] keventd_create_kthread+0x0/0x61 > [<ffffffff8005bfdb>] child_rip+0x0/0x11 > 0f d6 c2 83 d8 5c [<ffffffff884d60d3>] > :ocfs2_dlm:dlm_drop_lockres_ref+0x1d3/0x1ec > <0>Kernel panic - not syncing: Fatal exception > > I'd be happy to provide more info or open a bug-report. Just tell me > what you need. I hope this is a better report than last time :) > > Daniel