This specific bug (associated with the message) has been fixed here.
http://oss.oracle.com/git/?p=ocfs2-1.4.git;a=commit;h=1f667766cb67ed05b4d706aa82e8ad0b12eaae8b
This should result in an oops and thus panic. But just on this node.
If other nodes are rebooting then I suspect some sysctl values are
incorrect. Ensure /proc/sys/kernel/panic and /proc/sys/kernel/panic_on_oops
are set appropriately. See user's guide for more.
On 03/18/2011 03:51 PM, Nikola Savic wrote:>
> Hi,
>
> I have 3 nodes cluster using OCFS2 1.4 on CentOS5.5 (kernel 2.6.18-194).
Two nodes (server1 and server2) are providing shared storage using DRBD. Shared
storage exported to nodes using iSCSI (server1 is target and all other nodes are
iSCSI initiators).
>
> Today cluster went down. Server 1 was not accessable, while server2 and
server3 got rebooted with log showing that connected to server 1 was lost, and
because of that, servers were rebooted. However, in server1's logs
doesn't have errors like that. There are only following lines minute before
other two servers rebooted:
>
> Mar 18 20:58:03 server1 kernel:
(dlm_thread,5154,3):dlm_drop_lockres_ref:2216 ERROR: while dropping ref on
BDB600C633D74D6B85C496D78F566879:O0000000000000002e81a8700000000 (master=1) got
-22.
> Mar 18 20:58:03 server1 kernel: lockres: O0000000000000002e81a8700000000,
owner=1, state=64
> Mar 18 20:58:03 server1 kernel: last used: 4501088944, refcnt: 3, on
purge list: yes
> Mar 18 20:58:03 server1 kernel: on dirty list: no, on reco list: no,
migrating pending: no
> Mar 18 20:58:03 server1 kernel: inflight locks: 0, asts reserved: 0
> Mar 18 20:58:03 server1 kernel: refmap nodes: [ ], inflight=0
> Mar 18 20:58:03 server1 kernel: granted queue:
> Mar 18 20:58:03 server1 kernel: converting queue:
> Mar 18 20:58:03 server1 kernel: blocked queue:
>
> I think that I saw, from time to time, errors like this logged in
/var/log/messages, but servers continue to work without hanging. Is this kind of
error serious enough for server to go down? If it is, why is it happening and
how to prevent it?
>
> Thanks,
> Nikola
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110318/03a80fe0/attachment.html