Hi list,
Something went wrong this morning and we have a node ( #0 ) reboot.
Something blocked the NFS access from both nodes, one rebooted and the
another we restarted the nfsd and it brought him back.
Looking at node #0 - the one that rebooted - logs everything seems
normal, but looking at the othere node dmesg's we saw this messages:
First the o2net detected that node #0 was dead: (It seems everything OK
here)
(0,0):o2net_idle_timer:1422 here are some times that might help debug
the situation: (tmr 1233748167.271522 now 1233748227.272666 dr
1233748167.271516 adv 1233748167.271532:1233748167.271533 func
(300d6acb:500) 1233748167.271522:1233748167.271526)
o2net: no longer connected to node soap02 (num 0) at 192.168.0.10:7777
(5244,2):ocfs2_dlm_eviction_cb:108 device (8,33): dlm has evicted node 0
(12281,1):dlm_get_lock_resource:913
F59B45831EEA41F384BADE6C4B7A932B:M000000000000000000001aa9d5b7e0: at
least one node (0) to recover before lock mastery can begin
(12281,1):dlm_get_lock_resource:967
F59B45831EEA41F384BADE6C4B7A932B:M000000000000000000001aa9d5b7e0: at
least one node (0) to recover before lock mastery can begin
(6968,7):dlm_get_lock_resource:913
F59B45831EEA41F384BADE6C4B7A932B:$RECOVERY: at least one node (0) to
recover before lock mastery can begin
(6968,7):dlm_get_lock_resource:947 F59B45831EEA41F384BADE6C4B7A932B:
recovery map is not empty, but must master $RECOVERY lock now
(6968,7):dlm_do_recovery:524 (6968) Node 1 is the Recovery Master for
the Dead Node 0 for Domain F59B45831EEA41F384BADE6C4B7A932B
(12281,2):ocfs2_replay_journal:1004 Recovering node 0 from slot 0 on
device (8,33)
(fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0,
recovered transactions 66251376 to 66251415
(fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 3176 and
revoked 0/0 blocks
kjournald starting. Commit interval 5 seconds
We restarted the nfsd on node #1: (seems OK)
nfsd: non-standard errno: -512
nfsd: non-standard errno: -512
nfsd: non-standard errno: -512
nfsd: last server has exited
nfsd: unexporting all filesystems
And here are the strange messages:
(6965,2):ocfs2_orphan_del:1869 ERROR: status = -2
(6965,2):ocfs2_remove_inode:614 ERROR: status = -2
(6965,2):ocfs2_wipe_inode:740 ERROR: status = -2
(6965,2):ocfs2_delete_inode:974 ERROR: status = -2
(6965,2):ocfs2_orphan_del:1869 ERROR: status = -2
(6965,2):ocfs2_remove_inode:614 ERROR: status = -2
(6965,2):ocfs2_wipe_inode:740 ERROR: status = -2
(6965,2):ocfs2_delete_inode:974 ERROR: status = -2
(6965,2):ocfs2_orphan_del:1869 ERROR: status = -2
(6965,2):ocfs2_remove_inode:614 ERROR: status = -2
(6965,2):ocfs2_wipe_inode:740 ERROR: status = -2
(6965,2):ocfs2_delete_inode:974 ERROR: status = -2
(6965,2):ocfs2_orphan_del:1869 ERROR: status = -2
(6965,2):ocfs2_remove_inode:614 ERROR: status = -2
(6965,2):ocfs2_wipe_inode:740 ERROR: status = -2
(6965,2):ocfs2_delete_inode:974 ERROR: status = -2
(6965,2):ocfs2_orphan_del:1869 ERROR: status = -2
(6965,2):ocfs2_remove_inode:614 ERROR: status = -2
(6965,2):ocfs2_wipe_inode:740 ERROR: status = -2
(6965,2):ocfs2_delete_inode:974 ERROR: status = -2
I think this messages are related to the nfsd restart. Any clue?
Regards,
--
.:''''':.
.:' ` S?rgio Surkamp | Gerente de Rede
:: ........ sergio at gruposinternet.com.br
`:. .:'
`:, ,.:' *Grupos Internet S.A.*
`: :' R. Lauro Linhares, 2123 Torre B - Sala 201
: : Trindade - Florian?polis - SC
:.'
:: +55 48 3234-4109
:
' http://www.gruposinternet.com.br