thr3ads.net - Ocfs2 users - [Ocfs2-users] dlm timeouts and following errors -112 [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Sebastian Reitenbach

2007-Feb-26 12:18 UTC

[Ocfs2-users] dlm timeouts and following errors -112

Hi list,

I am experimenting with ocfs2 (rpm package: 1.2.2-0.2), using linux-ha 2.0.8 
(all running on a SLES 10 x86-64, rpm packages from linux-ha.org) for the 
heartbeat. The three nodes are connected on a gigabit switch. From time to 
time  I have problems to unmount a drive, and I have to reboot the whole 
system to fix the problem. When these lockups occur, I see these messages 
in /var/log/messages:


Feb 26 21:03:47 ppsbackup101 heartbeat: [5394]: ERROR: Irretrievably lost
packet: node ppsdb102 seq 6
Feb 26 21:03:47 ppsbackup101 heartbeat: [5394]: ERROR: Irretrievably lost
packet: node ppsdb102 seq 6
Feb 26 21:04:32 ppsbackup101 kernel: o2net: connection to node ppsnfs102 (num 
3)
at 192.168.102.32:7777 has been idle for 300.0 seconds, shutting it down.
Feb 26 21:04:32 ppsbackup101 kernel: (5394,1):o2net_idle_timer:1426 here are
some times that might help debug the situation: (tmr 1172519972.626184 now
1172520272.653263 dr 1172519972.626167 adv 1172519972.626208:1172519972.626210
func (666c6172:510) 1172519972.626186:1172519972.626195)
Feb 26 21:04:32 ppsbackup101 kernel: o2net: no longer connected to node
ppsnfs102 (num 3) at 192.168.102.32:7777
Feb 26 21:04:32 ppsbackup101 kernel: (8915,0):dlm_drop_lockres_ref:2283 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_request_join:899 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_try_to_join_domain:1048
ERROR: status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (8915,0):dlm_purge_lockres:189 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_join_domain:1321 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_register_domain:1514 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):ocfs2_dlm_init:2007 ERROR: 
status
= -112
Feb 26 21:04:32 ppsbackup101 kernel: (11375,0):dlm_leave_domain:565 Error -112
sending domain exit message to node 3
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):ocfs2_mount_volume:1093 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,145) on (node
4)
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_request_join:899 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_try_to_join_domain:1048
ERROR: status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_join_domain:1321 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_register_domain:1514 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):ocfs2_dlm_init:2007 ERROR: 
status
= -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):ocfs2_mount_volume:1093 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,97) on (node 
4)
Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,129) on (node
4)
Feb 26 21:04:33 ppsbackup101 kernel: ocfs2: Unmounting device (8,113) on (node
4)


I think it is because of the timeout at the beginning of the logs, but don't
know whether I am right, and what I can do to make it not happen anymore. Is 
there anything I can do to overcome these problems?

kind regards
Sebastian

Sunil Mushran

2007-Feb-26 12:27 UTC

head link

[Ocfs2-users] dlm timeouts and following errors -112

Yes, the messages are related. -112 is EHOSTDOWN.

Sebastian Reitenbach wrote:> Hi list,
>
> I am experimenting with ocfs2 (rpm package: 1.2.2-0.2), using linux-ha
2.0.8
> (all running on a SLES 10 x86-64, rpm packages from linux-ha.org) for the 
> heartbeat. The three nodes are connected on a gigabit switch. From time to 
> time  I have problems to unmount a drive, and I have to reboot the whole 
> system to fix the problem. When these lockups occur, I see these messages 
> in /var/log/messages:
>
>
> Feb 26 21:03:47 ppsbackup101 heartbeat: [5394]: ERROR: Irretrievably lost
> packet: node ppsdb102 seq 6
> Feb 26 21:03:47 ppsbackup101 heartbeat: [5394]: ERROR: Irretrievably lost
> packet: node ppsdb102 seq 6
> Feb 26 21:04:32 ppsbackup101 kernel: o2net: connection to node ppsnfs102
(num
> 3)
> at 192.168.102.32:7777 has been idle for 300.0 seconds, shutting it down.
> Feb 26 21:04:32 ppsbackup101 kernel: (5394,1):o2net_idle_timer:1426 here
are
> some times that might help debug the situation: (tmr 1172519972.626184 now
> 1172520272.653263 dr 1172519972.626167 adv
1172519972.626208:1172519972.626210
> func (666c6172:510) 1172519972.626186:1172519972.626195)
> Feb 26 21:04:32 ppsbackup101 kernel: o2net: no longer connected to node
> ppsnfs102 (num 3) at 192.168.102.32:7777
> Feb 26 21:04:32 ppsbackup101 kernel: (8915,0):dlm_drop_lockres_ref:2283
ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_request_join:899 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_try_to_join_domain:1048
> ERROR: status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (8915,0):dlm_purge_lockres:189 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_join_domain:1321 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_register_domain:1514
ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):ocfs2_dlm_init:2007 ERROR: 
> status
> = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11375,0):dlm_leave_domain:565 Error
-112
> sending domain exit message to node 3
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):ocfs2_mount_volume:1093
ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,145) on
(node
> 4)
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_request_join:899 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_try_to_join_domain:1048
> ERROR: status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_join_domain:1321 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_register_domain:1514
ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):ocfs2_dlm_init:2007 ERROR: 
> status
> = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):ocfs2_mount_volume:1093
ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,97) on
(node
> 4)
> Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,129) on
(node
> 4)
> Feb 26 21:04:33 ppsbackup101 kernel: ocfs2: Unmounting device (8,113) on
(node
> 4)
>
>
> I think it is because of the timeout at the beginning of the logs, but
don't
> know whether I am right, and what I can do to make it not happen anymore.
Is
> there anything I can do to overcome these problems?
>
> kind regards
> Sebastian
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

Seemingly Similar Threads

Search for more possibly parallel threads

Ocfs2 users - Feb 2007 - dlm timeouts and following errors -112

[Ocfs2-users] dlm timeouts and following errors -112

[Ocfs2-users] dlm timeouts and following errors -112

Seemingly Similar Threads