Hi,
When system booting getting error message "modprobe: FATAL: Module
ocfs2_stackglue not found" in message. Some nodes reboot without any error
message.
-------------------------------------------------
ul 27 10:02:19 alf3 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Jul 27 10:02:19 alf3 kernel: Netfilter messages via NETLINK v0.30.
Jul 27 10:02:19 alf3 kernel: ip_conntrack version 2.4 (8192 buckets, 65536
max) - 304 bytes per conntrack
Jul 27 10:02:19 alf3 kernel: e1000: eth0: e1000_watchdog_task: NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: None
Jul 27 10:02:20 alf3 setroubleshoot: [server.ERROR] cannot start systen DBus
service: Failed to connect to socket /var/run/db
us/system_bus_socket: No such file or directory
Jul 27 10:02:20 alf3 kernel: VMware memory control driver initialized
Jul 27 10:02:20 alf3 kernel: e1000: eth0: e1000_set_tso: TSO is Enabled
Jul 27 10:02:21 alf3 modprobe: FATAL: Module ocfs2_stackglue not found.
Jul 27 10:02:21 alf3 kernel: OCFS2 Node Manager 1.4.2 Wed Jul 1 19:55:44
PDT 2009 (build 0b9eb999c4d39c0d4b66219a2752cda6)
Jul 27 10:02:21 alf3 kernel: OCFS2 DLM 1.4.2 Wed Jul 1 19:55:44 PDT 2009
(build 0faae8d4263a8c594749be558d8d7edd)
Jul 27 10:02:21 alf3 kernel: OCFS2 DLMFS 1.4.2 Wed Jul 1 19:55:44 PDT 2009
(build 0faae8d4263a8c594749be558d8d7edd)
Jul 27 10:02:21 alf3 kernel: OCFS2 User DLM kernel interface loaded
Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf0 (num 0) at
172.25.29.10:7777
Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf2 (num 2) at
172.25.29.12:7777
Jul 27 10:02:25 alf3 kernel: o2net: accepted connection from node alf5 (num
5) at 172.25.29.15:7777
Jul 27 10:02:26 alf3 kernel: o2net: accepted connection from node alf4 (num
4) at 172.25.29.14:7777
Jul 27 10:02:27 alf3 kernel: o2net: connected to node alf1 (num 1) at
172.25.29.11:7777
Jul 27 10:02:31 alf3 kernel: OCFS2 1.4.2 Wed Jul 1 19:55:41 PDT 2009 (build
966fd2793489955b2271e7bb7e691088)
Jul 27 10:02:31 alf3 kernel: ocfs2_dlm: Nodes in domain
("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
Kernel log from another node alf1 for above node alf3 is like
Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at
172.25.29.13:7777 has been idle for 30.0 seconds, shut
ting it down.
Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some times
that might help debug the situation: (tmr 124887
6927.861591 now 1248876957.858464 dr 1248876927.861556 adv
1248876927.861622:1248876927.861623 func (0ffa2aed:506) 1248876927
.861592:1248876927.861604)
Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3 (num 3)
at 172.25.29.13:7777
Jul 29 10:16:27 alf1 kernel: (2600,1):o2net_connect_expired:1667 ERROR: no
connection established with node 3 after 30.0 seco
nds, giving up and returning errors.
Jul 29 10:17:27 alf1 last message repeated 2 times
Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device
(8,33): dlm has evicted node 3
Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844
7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node
(3) to recover before lock mastery can begin
Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878
7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty,
but must master $RECOVERY lock now
Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node 1 is
the Recovery Master for the Dead Node 3 for Domain
7BE7E9E2026A40F8801B56257D805C88
Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3 (num
3) at 172.25.29.13:7777
Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain
7BE7E9E2026A40F8801B56257D805C88
Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain
("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5
Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at
172.25.29.10:7777
Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain
7BE7E9E2026A40F8801B56257D805C88
Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain
("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
OS = Red Hat 5.2
[root at alf3 /]# uname -a
Linux alf3 2.6.18-128.1.16.el5 #1 SMP Fri Jun 26 10:53:31 EDT 2009 x86_64
x86_64 x86_64 GNU/Linux
[root at alf3 /]# rpm -qa | grep ocfs2
ocfs2-tools-1.4.2-1.el5
ocfs2-2.6.18-128.1.16.el5-1.4.2-1.el5
ocfs2console-1.4.2-1.el5
Any help will be appreciated, OCFS2 cluster is not stable. Mounting File
system for file sharing with Alfresco.
Thanks
Raheel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090729/6c9b80b4/attachment.html
ocfs2_stackglue not found error message is harmless. We use the same init script for all versions of the fs.... stackglue is present in the current mainline and will be in ocfs2 1.6. Raheel Akhtar wrote:> > Hi, > > When system booting getting error message ?modprobe: FATAL: Module > ocfs2_stackglue not found? in message. Some nodes reboot without any > error message. > > ------------------------------------------------- > > ul 27 10:02:19 alf3 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team > > Jul 27 10:02:19 alf3 kernel: Netfilter messages via NETLINK v0.30. > > Jul 27 10:02:19 alf3 kernel: ip_conntrack version 2.4 (8192 buckets, > 65536 max) - 304 bytes per conntrack > > Jul 27 10:02:19 alf3 kernel: e1000: eth0: e1000_watchdog_task: NIC > Link is Up 1000 Mbps Full Duplex, Flow Control: None > > Jul 27 10:02:20 alf3 setroubleshoot: [server.ERROR] cannot start > systen DBus service: Failed to connect to socket /var/run/db > > us/system_bus_socket: No such file or directory > > Jul 27 10:02:20 alf3 kernel: VMware memory control driver initialized > > Jul 27 10:02:20 alf3 kernel: e1000: eth0: e1000_set_tso: TSO is Enabled > > Jul 27 10:02:21 alf3 modprobe: FATAL: Module ocfs2_stackglue not found. > > Jul 27 10:02:21 alf3 kernel: OCFS2 Node Manager 1.4.2 Wed Jul 1 > 19:55:44 PDT 2009 (build 0b9eb999c4d39c0d4b66219a2752cda6) > > Jul 27 10:02:21 alf3 kernel: OCFS2 DLM 1.4.2 Wed Jul 1 19:55:44 PDT > 2009 (build 0faae8d4263a8c594749be558d8d7edd) > > Jul 27 10:02:21 alf3 kernel: OCFS2 DLMFS 1.4.2 Wed Jul 1 19:55:44 PDT > 2009 (build 0faae8d4263a8c594749be558d8d7edd) > > Jul 27 10:02:21 alf3 kernel: OCFS2 User DLM kernel interface loaded > > Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf0 (num 0) at > 172.25.29.10:7777 > > Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf2 (num 2) at > 172.25.29.12:7777 > > Jul 27 10:02:25 alf3 kernel: o2net: accepted connection from node alf5 > (num 5) at 172.25.29.15:7777 > > Jul 27 10:02:26 alf3 kernel: o2net: accepted connection from node alf4 > (num 4) at 172.25.29.14:7777 > > Jul 27 10:02:27 alf3 kernel: o2net: connected to node alf1 (num 1) at > 172.25.29.11:7777 > > Jul 27 10:02:31 alf3 kernel: OCFS2 1.4.2 Wed Jul 1 19:55:41 PDT 2009 > (build 966fd2793489955b2271e7bb7e691088) > > Jul 27 10:02:31 alf3 kernel: ocfs2_dlm: Nodes in domain > ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5 > > Kernel log from another node alf1 for above node alf3 is like > > Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at > 172.25.29.13:7777 has been idle for 30.0 seconds, shut > > ting it down. > > Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some > times that might help debug the situation: (tmr 124887 > > 6927.861591 now 1248876957.858464 dr 1248876927.861556 adv > 1248876927.861622:1248876927.861623 func (0ffa2aed:506) 1248876927 > > .861592:1248876927.861604) > > Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3 > (num 3) at 172.25.29.13:7777 > > Jul 29 10:16:27 alf1 kernel: (2600,1):o2net_connect_expired:1667 > ERROR: no connection established with node 3 after 30.0 seco > > nds, giving up and returning errors. > > Jul 29 10:17:27 alf1 last message repeated 2 times > > Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device > (8,33): dlm has evicted node 3 > > Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844 > 7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node > > (3) to recover before lock mastery can begin > > Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878 > 7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty, > > but must master $RECOVERY lock now > > Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node > 1 is the Recovery Master for the Dead Node 3 for Domain > > 7BE7E9E2026A40F8801B56257D805C88 > > Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3 > (num 3) at 172.25.29.13:7777 > > Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain > 7BE7E9E2026A40F8801B56257D805C88 > > Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain > ("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5 > > Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at > 172.25.29.10:7777 > > Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain > 7BE7E9E2026A40F8801B56257D805C88 > > Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain > ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5 > > OS = Red Hat 5.2 > > [root at alf3 /]# uname -a > > Linux alf3 2.6.18-128.1.16.el5 #1 SMP Fri Jun 26 10:53:31 EDT 2009 > x86_64 x86_64 x86_64 GNU/Linux > > [root at alf3 /]# rpm -qa | grep ocfs2 > > ocfs2-tools-1.4.2-1.el5 > > ocfs2-2.6.18-128.1.16.el5-1.4.2-1.el5 > > ocfs2console-1.4.2-1.el5 > > Any help will be appreciated, OCFS2 cluster is not stable. Mounting > File system for file sharing with Alfresco. > > Thanks > > Raheel > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
Thanks, One of node (alf3) rebooted and here is log message from another
node alf1 about some error about node3.
Why node3 rebooted?
-------------------------------
Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at
172.25.29.13:7777 has been idle for 30.0 seconds, shutting it down.
Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some times
that might help debug the situation: (tmr 1248876927.861591 now
1248876957.858464 dr 1248876927.861556 adv
1248876927.861622:1248876927.861623 func (0ffa2aed:506)
1248876927.861592:1248876927.861604)
Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3 (num 3)
at 172.25.29.13:7777 Jul 29 10:16:27 alf1 kernel:
(2600,1):o2net_connect_expired:1667 ERROR: no connection established with
node 3 after 30.0 seconds, giving up and returning errors.
Jul 29 10:17:27 alf1 last message repeated 2 times
Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device
(8,33): dlm has evicted node 3
Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844
7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node (3) to recover
before lock mastery can begin
Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878
7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty, but must master
$RECOVERY lock now
Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node 1 is
the Recovery Master for the Dead Node 3 for Domain
7BE7E9E2026A40F8801B56257D805C88
Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3 (num
3) at 172.25.29.13:7777
Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain
7BE7E9E2026A40F8801B56257D805C88
Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain
("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5
Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at
172.25.29.10:7777
Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain
7BE7E9E2026A40F8801B56257D805C88
Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain
("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
----------------------------------
-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
Sent: Wednesday, July 29, 2009 1:25 PM
To: Raheel Akhtar
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] Error message whil booting system
ocfs2_stackglue not found error message is harmless.
We use the same init script for all versions of the fs.... stackglue
is present in the current mainline and will be in ocfs2 1.6.
Raheel Akhtar wrote:>
> Hi,
>
> When system booting getting error message ??modprobe: FATAL: Module
> ocfs2_stackglue not found?? in message. Some nodes reboot without any
> error message.
>
> -------------------------------------------------
>
> ul 27 10:02:19 alf3 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
>
> Jul 27 10:02:19 alf3 kernel: Netfilter messages via NETLINK v0.30.
>
> Jul 27 10:02:19 alf3 kernel: ip_conntrack version 2.4 (8192 buckets,
> 65536 max) - 304 bytes per conntrack
>
> Jul 27 10:02:19 alf3 kernel: e1000: eth0: e1000_watchdog_task: NIC
> Link is Up 1000 Mbps Full Duplex, Flow Control: None
>
> Jul 27 10:02:20 alf3 setroubleshoot: [server.ERROR] cannot start
> systen DBus service: Failed to connect to socket /var/run/db
>
> us/system_bus_socket: No such file or directory
>
> Jul 27 10:02:20 alf3 kernel: VMware memory control driver initialized
>
> Jul 27 10:02:20 alf3 kernel: e1000: eth0: e1000_set_tso: TSO is Enabled
>
> Jul 27 10:02:21 alf3 modprobe: FATAL: Module ocfs2_stackglue not found.
>
> Jul 27 10:02:21 alf3 kernel: OCFS2 Node Manager 1.4.2 Wed Jul 1
> 19:55:44 PDT 2009 (build 0b9eb999c4d39c0d4b66219a2752cda6)
>
> Jul 27 10:02:21 alf3 kernel: OCFS2 DLM 1.4.2 Wed Jul 1 19:55:44 PDT
> 2009 (build 0faae8d4263a8c594749be558d8d7edd)
>
> Jul 27 10:02:21 alf3 kernel: OCFS2 DLMFS 1.4.2 Wed Jul 1 19:55:44 PDT
> 2009 (build 0faae8d4263a8c594749be558d8d7edd)
>
> Jul 27 10:02:21 alf3 kernel: OCFS2 User DLM kernel interface loaded
>
> Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf0 (num 0) at
> 172.25.29.10:7777
>
> Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf2 (num 2) at
> 172.25.29.12:7777
>
> Jul 27 10:02:25 alf3 kernel: o2net: accepted connection from node alf5
> (num 5) at 172.25.29.15:7777
>
> Jul 27 10:02:26 alf3 kernel: o2net: accepted connection from node alf4
> (num 4) at 172.25.29.14:7777
>
> Jul 27 10:02:27 alf3 kernel: o2net: connected to node alf1 (num 1) at
> 172.25.29.11:7777
>
> Jul 27 10:02:31 alf3 kernel: OCFS2 1.4.2 Wed Jul 1 19:55:41 PDT 2009
> (build 966fd2793489955b2271e7bb7e691088)
>
> Jul 27 10:02:31 alf3 kernel: ocfs2_dlm: Nodes in domain
> ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
>
> Kernel log from another node alf1 for above node alf3 is like
>
> Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at
> 172.25.29.13:7777 has been idle for 30.0 seconds, shut
>
> ting it down.
>
> Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some
> times that might help debug the situation: (tmr 124887
>
> 6927.861591 now 1248876957.858464 dr 1248876927.861556 adv
> 1248876927.861622:1248876927.861623 func (0ffa2aed:506) 1248876927
>
> .861592:1248876927.861604)
>
> Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3
> (num 3) at 172.25.29.13:7777
>
> Jul 29 10:16:27 alf1 kernel: (2600,1):o2net_connect_expired:1667
> ERROR: no connection established with node 3 after 30.0 seco
>
> nds, giving up and returning errors.
>
> Jul 29 10:17:27 alf1 last message repeated 2 times
>
> Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device
> (8,33): dlm has evicted node 3
>
> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844
> 7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node
>
> (3) to recover before lock mastery can begin
>
> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878
> 7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty,
>
> but must master $RECOVERY lock now
>
> Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node
> 1 is the Recovery Master for the Dead Node 3 for Domain
>
> 7BE7E9E2026A40F8801B56257D805C88
>
> Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3
> (num 3) at 172.25.29.13:7777
>
> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain
> 7BE7E9E2026A40F8801B56257D805C88
>
> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain
> ("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5
>
> Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at
> 172.25.29.10:7777
>
> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain
> 7BE7E9E2026A40F8801B56257D805C88
>
> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain
> ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
>
> OS = Red Hat 5.2
>
> [root at alf3 /]# uname -a
>
> Linux alf3 2.6.18-128.1.16.el5 #1 SMP Fri Jun 26 10:53:31 EDT 2009
> x86_64 x86_64 x86_64 GNU/Linux
>
> [root at alf3 /]# rpm -qa | grep ocfs2
>
> ocfs2-tools-1.4.2-1.el5
>
> ocfs2-2.6.18-128.1.16.el5-1.4.2-1.el5
>
> ocfs2console-1.4.2-1.el5
>
> Any help will be appreciated, OCFS2 cluster is not stable. Mounting
> File system for file sharing with Alfresco.
>
> Thanks
>
> Raheel
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
We appear to be stuck in a loop. You have to have netconsole setup. Ping support if you need help setting up netconsole. Raheel Akhtar wrote:> Thanks, One of node (alf3) rebooted and here is log message from another > node alf1 about some error about node3. > Why node3 rebooted? > > > ------------------------------- > Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at > 172.25.29.13:7777 has been idle for 30.0 seconds, shutting it down. > > Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some times > that might help debug the situation: (tmr 1248876927.861591 now > 1248876957.858464 dr 1248876927.861556 adv > 1248876927.861622:1248876927.861623 func (0ffa2aed:506) > 1248876927.861592:1248876927.861604) > > Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3 (num 3) > at 172.25.29.13:7777 Jul 29 10:16:27 alf1 kernel: > (2600,1):o2net_connect_expired:1667 ERROR: no connection established with > node 3 after 30.0 seconds, giving up and returning errors. > > Jul 29 10:17:27 alf1 last message repeated 2 times > > Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device > (8,33): dlm has evicted node 3 > > Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844 > 7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node (3) to recover > before lock mastery can begin > > Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878 > 7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty, but must master > $RECOVERY lock now > > Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node 1 is > the Recovery Master for the Dead Node 3 for Domain > 7BE7E9E2026A40F8801B56257D805C88 > > Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3 (num > 3) at 172.25.29.13:7777 > > Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain > 7BE7E9E2026A40F8801B56257D805C88 > > Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain > ("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5 > > Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at > 172.25.29.10:7777 > > Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain > 7BE7E9E2026A40F8801B56257D805C88 > > Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain > ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5 > ---------------------------------- > > > > > -----Original Message----- > From: Sunil Mushran [mailto:sunil.mushran at oracle.com] > Sent: Wednesday, July 29, 2009 1:25 PM > To: Raheel Akhtar > Cc: ocfs2-users at oss.oracle.com > Subject: Re: [Ocfs2-users] Error message whil booting system > > ocfs2_stackglue not found error message is harmless. > We use the same init script for all versions of the fs.... stackglue > is present in the current mainline and will be in ocfs2 1.6. > > Raheel Akhtar wrote: > >> Hi, >> >> When system booting getting error message ??modprobe: FATAL: Module >> ocfs2_stackglue not found?? in message. Some nodes reboot without any >> error message. >> >> ------------------------------------------------- >> >> ul 27 10:02:19 alf3 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team >> >> Jul 27 10:02:19 alf3 kernel: Netfilter messages via NETLINK v0.30. >> >> Jul 27 10:02:19 alf3 kernel: ip_conntrack version 2.4 (8192 buckets, >> 65536 max) - 304 bytes per conntrack >> >> Jul 27 10:02:19 alf3 kernel: e1000: eth0: e1000_watchdog_task: NIC >> Link is Up 1000 Mbps Full Duplex, Flow Control: None >> >> Jul 27 10:02:20 alf3 setroubleshoot: [server.ERROR] cannot start >> systen DBus service: Failed to connect to socket /var/run/db >> >> us/system_bus_socket: No such file or directory >> >> Jul 27 10:02:20 alf3 kernel: VMware memory control driver initialized >> >> Jul 27 10:02:20 alf3 kernel: e1000: eth0: e1000_set_tso: TSO is Enabled >> >> Jul 27 10:02:21 alf3 modprobe: FATAL: Module ocfs2_stackglue not found. >> >> Jul 27 10:02:21 alf3 kernel: OCFS2 Node Manager 1.4.2 Wed Jul 1 >> 19:55:44 PDT 2009 (build 0b9eb999c4d39c0d4b66219a2752cda6) >> >> Jul 27 10:02:21 alf3 kernel: OCFS2 DLM 1.4.2 Wed Jul 1 19:55:44 PDT >> 2009 (build 0faae8d4263a8c594749be558d8d7edd) >> >> Jul 27 10:02:21 alf3 kernel: OCFS2 DLMFS 1.4.2 Wed Jul 1 19:55:44 PDT >> 2009 (build 0faae8d4263a8c594749be558d8d7edd) >> >> Jul 27 10:02:21 alf3 kernel: OCFS2 User DLM kernel interface loaded >> >> Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf0 (num 0) at >> 172.25.29.10:7777 >> >> Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf2 (num 2) at >> 172.25.29.12:7777 >> >> Jul 27 10:02:25 alf3 kernel: o2net: accepted connection from node alf5 >> (num 5) at 172.25.29.15:7777 >> >> Jul 27 10:02:26 alf3 kernel: o2net: accepted connection from node alf4 >> (num 4) at 172.25.29.14:7777 >> >> Jul 27 10:02:27 alf3 kernel: o2net: connected to node alf1 (num 1) at >> 172.25.29.11:7777 >> >> Jul 27 10:02:31 alf3 kernel: OCFS2 1.4.2 Wed Jul 1 19:55:41 PDT 2009 >> (build 966fd2793489955b2271e7bb7e691088) >> >> Jul 27 10:02:31 alf3 kernel: ocfs2_dlm: Nodes in domain >> ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5 >> >> Kernel log from another node alf1 for above node alf3 is like >> >> Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at >> 172.25.29.13:7777 has been idle for 30.0 seconds, shut >> >> ting it down. >> >> Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some >> times that might help debug the situation: (tmr 124887 >> >> 6927.861591 now 1248876957.858464 dr 1248876927.861556 adv >> 1248876927.861622:1248876927.861623 func (0ffa2aed:506) 1248876927 >> >> .861592:1248876927.861604) >> >> Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3 >> (num 3) at 172.25.29.13:7777 >> >> Jul 29 10:16:27 alf1 kernel: (2600,1):o2net_connect_expired:1667 >> ERROR: no connection established with node 3 after 30.0 seco >> >> nds, giving up and returning errors. >> >> Jul 29 10:17:27 alf1 last message repeated 2 times >> >> Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device >> (8,33): dlm has evicted node 3 >> >> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844 >> 7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node >> >> (3) to recover before lock mastery can begin >> >> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878 >> 7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty, >> >> but must master $RECOVERY lock now >> >> Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node >> 1 is the Recovery Master for the Dead Node 3 for Domain >> >> 7BE7E9E2026A40F8801B56257D805C88 >> >> Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3 >> (num 3) at 172.25.29.13:7777 >> >> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain >> 7BE7E9E2026A40F8801B56257D805C88 >> >> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain >> ("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5 >> >> Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at >> 172.25.29.10:7777 >> >> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain >> 7BE7E9E2026A40F8801B56257D805C88 >> >> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain >> ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5 >> >> OS = Red Hat 5.2 >> >> [root at alf3 /]# uname -a >> >> Linux alf3 2.6.18-128.1.16.el5 #1 SMP Fri Jun 26 10:53:31 EDT 2009 >> x86_64 x86_64 x86_64 GNU/Linux >> >> [root at alf3 /]# rpm -qa | grep ocfs2 >> >> ocfs2-tools-1.4.2-1.el5 >> >> ocfs2-2.6.18-128.1.16.el5-1.4.2-1.el5 >> >> ocfs2console-1.4.2-1.el5 >> >> Any help will be appreciated, OCFS2 cluster is not stable. Mounting >> File system for file sharing with Alfresco. >> >> Thanks >> >> Raheel >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> > >