search for: o2net

Displaying 20 results from an estimated 58 matches for "o2net".

Did you mean: n2net
2009 Nov 06
0
iscsi connection drop, comes back in seconds, then deadlock in cluster
..., now 4325774337 Nov 6 01:00:12 mgr01 kernel: connection1:0: detected conn error (1011) Nov 6 01:00:13 mgr01 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Nov 6 01:00:15 mgr01 iscsid: connection1:0 is operational after recovery (1 attempts) Nov 6 01:00:38 mgr01 kernel: o2net: no longer connected to node rack105 (num 7) at 10.244.1.105:7777 Nov 6 01:00:38 mgr01 kernel: (3270,0):dlm_send_remote_convert_request:395 ERROR: status = -112 Nov 6 01:00:38 mgr01 kernel: (3270,0):dlm_wait_for_node_death:370 4FF4E858AF6E4AEEB2650A543A320C2F: waiting 5000ms for notification o...
2009 Jul 29
3
Error message whil booting system
...S2 DLM 1.4.2 Wed Jul 1 19:55:44 PDT 2009 (build 0faae8d4263a8c594749be558d8d7edd) Jul 27 10:02:21 alf3 kernel: OCFS2 DLMFS 1.4.2 Wed Jul 1 19:55:44 PDT 2009 (build 0faae8d4263a8c594749be558d8d7edd) Jul 27 10:02:21 alf3 kernel: OCFS2 User DLM kernel interface loaded Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf0 (num 0) at 172.25.29.10:7777 Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf2 (num 2) at 172.25.29.12:7777 Jul 27 10:02:25 alf3 kernel: o2net: accepted connection from node alf5 (num 5) at 172.25.29.15:7777 Jul 27 10:02:26 alf3 kernel: o2net: accepted connection...
2011 May 10
3
ERROR: -91 after Kernel Upgrade
...2-tools-1.4.3 Modules are loaded and /config type configfs and /dlm type ocfs2_dlmfs are mounted. server2 ~ # mount /data/ mount.ocfs2: Protocol not available while mounting /dev/sdb1 on /data. Check ''dmesg'' for more information on this error. server2 ~ # dmesg [ 802.267217] o2net: accepted connection from node server4 (num 3) at 10.10.21.14:7777 [ 802.871908] o2net: accepted connection from node server3 (num 2) at 10.10.21.13:7777 [ 805.295632] (mount.ocfs2,13964,2):dlm_send_nodeinfo:1233 ERROR: node mismatch -92, node 2 [ 805.295637] (mount.ocfs2,13964,2):dlm_try_to_...
2013 Apr 28
2
Is it one issue. Do you have some good ideas, thanks a lot.
...the log below. Why is there the information of "Node 255 (he) is the Recovery Master for the dead node 255" in the syslog? Why the host ZHJD-VM6 is blocked until it reboot one day time later, and what is it wait for still? Thanks a lot. Apr 27 17:35:59 ZHJD-VM6 kernel: [ 3734.057330] o2net: Connection to node ZHJD-VM5 (num 5) at 185.200.1.16:7100 has been idle for 30.100 secs, shutting it down. Apr 27 17:35:59 ZHJD-VM6 kernel: [ 3734.057359] o2net: No longer connected to node ZHJD-VM5 (num 5) at 185.200.1.16:7100 Apr 27 17:35:59 ZHJD-VM6 kernel: [ 3734.058212] o2net: Connected to nod...
2013 Apr 28
2
Is it one issue. Do you have some good ideas, thanks a lot.
...the log below. Why is there the information of "Node 255 (he) is the Recovery Master for the dead node 255" in the syslog? Why the host ZHJD-VM6 is blocked until it reboot one day time later, and what is it wait for still? Thanks a lot. Apr 27 17:35:59 ZHJD-VM6 kernel: [ 3734.057330] o2net: Connection to node ZHJD-VM5 (num 5) at 185.200.1.16:7100 has been idle for 30.100 secs, shutting it down. Apr 27 17:35:59 ZHJD-VM6 kernel: [ 3734.057359] o2net: No longer connected to node ZHJD-VM5 (num 5) at 185.200.1.16:7100 Apr 27 17:35:59 ZHJD-VM6 kernel: [ 3734.058212] o2net: Connected to nod...
2007 Feb 06
2
Network 10 sec timeout setting?
Hello! Hey didnt a setting for the 10 second network timeout get into the 2.6.20 kernel? if so how do we set this? I am getting OCFS2 1.3.3 (2201,0):o2net_connect_expired:1547 ERROR: no connection established with node 1 after 10.0 seconds, giving up and returning errors. (2458,0):dlm_request_join:802 ERROR: status = -107 (2458,0):dlm_try_to_join_domain:950 ERROR: status = -107 (2458,0):dlm_join_domain:1202 ERROR: status = -107 (2458,0):dlm_register_...
2009 Jul 22
2
OCFS2 Node restart
...emote logging for kernel, and here is log. I noticed VM become non-response and suddenly reboots. I am running Alfresco (documents sharing) application all nodes are accessing common share on OCFS. --------------------------------------------------------- -Jul 22 09:01:25 172.25.29.10 kernel: o2net: connection to node alf3 (num 3) at 172.25.29.13:7777 has been idle for 30.0 secon ds, shutting it down. -Jul 22 09:01:25 172.25.29.10 kernel: (0,1):o2net_idle_timer:1506 here are some times that might help debug the situation: (tm r 1248267655.660420 now 1248267685.655778 dr 1248267655.660405...
2010 Jul 29
3
[PATCH 1/1] O2net: Disallow o2net accept connection request from itself.
Currently, o2net_accept_one() is allowed to accept a connection from listening node itself, such a fake connection will not be successfully established due to no handshake detected afterwards, and later end up with triggering connecting worker in a loop. We're going to fix this by treating such connection requ...
2010 Oct 23
1
Reg: ocfs2 two node cluster crashed, node2 crashed, when I rebooted node1 for maintenance.
...: Nodes in domain ("C54B4F6991954F98AA6A37C4F3901CD8"): 2 Oct 23 15:42:58 node2 kernel: ocfs2_dlm: Node 1 leaves domain D96AC8E8BDD54913AE6D8EC0EB539603 Oct 23 15:42:58 node2 kernel: ocfs2_dlm: Nodes in domain ("D96AC8E8BDD54913AE6D8EC0EB539603"): 2 Oct 23 15:44:06 node2 kernel: o2net: connection to node node1 (num 1) at 192.168.3.1:7777 has been idle for 60 .0 seconds, shutting it down. Oct 23 15:44:06 node2 kernel: (swapper,0,15):o2net_idle_timer:1503 here are some times that might help debug the situa tion: (tmr 1287848586.872368 now 1287848646.872227 dr 1287848586.872346 adv...
2010 Jan 14
1
another fencing question
Hi, periodically one of on my two nodes cluster is fenced here are the logs: Jan 14 07:01:44 nvr1-rc kernel: o2net: no longer connected to node nvr2- rc.minint.it (num 0) at 1.1.1.6:7777 Jan 14 07:01:44 nvr1-rc kernel: (21534,1):dlm_do_master_request:1334 ERROR: link to 0 went down! Jan 14 07:01:44 nvr1-rc kernel: (4007,4):dlm_send_proxy_ast_msg:458 ERROR: status = -112 Jan 14 07:01:44 nvr1-rc kernel: (4007,4...
2010 Dec 09
2
servers blocked on ocfs2
...servers (ocfs2-1.4.7) Some days ago, two servers sharing an ocfs2 filesystem, and with quite virtual services, stalled, in what it seems on ocfs2 issue. This are the lines in their messages files: =====node heraclito (0)======================================== /Dec 4 09:15:06 heraclito kernel: o2net: connection to node parmenides (num 1) at 192.168.1.2:7777 has been idle for 30.0 seconds, shutting it down. Dec 4 09:15:06 heraclito kernel: (swapper,0,7):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1291450476.228826 now 1291450506.229456 dr 1291450476....
2009 Nov 20
3
o2net patch that avoids socket disconnect/reconnect
This fix modifies o2net layer behavior which seems to trigger some DLM race issues during umount/evictions that needs to be fixed as well. I am working on the dlm issues but meanwhile please review this patch. Thanks, --Srini
2008 Feb 04
0
[PATCH] o2net: Reconnect after idle time out.
Currently, o2net connects to a node on hb_up and disconnects on hb_down and net timeout. It disconnects on net timeout is ok, but it should attempt to reconnect back. This is because sometimes nodes get overloaded enough that the network connection breaks but the disk hb does not. And if we get into that situation...
2011 Feb 10
0
(o2net, 6301, 0):o2net_connect_expired:1664 ERROR: no connection established with node 1 after 60.0 seconds, giving up and returning errors.
Hello, I am installing Two Node cluster when I automount the file systems I am getting o2net_connect_expired error and it is not mounting the cluster filesystems if I mount the cluster file systems manually as mount -a it is mounting the file systems without any issues. 1.If I bring Node1 up with Node2 to down cluster file system is automounting fine without any issues. 2.I checked the c...
2008 Feb 13
2
[PATCH] o2net: Reconnect after idle time out.V2
Modification from V1 to V2: 1. Use atomic ops instead of spin_lock in timer. 2. Add some comments when querying connect_expired work. These comments are copied form Zach's mail.;) Currently, o2net connects to a node on hb_up and disconnects on hb_down and net timeout. It disconnects on net timeout is ok, but it should attempt to reconnect back. This is because sometimes nodes get overloaded enough that the network connection breaks but the disk hb does not. And if we get into that situation...
2014 Sep 26
2
One node hangs up issue requiring goog idea, thanks
Hi, all, As we use OCFS2, the network is not good. When the converting request message can?t send to the another node, there will be a node hangs up which will still waiting for the dlm. CAS2/logdir/var/log/syslog.1-6778-Sep 16 20:57:16 CAS2 kernel: [516366.623623] o2net: Connection to node CAS1 (num 1) at 10.172.254.1:7100 has been idle for 30.87 secs, shutting it down. CAS2/logdir/var/log/syslog.1-6779-Sep 16 20:57:16 CAS2 kernel: [516366.623631] o2net_idle_timer 1621: Local and remote node is heartbeating, and try connect CAS2/logdir/var/log/syslog.1-6780-Sep 16...
2007 Aug 22
1
mount.ocfs2: Value too large ...
...too large for defined data type while mounting /dev/sdb1 on /ext_arrays/ds3200_1/. Check 'dmesg' for more information on this error. --------------- In serv_x86_64's dmesg are following lines ---------------- ocfs2_dlm: Nodes in domain ("892E82953F2147A4BD75E2AAC5750BD3"): 1 o2net: connected to node serv_i386 (num 0) at 19X.XXX.69.194:7777 ocfs2_dlm: Nodes in domain ("892E82953F2147A4BD75E2AAC5750BD3"): 0 1 kjournald starting. Commit interval 5 seconds (11637,3):ocfs2_broadcast_vote:434 ERROR: status = -75 (11637,3):ocfs2_do_request_vote:504 ERROR: status = -75 (1...
2008 Jan 23
1
OCFS2 DLM problems
...unters showing and even during the problem we can communicate via the bond0 interface. This setup has been running for more then 2 months but last Wednesday morning and today again, we had 2 nodes causing locking problems. The problem starts with messages like this: Jan 23 03:20:44 dbprd01 kernel: o2net: no longer connected to node dbprd02 (num 1) at 192.168.202.2:7777 Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_send_proxy_ast_msg:459 ERROR: status = -107 Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_flush_asts:600 ERROR: status = -107 Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_send_proxy_ast_ms...
2009 Apr 20
2
BUG: soft lockup - CPU#1 stuck for 61s
?i, I have a cluster with 5 nodes hosting web application. All web servers save log info into shared access.log file. There is awstats log analyzer on the first node. Sometimes this node fails with the following messages (captured on another server) Apr 20 17:31:16 um-be-2 [145813.022112] o2net: connection to node um-fe-1 (num 1) at 192.168.10.10:7777 has been idle for 30.0 seconds, shutting it down. Apr 20 17:31:16 um-be-2 [145813.022397] o2net: no longer connected to node um-fe-1 (num 1) at 192.168.10.10:7777 Apr 20 17:31:16 um-fe-1 [ 9087.529912] o2net: connection to node um-be-1 (num...
2006 Jan 09
0
[PATCH 01/11] ocfs2: event-driven quorum
This patch separates o2net and o2quo from knowing about one another as much as possible. This is the first in a series of patches that will allow userspace cluster interaction. Quorum is separated out first, and will ultimately only be associated with the disk heartbeat as a separate module. To do so, this patch perform...