thr3ads.net - Ocfs2 devel - [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect [Jun 2014]

If this information is useful, please help other people find it:
Share via:

Junxiao Bi

2014-Jun-13 01:48 UTC

[Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect

Hi,

This patch serial is to fix a possible message lost bug in ocfs2 when
network go bad. This bug will cause ocfs2 hung forever even network
become good again.
The messages may lost in this case. After the tcp connection is established
between two nodes, an idle timer will be set to check its state periodically,
if no messages are received during this time, idle timer will timeout, it will
shutdown the connection and try to reconnect, so pending messages in tcp queues
will be lost. This messages may be from dlm. Dlm may get hung in this case. This
may cause the whole ocfs2 cluster hung. 
This is very possible to happen when network state goes bad. Do the reconnect is
useless, it will fail if network state is still bad. Just waiting there for
network recovering may be a good idea, it will not lost messages and some node
will be fenced until cluster goes into split-brain state, for this case, Tcp
user
timeout is used to override the tcp retransmit timeout. It will timeout after 25
days, user should have notice this through the provided log and fix the network,
if they don't, ocfs2 will fall back to original reconnect way.
This is a resend of the patches, no changes since last time. Please help review.

Thanks,
Junxiao.

Junxiao Bi

2014-Jun-13 01:48 UTC

head link

[Ocfs2-devel] [PATCH 1/3] ocfs2: o2net: don't shutdown connection when idle timeout

Some messages in the tcp queue maybe lost if we shutdown the connection
and reconnect when idle timeout. If packets lost and reconnect success,
then the ocfs2 cluster maybe hung.

To fix this, we can leave the connection there and do the fence decision
when idle timeout, if network recover before fence dicision is made, the
connection survive without lost any messages.

This bug can be saw when network state go bad. It may cause ocfs2 hung
forever if some packets lost. With this fix, ocfs2 will recover from
hung if network becomes good again.

Reviewed-by: Srinivas Eeda <srinivas.eeda at oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi at oracle.com>
---
 fs/ocfs2/cluster/tcp.c |   25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index c6b90e6..76ef3d8 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1536,16 +1536,20 @@ static void o2net_idle_timer(unsigned long data)
 #endif
 
 	printk(KERN_NOTICE "o2net: Connection to " SC_NODEF_FMT " has
been "
-	       "idle for %lu.%lu secs, shutting it down.\n",
SC_NODEF_ARGS(sc),
-	       msecs / 1000, msecs % 1000);
+	       "idle for %lu.%lu secs.\n",
+	       SC_NODEF_ARGS(sc), msecs / 1000, msecs % 1000);
 
-	/*
-	 * Initialize the nn_timeout so that the next connection attempt
-	 * will continue in o2net_start_connect.
+	/* idle timerout happen, don't shutdown the connection, but
+	 * make fence decision. Maybe the connection can recover before
+	 * the decision is made.
 	 */
 	atomic_set(&nn->nn_timeout, 1);
+	o2quo_conn_err(o2net_num_from_nn(nn));
+	queue_delayed_work(o2net_wq, &nn->nn_still_up,
+			msecs_to_jiffies(O2NET_QUORUM_DELAY_MS));
+
+	o2net_sc_reset_idle_timer(sc);
 
-	o2net_sc_queue_work(sc, &sc->sc_shutdown_work);
 }
 
 static void o2net_sc_reset_idle_timer(struct o2net_sock_container *sc)
@@ -1560,6 +1564,15 @@ static void o2net_sc_reset_idle_timer(struct
o2net_sock_container *sc)
 
 static void o2net_sc_postpone_idle(struct o2net_sock_container *sc)
 {
+	struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num);
+
+	/* clear fence decision since the connection recover from timeout*/
+	if (atomic_read(&nn->nn_timeout)) {
+		o2quo_conn_up(o2net_num_from_nn(nn));
+		cancel_delayed_work(&nn->nn_still_up);
+		atomic_set(&nn->nn_timeout, 0);
+	}
+
 	/* Only push out an existing timer */
 	if (timer_pending(&sc->sc_idle_timeout))
 		o2net_sc_reset_idle_timer(sc);
-- 
1.7.9.5

Junxiao Bi

2014-Jun-13 01:48 UTC

head link

[Ocfs2-devel] [PATCH 2/3] ocfs2: o2net: set tcp user timeout to max value

When tcp retransmit timeout(15mins), the connection will be closed.
Pending messages may be lost during this time. So we set tcp user
timeout to override the retransmit timeout to the max value.
This is OK for ocfs2 since we have disk heartbeat, if peer crash,
the disk heartbeat will timeout and it will be evicted, if disk
heartbeat not timeout and connection idle for a long time, then
this means the cluster enters split-brain state, since fence can't
happen, we'd better keep the connection and wait network recover.

Reviewed-by: Srinivas Eeda <srinivas.eeda at oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi at oracle.com>
---
 fs/ocfs2/cluster/tcp.c |   20 ++++++++++++++++++++
 fs/ocfs2/cluster/tcp.h |    1 +
 2 files changed, 21 insertions(+)

diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index 76ef3d8..eae58d8 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1480,6 +1480,14 @@ static int o2net_set_nodelay(struct socket *sock)
 	return ret;
 }
 
+static int o2net_set_usertimeout(struct socket *sock)
+{
+	int user_timeout = O2NET_TCP_USER_TIMEOUT;
+
+	return kernel_setsockopt(sock, SOL_TCP, TCP_USER_TIMEOUT,
+				(char *)&user_timeout, sizeof(user_timeout));
+}
+
 static void o2net_initialize_handshake(void)
 {
 	o2net_hand->o2hb_heartbeat_timeout_ms = cpu_to_be32(
@@ -1663,6 +1671,12 @@ static void o2net_start_connect(struct work_struct *work)
 		goto out;
 	}
 
+	ret = o2net_set_usertimeout(sock);
+	if (ret) {
+		mlog(ML_ERROR, "set TCP_USER_TIMEOUT failed with %d\n", ret);
+		goto out;
+	}
+
 	o2net_register_callbacks(sc->sc_sock->sk, sc);
 
 	spin_lock(&nn->nn_lock);
@@ -1842,6 +1856,12 @@ static int o2net_accept_one(struct socket *sock)
 		goto out;
 	}
 
+	ret = o2net_set_usertimeout(new_sock);
+	if (ret) {
+		mlog(ML_ERROR, "set TCP_USER_TIMEOUT failed with %d\n", ret);
+		goto out;
+	}
+
 	slen = sizeof(sin);
 	ret = new_sock->ops->getname(new_sock, (struct sockaddr *) &sin,
 				       &slen, 1);
diff --git a/fs/ocfs2/cluster/tcp.h b/fs/ocfs2/cluster/tcp.h
index 5bada2a..c571e84 100644
--- a/fs/ocfs2/cluster/tcp.h
+++ b/fs/ocfs2/cluster/tcp.h
@@ -63,6 +63,7 @@ typedef void (o2net_post_msg_handler_func)(int status, void
*data,
 #define O2NET_KEEPALIVE_DELAY_MS_DEFAULT	2000
 #define O2NET_IDLE_TIMEOUT_MS_DEFAULT		30000
 
+#define O2NET_TCP_USER_TIMEOUT			0x7fffffff
 
 /* TODO: figure this out.... */
 static inline int o2net_link_down(int err, struct socket *sock)
-- 
1.7.9.5

Junxiao Bi

2014-Jun-13 01:48 UTC

head link

[Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced

For debug use, we can see from the log whether the fence decision
is made and why it is not fenced.

Reviewed-by: Srinivas Eeda <srinivas.eeda at oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi at oracle.com>
---
 fs/ocfs2/cluster/quorum.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/cluster/quorum.c b/fs/ocfs2/cluster/quorum.c
index 1ec141e..62e8ec6 100644
--- a/fs/ocfs2/cluster/quorum.c
+++ b/fs/ocfs2/cluster/quorum.c
@@ -160,9 +160,18 @@ static void o2quo_make_decision(struct work_struct *work)
 	}
 
 out:
-	spin_unlock(&qs->qs_lock);
-	if (fence)
+	if (fence) {
+		spin_unlock(&qs->qs_lock);
 		o2quo_fence_self();
+	} else {
+		mlog(ML_NOTICE, "not fencing this node, heartbeating: %d, "
+			"connected: %d, lowest: %d (%sreachable)\n",
+			qs->qs_heartbeating, qs->qs_connected, lowest_hb,
+			lowest_reachable ? "" : "un");
+		spin_unlock(&qs->qs_lock);
+
+	}
+
 }
 
 static void o2quo_set_hold(struct o2quo_state *qs, u8 node)
-- 
1.7.9.5

Junxiao Bi

2014-Jun-13 01:56 UTC

head link

[Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect

Not sure why Joseph Qi is excluded from cc list of git send-email.
Cc him.

On 06/13/2014 09:48 AM, Junxiao Bi wrote:>
> Hi,
>
> This patch serial is to fix a possible message lost bug in ocfs2 when
> network go bad. This bug will cause ocfs2 hung forever even network
> become good again.
> The messages may lost in this case. After the tcp connection is established
> between two nodes, an idle timer will be set to check its state
periodically,
> if no messages are received during this time, idle timer will timeout, it
will
> shutdown the connection and try to reconnect, so pending messages in tcp
queues
> will be lost. This messages may be from dlm. Dlm may get hung in this case.
This
> may cause the whole ocfs2 cluster hung. 
> This is very possible to happen when network state goes bad. Do the
reconnect is
> useless, it will fail if network state is still bad. Just waiting there for
> network recovering may be a good idea, it will not lost messages and some
node
> will be fenced until cluster goes into split-brain state, for this case,
Tcp user
> timeout is used to override the tcp retransmit timeout. It will timeout
after 25
> days, user should have notice this through the provided log and fix the
network,
> if they don't, ocfs2 will fall back to original reconnect way.
> This is a resend of the patches, no changes since last time. Please help
review.
>
> Thanks,
> Junxiao.
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Ocfs2 devel - Jun 2014 - ocfs2: o2net: fix packets lost issue when reconnect

[Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect

[Ocfs2-devel] [PATCH 1/3] ocfs2: o2net: don't shutdown connection when idle timeout

[Ocfs2-devel] [PATCH 2/3] ocfs2: o2net: set tcp user timeout to max value

[Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced

[Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect