Hi,
as I wrote yesterday I applyed all the patches. Unfortunately it did not bring
the wanted results. The same node crashed again with very similar messages
today. I attached also the messages of the other node that stayed alive.
Not to forget to mention that in the meantime I switched the full hardware to
make sure that it is not a hardware problem. On the first view it looks like a
network problem for me, but as I already wrote before the two nodes are IBM
blades in the same bladecenter directly connected by the bladcenters internal
switch. All the other blades in the same bladecenter make no problems.
I am really at the end of my knowledge and hope you can still help me.
Thanks very much,
- Rainer
+----------------------------------------------+
| These are the messages of the crashing node: |
+----------------------------------------------+
Dec 5 12:58:14 webhost2 kernel: o2net: no longer connected to node webhost1
(num 0) at 10.2.0.70:7777
Dec 5 12:58:14 webhost2 kernel: (10409,1):dlm_send_remote_convert_request:395
ERROR: status = -112
Dec 5 12:58:14 webhost2 kernel: (14860,2):dlm_send_remote_convert_request:395
ERROR: status = -112
Dec 5 12:58:14 webhost2 kernel: (14860,2):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:14 webhost2 kernel: (10409,1):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:14 webhost2 kernel: (8536,0):dlm_send_remote_convert_request:395
ERROR: status = -112
Dec 5 12:58:14 webhost2 kernel: (8536,0):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:20 webhost2 kernel: (10409,1):dlm_send_remote_convert_request:395
ERROR: status = -107
Dec 5 12:58:20 webhost2 kernel: (14860,3):dlm_send_remote_convert_request:395
ERROR: status = -107
Dec 5 12:58:20 webhost2 kernel: (8536,0):dlm_send_remote_convert_request:395
ERROR: status = -107
Dec 5 12:58:20 webhost2 kernel: (14860,3):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:20 webhost2 kernel: (8536,0):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:20 webhost2 kernel: (10409,1):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:25 webhost2 kernel: (10409,1):dlm_send_remote_convert_request:395
ERROR: status = -107
Dec 5 12:58:25 webhost2 kernel: (14860,3):dlm_send_remote_convert_request:395
ERROR: status = -107
Dec 5 12:58:25 webhost2 kernel: (8536,0):dlm_send_remote_convert_request:395
ERROR: status = -107
Dec 5 12:58:25 webhost2 kernel: (14860,3):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:25 webhost2 kernel: (8536,0):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:25 webhost2 kernel: (10409,1):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:30 webhost2 kernel: (14860,2):dlm_send_remote_convert_request:395
ERROR: status = -107
Dec 5 12:58:30 webhost2 kernel: (10409,0):dlm_send_remote_convert_request:395
ERROR: status = -107
Dec 5 12:58:30 webhost2 kernel: (8536,1):dlm_send_remote_convert_request:395
ERROR: status = -107
Dec 5 12:58:30 webhost2 kernel: (10409,0):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:30 webhost2 kernel: (8536,1):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
Dec 5 12:58:30 webhost2 kernel: (14860,2):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of
node 0
+--------------------------------------------------------------------------------+
| During that crash you can see the following messages at the other (stable)
node: |
+--------------------------------------------------------------------------------+
Dec 5 12:58:15 webhost1 kernel: o2net: connection to node webhost2 (num 1) at
10.2.0.71:7777 has been idle for 10 seconds, shutting it down.
Dec 5 12:58:15 webhost1 kernel: (0,2):o2net_idle_timer:1313 here are some times
that might help debug the situation: (tmr 1196859485.13835 now 1196859495.12881
dr 1196859485.13824 adv 1196859485.13837:1196859485.13838 func (434028bd:504)
1196859485.12053:1196859485.12057)
Dec 5 12:58:15 webhost1 kernel: o2net: no longer connected to node webhost2
(num 1) at 10.2.0.71:7777
Dec 5 12:58:15 webhost1 kernel: (8511,2):dlm_send_proxy_ast_msg:457 ERROR:
status = -112
Dec 5 12:58:15 webhost1 kernel: (8511,2):dlm_flush_asts:589 ERROR: status =
-112
Dec 5 12:58:55 webhost1 kernel: (11011,3):ocfs2_replay_journal:1184 Recovering
node 1 from slot 0 on device (147,0)
--------------------------------------------------------------------------------------------
On Monday, December 3, 2007 7:18:12 PM Mark Fasheh wrote:
On Mon, Dec 03, 2007 at 04:45:01AM -0800, rain c wrote:
> thanks very much for your answer.
> My problem is, that I connot really use kernel 2.6.22, because I also need
> the openVZ patch which is not available in a stable version for 2.6.22. Is
> there a way to backport ocfs2-Retry-if-it-returns-EAGAIN to 2.6.18?
Attached is a pair of patches which applied more cleanly. Basically it
includes another tcp.c fix which the -EAGAIN fix built on top of. Both would
be good for you to have one way or the other. Fair warning though - I don't
really have the ability to test 2.6.18 fixes right now, so you're going to
have to be a bit of a beta tester ;) That said, they look pretty clean to me
so I have a relatively high confidence that they should work.
Be sure to apply them in order:
$ cd linux-2.6.18
$ patch -p1 < 0001-ocfs2-Backport-message-locking-fix-to-2.6.18.patch
$ patch -p1 < 0002-ocfs2-Backport-sendpage-fix-to-2.6.18.patch
> Further I wonder why only one (and always the same) of my nodes is so
> unstable.
I'm not sure why it would be always one node and not the other. We'd
probably need more detailing information about what's going on to figure
that out. Maybe some combination of user application + cluster stack
conspires to put a larger messaging load on it?
Are there any other ocfs2 messages in your logs for that node?
> Are you sure that it cannot be any other problem?
No, not 100% sure. My first hunch was the -EAGAIN bug because your messages
looked exactly what I saw there. Looking a bit deeper, it seems that your
value (when turned into a signed integer) is -32, which would actually make
it -EPIPE.
-EPIPE gets returned from several places in the tcp code, in particular
do_tcp_sendpages() and sk_stream_wait_memory(). If you look at the 1st patch
that's attached, you'll see that it fixes some races that occurred when
sending outgoing messages, including when those functions were called. While
I'm not 100% sure these patches will fix it, I definitely think it's
the 1st
thing we should try.
By the way, while you're doing this it might be a good idea to also apply
some of the other patches we backported to 2.6.18 a long time ago:
http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.18/
If the two patches here work for you, I'll probably just add them to that
directory for others to use.
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com
-----Inline Attachment Follows-----
>From 42318a6658696711baf25d8bd17e3d2827472d66 Mon Sep 17 00:00:00 2001
From: Zhen Wei <zwei@novell.com>
Date: Tue, 23 Jan 2007 17:19:59 -0800
Subject: ocfs2: Backport message locking fix to 2.6.18
Untested fix, apply at your own risk.
Original commit message follows.
ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages
When there is a lot of multithreaded I/O usage, two threads can collide
while sending out a message to the other nodes. This is due to the lack of
locking between threads while sending out the messages.
When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux
kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects
itself by acquiring a lock at invocation by calling lock_sock().
tcp_sendmsg() then loops over the buffers in the iovec, allocating
associated sk_buff's and cache pages for use in the actual send. As it does
so, it pushes the data out to tcp for actual transmission. However, if one
of those allocation fails (because a large number of large sends is being
processed, for example), it must wait for memory to become available. It
does so by jumping to wait_for_sndbuf or wait_for_memory, both of which
eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory()
contains a code path that calls sk_wait_event(). Finally, sk_wait_event()
contains the call to release_sock().
The following patch adds a lock to the socket container in order to
properly serialize outbound requests.
From: Zhen Wei <zwei@novell.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
---
fs/ocfs2/cluster/tcp.c | 8 ++++++++
fs/ocfs2/cluster/tcp_internal.h | 2 ++
2 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index b650efa..3c5bf4d 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -520,6 +520,8 @@ static void o2net_register_callbacks(struct sock *sk,
sk->sk_data_ready = o2net_data_ready;
sk->sk_state_change = o2net_state_change;
+ mutex_init(&sc->sc_send_lock);
+
write_unlock_bh(&sk->sk_callback_lock);
}
@@ -818,10 +820,12 @@ static void o2net_sendpage(struct o2net_sock_container
*sc,
ssize_t ret;
+ mutex_lock(&sc->sc_send_lock);
ret = sc->sc_sock->ops->sendpage(sc->sc_sock,
virt_to_page(kmalloced_virt),
(long)kmalloced_virt & ~PAGE_MASK,
size, MSG_DONTWAIT);
+ mutex_unlock(&sc->sc_send_lock);
if (ret != size) {
mlog(ML_ERROR, "sendpage of size %zu to " SC_NODEF_FMT
" failed with %zd\n", size, SC_NODEF_ARGS(sc), ret);
@@ -936,8 +940,10 @@ int o2net_send_message_vec(u32 msg_type, u32 key, struct
kvec *caller_vec,
/* finally, convert the message header to network byte-order
* and send */
+ mutex_lock(&sc->sc_send_lock);
ret = o2net_send_tcp_msg(sc->sc_sock, vec, veclen,
sizeof(struct o2net_msg) + caller_bytes);
+ mutex_unlock(&sc->sc_send_lock);
msglog(msg, "sending returned %d\n", ret);
if (ret < 0) {
mlog(0, "error returned from o2net_send_tcp_msg=%d\n", ret);
@@ -1068,8 +1074,10 @@ static int o2net_process_message(struct
o2net_sock_container *sc,
out_respond:
/* this destroys the hdr, so don't use it after this */
+ mutex_lock(&sc->sc_send_lock);
ret = o2net_send_status_magic(sc->sc_sock, hdr, syserr,
handler_status);
+ mutex_unlock(&sc->sc_send_lock);
hdr = NULL;
mlog(0, "sending handler status %d, syserr %d returned %d\n",
handler_status, syserr, ret);
diff --git a/fs/ocfs2/cluster/tcp_internal.h b/fs/ocfs2/cluster/tcp_internal.h
index ff9e2e2..008fcf9 100644
--- a/fs/ocfs2/cluster/tcp_internal.h
+++ b/fs/ocfs2/cluster/tcp_internal.h
@@ -142,6 +142,8 @@ struct o2net_sock_container {
struct timeval sc_tv_func_stop;
u32 sc_msg_key;
u16 sc_msg_type;
+
+ struct mutex sc_send_lock;
};
struct o2net_msg_handler {
--
1.5.3.4
-----Inline Attachment Follows-----
>From 355053cdec5205ff35398d78f5c93a59eeb502ce Mon Sep 17 00:00:00 2001
From: Sunil Mushran <sunil.mushran@oracle.com>
Date: Mon, 30 Jul 2007 11:02:50 -0700
Subject: ocfs2: Backport sendpage() fix to 2.6.18
Untested fix, apply at your own risk.
Original commit message follows.
ocfs2: Retry sendpage() if it returns EAGAIN
Instead of treating EAGAIN, returned from sendpage(), as an error, this
patch retries the operation.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
---
fs/ocfs2/cluster/tcp.c | 24 ++++++++++++++++--------
1 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index 3c5bf4d..29554e5 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -819,17 +819,25 @@ static void o2net_sendpage(struct o2net_sock_container
*sc,
struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num);
ssize_t ret;
-
- mutex_lock(&sc->sc_send_lock);
- ret = sc->sc_sock->ops->sendpage(sc->sc_sock,
- virt_to_page(kmalloced_virt),
- (long)kmalloced_virt & ~PAGE_MASK,
- size, MSG_DONTWAIT);
- mutex_unlock(&sc->sc_send_lock);
- if (ret != size) {
+ while (1) {
+ mutex_lock(&sc->sc_send_lock);
+ ret = sc->sc_sock->ops->sendpage(sc->sc_sock,
+ virt_to_page(kmalloced_virt),
+ (long)kmalloced_virt & ~PAGE_MASK,
+ size, MSG_DONTWAIT);
+ mutex_unlock(&sc->sc_send_lock);
+ if (ret == size)
+ break;
+ if (ret == (ssize_t)-EAGAIN) {
+ mlog(0, "sendpage of size %zu to " SC_NODEF_FMT
+ " returned EAGAIN\n", size, SC_NODEF_ARGS(sc));
+ cond_resched();
+ continue;
+ }
mlog(ML_ERROR, "sendpage of size %zu to " SC_NODEF_FMT
" failed with %zd\n", size, SC_NODEF_ARGS(sc), ret);
o2net_ensure_shutdown(nn, sc, 0);
+ break;
}
}
--
1.5.3.4
____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs