Hi
1. In the callback o2net_sendpage -> sendpage()
2. If sendpage (tcp socket send) continuously returns -EAGAIN, then get into an
endless loop,
even though the function of cond_resched() have already used.
3. I think it is not reasonable, try to continuously send 20 times returns
-EAGAIN ,
shutdown the socket to avoid affecting the entire cluster.
Finally, any feedback about this process (positive or negative) would be
greatly appreciated.
--- tcp.c 2015-06-30 11:46:54.727447919 +0800
+++ tcp.c.diff 2015-06-30 11:52:12.823447881 +0800
@@ -949,6 +949,7 @@
{
struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num);
ssize_t ret;
+ int send_fails = 20;
while (1) {
mutex_lock(&sc->sc_send_lock);
@@ -959,10 +960,11 @@
mutex_unlock(&sc->sc_send_lock);
if (ret == size)
break;
- if (ret == (ssize_t)-EAGAIN) {
+ if (ret == (ssize_t)-EAGAIN && send_fails > 0) {
mlog(0, "sendpage of size %zu to " SC_NODEF_FMT
" returned EAGAIN\n", size, SC_NODEF_ARGS(sc));
cond_resched();
+ --send_fails;
continue;
}
mlog(ML_ERROR, "sendpage of size %zu to " SC_NODEF_FMT
syslog:
/var/log/syslog:Jun 29 09:32:58 cvk47 kernel: [156022.769539]
(kworker/u130:1,12041,9):o2net_sendpage:1026 sendpage of size 24 to node cvk61
(num 5) at 172.16.202.61:7100 returned EAGAIN
/var/log/syslog:Jun 29 09:32:58 cvk47 kernel: [156022.769542]
(kworker/u130:1,12041,9):o2net_sendpage:1026 sendpage of size 24 to node cvk61
(num 5) at 172.16.202.61:7100 returned EAGAIN
/var/log/syslog:Jun 29 09:32:58 cvk47 kernel: [156022.769544]
(kworker/u130:1,12041,9):o2net_sendpage:1026 sendpage of size 24 to node cvk61
(num 5) at 172.16.202.61:7100 returned EAGAIN
/var/log/syslog:Jun 29 09:32:58 cvk47 kernel: [156022.769546]
(kworker/u130:1,12041,9):o2net_sendpage:1026 sendpage of
________________________________
zhangguanghui 10102
-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which
is
intended only for the person or entity whose address is listed above. Any use of
the
information contained herein in any way (including, but not limited to, total or
partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender
by phone or email immediately and delete it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20150630/33cc54e0/attachment.html
Hi
1. In the callback o2net_sendpage -> sendpage()
2. If sendpage (tcp socket send) continuously returns -EAGAIN, then get into an
endless loop,
even though the function of cond_resched() have already used.
3. I think it is not reasonable, try to continuously send 20 times returns
-EAGAIN ,
shutdown the socket to avoid affecting the entire cluster.
Finally, any feedback about this process (positive or negative) would be greatly
appreciated.
--- tcp.c 2015-06-30 11:46:54.727447919 +0800
+++ tcp.c.diff 2015-06-30 11:52:12.823447881 +0800
@@ -949,6 +949,7 @@
{
struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num);
ssize_t ret;
+ int send_fails = 20;
while (1) {
mutex_lock(&sc->sc_send_lock);
@@ -959,10 +960,11 @@
mutex_unlock(&sc->sc_send_lock);
if (ret == size)
break;
- if (ret == (ssize_t)-EAGAIN) {
+ if (ret == (ssize_t)-EAGAIN && send_fails > 0) {
mlog(0, "sendpage of size %zu to " SC_NODEF_FMT
" returned EAGAIN\n", size, SC_NODEF_ARGS(sc));
cond_resched();
+ --send_fails;
continue;
}
mlog(ML_ERROR, "sendpage of size %zu to " SC_NODEF_FMT
syslog:
/var/log/syslog:Jun 29 09:32:58 cvk47 kernel: [156022.769539]
(kworker/u130:1,12041,9):o2net_sendpage:1026 sendpage of size 24 to node cvk61
(num 5) at 172.16.202.61:7100 returned EAGAIN
/var/log/syslog:Jun 29 09:32:58 cvk47 kernel: [156022.769542]
(kworker/u130:1,12041,9):o2net_sendpage:1026 sendpage of size 24 to node cvk61
(num 5) at 172.16.202.61:7100 returned EAGAIN
/var/log/syslog:Jun 29 09:32:58 cvk47 kernel: [156022.769544]
(kworker/u130:1,12041,9):o2net_sendpage:1026 sendpage of size 24 to node cvk61
(num 5) at 172.16.202.61:7100 returned EAGAIN
/var/log/syslog:Jun 29 09:32:58 cvk47 kernel: [156022.769546]
(kworker/u130:1,12041,9):o2net_sendpage:1026 sendpage of
________________________________
zhangguanghui 10102
-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which
is
intended only for the person or entity whose address is listed above. Any use of
the
information contained herein in any way (including, but not limited to, total or
partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender
by phone or email immediately and delete it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20150630/ca5d1058/attachment.html