Hello,
on three of eight OSTs i can see sporadic messages like these:
sadosrd21
Nov 24 09:11:52 sadosrd21 LustreError:
5518:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.133
Nov 24 09:12:01 sadosrd21 LustreError:
5516:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.19
sadosrd24
Nov 21 01:42:13 sadosrd24 LustreError:
9097:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.111
Nov 21 01:42:13 sadosrd24 LustreError:
9098:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.114
Nov 22 04:01:59 sadosrd24 LustreError:
9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.116
Nov 23 01:42:16 sadosrd24 LustreError:
9099:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.34
Nov 23 01:42:27 sadosrd24 LustreError:
9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from
192.168.16.34
Nov 23 01:42:59 sadosrd24 LustreError:
9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from
192.168.16.116
sadosrd25
Nov 22 04:02:06 sadosrd25 LustreError:
5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.19
Nov 23 04:00:53 sadosrd25 LustreError:
5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.114
Nov 23 04:01:01 sadosrd25 LustreError:
5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.115
Nov 23 04:01:02 sadosrd25 LustreError:
5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.109
Nov 23 09:12:57 sadosrd25 LustreError:
5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.111
Nov 24 01:41:40 sadosrd25 LustreError:
5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.110
Nov 24 01:42:57 sadosrd25 LustreError:
5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.111
Nov 24 01:43:03 sadosrd25 LustreError:
5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from
192.168.16.110
Nov 24 01:43:08 sadosrd25 LustreError:
5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.100
Nov 24 01:43:11 sadosrd25 LustreError:
5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from
192.168.16.122
Error Number:
/usr/include/asm-generic/errno-base.h:#define EAGAIN 11 /* Try
again */
/usr/include/asm-generic/errno.h:#define ECONNRESET 104 /*
Connection reset by peer */
They seem to be related to heavy network traffic to and from this OST.
Network driver e1000.
lustre-1.6.6
vanilla 2.6.22.19
What triggers such messages ?
Anything to worry about ?
Thanks and Regards
Heiko
Network Adapter Statistics of the above Raids.
sadosrd21 ~ # ethtool -S eth0
NIC statistics:
rx_packets: 3476732178
tx_packets: 8161698729
rx_bytes: 1261677735249
tx_bytes: 11684960617899
rx_broadcast: 96324977
tx_broadcast: 31080
rx_multicast: 885
tx_multicast: 12
rx_errors: 0
tx_errors: 0
tx_dropped: 0
multicast: 885
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 112425
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 485691240
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
tx_restart_queue: 202994789
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 2220028952
tx_tcp_seg_failed: 0
rx_flow_control_xon: 926991076
rx_flow_control_xoff: 2476536244
tx_flow_control_xon: 3754
tx_flow_control_xoff: 6876
rx_long_byte_count: 1261677735249
rx_csum_offload_good: 3415421552
rx_csum_offload_errors: 1134
rx_header_split: 0
alloc_rx_buff_failed: 0
tx_smbus: 0
rx_smbus: 53162812
dropped_smbus: 0
sadosrd24 ~ # ethtool -S eth0
NIC statistics:
rx_packets: 4090343679
tx_packets: 2636690225
rx_bytes: 5479498759229
tx_bytes: 2039673228907
rx_broadcast: 32078587
tx_broadcast: 28901
rx_multicast: 316
tx_multicast: 6
rx_errors: 0
tx_errors: 0
tx_dropped: 0
multicast: 316
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_no_buffer_count: 11278
rx_missed_errors: 78171
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 194098104
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
tx_restart_queue: 68502186
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 410577015
tx_tcp_seg_failed: 0
rx_flow_control_xon: 234761468
rx_flow_control_xoff: 1632413652
tx_flow_control_xon: 1516
tx_flow_control_xoff: 2889
rx_long_byte_count: 5479498759229
rx_csum_offload_good: 4067175471
rx_csum_offload_errors: 0
rx_header_split: 0
alloc_rx_buff_failed: 0
tx_smbus: 0
rx_smbus: 20807887
dropped_smbus: 0
sadosrd25 ~ # ethtool -S eth0
NIC statistics:
rx_packets: 4305347487
tx_packets: 3031165604
rx_bytes: 5797498509449
tx_bytes: 2043989105691
rx_broadcast: 37618726
tx_broadcast: 28310
rx_multicast: 386
tx_multicast: 6
rx_errors: 0
tx_errors: 0
tx_dropped: 0
multicast: 386
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_no_buffer_count: 4738
rx_missed_errors: 223116
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 156915562
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
tx_restart_queue: 50086469
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 396787000
tx_tcp_seg_failed: 0
rx_flow_control_xon: 184756690
rx_flow_control_xoff: 1346260879
tx_flow_control_xon: 7451
tx_flow_control_xoff: 13175
rx_long_byte_count: 5797498509449
rx_csum_offload_good: 4277898711
rx_csum_offload_errors: 0
rx_header_split: 0
alloc_rx_buff_failed: 0
tx_smbus: 0
rx_smbus: 24585106
dropped_smbus: 0
Hi This likely to tcp stack tuning. Possible OSS node not have enough free sockets for connect. On Tue, 2009-11-24 at 09:35 +0100, Heiko Schr?ter wrote:> Hello, > > on three of eight OSTs i can see sporadic messages like these: > > sadosrd21 > Nov 24 09:11:52 sadosrd21 LustreError: 5518:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.133 > Nov 24 09:12:01 sadosrd21 LustreError: 5516:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19 > sadosrd24 > Nov 21 01:42:13 sadosrd24 LustreError: 9097:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111 > Nov 21 01:42:13 sadosrd24 LustreError: 9098:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114 > Nov 22 04:01:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.116 > Nov 23 01:42:16 sadosrd24 LustreError: 9099:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.34 > Nov 23 01:42:27 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.34 > Nov 23 01:42:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.116 > sadosrd25 > Nov 22 04:02:06 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19 > Nov 23 04:00:53 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114 > Nov 23 04:01:01 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.115 > Nov 23 04:01:02 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.109 > Nov 23 09:12:57 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111 > Nov 24 01:41:40 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.110 > Nov 24 01:42:57 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111 > Nov 24 01:43:03 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.110 > Nov 24 01:43:08 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.100 > Nov 24 01:43:11 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.122 > > Error Number: > /usr/include/asm-generic/errno-base.h:#define EAGAIN 11 /* Try again */ > /usr/include/asm-generic/errno.h:#define ECONNRESET 104 /* Connection reset by peer */ > > They seem to be related to heavy network traffic to and from this OST. > Network driver e1000. >