Hello, on three of eight OSTs i can see sporadic messages like these: sadosrd21 Nov 24 09:11:52 sadosrd21 LustreError: 5518:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.133 Nov 24 09:12:01 sadosrd21 LustreError: 5516:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19 sadosrd24 Nov 21 01:42:13 sadosrd24 LustreError: 9097:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111 Nov 21 01:42:13 sadosrd24 LustreError: 9098:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114 Nov 22 04:01:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.116 Nov 23 01:42:16 sadosrd24 LustreError: 9099:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.34 Nov 23 01:42:27 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.34 Nov 23 01:42:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.116 sadosrd25 Nov 22 04:02:06 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19 Nov 23 04:00:53 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114 Nov 23 04:01:01 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.115 Nov 23 04:01:02 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.109 Nov 23 09:12:57 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111 Nov 24 01:41:40 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.110 Nov 24 01:42:57 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111 Nov 24 01:43:03 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.110 Nov 24 01:43:08 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.100 Nov 24 01:43:11 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.122 Error Number: /usr/include/asm-generic/errno-base.h:#define EAGAIN 11 /* Try again */ /usr/include/asm-generic/errno.h:#define ECONNRESET 104 /* Connection reset by peer */ They seem to be related to heavy network traffic to and from this OST. Network driver e1000. lustre-1.6.6 vanilla 2.6.22.19 What triggers such messages ? Anything to worry about ? Thanks and Regards Heiko Network Adapter Statistics of the above Raids. sadosrd21 ~ # ethtool -S eth0 NIC statistics: rx_packets: 3476732178 tx_packets: 8161698729 rx_bytes: 1261677735249 tx_bytes: 11684960617899 rx_broadcast: 96324977 tx_broadcast: 31080 rx_multicast: 885 tx_multicast: 12 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 885 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 0 rx_missed_errors: 112425 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 485691240 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 202994789 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 2220028952 tx_tcp_seg_failed: 0 rx_flow_control_xon: 926991076 rx_flow_control_xoff: 2476536244 tx_flow_control_xon: 3754 tx_flow_control_xoff: 6876 rx_long_byte_count: 1261677735249 rx_csum_offload_good: 3415421552 rx_csum_offload_errors: 1134 rx_header_split: 0 alloc_rx_buff_failed: 0 tx_smbus: 0 rx_smbus: 53162812 dropped_smbus: 0 sadosrd24 ~ # ethtool -S eth0 NIC statistics: rx_packets: 4090343679 tx_packets: 2636690225 rx_bytes: 5479498759229 tx_bytes: 2039673228907 rx_broadcast: 32078587 tx_broadcast: 28901 rx_multicast: 316 tx_multicast: 6 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 316 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 11278 rx_missed_errors: 78171 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 194098104 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 68502186 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 410577015 tx_tcp_seg_failed: 0 rx_flow_control_xon: 234761468 rx_flow_control_xoff: 1632413652 tx_flow_control_xon: 1516 tx_flow_control_xoff: 2889 rx_long_byte_count: 5479498759229 rx_csum_offload_good: 4067175471 rx_csum_offload_errors: 0 rx_header_split: 0 alloc_rx_buff_failed: 0 tx_smbus: 0 rx_smbus: 20807887 dropped_smbus: 0 sadosrd25 ~ # ethtool -S eth0 NIC statistics: rx_packets: 4305347487 tx_packets: 3031165604 rx_bytes: 5797498509449 tx_bytes: 2043989105691 rx_broadcast: 37618726 tx_broadcast: 28310 rx_multicast: 386 tx_multicast: 6 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 386 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 4738 rx_missed_errors: 223116 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 156915562 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 50086469 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 396787000 tx_tcp_seg_failed: 0 rx_flow_control_xon: 184756690 rx_flow_control_xoff: 1346260879 tx_flow_control_xon: 7451 tx_flow_control_xoff: 13175 rx_long_byte_count: 5797498509449 rx_csum_offload_good: 4277898711 rx_csum_offload_errors: 0 rx_header_split: 0 alloc_rx_buff_failed: 0 tx_smbus: 0 rx_smbus: 24585106 dropped_smbus: 0
Hi This likely to tcp stack tuning. Possible OSS node not have enough free sockets for connect. On Tue, 2009-11-24 at 09:35 +0100, Heiko Schr?ter wrote:> Hello, > > on three of eight OSTs i can see sporadic messages like these: > > sadosrd21 > Nov 24 09:11:52 sadosrd21 LustreError: 5518:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.133 > Nov 24 09:12:01 sadosrd21 LustreError: 5516:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19 > sadosrd24 > Nov 21 01:42:13 sadosrd24 LustreError: 9097:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111 > Nov 21 01:42:13 sadosrd24 LustreError: 9098:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114 > Nov 22 04:01:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.116 > Nov 23 01:42:16 sadosrd24 LustreError: 9099:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.34 > Nov 23 01:42:27 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.34 > Nov 23 01:42:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.116 > sadosrd25 > Nov 22 04:02:06 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19 > Nov 23 04:00:53 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114 > Nov 23 04:01:01 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.115 > Nov 23 04:01:02 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.109 > Nov 23 09:12:57 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111 > Nov 24 01:41:40 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.110 > Nov 24 01:42:57 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111 > Nov 24 01:43:03 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.110 > Nov 24 01:43:08 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.100 > Nov 24 01:43:11 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.122 > > Error Number: > /usr/include/asm-generic/errno-base.h:#define EAGAIN 11 /* Try again */ > /usr/include/asm-generic/errno.h:#define ECONNRESET 104 /* Connection reset by peer */ > > They seem to be related to heavy network traffic to and from this OST. > Network driver e1000. >