I am trying to get a basic HAST setup working on 8-stable (as of today). Hardware is two supermicro blades, each with 2x Xeon E5620 processors, 48GB RAM, integrated LSI2008 controller, two 600GB SAS2 Toshiba drives, two Intel gigabit interfaces and two Intel 10Gbit interfaces. On each of the drives there is an GPT partition intended to be used by HAST. Each host thus has two HAST resources, data0 and data1 respectively. HAST is run over the 10Gbit interfaces, connected via the blade chasis 10Gbit switch. /etc/hast.conf is resource data0 { on b1a { local /dev/gpt/data0 remote 10.2.101.12 } on b1b { local /dev/gpt/data0 remote 10.2.101.11 } } resource data1 { on b1a { local /dev/gpt/data1 remote 10.2.101.12 } on b1b { local /dev/gpt/data1 remote 10.2.101.11 } } On top of data0 and data1 I run ZFS mirror, although this doesn't seem to be relevant here. What I am observing is very jumpy performance, both nodes often disconnect with primary: May 29 13:06:33 b1b hastd[2372]: [data0] (primary) Unable to receive reply header: Socket is not connected. May 29 13:06:33 b1b hastd[2372]: [data0] (primary) Unable to send request (Broken pipe): WRITE(60470853632, 131072). May 29 13:06:33 b1b hastd[2372]: [data0] (primary) Disconnected from 10.2.101.11. May 29 13:06:33 b1b hastd[2372]: [data0] (primary) Unable to write synchronization data: Socket is not connected. on secondary: May 29 03:03:14 b1a hastd[28357]: [data1] (secondary) Unable to receive request header: RPC version wrong. May 29 03:03:19 b1a hastd[11659]: [data1] (secondary) Worker process exited ungracefully (pid=28357, exitcode=75). May 29 03:05:31 b1a hastd[35535]: [data0] (secondary) Unable to receive request header: RPC version wrong. May 29 03:05:36 b1a hastd[11659]: [data0] (secondary) Worker process exited ungracefully (pid=35535, exitcode=75). When it works, replication rate observed with 'systat -if' is over 140MB/sec (perhaps limited by drives write troughput) The only reference to this error messages I found in http://lists.freebsd.org/pipermail/freebsd-stable/2010-November/059817.html, and that thread indicated the fix was commited. About the only tuning these machines have is to set kern.ipc.nmbclusters=51200, because with the default values 10Gbit interfaces would not work and anyway the system would run out of mbufs. Has anyone observed something similar? Any ideas how to fix it? Daniel
Some further investigation: The HAST nodes do not disconnect when checksum is enabled (either crc32 or sha256). One strange thing is that there is never established TCP connection between both nodes: tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 FIN_WAIT_2 tcp4 0 1288 10.2.101.11.57008 10.2.101.12.8457 CLOSE_WAIT tcp4 0 0 10.2.101.11.46346 10.2.101.12.8457 FIN_WAIT_2 tcp4 0 90648 10.2.101.11.13916 10.2.101.12.8457 CLOSE_WAIT tcp4 0 0 10.2.101.11.8457 *.* LISTEN When using sha256 one CPU core is 100% utilized by each hastd process, while 70-80MB/sec per HAST resource is being transferred (total of up to 140 MB/sec traffic for both); When using crc32 each CPU core is at 22% utilization; When using none as checksum, CPU usage is under 10% Eventually after many hours, got corrupted communication: May 30 17:32:35 b1b hastd[9827]: [data0] (secondary) Hash mismatch. May 30 17:32:35 b1b hastd[9827]: [data0] (secondary) Unable to receive request data: No such file or directory. May 30 17:32:38 b1b hastd[9397]: [data0] (secondary) Worker process exited ungracefully (pid=9827, exitcode=75). and May 30 17:32:27 b1a hastd[1837]: [data0] (primary) Unable to receive reply header: Operation timed out. May 30 17:32:30 b1a hastd[1837]: [data0] (primary) Disconnected from 10.2.101.12. May 30 17:32:30 b1a hastd[1837]: [data0] (primary) Unable to send request (Broken pipe): WRITE(99128470016, 131072). Daniel
Well, apparently my HAST joy was short. On a second run, I got stuck with Jun 3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive reply header: Operation timed out. on the primary. No messages on the secondary. On primary: # netstat -an | grep 8457 tcp4 0 0 10.2.101.11.42659 10.2.101.12.8457 FIN_WAIT_2 tcp4 0 0 10.2.101.11.62058 10.2.101.12.8457 CLOSE_WAIT tcp4 0 0 10.2.101.11.34646 10.2.101.12.8457 FIN_WAIT_2 tcp4 0 0 10.2.101.11.11419 10.2.101.12.8457 CLOSE_WAIT tcp4 0 0 10.2.101.11.37773 10.2.101.12.8457 FIN_WAIT_2 tcp4 0 0 10.2.101.11.21911 10.2.101.12.8457 FIN_WAIT_2 tcp4 0 0 10.2.101.11.40169 10.2.101.12.8457 CLOSE_WAIT tcp4 0 97749 10.2.101.11.44360 10.2.101.12.8457 CLOSE_WAIT tcp4 0 0 10.2.101.11.8457 *.* LISTEN on secondary # netstat -an | grep 8457 tcp4 0 0 10.2.101.12.8457 10.2.101.11.42659 CLOSE_WAIT tcp4 0 0 10.2.101.12.8457 10.2.101.11.62058 FIN_WAIT_2 tcp4 0 0 10.2.101.12.8457 10.2.101.11.34646 CLOSE_WAIT tcp4 0 0 10.2.101.12.8457 10.2.101.11.11419 FIN_WAIT_2 tcp4 0 0 10.2.101.12.8457 10.2.101.11.37773 CLOSE_WAIT tcp4 0 0 10.2.101.12.8457 10.2.101.11.21911 CLOSE_WAIT tcp4 0 0 10.2.101.12.8457 10.2.101.11.40169 FIN_WAIT_2 tcp4 66415 0 10.2.101.12.8457 10.2.101.11.44360 FIN_WAIT_2 tcp4 0 0 10.2.101.12.8457 *.* LISTEN on primary # hastctl status data0: role: primary provname: data0 localpath: /dev/gpt/data0 extentsize: 2097152 (2.0MB) keepdirty: 64 remoteaddr: 10.2.101.12 sourceaddr: 10.2.101.11 replication: fullsync status: complete dirty: 0 (0B) data1: role: primary provname: data1 localpath: /dev/gpt/data1 extentsize: 2097152 (2.0MB) keepdirty: 64 remoteaddr: 10.2.101.12 sourceaddr: 10.2.101.11 replication: fullsync status: complete dirty: 0 (0B) data2: role: primary provname: data2 localpath: /dev/gpt/data2 extentsize: 2097152 (2.0MB) keepdirty: 64 remoteaddr: 10.2.101.12 sourceaddr: 10.2.101.11 replication: fullsync status: complete dirty: 6291456 (6.0MB) data3: role: primary provname: data3 localpath: /dev/gpt/data3 extentsize: 2097152 (2.0MB) keepdirty: 64 remoteaddr: 10.2.101.12 sourceaddr: 10.2.101.11 replication: fullsync status: complete dirty: 0 (0B) Sits in this state for over 10 minutes. Unfortunately, no KDB in kernel. Any ideas what other to look for? Daniel