Hi,
I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to
stable
if told to, but want to check here first), ZFS and HAST. HAST is configured to
run on top of zvols configured on each host, as illustrated:
FS FS
+------+ +------+
| hvol | <---- hastd -----> | hvol |
+------+ +------+
| zvol | | zvol |
+------+ +------+
| zfs | | zfs |
+------+ +------+
h1 h2
Connection is gigabit to the same switch. No issues with large TCP
transfers such as SCP/FTP.
Config is vanilla:
# zfs create -V 10G zfs/hvol
hast.conf:
resource hvol {
on h1 {
local /dev/zvol/zfs/hvol
remote tcp4://192.168.1.100
}
on h2 {
local /dev/zvol/zfs/hvol
remote tcp4://192.168.1.200
}
}
h1 is behaving fine as primary, either with h2 turned off or in init -
but as soon as I set the role to secondary for h2, the receiver
repeatedly crashes and restarts - see the traces below.
I've seen
http://lists.freebsd.org/pipermail/freebsd-current/2011-May/024871.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2012-01/msg00510.html
... but in the first case the fix is in 9 since last year, and the second
is referring to async replication - I'm using the default (fullsync).
hastctl status on the primary shows the dirty size diminishing slowly,
but obviously this isn't optimal (and causes freezes on I/O to the primary
hvol, causing all kinds of issues with the consumers of the hvol).
Any idea ? Am I doing something wrong ?
Primary:
Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from
tcp4://192.168.1.200.
Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization
data: Cannot allocate memory.
Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot
allocate memory): WRITE(31642091520, 131072).
Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Disconnected from
tcp4://192.168.1.200.
Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization
data: Cannot allocate memory.
Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot
allocate memory): WRITE(31649693696, 131072).
Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Disconnected from
tcp4://192.168.1.200.
Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization
data: Cannot allocate memory.
Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot
allocate memory): WRITE(31691243520, 131072).
Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Disconnected from
tcp4://192.168.1.200.
Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization
data: Cannot allocate memory.
Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot
allocate memory): WRITE(31783256064, 131072).
Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Disconnected from
tcp4://192.168.1.200.
Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization
data: Cannot allocate memory.
Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot
allocate memory): WRITE(31782731776, 131072).
Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Disconnected from
tcp4://192.168.1.200.
Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization
data: Cannot allocate memory.
Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot
allocate memory): WRITE(31803441152, 131072).
Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Disconnected from
tcp4://192.168.1.200.
Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization
data: Cannot allocate memory.
Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot
allocate memory): WRITE(31881953280, 131072).
Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Disconnected from
tcp4://192.168.1.200.
Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization
data: Cannot allocate memory.
Secondary:
Mar 11 01:01:30 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2874, exitcode=75).
Mar 11 01:01:38 h2 hastd[2875]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:01:44 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2875, exitcode=75).
Mar 11 01:01:45 h2 hastd[2876]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:01:50 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2876, exitcode=75).
Mar 11 01:01:56 h2 hastd[2877]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:02:01 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2877, exitcode=75).
Mar 11 01:02:05 h2 hastd[2878]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:02:11 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2878, exitcode=75).
Mar 11 01:02:15 h2 hastd[2879]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:02:20 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2879, exitcode=75).
Mar 11 01:02:30 h2 hastd[2880]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:02:34 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2880, exitcode=75).
Mar 11 01:02:41 h2 hastd[2881]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:02:47 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2881, exitcode=75).
Mar 11 01:02:48 h2 hastd[2882]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:02:54 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2882, exitcode=75).
Mar 11 01:02:59 h2 hastd[2883]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:03:04 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2883, exitcode=75).
Mar 11 01:03:13 h2 hastd[2884]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:03:17 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2884, exitcode=75).
Mar 11 01:03:18 h2 hastd[2885]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:03:23 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2885, exitcode=75).
Mar 11 01:03:28 h2 hastd[2886]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:03:33 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2886, exitcode=75).
Mar 11 01:03:42 h2 hastd[2887]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.
Mar 11 01:03:48 h2 hastd[2506]: [hvol] (secondary) Worker process exited
ungracefully (pid=2887, exitcode=75).
Mar 11 01:03:48 h2 hastd[2888]: [hvol] (secondary) Unable to receive request
header: Socket is not connected.