thr3ads.net - freebsd stable - hast vs ggate+gmirror sychrnoisation speed [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Pete French

2010-Oct-21 12:25 UTC

hast vs ggate+gmirror sychrnoisation speed

Well, I bit the bullet and moved to using hast - all went beautifully,
and I migrated the pool with no downtime. The one thing I do notice,
however, is that the synchronisation with hast is much slower
than the older ggate+gmirror combination. It's about half the
speed in fact.

When I orginaly setup my ggate configuration I did a lot of tweaks to
get the speed good - these copnsisted of expanding the send and
receive space for the sockets using sysctl.conf, and then providing
large buffers to ggate. Is there a way to control this with hast ?
I still have the sysctls set (as the machines have not rebooted)
but I cant see any options in hast.conf which are equivalent to the
"-S 262144 -R 262144" which I use with ggate

Any advice, or am I barking up the wrong tree here ?

cheers,

-pete.

Mikolaj Golub

2010-Oct-22 14:51 UTC

head link

hast vs ggate+gmirror sychrnoisation speed

On Thu, 21 Oct 2010 13:25:34 +0100 Pete French wrote:

 PF> Well, I bit the bullet and moved to using hast - all went beautifully,
 PF> and I migrated the pool with no downtime. The one thing I do notice,
 PF> however, is that the synchronisation with hast is much slower
 PF> than the older ggate+gmirror combination. It's about half the
 PF> speed in fact.

 PF> When I orginaly setup my ggate configuration I did a lot of tweaks to
 PF> get the speed good - these copnsisted of expanding the send and
 PF> receive space for the sockets using sysctl.conf, and then providing
 PF> large buffers to ggate. Is there a way to control this with hast ?
 PF> I still have the sysctls set (as the machines have not rebooted)
 PF> but I cant see any options in hast.conf which are equivalent to the
 PF> "-S 262144 -R 262144" which I use with ggate

 PF> Any advice, or am I barking up the wrong tree here ?

Currently there are no options in hast.conf to change send and receive buffer
size. They are hardcoded in sbin/hastd/proto_tcp4.c:

        val = 131072;
        if (setsockopt(tctx->tc_fd, SOL_SOCKET, SO_SNDBUF, &val,
            sizeof(val)) == -1) {
                pjdlog_warning("Unable to set send buffer size on %s",
addr);
        }
        val = 131072;
        if (setsockopt(tctx->tc_fd, SOL_SOCKET, SO_RCVBUF, &val,
            sizeof(val)) == -1) {
                pjdlog_warning("Unable to set receive buffer size on
%s", addr);
        }

You could change the values and recompile hastd :-). It would be interesting
to know about the results of your experiment (if you do).

Also note there is another hardcoded value in sbin/hastd/proto_common.c

 /* Maximum size of packet we want to use when sending data. */
#define MAX_SEND_SIZE   32768

that looks like might affect synchronization speed too. Previously we had 128kB
here but this has been changed to 32Kb because it was reported about slow
synchronization with MAX_SEND_SIZE=128kB.

http://svn.freebsd.org/viewvc/base?view=revision&revision=211452

I wonder couldn't slow synchronization with MAX_SEND_SIZE=131072 be due to
SO_SNDBUF/SO_RCVBUF be equal to this size? May be increasing
SO_SNDBUF/SO_RCVBUF we could reach better performance with
MAX_SEND_SIZE=128kB?

-- 
Mikolaj Golub

Pete French

2010-Oct-26 15:30 UTC

head link

hast vs ggate+gmirror sychrnoisation speed

> You can check if the queue size is an issue monitoring with netstat Recv-Q
and
> Send-Q for hastd connections during the test. Running something like below:
>
> while sleep 1; do netstat -na |grep '\.8457.*ESTAB'; done
Interesting - I ran those and started a complete resilvert (I do
this by changing the secindary to 'init', running 'create' and
then
changing the role back to secondary)

On primary I get...

tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0  29872 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0    115 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0  80928 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0  32883 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0    115 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10062       10.17.18.2.8457        ESTABLISHED
tcp4       0      0 10.17.18.1.10061       10.17.18.2.8457        ESTABLISHED

And on the secondary....

tcp4       0     27 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0     27 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0     27 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0     27 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4  105544      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4    8688      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4   84360      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4  102648      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0     27 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4   17376      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4   64088      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0     27 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0     27 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4   34216      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0     27 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED
tcp4       0      0 10.17.18.2.8457        10.17.18.1.10061       ESTABLISHED
tcp4       0     27 10.17.18.2.8457        10.17.18.1.10062       ESTABLISHED

Thats just an example - I see the same kind of behaviour throughout the
sychronisation process. I cant comopare it to gmirror+ggated, but it looks
far more bursty than I would expect.

-pete.

Pete French

2010-Oct-26 16:01 UTC

head link

hast vs ggate+gmirror sychrnoisation speed

Actually, I just llooked I dmesg on the secondary - it is full
of messages thus:

Oct 26 15:44:59 serpentine-passive hastd[10394]: [serp0] (secondary) Unable to
receive request header: RPC version wrong.
Oct 26 15:45:00 serpentine-passive hastd[782]: [serp0] (secondary) Worker
process exited ungracefully (pid=10394, exitcode=75).
Oct 26 15:46:59 serpentine-passive hastd[10421]: [serp0] (secondary) Unable to
receive request header: RPC version wrong.
Oct 26 15:47:04 serpentine-passive hastd[782]: [serp0] (secondary) Worker
process exited ungracefully (pid=10421, exitcode=75).


Does that help explain my issues ? I have the same OS build running on both
machines, so I dont see how I can have a version mismatch.

The ethernet here cnsists of a pair of bge devices, which are bundled
using LACP and lagg. I didnt see this on my test setup, but that
was using ethernet directly - could there be a difference there ?

-pete.

Pete French

2010-Nov-07 23:52 UTC

head link

hast vs ggate+gmirror sychrnoisation speed

Just to report back on this - I just tried the patches from last week,
which fixed the sending of the keepalives in the different
thread, but my original issue (the sychronisation speed) remains
I'm afraid - so much for the theory that the corruption was causing
the speed decrease. It's obviously good to have the threading issue
fixed though.

-pete.

freebsd stable - Oct 2010 - hast vs ggate+gmirror sychrnoisation speed

hast vs ggate+gmirror sychrnoisation speed

hast vs ggate+gmirror sychrnoisation speed

hast vs ggate+gmirror sychrnoisation speed

hast vs ggate+gmirror sychrnoisation speed

hast vs ggate+gmirror sychrnoisation speed