thr3ads.net - Gluster users - [Gluster-users] slow write to non-hosted replica in distributed-replicated volume [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Rowley, Shane K

2012-Oct-17 20:47 UTC

[Gluster-users] slow write to non-hosted replica in distributed-replicated volume

I have four servers, absolutely identical, connected to the same switches. One
interface is on a 100Mb switch, the other is on a 1Gb switch. I access the nodes
via the 100Mb port, gluster is configured on the 1Gb port. The nodes are all
loaded with Scientific Linux 6.3, Virtualization Host, with glusterfs-3.2.7 from
EPEL. The nodes are 2-socket quad-core AMD (so 8 cores total) servers with 6x
300GB internal drives. I'm using LVM on top of h/w RAID0, and have a 1.5TB
xfs brick on each node. I have libvirtd running, but no VMs created yet.

I initially configured each pair of servers as a separate cluster with a 1x2
replicated volume. Mount the volumes as glusterfs from localhost, and dd tests
gives me ~90MB/s.. pretty decent for 1Gb network (max 125MB/s). So, tear that
all down, and join all four nodes together, and create a 2x2
distributed-replicated volume. Now is where it gets interesting. First node dd
test is consistent. Second node dd test is half-speed. Third node dd test is
back to full speed. Fourth node dd test is back to half-speed. So when I look in
the bricks directly, I see that the nodes that were slower had their file in a
brick that was not part of the replica they were hosting.

For example..
gluster volume create vol1 replica 2 transport tcp server1:/brick1
server2:/brick2 server3:/brick3 server4:/brick4

server1:/brick1 and server2:/brick2 are the first replica pair
server3:/brick3 and server4:/brick4 are the second replica pair

server1.. file1 goes into brick1/brick2 - fast
server2.. file2 goes into brick3/brick4 - slow
server3.. file3 goes into brick3/brick4 - fast
server4.. file4 goes into brick1/brick2 - slow

So I delete that volume, and create another..
gluster volume create vol2 replica 2 transport tcp server2:/brick2
server3:/brick3 server4:/brick4 server1:/brick1

server2:/brick2 and server3:/brick3 are the first replica pair
server4:/brick4 and server1:/brick1 are the second replica pair

server2.. file2 goes into brick2/brick3 - fast
server3.. file3 goes into brick2/brick3 - fast
server4.. file4 goes into brick4/brick1 - fast
server1.. file1 goes into brick4/brick1 - fast

So now I'm like seriously WTF. So I remove all output files, and try four
consecutive tests from the same node with output file1, file2, file3, file4. And
sure enough two of them are fast and two are slow, and the fast ones are placed
in "its" replica pair and the slow ones are in the other. And I notice
that every time I delete them, the files get created in the same replica pair
each time, no matter what order I create them. I've tried this with nfs
mounts also (instead of glusterfs), and the results are the same.

Has anyone seen this behavior before? Is this a known issue or
mis-configuration?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121017/5cd555d0/attachment.html>

Bryan Whitehead

2012-Oct-22 19:04 UTC

head link

[Gluster-users] slow write to non-hosted replica in distributed-replicated volume

> gluster volume create vol1 replica 2 transport tcp server1:/brick1
server2:/brick2 server3:/brick3 server4:/brick4
>
> server1:/brick1 and server2:/brick2 are the first replica pair
>
> server3:/brick3 and server4:/brick4 are the second replica pair
>
> server1.. file1 goes into brick1/brick2 - fast
This is fast because 1 copy of the file goes to brick1(local) and 1
copy goes to brick4(remote). So the 1 remote copy gets the full
bandwidth of the nic the other is just local so nic isn't hit.
> server2.. file2 goes into brick3/brick4 - slow
This is slow because 1 copy of the file goes to brick3(remote) and 1
copy goes to brick4(remote). Transferring to 2 remote bricks at the
same time will max out your rate to only ~50MB/sec instead of the full
100MB/sec because you have 2 streams.
> server3.. file3 goes into brick3/brick4 - fast
>
> server4.. file4 goes into brick1/brick2 - slow
> So I delete that volume, and create another..
>
> gluster volume create vol2 replica 2 transport tcp server2:/brick2
server3:/brick3 server4:/brick4 server1:/brick1
All this will do is change the order where the filenames end up
landing. if you keep creating files you should get a even
distributions - so this will not change anything in the long run.
Eventually you'll have serverX writing files to brickY and brickZ that
are not local.

Gluster is working as intended from the description you are giving.

Gluster users - Oct 2012 - slow write to non-hosted replica in distributed-replicated volume

[Gluster-users] slow write to non-hosted replica in distributed-replicated volume

[Gluster-users] slow write to non-hosted replica in distributed-replicated volume