Rowley, Shane K
2012-Oct-17 20:47 UTC
[Gluster-users] slow write to non-hosted replica in distributed-replicated volume
I have four servers, absolutely identical, connected to the same switches. One interface is on a 100Mb switch, the other is on a 1Gb switch. I access the nodes via the 100Mb port, gluster is configured on the 1Gb port. The nodes are all loaded with Scientific Linux 6.3, Virtualization Host, with glusterfs-3.2.7 from EPEL. The nodes are 2-socket quad-core AMD (so 8 cores total) servers with 6x 300GB internal drives. I'm using LVM on top of h/w RAID0, and have a 1.5TB xfs brick on each node. I have libvirtd running, but no VMs created yet. I initially configured each pair of servers as a separate cluster with a 1x2 replicated volume. Mount the volumes as glusterfs from localhost, and dd tests gives me ~90MB/s.. pretty decent for 1Gb network (max 125MB/s). So, tear that all down, and join all four nodes together, and create a 2x2 distributed-replicated volume. Now is where it gets interesting. First node dd test is consistent. Second node dd test is half-speed. Third node dd test is back to full speed. Fourth node dd test is back to half-speed. So when I look in the bricks directly, I see that the nodes that were slower had their file in a brick that was not part of the replica they were hosting. For example.. gluster volume create vol1 replica 2 transport tcp server1:/brick1 server2:/brick2 server3:/brick3 server4:/brick4 server1:/brick1 and server2:/brick2 are the first replica pair server3:/brick3 and server4:/brick4 are the second replica pair server1.. file1 goes into brick1/brick2 - fast server2.. file2 goes into brick3/brick4 - slow server3.. file3 goes into brick3/brick4 - fast server4.. file4 goes into brick1/brick2 - slow So I delete that volume, and create another.. gluster volume create vol2 replica 2 transport tcp server2:/brick2 server3:/brick3 server4:/brick4 server1:/brick1 server2:/brick2 and server3:/brick3 are the first replica pair server4:/brick4 and server1:/brick1 are the second replica pair server2.. file2 goes into brick2/brick3 - fast server3.. file3 goes into brick2/brick3 - fast server4.. file4 goes into brick4/brick1 - fast server1.. file1 goes into brick4/brick1 - fast So now I'm like seriously WTF. So I remove all output files, and try four consecutive tests from the same node with output file1, file2, file3, file4. And sure enough two of them are fast and two are slow, and the fast ones are placed in "its" replica pair and the slow ones are in the other. And I notice that every time I delete them, the files get created in the same replica pair each time, no matter what order I create them. I've tried this with nfs mounts also (instead of glusterfs), and the results are the same. Has anyone seen this behavior before? Is this a known issue or mis-configuration? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121017/5cd555d0/attachment.html>
Bryan Whitehead
2012-Oct-22 19:04 UTC
[Gluster-users] slow write to non-hosted replica in distributed-replicated volume
> gluster volume create vol1 replica 2 transport tcp server1:/brick1 server2:/brick2 server3:/brick3 server4:/brick4 > > server1:/brick1 and server2:/brick2 are the first replica pair > > server3:/brick3 and server4:/brick4 are the second replica pair > > server1.. file1 goes into brick1/brick2 - fastThis is fast because 1 copy of the file goes to brick1(local) and 1 copy goes to brick4(remote). So the 1 remote copy gets the full bandwidth of the nic the other is just local so nic isn't hit.> server2.. file2 goes into brick3/brick4 - slowThis is slow because 1 copy of the file goes to brick3(remote) and 1 copy goes to brick4(remote). Transferring to 2 remote bricks at the same time will max out your rate to only ~50MB/sec instead of the full 100MB/sec because you have 2 streams.> server3.. file3 goes into brick3/brick4 - fast > > server4.. file4 goes into brick1/brick2 - slow> So I delete that volume, and create another.. > > gluster volume create vol2 replica 2 transport tcp server2:/brick2 server3:/brick3 server4:/brick4 server1:/brick1All this will do is change the order where the filenames end up landing. if you keep creating files you should get a even distributions - so this will not change anything in the long run. Eventually you'll have serverX writing files to brickY and brickZ that are not local. Gluster is working as intended from the description you are giving.