It's been a while since I did some gluster replication testing, so I spun up a quick cluster *cough, plug* using puppet-gluster+vagrant (of course) and here are my results. * Setup is a 2x2 distributed-replicated cluster * Hosts are named: annex{1..4} * Volume name is 'puppet' * Client vm's mount (fuse) the volume. * On the client: # cd /mnt/gluster/puppet/ # dd if=/dev/urandom of=random.51200 count=51200 # sha1sum random.51200 # rsync -v --bwlimit=10 --progress random.51200 root at localhost:/tmp * This gives me about an hour to mess with the bricks... * By looking on the hosts directly, I see that the random.51200 file is on annex3 and annex4... * On annex3: # poweroff [host shuts down...] * On client1: # time ls random.51200 real 0m42.705s user 0m0.001s sys 0m0.002s [hangs for about 42 seconds, and then returns successfully...] * I then powerup annex3, and then pull the plug on annex4. The same sort of thing happens... It hangs for 42 seconds, but then everything works as normal. This is of course the cluster timeout value and the answer to life the universe and everything. Question: Why doesn't glusterfs automatically flip over to using the other available host right away? If you agree, I'll report this as a bug. If there's a way to do this, let me know. Apart from the delay, glad that this is of course still HA ;) Cheers, James @purpleidea (twitter/irc) https://ttboj.wordpress.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140210/3a4d04a0/attachment.sig>
Krishnan Parthasarathi
2014-Feb-11 06:27 UTC
[Gluster-users] [Gluster-devel] Testing replication and HA
James, Could you provide the logs of the mount process, where you see the hang for 42s? My initial guess, seeing 42s, is that the client translator's ping timeout is in play. I would encourage you to report a bug and attach relevant logs. If the issue (observed) turns out to be an acceptable/explicable behavioural quirk of glusterfs, then we could close the bug :-) cheers, Krish ----- Original Message -----> It's been a while since I did some gluster replication testing, so I > spun up a quick cluster *cough, plug* using puppet-gluster+vagrant (of > course) and here are my results. > > * Setup is a 2x2 distributed-replicated cluster > * Hosts are named: annex{1..4} > * Volume name is 'puppet' > * Client vm's mount (fuse) the volume. > > * On the client: > > # cd /mnt/gluster/puppet/ > # dd if=/dev/urandom of=random.51200 count=51200 > # sha1sum random.51200 > # rsync -v --bwlimit=10 --progress random.51200 root at localhost:/tmp > > * This gives me about an hour to mess with the bricks... > * By looking on the hosts directly, I see that the random.51200 file is > on annex3 and annex4... > > * On annex3: > # poweroff > [host shuts down...] > > * On client1: > # time ls > random.51200 > > real 0m42.705s > user 0m0.001s > sys 0m0.002s > > [hangs for about 42 seconds, and then returns successfully...] > > * I then powerup annex3, and then pull the plug on annex4. The same sort > of thing happens... It hangs for 42 seconds, but then everything works > as normal. This is of course the cluster timeout value and the answer to > life the universe and everything. > > Question: Why doesn't glusterfs automatically flip over to using the > other available host right away? If you agree, I'll report this as a > bug. If there's a way to do this, let me know. > > Apart from the delay, glad that this is of course still HA ;) > > Cheers, > James > @purpleidea (twitter/irc) > https://ttboj.wordpress.com/ > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at nongnu.org > https://lists.nongnu.org/mailman/listinfo/gluster-devel >