samuel
2012-Mar-07 16:42 UTC
[Gluster-users] servers with multiple bricks do not get removed from LOOKUP function
Dear all, First of all apologies if the question has been answered before. In such case, I would really appreciate any pointer to the location of the information. Using gluster version 3.2.5 and 2 nodes, each one with 2 bricks, in a replicated distributed topology: (d1a)(d1b) (d2a)(d2b) disks d1a and disk d2a form a replica set, as well as d1b and d2b. This covers the failure of one server. The scenario I'm testing is the simulation of a disk failure, for instance disk d2b. The information keeps consistency (from a customer point of view all the files are in the right directory and it's possible to read, modify, delete,etc..). The problem is that all this operations takes extremelly too much time to complete. A single ls of a 4-file directory takes almost a minute. The main problem that may arisewould be I/o errors due to timeout in complex operations. Halting a server, on the other side, keeps the cluster working without performance penality. My guess is that since only one disk is down, but the server 2 is still up, the client (which is using gluster native library) still tries to use the brick at node 2, because node2 is still up although the brick is missing. When the LOOKUP times out, the client send a "broadcast" LOOKUP function and the other node (1) replies with the information of the running bricks. That may explain the reason why it takes so much to complete but the information is right. Is there any configuration option to solve this issue? Is my guess something similar to reallity? Is there any workaround besides using nodes of 1 single brick? Thank you in advance for any help, Samuel. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120307/778da33f/attachment.html>