Hi all We are running tests on gluster to see if it is suitable for inclusion in a live environment. Software is 3.4.0beta4 Cluster is Proxmax with 2 nodes + quorum disc. Gluster is set to replicate mode - 2 replicas. Many tests are very satisfactory but last week we discovered two facts which make gluster unsuitable for our application. I assume this is a misunderstanding or misconfiguration on my part - once again I ask for your help. The intended use is that we require the data on gluster volumes to be available when the cluster is degraded - i.e. running on a single node (+ quorum disc). This happens for ext4/drbd primary/secondary mode. The virtual server moves to the surviving node (if it is not already there). This happens with unison/inotify. Reads are unaffected on the surviving node. Writes are queued - to be transfered to the failed node when it is later restored to the cluster. Preliminary tests with gfs2/drbd primary/primary indicate that writes are blocked for about 60 seconds then it continues normally on the surviving node. The updates are transfered to the failed node when it is later restored to the cluster. If we eliminate gluster in these trials we will put the effort into more testing of gfs2/drbd. Gluster behaves differently: 1. when one node dies the volume is half-umounted on the surviving node. i.e. it still shows with the mount command but we get the error 'transport endpoint disconnected'. 2. it is impossible to mount the volume again although a local copy of all the data is available in the bricks. umount reports no error and mount then correctly shows the gluster mount is not there. A subsequent mount command of the gluster volume waits a long time and then reports (via the logs) that the other server is dead. The reason why this is unworkable is that it makes a virtual server which uses a gluster volume depend on BOTH nodes being online. This is the exact opposite of high-availablity. What have I configured wrong? I can partly understand the logic of this behaviour - you cannot possibly replicate to 2 nodes if only a single node is available. However to deny even read access to the available data cannot be right. What I really wanted was that 'writes' are queued and written later when the dead node is available again (i.e. the same behaviour as gfs2 and unison). Any help or clarification would be appreciated. My question in it's simplest form is: Is this the intended behaviour in these circumstances? Is it possible to configure for the behaviour I expected? If so, how do I do that? Thanks in advance Allan
On 07/20/2013 12:36 AM, Allan Latham wrote:> Software is 3.4.0beta4 > Cluster is Proxmax with 2 nodes + quorum disc. > Gluster is set to replicate mode - 2 replicas. > > The intended use is that we require the data on gluster volumes to be > available when the cluster is degraded - i.e. running on a single node > (+ quorum disc). > > 1. when one node dies the volume is half-umounted on the surviving node. > i.e. it still shows with the mount command but we get the error > 'transport endpoint disconnected'. > > 2. it is impossible to mount the volume again although a local copy of > all the data is available in the bricks. umount reports no error and > mount then correctly shows the gluster mount is not there. A subsequent > mount command of the gluster volume waits a long time and then reports > (via the logs) that the other server is dead. > > The reason why this is unworkable is that it makes a virtual server > which uses a gluster volume depend on BOTH nodes being online. This is > the exact opposite of high-availablity. > > What have I configured wrong? > > I can partly understand the logic of this behaviour - you cannot > possibly replicate to 2 nodes if only a single node is available. > However to deny even read access to the available data cannot be right. > > What I really wanted was that 'writes' are queued and written later when > the dead node is available again (i.e. the same behaviour as gfs2 and > unison). > > Any help or clarification would be appreciated. > > My question in it's simplest form is: > > Is this the intended behaviour in these circumstances? > Is it possible to configure for the behaviour I expected? > If so, how do I do that? >Setting quorum on a 2 brick replica 2 is going to prevent writes if you have less than quorum. In automatic quorum mode, that's replicas/2+1 (or 2 in this case). So nothing's going to be "queued" for writing, but rather denied. Check "gluster volume status" and make sure both your servers are running. It sounds like your local client is not connecting to your local bricks. Expected behavior is that if you "pull the plug" on one of the servers, the client should pause ping-timeout seconds (defaults to 42) and continue operating as normal. If you shut down the server, tcp connections are closed properly and there is no hang. For more analysis, provide a clean client log (truncate the log, mount the volume, cause your failure, send log) and the result of "gluster volume status" during your failure.