Olivier Lambert
2016-Nov-17 22:35 UTC
[Gluster-users] corruption using gluster and iSCSI with LIO
It's planned to have an arbiter soon :) It was just preliminary tests. Thanks for the settings, I'll test this soon and I'll come back to you! On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson <lindsay.mathieson at gmail.com> wrote:> On 18/11/2016 8:17 AM, Olivier Lambert wrote: >> >> gluster volume info gv0 >> >> Volume Name: gv0 >> Type: Replicate >> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> Options Reconfigured: >> nfs.disable: on >> performance.readdir-ahead: on >> transport.address-family: inet >> features.shard: on >> features.shard-block-size: 16MB > > > > When hosting VM's its essential to set these options: > > network.remote-dio: enable > cluster.eager-lock: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > performance.stat-prefetch: on > performance.strict-write-ordering: off > cluster.server-quorum-type: server > cluster.quorum-type: auto > cluster.data-self-heal: on > > Also with replica two and quorum on (required) your volume will become > read-only when one node goes down to prevent the possibility of split-brain > - you *really* want to avoid that :) > > I'd recommend a replica 3 volume, that way 1 node can go down, but the other > two still form a quorum and will remain r/w. > > If the extra disks are not possible, then a Arbiter volume can be setup - > basically dummy files on the third node. > > > > -- > Lindsay Mathieson > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
Olivier Lambert
2016-Nov-18 00:42 UTC
[Gluster-users] corruption using gluster and iSCSI with LIO
Okay, used the exact same config you provided, and adding an arbiter node (node3) After halting node2, VM continues to work after a small "lag"/freeze. I restarted node2 and it was back online: OK Then, after waiting few minutes, halting node1. And **just** at this moment, the VM is corrupted (segmentation fault, /var/log folder empty etc.) dmesg of the VM: [ 1645.852905] EXT4-fs error (device xvda1): htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0 [ 1645.854509] Aborting journal on device xvda1-8. [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only And got a lot of " comm bash: bad entry in directory" messages then... Here is the current config with all Node back online: # gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.0.0.1:/bricks/brick1/gv0 Brick2: 10.0.0.2:/bricks/brick1/gv0 Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.shard: on features.shard-block-size: 16MB network.remote-dio: enable cluster.eager-lock: enable performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.stat-prefetch: on performance.strict-write-ordering: off cluster.server-quorum-type: server cluster.quorum-type: auto cluster.data-self-heal: on # gluster volume status Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y 1331 Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y 2274 Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y 2355 Self-heal Daemon on localhost N/A N/A Y 2300 Self-heal Daemon on 10.0.0.3 N/A N/A Y 10530 Self-heal Daemon on 10.0.0.2 N/A N/A Y 2425 Task Status of Volume gv0 ------------------------------------------------------------------------------ There are no active volume tasks On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert <lambert.olivier at gmail.com> wrote:> It's planned to have an arbiter soon :) It was just preliminary tests. > > Thanks for the settings, I'll test this soon and I'll come back to you! > > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson > <lindsay.mathieson at gmail.com> wrote: >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: >>> >>> gluster volume info gv0 >>> >>> Volume Name: gv0 >>> Type: Replicate >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 >>> Options Reconfigured: >>> nfs.disable: on >>> performance.readdir-ahead: on >>> transport.address-family: inet >>> features.shard: on >>> features.shard-block-size: 16MB >> >> >> >> When hosting VM's its essential to set these options: >> >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> performance.stat-prefetch: on >> performance.strict-write-ordering: off >> cluster.server-quorum-type: server >> cluster.quorum-type: auto >> cluster.data-self-heal: on >> >> Also with replica two and quorum on (required) your volume will become >> read-only when one node goes down to prevent the possibility of split-brain >> - you *really* want to avoid that :) >> >> I'd recommend a replica 3 volume, that way 1 node can go down, but the other >> two still form a quorum and will remain r/w. >> >> If the extra disks are not possible, then a Arbiter volume can be setup - >> basically dummy files on the third node. >> >> >> >> -- >> Lindsay Mathieson >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users