David Gossage
2016-Nov-18 01:51 UTC
[Gluster-users] corruption using gluster and iSCSI with LIO
On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert <lambert.olivier at gmail.com> wrote:> Okay, used the exact same config you provided, and adding an arbiter > node (node3) > > After halting node2, VM continues to work after a small "lag"/freeze. > I restarted node2 and it was back online: OK > > Then, after waiting few minutes, halting node1. And **just** at this > moment, the VM is corrupted (segmentation fault, /var/log folder empty > etc.) > > Other than waiting a few minutes did you make sure heals had completed?> dmesg of the VM: > > [ 1645.852905] EXT4-fs error (device xvda1): > htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad > entry in directory: rec_len is smaller than minimal - offset=0(0), > inode=0, rec_len=0, name_len=0 > [ 1645.854509] Aborting journal on device xvda1-8. > [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only > > And got a lot of " comm bash: bad entry in directory" messages then... > > Here is the current config with all Node back online: > > # gluster volume info > > Volume Name: gv0 > Type: Replicate > Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: 10.0.0.1:/bricks/brick1/gv0 > Brick2: 10.0.0.2:/bricks/brick1/gv0 > Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) > Options Reconfigured: > nfs.disable: on > performance.readdir-ahead: on > transport.address-family: inet > features.shard: on > features.shard-block-size: 16MB > network.remote-dio: enable > cluster.eager-lock: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > performance.stat-prefetch: on > performance.strict-write-ordering: off > cluster.server-quorum-type: server > cluster.quorum-type: auto > cluster.data-self-heal: on > > > # gluster volume status > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y > 1331 > Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y > 2274 > Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y > 2355 > Self-heal Daemon on localhost N/A N/A Y > 2300 > Self-heal Daemon on 10.0.0.3 N/A N/A Y > 10530 > Self-heal Daemon on 10.0.0.2 N/A N/A Y > 2425 > > Task Status of Volume gv0 > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > > > On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert > <lambert.olivier at gmail.com> wrote: > > It's planned to have an arbiter soon :) It was just preliminary tests. > > > > Thanks for the settings, I'll test this soon and I'll come back to you! > > > > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson > > <lindsay.mathieson at gmail.com> wrote: > >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: > >>> > >>> gluster volume info gv0 > >>> > >>> Volume Name: gv0 > >>> Type: Replicate > >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 > >>> Status: Started > >>> Snapshot Count: 0 > >>> Number of Bricks: 1 x 2 = 2 > >>> Transport-type: tcp > >>> Bricks: > >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 > >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 > >>> Options Reconfigured: > >>> nfs.disable: on > >>> performance.readdir-ahead: on > >>> transport.address-family: inet > >>> features.shard: on > >>> features.shard-block-size: 16MB > >> > >> > >> > >> When hosting VM's its essential to set these options: > >> > >> network.remote-dio: enable > >> cluster.eager-lock: enable > >> performance.io-cache: off > >> performance.read-ahead: off > >> performance.quick-read: off > >> performance.stat-prefetch: on > >> performance.strict-write-ordering: off > >> cluster.server-quorum-type: server > >> cluster.quorum-type: auto > >> cluster.data-self-heal: on > >> > >> Also with replica two and quorum on (required) your volume will become > >> read-only when one node goes down to prevent the possibility of > split-brain > >> - you *really* want to avoid that :) > >> > >> I'd recommend a replica 3 volume, that way 1 node can go down, but the > other > >> two still form a quorum and will remain r/w. > >> > >> If the extra disks are not possible, then a Arbiter volume can be setup > - > >> basically dummy files on the third node. > >> > >> > >> > >> -- > >> Lindsay Mathieson > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://www.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161117/ace3175a/attachment.html>
Olivier Lambert
2016-Nov-18 09:49 UTC
[Gluster-users] corruption using gluster and iSCSI with LIO
Hi David, What are the exact commands to be sure it's fine? Right now I got: # gluster volume heal gv0 info Brick 10.0.0.1:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Brick 10.0.0.2:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Brick 10.0.0.3:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Everything is online and working, but this command give a strange output: # gluster volume heal gv0 info heal-failed Gathering list of heal failed entries on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running. Is it normal? On Fri, Nov 18, 2016 at 2:51 AM, David Gossage <dgossage at carouselchecks.com> wrote:> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert <lambert.olivier at gmail.com> > wrote: >> >> Okay, used the exact same config you provided, and adding an arbiter >> node (node3) >> >> After halting node2, VM continues to work after a small "lag"/freeze. >> I restarted node2 and it was back online: OK >> >> Then, after waiting few minutes, halting node1. And **just** at this >> moment, the VM is corrupted (segmentation fault, /var/log folder empty >> etc.) >> > Other than waiting a few minutes did you make sure heals had completed? > >> >> dmesg of the VM: >> >> [ 1645.852905] EXT4-fs error (device xvda1): >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad >> entry in directory: rec_len is smaller than minimal - offset=0(0), >> inode=0, rec_len=0, name_len=0 >> [ 1645.854509] Aborting journal on device xvda1-8. >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only >> >> And got a lot of " comm bash: bad entry in directory" messages then... >> >> Here is the current config with all Node back online: >> >> # gluster volume info >> >> Volume Name: gv0 >> Type: Replicate >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) >> Options Reconfigured: >> nfs.disable: on >> performance.readdir-ahead: on >> transport.address-family: inet >> features.shard: on >> features.shard-block-size: 16MB >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> performance.stat-prefetch: on >> performance.strict-write-ordering: off >> cluster.server-quorum-type: server >> cluster.quorum-type: auto >> cluster.data-self-heal: on >> >> >> # gluster volume status >> Status of volume: gv0 >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y >> 1331 >> Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y >> 2274 >> Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y >> 2355 >> Self-heal Daemon on localhost N/A N/A Y >> 2300 >> Self-heal Daemon on 10.0.0.3 N/A N/A Y >> 10530 >> Self-heal Daemon on 10.0.0.2 N/A N/A Y >> 2425 >> >> Task Status of Volume gv0 >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert >> <lambert.olivier at gmail.com> wrote: >> > It's planned to have an arbiter soon :) It was just preliminary tests. >> > >> > Thanks for the settings, I'll test this soon and I'll come back to you! >> > >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson >> > <lindsay.mathieson at gmail.com> wrote: >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: >> >>> >> >>> gluster volume info gv0 >> >>> >> >>> Volume Name: gv0 >> >>> Type: Replicate >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 >> >>> Status: Started >> >>> Snapshot Count: 0 >> >>> Number of Bricks: 1 x 2 = 2 >> >>> Transport-type: tcp >> >>> Bricks: >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> >>> Options Reconfigured: >> >>> nfs.disable: on >> >>> performance.readdir-ahead: on >> >>> transport.address-family: inet >> >>> features.shard: on >> >>> features.shard-block-size: 16MB >> >> >> >> >> >> >> >> When hosting VM's its essential to set these options: >> >> >> >> network.remote-dio: enable >> >> cluster.eager-lock: enable >> >> performance.io-cache: off >> >> performance.read-ahead: off >> >> performance.quick-read: off >> >> performance.stat-prefetch: on >> >> performance.strict-write-ordering: off >> >> cluster.server-quorum-type: server >> >> cluster.quorum-type: auto >> >> cluster.data-self-heal: on >> >> >> >> Also with replica two and quorum on (required) your volume will become >> >> read-only when one node goes down to prevent the possibility of >> >> split-brain >> >> - you *really* want to avoid that :) >> >> >> >> I'd recommend a replica 3 volume, that way 1 node can go down, but the >> >> other >> >> two still form a quorum and will remain r/w. >> >> >> >> If the extra disks are not possible, then a Arbiter volume can be setup >> >> - >> >> basically dummy files on the third node. >> >> >> >> >> >> >> >> -- >> >> Lindsay Mathieson >> >> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users at gluster.org >> >> http://www.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users > >