Krutika Dhananjay
2016-Nov-18 01:22 UTC
[Gluster-users] corruption using gluster and iSCSI with LIO
Could you attach the fuse client and brick logs? -Krutika On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert <lambert.olivier at gmail.com> wrote:> Okay, used the exact same config you provided, and adding an arbiter > node (node3) > > After halting node2, VM continues to work after a small "lag"/freeze. > I restarted node2 and it was back online: OK > > Then, after waiting few minutes, halting node1. And **just** at this > moment, the VM is corrupted (segmentation fault, /var/log folder empty > etc.) > > dmesg of the VM: > > [ 1645.852905] EXT4-fs error (device xvda1): > htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad > entry in directory: rec_len is smaller than minimal - offset=0(0), > inode=0, rec_len=0, name_len=0 > [ 1645.854509] Aborting journal on device xvda1-8. > [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only > > And got a lot of " comm bash: bad entry in directory" messages then... > > Here is the current config with all Node back online: > > # gluster volume info > > Volume Name: gv0 > Type: Replicate > Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: 10.0.0.1:/bricks/brick1/gv0 > Brick2: 10.0.0.2:/bricks/brick1/gv0 > Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) > Options Reconfigured: > nfs.disable: on > performance.readdir-ahead: on > transport.address-family: inet > features.shard: on > features.shard-block-size: 16MB > network.remote-dio: enable > cluster.eager-lock: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > performance.stat-prefetch: on > performance.strict-write-ordering: off > cluster.server-quorum-type: server > cluster.quorum-type: auto > cluster.data-self-heal: on > > > # gluster volume status > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y > 1331 > Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y > 2274 > Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y > 2355 > Self-heal Daemon on localhost N/A N/A Y > 2300 > Self-heal Daemon on 10.0.0.3 N/A N/A Y > 10530 > Self-heal Daemon on 10.0.0.2 N/A N/A Y > 2425 > > Task Status of Volume gv0 > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > > > On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert > <lambert.olivier at gmail.com> wrote: > > It's planned to have an arbiter soon :) It was just preliminary tests. > > > > Thanks for the settings, I'll test this soon and I'll come back to you! > > > > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson > > <lindsay.mathieson at gmail.com> wrote: > >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: > >>> > >>> gluster volume info gv0 > >>> > >>> Volume Name: gv0 > >>> Type: Replicate > >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 > >>> Status: Started > >>> Snapshot Count: 0 > >>> Number of Bricks: 1 x 2 = 2 > >>> Transport-type: tcp > >>> Bricks: > >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 > >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 > >>> Options Reconfigured: > >>> nfs.disable: on > >>> performance.readdir-ahead: on > >>> transport.address-family: inet > >>> features.shard: on > >>> features.shard-block-size: 16MB > >> > >> > >> > >> When hosting VM's its essential to set these options: > >> > >> network.remote-dio: enable > >> cluster.eager-lock: enable > >> performance.io-cache: off > >> performance.read-ahead: off > >> performance.quick-read: off > >> performance.stat-prefetch: on > >> performance.strict-write-ordering: off > >> cluster.server-quorum-type: server > >> cluster.quorum-type: auto > >> cluster.data-self-heal: on > >> > >> Also with replica two and quorum on (required) your volume will become > >> read-only when one node goes down to prevent the possibility of > split-brain > >> - you *really* want to avoid that :) > >> > >> I'd recommend a replica 3 volume, that way 1 node can go down, but the > other > >> two still form a quorum and will remain r/w. > >> > >> If the extra disks are not possible, then a Arbiter volume can be setup > - > >> basically dummy files on the third node. > >> > >> > >> > >> -- > >> Lindsay Mathieson > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://www.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161118/c79d69e6/attachment.html>
Olivier Lambert
2016-Nov-18 01:43 UTC
[Gluster-users] corruption using gluster and iSCSI with LIO
Attached, bricks log. Where could I find the fuse client log? On Fri, Nov 18, 2016 at 2:22 AM, Krutika Dhananjay <kdhananj at redhat.com> wrote:> Could you attach the fuse client and brick logs? > > -Krutika > > On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert <lambert.olivier at gmail.com> > wrote: >> >> Okay, used the exact same config you provided, and adding an arbiter >> node (node3) >> >> After halting node2, VM continues to work after a small "lag"/freeze. >> I restarted node2 and it was back online: OK >> >> Then, after waiting few minutes, halting node1. And **just** at this >> moment, the VM is corrupted (segmentation fault, /var/log folder empty >> etc.) >> >> dmesg of the VM: >> >> [ 1645.852905] EXT4-fs error (device xvda1): >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad >> entry in directory: rec_len is smaller than minimal - offset=0(0), >> inode=0, rec_len=0, name_len=0 >> [ 1645.854509] Aborting journal on device xvda1-8. >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only >> >> And got a lot of " comm bash: bad entry in directory" messages then... >> >> Here is the current config with all Node back online: >> >> # gluster volume info >> >> Volume Name: gv0 >> Type: Replicate >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) >> Options Reconfigured: >> nfs.disable: on >> performance.readdir-ahead: on >> transport.address-family: inet >> features.shard: on >> features.shard-block-size: 16MB >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> performance.stat-prefetch: on >> performance.strict-write-ordering: off >> cluster.server-quorum-type: server >> cluster.quorum-type: auto >> cluster.data-self-heal: on >> >> >> # gluster volume status >> Status of volume: gv0 >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y >> 1331 >> Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y >> 2274 >> Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y >> 2355 >> Self-heal Daemon on localhost N/A N/A Y >> 2300 >> Self-heal Daemon on 10.0.0.3 N/A N/A Y >> 10530 >> Self-heal Daemon on 10.0.0.2 N/A N/A Y >> 2425 >> >> Task Status of Volume gv0 >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert >> <lambert.olivier at gmail.com> wrote: >> > It's planned to have an arbiter soon :) It was just preliminary tests. >> > >> > Thanks for the settings, I'll test this soon and I'll come back to you! >> > >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson >> > <lindsay.mathieson at gmail.com> wrote: >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: >> >>> >> >>> gluster volume info gv0 >> >>> >> >>> Volume Name: gv0 >> >>> Type: Replicate >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 >> >>> Status: Started >> >>> Snapshot Count: 0 >> >>> Number of Bricks: 1 x 2 = 2 >> >>> Transport-type: tcp >> >>> Bricks: >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> >>> Options Reconfigured: >> >>> nfs.disable: on >> >>> performance.readdir-ahead: on >> >>> transport.address-family: inet >> >>> features.shard: on >> >>> features.shard-block-size: 16MB >> >> >> >> >> >> >> >> When hosting VM's its essential to set these options: >> >> >> >> network.remote-dio: enable >> >> cluster.eager-lock: enable >> >> performance.io-cache: off >> >> performance.read-ahead: off >> >> performance.quick-read: off >> >> performance.stat-prefetch: on >> >> performance.strict-write-ordering: off >> >> cluster.server-quorum-type: server >> >> cluster.quorum-type: auto >> >> cluster.data-self-heal: on >> >> >> >> Also with replica two and quorum on (required) your volume will become >> >> read-only when one node goes down to prevent the possibility of >> >> split-brain >> >> - you *really* want to avoid that :) >> >> >> >> I'd recommend a replica 3 volume, that way 1 node can go down, but the >> >> other >> >> two still form a quorum and will remain r/w. >> >> >> >> If the extra disks are not possible, then a Arbiter volume can be setup >> >> - >> >> basically dummy files on the third node. >> >> >> >> >> >> >> >> -- >> >> Lindsay Mathieson >> >> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users at gluster.org >> >> http://www.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- A non-text attachment was scrubbed... Name: bricks-brick1-gv0.log Type: text/x-log Size: 60687 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161118/8e4484dd/attachment.bin>