thr3ads.net - Gluster users - [Gluster-users] corruption using gluster and iSCSI with LIO [Nov 2016]

If this information is useful, please help other people find it:
Share via:

David Gossage

2016-Nov-18 01:51 UTC

[Gluster-users] corruption using gluster and iSCSI with LIO

On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert <lambert.olivier at
gmail.com>
wrote:
> Okay, used the exact same config you provided, and adding an arbiter
> node (node3)
>
> After halting node2, VM continues to work after a small
"lag"/freeze.
> I restarted node2 and it was back online: OK
>
> Then, after waiting few minutes, halting node1. And **just** at this
> moment, the VM is corrupted (segmentation fault, /var/log folder empty
> etc.)
>
> Other than waiting a few minutes did you make sure heals had completed?
> dmesg of the VM:
>
> [ 1645.852905] EXT4-fs error (device xvda1):
> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> entry in directory: rec_len is smaller than minimal - offset=0(0),
> inode=0, rec_len=0, name_len=0
> [ 1645.854509] Aborting journal on device xvda1-8.
> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>
> And got a lot of " comm bash: bad entry in directory" messages
then...
>
> Here is the current config with all Node back online:
>
> # gluster volume info
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.0.1:/bricks/brick1/gv0
> Brick2: 10.0.0.2:/bricks/brick1/gv0
> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.shard: on
> features.shard-block-size: 16MB
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.stat-prefetch: on
> performance.strict-write-ordering: off
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.data-self-heal: on
>
>
> # gluster volume status
> Status of volume: gv0
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
> ------------------------------------------------------------
> ------------------
> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0          Y
>  1331
> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0          Y
>  2274
> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0          Y
>  2355
> Self-heal Daemon on localhost               N/A       N/A        Y
>  2300
> Self-heal Daemon on 10.0.0.3                N/A       N/A        Y
>  10530
> Self-heal Daemon on 10.0.0.2                N/A       N/A        Y
>  2425
>
> Task Status of Volume gv0
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
>
> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
> <lambert.olivier at gmail.com> wrote:
> > It's planned to have an arbiter soon :) It was just preliminary
tests.
> >
> > Thanks for the settings, I'll test this soon and I'll come
back to you!
> >
> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> > <lindsay.mathieson at gmail.com> wrote:
> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >>>
> >>> gluster volume info gv0
> >>>
> >>> Volume Name: gv0
> >>> Type: Replicate
> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >>> Status: Started
> >>> Snapshot Count: 0
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> performance.readdir-ahead: on
> >>> transport.address-family: inet
> >>> features.shard: on
> >>> features.shard-block-size: 16MB
> >>
> >>
> >>
> >> When hosting VM's its essential to set these options:
> >>
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >> Also with replica two and quorum on (required) your volume will
become
> >> read-only when one node goes down to prevent the possibility of
> split-brain
> >> - you *really* want to avoid that :)
> >>
> >> I'd recommend a replica 3 volume, that way 1 node can go down,
but the
> other
> >> two still form a quorum and will remain r/w.
> >>
> >> If the extra disks are not possible, then a Arbiter volume can be
setup
> -
> >> basically dummy files on the third node.
> >>
> >>
> >>
> >> --
> >> Lindsay Mathieson
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161117/ace3175a/attachment.html>

Olivier Lambert

2016-Nov-18 09:49 UTC

head link

[Gluster-users] corruption using gluster and iSCSI with LIO

Hi David,

What are the exact commands to be sure it's fine?

Right now I got:

# gluster volume heal gv0 info
Brick 10.0.0.1:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.2:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.3:/bricks/brick1/gv0
Status: Connected
Number of entries: 0


Everything is online and working, but this command give a strange output:

# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.

Is it normal?

On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
<dgossage at carouselchecks.com> wrote:>
> On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert <lambert.olivier at
gmail.com>
> wrote:
>>
>> Okay, used the exact same config you provided, and adding an arbiter
>> node (node3)
>>
>> After halting node2, VM continues to work after a small
"lag"/freeze.
>> I restarted node2 and it was back online: OK
>>
>> Then, after waiting few minutes, halting node1. And **just** at this
>> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> etc.)
>>
> Other than waiting a few minutes did you make sure heals had completed?
>
>>
>> dmesg of the VM:
>>
>> [ 1645.852905] EXT4-fs error (device xvda1):
>> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> inode=0, rec_len=0, name_len=0
>> [ 1645.854509] Aborting journal on device xvda1-8.
>> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>>
>> And got a lot of " comm bash: bad entry in directory"
messages then...
>>
>> Here is the current config with all Node back online:
>>
>> # gluster volume info
>>
>> Volume Name: gv0
>> Type: Replicate
>> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> features.shard: on
>> features.shard-block-size: 16MB
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.stat-prefetch: on
>> performance.strict-write-ordering: off
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.data-self-heal: on
>>
>>
>> # gluster volume status
>> Status of volume: gv0
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>>
------------------------------------------------------------------------------
>> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0          Y
>> 1331
>> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0          Y
>> 2274
>> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0          Y
>> 2355
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 2300
>> Self-heal Daemon on 10.0.0.3                N/A       N/A        Y
>> 10530
>> Self-heal Daemon on 10.0.0.2                N/A       N/A        Y
>> 2425
>>
>> Task Status of Volume gv0
>>
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>>
>> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>> <lambert.olivier at gmail.com> wrote:
>> > It's planned to have an arbiter soon :) It was just
preliminary tests.
>> >
>> > Thanks for the settings, I'll test this soon and I'll come
back to you!
>> >
>> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>> > <lindsay.mathieson at gmail.com> wrote:
>> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>> >>>
>> >>> gluster volume info gv0
>> >>>
>> >>> Volume Name: gv0
>> >>> Type: Replicate
>> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> >>> Status: Started
>> >>> Snapshot Count: 0
>> >>> Number of Bricks: 1 x 2 = 2
>> >>> Transport-type: tcp
>> >>> Bricks:
>> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >>> Options Reconfigured:
>> >>> nfs.disable: on
>> >>> performance.readdir-ahead: on
>> >>> transport.address-family: inet
>> >>> features.shard: on
>> >>> features.shard-block-size: 16MB
>> >>
>> >>
>> >>
>> >> When hosting VM's its essential to set these options:
>> >>
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >> Also with replica two and quorum on (required) your volume
will become
>> >> read-only when one node goes down to prevent the possibility
of
>> >> split-brain
>> >> - you *really* want to avoid that :)
>> >>
>> >> I'd recommend a replica 3 volume, that way 1 node can go
down, but the
>> >> other
>> >> two still form a quorum and will remain r/w.
>> >>
>> >> If the extra disks are not possible, then a Arbiter volume can
be setup
>> >> -
>> >> basically dummy files on the third node.
>> >>
>> >>
>> >>
>> >> --
>> >> Lindsay Mathieson
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>

Gluster users - Nov 2016 - corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO