thr3ads.net - Gluster users - [Gluster-users] corruption using gluster and iSCSI with LIO [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Olivier Lambert

2016-Nov-18 00:42 UTC

[Gluster-users] corruption using gluster and iSCSI with LIO

Okay, used the exact same config you provided, and adding an arbiter
node (node3)

After halting node2, VM continues to work after a small "lag"/freeze.
I restarted node2 and it was back online: OK

Then, after waiting few minutes, halting node1. And **just** at this
moment, the VM is corrupted (segmentation fault, /var/log folder empty
etc.)

dmesg of the VM:

[ 1645.852905] EXT4-fs error (device xvda1):
htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
entry in directory: rec_len is smaller than minimal - offset=0(0),
inode=0, rec_len=0, name_len=0
[ 1645.854509] Aborting journal on device xvda1-8.
[ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only

And got a lot of " comm bash: bad entry in directory" messages then...

Here is the current config with all Node back online:

# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.0.0.1:/bricks/brick1/gv0
Brick2: 10.0.0.2:/bricks/brick1/gv0
Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.shard: on
features.shard-block-size: 16MB
network.remote-dio: enable
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.stat-prefetch: on
performance.strict-write-ordering: off
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.data-self-heal: on


# gluster volume status
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.0.0.1:/bricks/brick1/gv0           49152     0          Y       1331
Brick 10.0.0.2:/bricks/brick1/gv0           49152     0          Y       2274
Brick 10.0.0.3:/bricks/brick1/gv0           49152     0          Y       2355
Self-heal Daemon on localhost               N/A       N/A        Y       2300
Self-heal Daemon on 10.0.0.3                N/A       N/A        Y       10530
Self-heal Daemon on 10.0.0.2                N/A       N/A        Y       2425

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks



On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
<lambert.olivier at gmail.com> wrote:> It's planned to have an arbiter soon :) It was just preliminary tests.
>
> Thanks for the settings, I'll test this soon and I'll come back to
you!
>
> On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> <lindsay.mathieson at gmail.com> wrote:
>> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>>>
>>> gluster volume info gv0
>>>
>>> Volume Name: gv0
>>> Type: Replicate
>>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>>> Options Reconfigured:
>>> nfs.disable: on
>>> performance.readdir-ahead: on
>>> transport.address-family: inet
>>> features.shard: on
>>> features.shard-block-size: 16MB
>>
>>
>>
>> When hosting VM's its essential to set these options:
>>
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.stat-prefetch: on
>> performance.strict-write-ordering: off
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.data-self-heal: on
>>
>> Also with replica two and quorum on (required) your volume will become
>> read-only when one node goes down to prevent the possibility of
split-brain
>> - you *really* want to avoid that :)
>>
>> I'd recommend a replica 3 volume, that way 1 node can go down, but
the other
>> two still form a quorum and will remain r/w.
>>
>> If the extra disks are not possible, then a Arbiter volume can be setup
-
>> basically dummy files on the third node.
>>
>>
>>
>> --
>> Lindsay Mathieson
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users

Krutika Dhananjay

2016-Nov-18 01:22 UTC

head link

[Gluster-users] corruption using gluster and iSCSI with LIO

Could you attach the fuse client and brick logs?

-Krutika

On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert <lambert.olivier at
gmail.com>
wrote:
> Okay, used the exact same config you provided, and adding an arbiter
> node (node3)
>
> After halting node2, VM continues to work after a small
"lag"/freeze.
> I restarted node2 and it was back online: OK
>
> Then, after waiting few minutes, halting node1. And **just** at this
> moment, the VM is corrupted (segmentation fault, /var/log folder empty
> etc.)
>
> dmesg of the VM:
>
> [ 1645.852905] EXT4-fs error (device xvda1):
> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> entry in directory: rec_len is smaller than minimal - offset=0(0),
> inode=0, rec_len=0, name_len=0
> [ 1645.854509] Aborting journal on device xvda1-8.
> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>
> And got a lot of " comm bash: bad entry in directory" messages
then...
>
> Here is the current config with all Node back online:
>
> # gluster volume info
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.0.1:/bricks/brick1/gv0
> Brick2: 10.0.0.2:/bricks/brick1/gv0
> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.shard: on
> features.shard-block-size: 16MB
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.stat-prefetch: on
> performance.strict-write-ordering: off
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.data-self-heal: on
>
>
> # gluster volume status
> Status of volume: gv0
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
> ------------------------------------------------------------
> ------------------
> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0          Y
>  1331
> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0          Y
>  2274
> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0          Y
>  2355
> Self-heal Daemon on localhost               N/A       N/A        Y
>  2300
> Self-heal Daemon on 10.0.0.3                N/A       N/A        Y
>  10530
> Self-heal Daemon on 10.0.0.2                N/A       N/A        Y
>  2425
>
> Task Status of Volume gv0
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
>
> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
> <lambert.olivier at gmail.com> wrote:
> > It's planned to have an arbiter soon :) It was just preliminary
tests.
> >
> > Thanks for the settings, I'll test this soon and I'll come
back to you!
> >
> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> > <lindsay.mathieson at gmail.com> wrote:
> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >>>
> >>> gluster volume info gv0
> >>>
> >>> Volume Name: gv0
> >>> Type: Replicate
> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >>> Status: Started
> >>> Snapshot Count: 0
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> performance.readdir-ahead: on
> >>> transport.address-family: inet
> >>> features.shard: on
> >>> features.shard-block-size: 16MB
> >>
> >>
> >>
> >> When hosting VM's its essential to set these options:
> >>
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >> Also with replica two and quorum on (required) your volume will
become
> >> read-only when one node goes down to prevent the possibility of
> split-brain
> >> - you *really* want to avoid that :)
> >>
> >> I'd recommend a replica 3 volume, that way 1 node can go down,
but the
> other
> >> two still form a quorum and will remain r/w.
> >>
> >> If the extra disks are not possible, then a Arbiter volume can be
setup
> -
> >> basically dummy files on the third node.
> >>
> >>
> >>
> >> --
> >> Lindsay Mathieson
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161118/c79d69e6/attachment.html>

David Gossage

2016-Nov-18 01:51 UTC

head link

[Gluster-users] corruption using gluster and iSCSI with LIO

On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert <lambert.olivier at
gmail.com>
wrote:
> Okay, used the exact same config you provided, and adding an arbiter
> node (node3)
>
> After halting node2, VM continues to work after a small
"lag"/freeze.
> I restarted node2 and it was back online: OK
>
> Then, after waiting few minutes, halting node1. And **just** at this
> moment, the VM is corrupted (segmentation fault, /var/log folder empty
> etc.)
>
> Other than waiting a few minutes did you make sure heals had completed?
> dmesg of the VM:
>
> [ 1645.852905] EXT4-fs error (device xvda1):
> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> entry in directory: rec_len is smaller than minimal - offset=0(0),
> inode=0, rec_len=0, name_len=0
> [ 1645.854509] Aborting journal on device xvda1-8.
> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>
> And got a lot of " comm bash: bad entry in directory" messages
then...
>
> Here is the current config with all Node back online:
>
> # gluster volume info
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.0.1:/bricks/brick1/gv0
> Brick2: 10.0.0.2:/bricks/brick1/gv0
> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.shard: on
> features.shard-block-size: 16MB
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.stat-prefetch: on
> performance.strict-write-ordering: off
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.data-self-heal: on
>
>
> # gluster volume status
> Status of volume: gv0
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
> ------------------------------------------------------------
> ------------------
> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0          Y
>  1331
> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0          Y
>  2274
> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0          Y
>  2355
> Self-heal Daemon on localhost               N/A       N/A        Y
>  2300
> Self-heal Daemon on 10.0.0.3                N/A       N/A        Y
>  10530
> Self-heal Daemon on 10.0.0.2                N/A       N/A        Y
>  2425
>
> Task Status of Volume gv0
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
>
> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
> <lambert.olivier at gmail.com> wrote:
> > It's planned to have an arbiter soon :) It was just preliminary
tests.
> >
> > Thanks for the settings, I'll test this soon and I'll come
back to you!
> >
> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> > <lindsay.mathieson at gmail.com> wrote:
> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >>>
> >>> gluster volume info gv0
> >>>
> >>> Volume Name: gv0
> >>> Type: Replicate
> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >>> Status: Started
> >>> Snapshot Count: 0
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> performance.readdir-ahead: on
> >>> transport.address-family: inet
> >>> features.shard: on
> >>> features.shard-block-size: 16MB
> >>
> >>
> >>
> >> When hosting VM's its essential to set these options:
> >>
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >> Also with replica two and quorum on (required) your volume will
become
> >> read-only when one node goes down to prevent the possibility of
> split-brain
> >> - you *really* want to avoid that :)
> >>
> >> I'd recommend a replica 3 volume, that way 1 node can go down,
but the
> other
> >> two still form a quorum and will remain r/w.
> >>
> >> If the extra disks are not possible, then a Arbiter volume can be
setup
> -
> >> basically dummy files on the third node.
> >>
> >>
> >>
> >> --
> >> Lindsay Mathieson
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161117/ace3175a/attachment.html>

Gluster users - Nov 2016 - corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO