thr3ads.net - Gluster users - [Gluster-users] corruption using gluster and iSCSI with LIO [Nov 2016]

If this information is useful, please help other people find it:
Share via:

David Gossage

2016-Nov-18 14:39 UTC

[Gluster-users] corruption using gluster and iSCSI with LIO

On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert <lambert.olivier at
gmail.com>
wrote:
> Hi David,
>
> What are the exact commands to be sure it's fine?
>
> Right now I got:
>
> # gluster volume heal gv0 info
> Brick 10.0.0.1:/bricks/brick1/gv0
> Status: Connected
> Number of entries: 0
>
> Brick 10.0.0.2:/bricks/brick1/gv0
> Status: Connected
> Number of entries: 0
>
> Brick 10.0.0.3:/bricks/brick1/gv0
> Status: Connected
> Number of entries: 0
>
>
> Did you run this before taking down 2nd node to see if any heals wereongoing?

Also I see you have sharding enabled.  Are your files being served sharded
already as well?

> Everything is online and working, but this command give a strange output:
>
> # gluster volume heal gv0 info heal-failed
> Gathering list of heal failed entries on volume gv0 has been
> unsuccessful on bricks that are down. Please check if all brick
> processes are running.
>
> Is it normal?
>
I don't think that is a valid command anymore as whern I run it I get same
message and this is in logs
 [2016-11-18 14:35:02.260503] I [MSGID: 106533]
[glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume GLUSTER1
[2016-11-18 14:35:02.263341] W [MSGID: 106530]
[glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
not supported. Please use "gluster volume heal GLUSTER1 info" and logs
to
find the heal information.
[2016-11-18 14:35:02.263365] E [MSGID: 106301]
[glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
operation 'Volume Heal' failed on localhost : Command not supported.
Please
use "gluster volume heal GLUSTER1 info" and logs to find the heal
information.

> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
> <dgossage at carouselchecks.com> wrote:
> >
> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert <
> lambert.olivier at gmail.com>
> > wrote:
> >>
> >> Okay, used the exact same config you provided, and adding an
arbiter
> >> node (node3)
> >>
> >> After halting node2, VM continues to work after a small
"lag"/freeze.
> >> I restarted node2 and it was back online: OK
> >>
> >> Then, after waiting few minutes, halting node1. And **just** at
this
> >> moment, the VM is corrupted (segmentation fault, /var/log folder
empty
> >> etc.)
> >>
> > Other than waiting a few minutes did you make sure heals had
completed?
> >
> >>
> >> dmesg of the VM:
> >>
> >> [ 1645.852905] EXT4-fs error (device xvda1):
> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
> >> inode=0, rec_len=0, name_len=0
> >> [ 1645.854509] Aborting journal on device xvda1-8.
> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
> >>
> >> And got a lot of " comm bash: bad entry in directory"
messages then...
> >>
> >> Here is the current config with all Node back online:
> >>
> >> # gluster volume info
> >>
> >> Volume Name: gv0
> >> Type: Replicate
> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> >> Status: Started
> >> Snapshot Count: 0
> >> Number of Bricks: 1 x (2 + 1) = 3
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> >> Options Reconfigured:
> >> nfs.disable: on
> >> performance.readdir-ahead: on
> >> transport.address-family: inet
> >> features.shard: on
> >> features.shard-block-size: 16MB
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >>
> >> # gluster volume status
> >> Status of volume: gv0
> >> Gluster process                             TCP Port  RDMA Port 
Online
> >> Pid
> >>
> >> ------------------------------------------------------------
> ------------------
> >> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0          Y
> >> 1331
> >> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0          Y
> >> 2274
> >> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0          Y
> >> 2355
> >> Self-heal Daemon on localhost               N/A       N/A        Y
> >> 2300
> >> Self-heal Daemon on 10.0.0.3                N/A       N/A        Y
> >> 10530
> >> Self-heal Daemon on 10.0.0.2                N/A       N/A        Y
> >> 2425
> >>
> >> Task Status of Volume gv0
> >>
> >> ------------------------------------------------------------
> ------------------
> >> There are no active volume tasks
> >>
> >>
> >>
> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
> >> <lambert.olivier at gmail.com> wrote:
> >> > It's planned to have an arbiter soon :) It was just
preliminary tests.
> >> >
> >> > Thanks for the settings, I'll test this soon and I'll
come back to
> you!
> >> >
> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> >> > <lindsay.mathieson at gmail.com> wrote:
> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >> >>>
> >> >>> gluster volume info gv0
> >> >>>
> >> >>> Volume Name: gv0
> >> >>> Type: Replicate
> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >> >>> Status: Started
> >> >>> Snapshot Count: 0
> >> >>> Number of Bricks: 1 x 2 = 2
> >> >>> Transport-type: tcp
> >> >>> Bricks:
> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >> >>> Options Reconfigured:
> >> >>> nfs.disable: on
> >> >>> performance.readdir-ahead: on
> >> >>> transport.address-family: inet
> >> >>> features.shard: on
> >> >>> features.shard-block-size: 16MB
> >> >>
> >> >>
> >> >>
> >> >> When hosting VM's its essential to set these options:
> >> >>
> >> >> network.remote-dio: enable
> >> >> cluster.eager-lock: enable
> >> >> performance.io-cache: off
> >> >> performance.read-ahead: off
> >> >> performance.quick-read: off
> >> >> performance.stat-prefetch: on
> >> >> performance.strict-write-ordering: off
> >> >> cluster.server-quorum-type: server
> >> >> cluster.quorum-type: auto
> >> >> cluster.data-self-heal: on
> >> >>
> >> >> Also with replica two and quorum on (required) your
volume will
> become
> >> >> read-only when one node goes down to prevent the
possibility of
> >> >> split-brain
> >> >> - you *really* want to avoid that :)
> >> >>
> >> >> I'd recommend a replica 3 volume, that way 1 node can
go down, but
> the
> >> >> other
> >> >> two still form a quorum and will remain r/w.
> >> >>
> >> >> If the extra disks are not possible, then a Arbiter
volume can be
> setup
> >> >> -
> >> >> basically dummy files on the third node.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Lindsay Mathieson
> >> >>
> >> >> _______________________________________________
> >> >> Gluster-users mailing list
> >> >> Gluster-users at gluster.org
> >> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161118/21c229cb/attachment.html>

Olivier Lambert

2016-Nov-18 16:00 UTC

head link

[Gluster-users] corruption using gluster and iSCSI with LIO

Yes, I did it only if I have the previous result of heal info ("number
of entries: 0"). But same result, as soon as the second Node is
offline (after they were both working/back online), everything is
corrupted.

To recap:

* Node 1 UP Node 2 UP -> OK
* Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
the path down and change if necessary)
* Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
in heal command)
* Node 1 DOWN Node 2 UP -> NOT OK (data corruption)

On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
<dgossage at carouselchecks.com> wrote:> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert <lambert.olivier at
gmail.com>
> wrote:
>>
>> Hi David,
>>
>> What are the exact commands to be sure it's fine?
>>
>> Right now I got:
>>
>> # gluster volume heal gv0 info
>> Brick 10.0.0.1:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick 10.0.0.2:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick 10.0.0.3:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>>
> Did you run this before taking down 2nd node to see if any heals were
> ongoing?
>
> Also I see you have sharding enabled.  Are your files being served sharded
> already as well?
>
>>
>> Everything is online and working, but this command give a strange
output:
>>
>> # gluster volume heal gv0 info heal-failed
>> Gathering list of heal failed entries on volume gv0 has been
>> unsuccessful on bricks that are down. Please check if all brick
>> processes are running.
>>
>> Is it normal?
>
>
> I don't think that is a valid command anymore as whern I run it I get
same
> message and this is in logs
>  [2016-11-18 14:35:02.260503] I [MSGID: 106533]
> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
> Received heal vol req for volume GLUSTER1
> [2016-11-18 14:35:02.263341] W [MSGID: 106530]
> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
> not supported. Please use "gluster volume heal GLUSTER1 info" and
logs to
> find the heal information.
> [2016-11-18 14:35:02.263365] E [MSGID: 106301]
> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
> operation 'Volume Heal' failed on localhost : Command not
supported. Please
> use "gluster volume heal GLUSTER1 info" and logs to find the heal
> information.
>
>>
>> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
>> <dgossage at carouselchecks.com> wrote:
>> >
>> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert
>> > <lambert.olivier at gmail.com>
>> > wrote:
>> >>
>> >> Okay, used the exact same config you provided, and adding an
arbiter
>> >> node (node3)
>> >>
>> >> After halting node2, VM continues to work after a small
"lag"/freeze.
>> >> I restarted node2 and it was back online: OK
>> >>
>> >> Then, after waiting few minutes, halting node1. And **just**
at this
>> >> moment, the VM is corrupted (segmentation fault, /var/log
folder empty
>> >> etc.)
>> >>
>> > Other than waiting a few minutes did you make sure heals had
completed?
>> >
>> >>
>> >> dmesg of the VM:
>> >>
>> >> [ 1645.852905] EXT4-fs error (device xvda1):
>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash:
bad
>> >> entry in directory: rec_len is smaller than minimal -
offset=0(0),
>> >> inode=0, rec_len=0, name_len=0
>> >> [ 1645.854509] Aborting journal on device xvda1-8.
>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem
read-only
>> >>
>> >> And got a lot of " comm bash: bad entry in
directory" messages then...
>> >>
>> >> Here is the current config with all Node back online:
>> >>
>> >> # gluster volume info
>> >>
>> >> Volume Name: gv0
>> >> Type: Replicate
>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> >> Status: Started
>> >> Snapshot Count: 0
>> >> Number of Bricks: 1 x (2 + 1) = 3
>> >> Transport-type: tcp
>> >> Bricks:
>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> >> Options Reconfigured:
>> >> nfs.disable: on
>> >> performance.readdir-ahead: on
>> >> transport.address-family: inet
>> >> features.shard: on
>> >> features.shard-block-size: 16MB
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >>
>> >> # gluster volume status
>> >> Status of volume: gv0
>> >> Gluster process                             TCP Port  RDMA
Port  Online
>> >> Pid
>> >>
>> >>
>> >>
------------------------------------------------------------------------------
>> >> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0       
Y
>> >> 1331
>> >> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0       
Y
>> >> 2274
>> >> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0       
Y
>> >> 2355
>> >> Self-heal Daemon on localhost               N/A       N/A     
Y
>> >> 2300
>> >> Self-heal Daemon on 10.0.0.3                N/A       N/A     
Y
>> >> 10530
>> >> Self-heal Daemon on 10.0.0.2                N/A       N/A     
Y
>> >> 2425
>> >>
>> >> Task Status of Volume gv0
>> >>
>> >>
>> >>
------------------------------------------------------------------------------
>> >> There are no active volume tasks
>> >>
>> >>
>> >>
>> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>> >> <lambert.olivier at gmail.com> wrote:
>> >> > It's planned to have an arbiter soon :) It was just
preliminary
>> >> > tests.
>> >> >
>> >> > Thanks for the settings, I'll test this soon and
I'll come back to
>> >> > you!
>> >> >
>> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>> >> > <lindsay.mathieson at gmail.com> wrote:
>> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>> >> >>>
>> >> >>> gluster volume info gv0
>> >> >>>
>> >> >>> Volume Name: gv0
>> >> >>> Type: Replicate
>> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> >> >>> Status: Started
>> >> >>> Snapshot Count: 0
>> >> >>> Number of Bricks: 1 x 2 = 2
>> >> >>> Transport-type: tcp
>> >> >>> Bricks:
>> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> >>> Options Reconfigured:
>> >> >>> nfs.disable: on
>> >> >>> performance.readdir-ahead: on
>> >> >>> transport.address-family: inet
>> >> >>> features.shard: on
>> >> >>> features.shard-block-size: 16MB
>> >> >>
>> >> >>
>> >> >>
>> >> >> When hosting VM's its essential to set these
options:
>> >> >>
>> >> >> network.remote-dio: enable
>> >> >> cluster.eager-lock: enable
>> >> >> performance.io-cache: off
>> >> >> performance.read-ahead: off
>> >> >> performance.quick-read: off
>> >> >> performance.stat-prefetch: on
>> >> >> performance.strict-write-ordering: off
>> >> >> cluster.server-quorum-type: server
>> >> >> cluster.quorum-type: auto
>> >> >> cluster.data-self-heal: on
>> >> >>
>> >> >> Also with replica two and quorum on (required) your
volume will
>> >> >> become
>> >> >> read-only when one node goes down to prevent the
possibility of
>> >> >> split-brain
>> >> >> - you *really* want to avoid that :)
>> >> >>
>> >> >> I'd recommend a replica 3 volume, that way 1 node
can go down, but
>> >> >> the
>> >> >> other
>> >> >> two still form a quorum and will remain r/w.
>> >> >>
>> >> >> If the extra disks are not possible, then a Arbiter
volume can be
>> >> >> setup
>> >> >> -
>> >> >> basically dummy files on the third node.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lindsay Mathieson
>> >> >>
>> >> >> _______________________________________________
>> >> >> Gluster-users mailing list
>> >> >> Gluster-users at gluster.org
>> >> >> http://www.gluster.org/mailman/listinfo/gluster-users
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>> >
>> >
>
>

Gluster users - Nov 2016 - corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO