thr3ads.net - Gluster users - [Gluster-users] corruption using gluster and iSCSI with LIO [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Olivier Lambert

2016-Nov-18 16:00 UTC

[Gluster-users] corruption using gluster and iSCSI with LIO

Yes, I did it only if I have the previous result of heal info ("number
of entries: 0"). But same result, as soon as the second Node is
offline (after they were both working/back online), everything is
corrupted.

To recap:

* Node 1 UP Node 2 UP -> OK
* Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
the path down and change if necessary)
* Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
in heal command)
* Node 1 DOWN Node 2 UP -> NOT OK (data corruption)

On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
<dgossage at carouselchecks.com> wrote:> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert <lambert.olivier at
gmail.com>
> wrote:
>>
>> Hi David,
>>
>> What are the exact commands to be sure it's fine?
>>
>> Right now I got:
>>
>> # gluster volume heal gv0 info
>> Brick 10.0.0.1:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick 10.0.0.2:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick 10.0.0.3:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>>
> Did you run this before taking down 2nd node to see if any heals were
> ongoing?
>
> Also I see you have sharding enabled.  Are your files being served sharded
> already as well?
>
>>
>> Everything is online and working, but this command give a strange
output:
>>
>> # gluster volume heal gv0 info heal-failed
>> Gathering list of heal failed entries on volume gv0 has been
>> unsuccessful on bricks that are down. Please check if all brick
>> processes are running.
>>
>> Is it normal?
>
>
> I don't think that is a valid command anymore as whern I run it I get
same
> message and this is in logs
>  [2016-11-18 14:35:02.260503] I [MSGID: 106533]
> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
> Received heal vol req for volume GLUSTER1
> [2016-11-18 14:35:02.263341] W [MSGID: 106530]
> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
> not supported. Please use "gluster volume heal GLUSTER1 info" and
logs to
> find the heal information.
> [2016-11-18 14:35:02.263365] E [MSGID: 106301]
> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
> operation 'Volume Heal' failed on localhost : Command not
supported. Please
> use "gluster volume heal GLUSTER1 info" and logs to find the heal
> information.
>
>>
>> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
>> <dgossage at carouselchecks.com> wrote:
>> >
>> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert
>> > <lambert.olivier at gmail.com>
>> > wrote:
>> >>
>> >> Okay, used the exact same config you provided, and adding an
arbiter
>> >> node (node3)
>> >>
>> >> After halting node2, VM continues to work after a small
"lag"/freeze.
>> >> I restarted node2 and it was back online: OK
>> >>
>> >> Then, after waiting few minutes, halting node1. And **just**
at this
>> >> moment, the VM is corrupted (segmentation fault, /var/log
folder empty
>> >> etc.)
>> >>
>> > Other than waiting a few minutes did you make sure heals had
completed?
>> >
>> >>
>> >> dmesg of the VM:
>> >>
>> >> [ 1645.852905] EXT4-fs error (device xvda1):
>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash:
bad
>> >> entry in directory: rec_len is smaller than minimal -
offset=0(0),
>> >> inode=0, rec_len=0, name_len=0
>> >> [ 1645.854509] Aborting journal on device xvda1-8.
>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem
read-only
>> >>
>> >> And got a lot of " comm bash: bad entry in
directory" messages then...
>> >>
>> >> Here is the current config with all Node back online:
>> >>
>> >> # gluster volume info
>> >>
>> >> Volume Name: gv0
>> >> Type: Replicate
>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> >> Status: Started
>> >> Snapshot Count: 0
>> >> Number of Bricks: 1 x (2 + 1) = 3
>> >> Transport-type: tcp
>> >> Bricks:
>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> >> Options Reconfigured:
>> >> nfs.disable: on
>> >> performance.readdir-ahead: on
>> >> transport.address-family: inet
>> >> features.shard: on
>> >> features.shard-block-size: 16MB
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >>
>> >> # gluster volume status
>> >> Status of volume: gv0
>> >> Gluster process                             TCP Port  RDMA
Port  Online
>> >> Pid
>> >>
>> >>
>> >>
------------------------------------------------------------------------------
>> >> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0       
Y
>> >> 1331
>> >> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0       
Y
>> >> 2274
>> >> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0       
Y
>> >> 2355
>> >> Self-heal Daemon on localhost               N/A       N/A     
Y
>> >> 2300
>> >> Self-heal Daemon on 10.0.0.3                N/A       N/A     
Y
>> >> 10530
>> >> Self-heal Daemon on 10.0.0.2                N/A       N/A     
Y
>> >> 2425
>> >>
>> >> Task Status of Volume gv0
>> >>
>> >>
>> >>
------------------------------------------------------------------------------
>> >> There are no active volume tasks
>> >>
>> >>
>> >>
>> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>> >> <lambert.olivier at gmail.com> wrote:
>> >> > It's planned to have an arbiter soon :) It was just
preliminary
>> >> > tests.
>> >> >
>> >> > Thanks for the settings, I'll test this soon and
I'll come back to
>> >> > you!
>> >> >
>> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>> >> > <lindsay.mathieson at gmail.com> wrote:
>> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>> >> >>>
>> >> >>> gluster volume info gv0
>> >> >>>
>> >> >>> Volume Name: gv0
>> >> >>> Type: Replicate
>> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> >> >>> Status: Started
>> >> >>> Snapshot Count: 0
>> >> >>> Number of Bricks: 1 x 2 = 2
>> >> >>> Transport-type: tcp
>> >> >>> Bricks:
>> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> >>> Options Reconfigured:
>> >> >>> nfs.disable: on
>> >> >>> performance.readdir-ahead: on
>> >> >>> transport.address-family: inet
>> >> >>> features.shard: on
>> >> >>> features.shard-block-size: 16MB
>> >> >>
>> >> >>
>> >> >>
>> >> >> When hosting VM's its essential to set these
options:
>> >> >>
>> >> >> network.remote-dio: enable
>> >> >> cluster.eager-lock: enable
>> >> >> performance.io-cache: off
>> >> >> performance.read-ahead: off
>> >> >> performance.quick-read: off
>> >> >> performance.stat-prefetch: on
>> >> >> performance.strict-write-ordering: off
>> >> >> cluster.server-quorum-type: server
>> >> >> cluster.quorum-type: auto
>> >> >> cluster.data-self-heal: on
>> >> >>
>> >> >> Also with replica two and quorum on (required) your
volume will
>> >> >> become
>> >> >> read-only when one node goes down to prevent the
possibility of
>> >> >> split-brain
>> >> >> - you *really* want to avoid that :)
>> >> >>
>> >> >> I'd recommend a replica 3 volume, that way 1 node
can go down, but
>> >> >> the
>> >> >> other
>> >> >> two still form a quorum and will remain r/w.
>> >> >>
>> >> >> If the extra disks are not possible, then a Arbiter
volume can be
>> >> >> setup
>> >> >> -
>> >> >> basically dummy files on the third node.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lindsay Mathieson
>> >> >>
>> >> >> _______________________________________________
>> >> >> Gluster-users mailing list
>> >> >> Gluster-users at gluster.org
>> >> >> http://www.gluster.org/mailman/listinfo/gluster-users
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>> >
>> >
>
>

Olivier Lambert

2016-Nov-18 16:21 UTC

head link

[Gluster-users] corruption using gluster and iSCSI with LIO

After Node 1 is DOWN, LIO on Node2 (iSCSI target) is not writing
anymore in the local Gluster mount, but in the root partition.

Despite "df -h" shows the Gluster brick mounted:

/dev/mapper/centos-root   3,1G    3,1G   20K 100% /
...
/dev/xvdb                  61G     61G  956M  99% /bricks/brick1
localhost:/gv0             61G     61G  956M  99% /mnt

If I unmount it, I still see the "block.img" in /mnt which is filling
the root space. So it's like Fuse is messing with the local Gluster
mount, which could lead to the data corruption on the client level.

It doesn't make sense for me... What am I missing?

On Fri, Nov 18, 2016 at 5:00 PM, Olivier Lambert
<lambert.olivier at gmail.com> wrote:> Yes, I did it only if I have the previous result of heal info ("number
> of entries: 0"). But same result, as soon as the second Node is
> offline (after they were both working/back online), everything is
> corrupted.
>
> To recap:
>
> * Node 1 UP Node 2 UP -> OK
> * Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
> the path down and change if necessary)
> * Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
> in heal command)
> * Node 1 DOWN Node 2 UP -> NOT OK (data corruption)
>
> On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
> <dgossage at carouselchecks.com> wrote:
>> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert <lambert.olivier at
gmail.com>
>> wrote:
>>>
>>> Hi David,
>>>
>>> What are the exact commands to be sure it's fine?
>>>
>>> Right now I got:
>>>
>>> # gluster volume heal gv0 info
>>> Brick 10.0.0.1:/bricks/brick1/gv0
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Brick 10.0.0.2:/bricks/brick1/gv0
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Brick 10.0.0.3:/bricks/brick1/gv0
>>> Status: Connected
>>> Number of entries: 0
>>>
>>>
>> Did you run this before taking down 2nd node to see if any heals were
>> ongoing?
>>
>> Also I see you have sharding enabled.  Are your files being served
sharded
>> already as well?
>>
>>>
>>> Everything is online and working, but this command give a strange
output:
>>>
>>> # gluster volume heal gv0 info heal-failed
>>> Gathering list of heal failed entries on volume gv0 has been
>>> unsuccessful on bricks that are down. Please check if all brick
>>> processes are running.
>>>
>>> Is it normal?
>>
>>
>> I don't think that is a valid command anymore as whern I run it I
get same
>> message and this is in logs
>>  [2016-11-18 14:35:02.260503] I [MSGID: 106533]
>> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume]
0-management:
>> Received heal vol req for volume GLUSTER1
>> [2016-11-18 14:35:02.263341] W [MSGID: 106530]
>> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management:
Command
>> not supported. Please use "gluster volume heal GLUSTER1 info"
and logs to
>> find the heal information.
>> [2016-11-18 14:35:02.263365] E [MSGID: 106301]
>> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
>> operation 'Volume Heal' failed on localhost : Command not
supported. Please
>> use "gluster volume heal GLUSTER1 info" and logs to find the
heal
>> information.
>>
>>>
>>> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
>>> <dgossage at carouselchecks.com> wrote:
>>> >
>>> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert
>>> > <lambert.olivier at gmail.com>
>>> > wrote:
>>> >>
>>> >> Okay, used the exact same config you provided, and adding
an arbiter
>>> >> node (node3)
>>> >>
>>> >> After halting node2, VM continues to work after a small
"lag"/freeze.
>>> >> I restarted node2 and it was back online: OK
>>> >>
>>> >> Then, after waiting few minutes, halting node1. And
**just** at this
>>> >> moment, the VM is corrupted (segmentation fault, /var/log
folder empty
>>> >> etc.)
>>> >>
>>> > Other than waiting a few minutes did you make sure heals had
completed?
>>> >
>>> >>
>>> >> dmesg of the VM:
>>> >>
>>> >> [ 1645.852905] EXT4-fs error (device xvda1):
>>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm
bash: bad
>>> >> entry in directory: rec_len is smaller than minimal -
offset=0(0),
>>> >> inode=0, rec_len=0, name_len=0
>>> >> [ 1645.854509] Aborting journal on device xvda1-8.
>>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem
read-only
>>> >>
>>> >> And got a lot of " comm bash: bad entry in
directory" messages then...
>>> >>
>>> >> Here is the current config with all Node back online:
>>> >>
>>> >> # gluster volume info
>>> >>
>>> >> Volume Name: gv0
>>> >> Type: Replicate
>>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>>> >> Status: Started
>>> >> Snapshot Count: 0
>>> >> Number of Bricks: 1 x (2 + 1) = 3
>>> >> Transport-type: tcp
>>> >> Bricks:
>>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
>>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
>>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>>> >> Options Reconfigured:
>>> >> nfs.disable: on
>>> >> performance.readdir-ahead: on
>>> >> transport.address-family: inet
>>> >> features.shard: on
>>> >> features.shard-block-size: 16MB
>>> >> network.remote-dio: enable
>>> >> cluster.eager-lock: enable
>>> >> performance.io-cache: off
>>> >> performance.read-ahead: off
>>> >> performance.quick-read: off
>>> >> performance.stat-prefetch: on
>>> >> performance.strict-write-ordering: off
>>> >> cluster.server-quorum-type: server
>>> >> cluster.quorum-type: auto
>>> >> cluster.data-self-heal: on
>>> >>
>>> >>
>>> >> # gluster volume status
>>> >> Status of volume: gv0
>>> >> Gluster process                             TCP Port  RDMA
Port  Online
>>> >> Pid
>>> >>
>>> >>
>>> >>
------------------------------------------------------------------------------
>>> >> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0   
Y
>>> >> 1331
>>> >> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0   
Y
>>> >> 2274
>>> >> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0   
Y
>>> >> 2355
>>> >> Self-heal Daemon on localhost               N/A       N/A 
Y
>>> >> 2300
>>> >> Self-heal Daemon on 10.0.0.3                N/A       N/A 
Y
>>> >> 10530
>>> >> Self-heal Daemon on 10.0.0.2                N/A       N/A 
Y
>>> >> 2425
>>> >>
>>> >> Task Status of Volume gv0
>>> >>
>>> >>
>>> >>
------------------------------------------------------------------------------
>>> >> There are no active volume tasks
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>>> >> <lambert.olivier at gmail.com> wrote:
>>> >> > It's planned to have an arbiter soon :) It was
just preliminary
>>> >> > tests.
>>> >> >
>>> >> > Thanks for the settings, I'll test this soon and
I'll come back to
>>> >> > you!
>>> >> >
>>> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>>> >> > <lindsay.mathieson at gmail.com> wrote:
>>> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>>> >> >>>
>>> >> >>> gluster volume info gv0
>>> >> >>>
>>> >> >>> Volume Name: gv0
>>> >> >>> Type: Replicate
>>> >> >>> Volume ID:
2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>>> >> >>> Status: Started
>>> >> >>> Snapshot Count: 0
>>> >> >>> Number of Bricks: 1 x 2 = 2
>>> >> >>> Transport-type: tcp
>>> >> >>> Bricks:
>>> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>>> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>>> >> >>> Options Reconfigured:
>>> >> >>> nfs.disable: on
>>> >> >>> performance.readdir-ahead: on
>>> >> >>> transport.address-family: inet
>>> >> >>> features.shard: on
>>> >> >>> features.shard-block-size: 16MB
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> When hosting VM's its essential to set these
options:
>>> >> >>
>>> >> >> network.remote-dio: enable
>>> >> >> cluster.eager-lock: enable
>>> >> >> performance.io-cache: off
>>> >> >> performance.read-ahead: off
>>> >> >> performance.quick-read: off
>>> >> >> performance.stat-prefetch: on
>>> >> >> performance.strict-write-ordering: off
>>> >> >> cluster.server-quorum-type: server
>>> >> >> cluster.quorum-type: auto
>>> >> >> cluster.data-self-heal: on
>>> >> >>
>>> >> >> Also with replica two and quorum on (required)
your volume will
>>> >> >> become
>>> >> >> read-only when one node goes down to prevent the
possibility of
>>> >> >> split-brain
>>> >> >> - you *really* want to avoid that :)
>>> >> >>
>>> >> >> I'd recommend a replica 3 volume, that way 1
node can go down, but
>>> >> >> the
>>> >> >> other
>>> >> >> two still form a quorum and will remain r/w.
>>> >> >>
>>> >> >> If the extra disks are not possible, then a
Arbiter volume can be
>>> >> >> setup
>>> >> >> -
>>> >> >> basically dummy files on the third node.
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Lindsay Mathieson
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> Gluster-users mailing list
>>> >> >> Gluster-users at gluster.org
>>> >> >>
http://www.gluster.org/mailman/listinfo/gluster-users
>>> >> _______________________________________________
>>> >> Gluster-users mailing list
>>> >> Gluster-users at gluster.org
>>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>>> >
>>> >
>>
>>

Дмитрий Глушенок

2016-Dec-02 20:43 UTC

head link

[Gluster-users] corruption using gluster and iSCSI with LIO

Hi,

It can be that LIO service starts before /mnt gets mounted. In absence of
backend file LIO has created the new one on root filesystem (/mnt directory).
Then gluster volume was mounted over, but as backend file was kept open by LIO -
it still was used instead of the right one on gluster volume. Then, when you
turn off the first node - active path for iSCSI disk switches to the second node
(with empty file, placed on root filesystem).
> 18 ????. 2016 ?., ? 19:21, Olivier Lambert <lambert.olivier at
gmail.com> ???????(?):
> 
> After Node 1 is DOWN, LIO on Node2 (iSCSI target) is not writing
> anymore in the local Gluster mount, but in the root partition.
> 
> Despite "df -h" shows the Gluster brick mounted:
> 
> /dev/mapper/centos-root   3,1G    3,1G   20K 100% /
> ...
> /dev/xvdb                  61G     61G  956M  99% /bricks/brick1
> localhost:/gv0             61G     61G  956M  99% /mnt
> 
> If I unmount it, I still see the "block.img" in /mnt which is
filling
> the root space. So it's like Fuse is messing with the local Gluster
> mount, which could lead to the data corruption on the client level.
> 
> It doesn't make sense for me... What am I missing?
> 
> On Fri, Nov 18, 2016 at 5:00 PM, Olivier Lambert
> <lambert.olivier at gmail.com> wrote:
>> Yes, I did it only if I have the previous result of heal info
("number
>> of entries: 0"). But same result, as soon as the second Node is
>> offline (after they were both working/back online), everything is
>> corrupted.
>> 
>> To recap:
>> 
>> * Node 1 UP Node 2 UP -> OK
>> * Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
>> the path down and change if necessary)
>> * Node 1 UP Node 2 UP -> OK (and waiting to have no entries
displayed
>> in heal command)
>> * Node 1 DOWN Node 2 UP -> NOT OK (data corruption)
>> 
>> On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
>> <dgossage at carouselchecks.com> wrote:
>>> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert
<lambert.olivier at gmail.com>
>>> wrote:
>>>> 
>>>> Hi David,
>>>> 
>>>> What are the exact commands to be sure it's fine?
>>>> 
>>>> Right now I got:
>>>> 
>>>> # gluster volume heal gv0 info
>>>> Brick 10.0.0.1:/bricks/brick1/gv0
>>>> Status: Connected
>>>> Number of entries: 0
>>>> 
>>>> Brick 10.0.0.2:/bricks/brick1/gv0
>>>> Status: Connected
>>>> Number of entries: 0
>>>> 
>>>> Brick 10.0.0.3:/bricks/brick1/gv0
>>>> Status: Connected
>>>> Number of entries: 0
>>>> 
>>>> 
>>> Did you run this before taking down 2nd node to see if any heals
were
>>> ongoing?
>>> 
>>> Also I see you have sharding enabled.  Are your files being served
sharded
>>> already as well?
>>> 
>>>> 
>>>> Everything is online and working, but this command give a
strange output:
>>>> 
>>>> # gluster volume heal gv0 info heal-failed
>>>> Gathering list of heal failed entries on volume gv0 has been
>>>> unsuccessful on bricks that are down. Please check if all brick
>>>> processes are running.
>>>> 
>>>> Is it normal?
>>> 
>>> 
>>> I don't think that is a valid command anymore as whern I run it
I get same
>>> message and this is in logs
>>> [2016-11-18 14:35:02.260503] I [MSGID: 106533]
>>> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume]
0-management:
>>> Received heal vol req for volume GLUSTER1
>>> [2016-11-18 14:35:02.263341] W [MSGID: 106530]
>>> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management:
Command
>>> not supported. Please use "gluster volume heal GLUSTER1
info" and logs to
>>> find the heal information.
>>> [2016-11-18 14:35:02.263365] E [MSGID: 106301]
>>> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
>>> operation 'Volume Heal' failed on localhost : Command not
supported. Please
>>> use "gluster volume heal GLUSTER1 info" and logs to find
the heal
>>> information.
>>> 
>>>> 
>>>> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
>>>> <dgossage at carouselchecks.com> wrote:
>>>>> 
>>>>> On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert
>>>>> <lambert.olivier at gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Okay, used the exact same config you provided, and
adding an arbiter
>>>>>> node (node3)
>>>>>> 
>>>>>> After halting node2, VM continues to work after a small
"lag"/freeze.
>>>>>> I restarted node2 and it was back online: OK
>>>>>> 
>>>>>> Then, after waiting few minutes, halting node1. And
**just** at this
>>>>>> moment, the VM is corrupted (segmentation fault,
/var/log folder empty
>>>>>> etc.)
>>>>>> 
>>>>> Other than waiting a few minutes did you make sure heals
had completed?
>>>>> 
>>>>>> 
>>>>>> dmesg of the VM:
>>>>>> 
>>>>>> [ 1645.852905] EXT4-fs error (device xvda1):
>>>>>> htree_dirblock_to_tree:988: inode #19: block 8286: comm
bash: bad
>>>>>> entry in directory: rec_len is smaller than minimal -
offset=0(0),
>>>>>> inode=0, rec_len=0, name_len=0
>>>>>> [ 1645.854509] Aborting journal on device xvda1-8.
>>>>>> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem
read-only
>>>>>> 
>>>>>> And got a lot of " comm bash: bad entry in
directory" messages then...
>>>>>> 
>>>>>> Here is the current config with all Node back online:
>>>>>> 
>>>>>> # gluster volume info
>>>>>> 
>>>>>> Volume Name: gv0
>>>>>> Type: Replicate
>>>>>> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>>>>>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>>>>>> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>>>>>> Options Reconfigured:
>>>>>> nfs.disable: on
>>>>>> performance.readdir-ahead: on
>>>>>> transport.address-family: inet
>>>>>> features.shard: on
>>>>>> features.shard-block-size: 16MB
>>>>>> network.remote-dio: enable
>>>>>> cluster.eager-lock: enable
>>>>>> performance.io-cache: off
>>>>>> performance.read-ahead: off
>>>>>> performance.quick-read: off
>>>>>> performance.stat-prefetch: on
>>>>>> performance.strict-write-ordering: off
>>>>>> cluster.server-quorum-type: server
>>>>>> cluster.quorum-type: auto
>>>>>> cluster.data-self-heal: on
>>>>>> 
>>>>>> 
>>>>>> # gluster volume status
>>>>>> Status of volume: gv0
>>>>>> Gluster process                             TCP Port 
RDMA Port  Online
>>>>>> Pid
>>>>>> 
>>>>>> 
>>>>>>
------------------------------------------------------------------------------
>>>>>> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0
Y
>>>>>> 1331
>>>>>> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0
Y
>>>>>> 2274
>>>>>> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0
Y
>>>>>> 2355
>>>>>> Self-heal Daemon on localhost               N/A      
N/A        Y
>>>>>> 2300
>>>>>> Self-heal Daemon on 10.0.0.3                N/A      
N/A        Y
>>>>>> 10530
>>>>>> Self-heal Daemon on 10.0.0.2                N/A      
N/A        Y
>>>>>> 2425
>>>>>> 
>>>>>> Task Status of Volume gv0
>>>>>> 
>>>>>> 
>>>>>>
------------------------------------------------------------------------------
>>>>>> There are no active volume tasks
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>>>>>> <lambert.olivier at gmail.com> wrote:
>>>>>>> It's planned to have an arbiter soon :) It was
just preliminary
>>>>>>> tests.
>>>>>>> 
>>>>>>> Thanks for the settings, I'll test this soon
and I'll come back to
>>>>>>> you!
>>>>>>> 
>>>>>>> On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>>>>>>> <lindsay.mathieson at gmail.com> wrote:
>>>>>>>> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>>>>>>>>> 
>>>>>>>>> gluster volume info gv0
>>>>>>>>> 
>>>>>>>>> Volume Name: gv0
>>>>>>>>> Type: Replicate
>>>>>>>>> Volume ID:
2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>>>>>>>>> Status: Started
>>>>>>>>> Snapshot Count: 0
>>>>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>>>>> Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>>>>>>>>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>>>>>>>>> Options Reconfigured:
>>>>>>>>> nfs.disable: on
>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>> transport.address-family: inet
>>>>>>>>> features.shard: on
>>>>>>>>> features.shard-block-size: 16MB
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> When hosting VM's its essential to set
these options:
>>>>>>>> 
>>>>>>>> network.remote-dio: enable
>>>>>>>> cluster.eager-lock: enable
>>>>>>>> performance.io-cache: off
>>>>>>>> performance.read-ahead: off
>>>>>>>> performance.quick-read: off
>>>>>>>> performance.stat-prefetch: on
>>>>>>>> performance.strict-write-ordering: off
>>>>>>>> cluster.server-quorum-type: server
>>>>>>>> cluster.quorum-type: auto
>>>>>>>> cluster.data-self-heal: on
>>>>>>>> 
>>>>>>>> Also with replica two and quorum on (required)
your volume will
>>>>>>>> become
>>>>>>>> read-only when one node goes down to prevent
the possibility of
>>>>>>>> split-brain
>>>>>>>> - you *really* want to avoid that :)
>>>>>>>> 
>>>>>>>> I'd recommend a replica 3 volume, that way
1 node can go down, but
>>>>>>>> the
>>>>>>>> other
>>>>>>>> two still form a quorum and will remain r/w.
>>>>>>>> 
>>>>>>>> If the extra disks are not possible, then a
Arbiter volume can be
>>>>>>>> setup
>>>>>>>> -
>>>>>>>> basically dummy files on the third node.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Lindsay Mathieson
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> 
>>>>> 
>>> 
>>> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
--
??????? ????????
??????????? ????
+7-910-453-2568

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161202/26c05d37/attachment.html>

Gluster users - Dec 2016 - corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO

[Gluster-users] corruption using gluster and iSCSI with LIO