thr3ads.net - Gluster users - [Gluster-users] Gluster 3.8.10 rebalance VMs corruption [Mar 2017]

If this information is useful, please help other people find it:
Share via:

Mahdi Adnan

2017-Mar-27 17:59 UTC

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

Hi,


Do you guys have any update regarding this issue ?


--

Respectfully
Mahdi A. Mahdi

________________________________
From: Krutika Dhananjay <kdhananj at redhat.com>
Sent: Tuesday, March 21, 2017 3:02:55 PM
To: Mahdi Adnan
Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; gluster-users at
gluster.org List
Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

Hi,

So it looks like Satheesaran managed to recreate this issue. We will be seeking
his help in debugging this. It will be easier that way.

-Krutika

On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at
outlook.com<mailto:mahdi.adnan at outlook.com>> wrote:

Hello and thank you for your email.
Actually no, i didn't check the gfid of the vms.
If this will help, i can setup a new test cluster and get all the data you need.

Get Outlook for Android<https://aka.ms/ghei36>


From: Nithya Balachandran
Sent: Monday, March 20, 20:57
Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
To: Krutika Dhananjay
Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, gluster-users at
gluster.org<mailto:gluster-users at gluster.org> List

Hi,

Do you know the GFIDs of the VM images which were corrupted?

Regards,

Nithya

On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at
redhat.com<mailto:kdhananj at redhat.com>> wrote:

I looked at the logs.
>From the time the new graph (since the add-brick command you shared where
bricks 41 through 44 are added) is switched to (line 3011 onwards in
nfs-gfapi.log), I see the following kinds of errors:
1. Lookups to a bunch of files failed with ENOENT on both replicas which
protocol/client converts to ESTALE. I am guessing these entries got migrated to

other subvolumes leading to 'No such file or directory' errors.

DHT and thereafter shard get the same error code and log the following:

 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht:
<gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to lookup the file
on vmware2-dht [Stale file handle]
  1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
[shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed:
a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]

which is fine.

2. The other kind are from AFR logging of possible split-brain which I suppose
are harmless too.
[2017-03-17 14:23:36.968883] W [MSGID: 108008] [afr-read-txn.c:228:afr_read_txn]
17-vmware2-replicate-13: Unreadable subvolume -1 found with event generation 2
for gfid     74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)

Since you are saying the bug is hit only on VMs that are undergoing IO while
rebalance is running (as opposed to those that remained powered off),

rebalance + IO could be causing some issues.

CC'ing DHT devs

Raghavendra/Nithya/Susant,

Could you take a look?

-Krutika



On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at
outlook.com<mailto:mahdi.adnan at outlook.com>> wrote:

Thank you for your email mate.

Yes, im aware of this but, to save costs i chose replica 2, this cluster is all
flash.

In version 3.7.x i had issues with ping timeout, if one hosts went down for few
seconds the whole cluster hangs and become unavailable, to avoid this i adjusted
the ping timeout to 5 seconds.

As for choosing Ganesha over gfapi, VMWare does not support Gluster (FUSE or
gfapi) im stuck with NFS for this volume.

The other volume is mounted using gfapi in oVirt cluster.




--

Respectfully
Mahdi A. Mahdi

From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at
redhat.com>>
Sent: Sunday, March 19, 2017 2:01:49 PM

To: Mahdi Adnan
Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption



While I'm still going through the logs, just wanted to point out a couple of
things:

1. It is recommended that you use 3-way replication (replica count 3) for VM
store use case

2. network.ping-timeout at 5 seconds is way too low. Please change it to 30.

Is there any specific reason for using NFS-Ganesha over gfapi/FUSE?

Will get back with anything else I might find or more questions if I have any.

-Krutika

On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at
outlook.com<mailto:mahdi.adnan at outlook.com>> wrote:

Thanks mate,

Kindly, check the attachment.


--

Respectfully
Mahdi A. Mahdi

From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at
redhat.com>>
Sent: Sunday, March 19, 2017 10:00:22 AM

To: Mahdi Adnan
Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption



In that case could you share the ganesha-gfapi logs?

-Krutika

On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at
outlook.com<mailto:mahdi.adnan at outlook.com>> wrote:

I have two volumes, one is mounted using libgfapi for ovirt mount, the other one
is exported via NFS-Ganesha for VMWare which is the one im testing now.


--

Respectfully
Mahdi A. Mahdi

From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at
redhat.com>>
Sent: Sunday, March 19, 2017 8:02:19 AM

To: Mahdi Adnan
Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption




On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at
outlook.com<mailto:mahdi.adnan at outlook.com>> wrote:

Kindly, check the attached new log file, i dont know if it's helpful or not
but, i couldn't find the log with the name you just described.

No. Are you using FUSE or libgfapi for accessing the volume? Or is it NFS?



-Krutika

--

Respectfully
Mahdi A. Mahdi

From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at
redhat.com>>
Sent: Saturday, March 18, 2017 6:10:40 PM

To: Mahdi Adnan
Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption



mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse mount
logs? It should be right under /var/log/glusterfs/ directory

named after the mount point name, only hyphenated.

-Krutika

On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at
outlook.com<mailto:mahdi.adnan at outlook.com>> wrote:

Hello Krutika,

Kindly, check the attached logs.


--

Respectfully
Mahdi A. Mahdi

From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at
redhat.com>>

Sent: Saturday, March 18, 2017 3:29:03 PM
To: Mahdi Adnan
Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption



Hi Mahdi,

Could you attach mount, brick and rebalance logs?

-Krutika

On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at
outlook.com<mailto:mahdi.adnan at outlook.com>> wrote:

Hi,

I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure in a
volume contains few VMs.

After the completion of rebalance, i have rebooted the VMs, some of ran just
fine, and others just crashed.

Windows boot to recovery mode and Linux throw xfs errors and does not boot.

I ran the test again and it happened just as the first one, but i have noticed
only VMs doing disk IOs are affected by this bug.

The VMs in power off mode started fine and even md5 of the disk file did not
change after the rebalance.

anyone else can confirm this ?


Volume info:



Volume Name: vmware2

Type: Distributed-Replicate

Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf

Status: Started

Snapshot Count: 0

Number of Bricks: 22 x 2 = 44

Transport-type: tcp

Bricks:

Brick1: gluster01:/mnt/disk1/vmware2

Brick2: gluster03:/mnt/disk1/vmware2

Brick3: gluster02:/mnt/disk1/vmware2

Brick4: gluster04:/mnt/disk1/vmware2

Brick5: gluster01:/mnt/disk2/vmware2

Brick6: gluster03:/mnt/disk2/vmware2

Brick7: gluster02:/mnt/disk2/vmware2

Brick8: gluster04:/mnt/disk2/vmware2

Brick9: gluster01:/mnt/disk3/vmware2

Brick10: gluster03:/mnt/disk3/vmware2

Brick11: gluster02:/mnt/disk3/vmware2

Brick12: gluster04:/mnt/disk3/vmware2

Brick13: gluster01:/mnt/disk4/vmware2

Brick14: gluster03:/mnt/disk4/vmware2

Brick15: gluster02:/mnt/disk4/vmware2

Brick16: gluster04:/mnt/disk4/vmware2

Brick17: gluster01:/mnt/disk5/vmware2

Brick18: gluster03:/mnt/disk5/vmware2

Brick19: gluster02:/mnt/disk5/vmware2

Brick20: gluster04:/mnt/disk5/vmware2

Brick21: gluster01:/mnt/disk6/vmware2

Brick22: gluster03:/mnt/disk6/vmware2

Brick23: gluster02:/mnt/disk6/vmware2

Brick24: gluster04:/mnt/disk6/vmware2

Brick25: gluster01:/mnt/disk7/vmware2

Brick26: gluster03:/mnt/disk7/vmware2

Brick27: gluster02:/mnt/disk7/vmware2

Brick28: gluster04:/mnt/disk7/vmware2

Brick29: gluster01:/mnt/disk8/vmware2

Brick30: gluster03:/mnt/disk8/vmware2

Brick31: gluster02:/mnt/disk8/vmware2

Brick32: gluster04:/mnt/disk8/vmware2

Brick33: gluster01:/mnt/disk9/vmware2

Brick34: gluster03:/mnt/disk9/vmware2

Brick35: gluster02:/mnt/disk9/vmware2

Brick36: gluster04:/mnt/disk9/vmware2

Brick37: gluster01:/mnt/disk10/vmware2

Brick38: gluster03:/mnt/disk10/vmware2

Brick39: gluster02:/mnt/disk10/vmware2

Brick40: gluster04:/mnt/disk10/vmware2

Brick41: gluster01:/mnt/disk11/vmware2

Brick42: gluster03:/mnt/disk11/vmware2

Brick43: gluster02:/mnt/disk11/vmware2

Brick44: gluster04:/mnt/disk11/vmware2

Options Reconfigured:

cluster.server-quorum-type: server

nfs.disable: on

performance.readdir-ahead: on

transport.address-family: inet

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

cluster.eager-lock: enable

network.remote-dio: enable

features.shard: on

cluster.data-self-heal-algorithm: full

features.cache-invalidation: on

ganesha.enable: on

features.shard-block-size: 256MB

client.event-threads: 2

server.event-threads: 2

cluster.favorite-child-policy: size

storage.build-pgfid: off

network.ping-timeout: 5

cluster.enable-shared-storage: enable

nfs-ganesha: enable

cluster.server-quorum-ratio: 51%


Adding bricks:

gluster volume add-brick vmware2 replica 2 gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2


starting fix layout:

gluster volume rebalance vmware2 fix-layout start

Starting rebalance:

gluster volume rebalance vmware2  start



--

Respectfully
Mahdi A. Mahdi

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://lists.gluster.org/mailman/listinfo/gluster-users









-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170327/ca1aff39/attachment.html>

Joe Julian

2017-Mar-28 14:19 UTC

head link

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

Based on what I know of the workflow, there is no update. There is no 
bug report in bugzilla so there are no patches in review for it.


On 03/27/2017 10:59 AM, Mahdi Adnan wrote:>
> Hi,
>
>
> Do you guys have any update regarding this issue ?
>
>
>
> -- 
>
> Respectfully*
> **Mahdi A. Mahdi*
>
> ------------------------------------------------------------------------
> *From:* Krutika Dhananjay <kdhananj at redhat.com>
> *Sent:* Tuesday, March 21, 2017 3:02:55 PM
> *To:* Mahdi Adnan
> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; 
> gluster-users at gluster.org List
> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
> Hi,
>
> So it looks like Satheesaran managed to recreate this issue. We will 
> be seeking his help in debugging this. It will be easier that way.
>
> -Krutika
>
> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at outlook.com
> <mailto:mahdi.adnan at outlook.com>> wrote:
>
>     Hello and thank you for your email.
>     Actually no, i didn't check the gfid of the vms.
>     If this will help, i can setup a new test cluster and get all the
>     data you need.
>
>     Get Outlook for Android <https://aka.ms/ghei36>
>
>
>     From: Nithya Balachandran
>     Sent: Monday, March 20, 20:57
>     Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>     To: Krutika Dhananjay
>     Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
>     gluster-users at gluster.org <mailto:gluster-users at
gluster.org> List
>
>     Hi,
>
>     Do you know the GFIDs of the VM images which were corrupted?
>
>     Regards,
>
>     Nithya
>
>     On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com
>     <mailto:kdhananj at redhat.com>> wrote:
>
>>     I looked at the logs.
>>
>>     From the time the new graph (since the add-brick command you
>>     shared where bricks 41 through 44 are added) is switched to (line
>>     3011 onwards in nfs-gfapi.log), I see the following kinds of
errors:
>>
>>     1. Lookups to a bunch of files failed with ENOENT on both
>>     replicas which protocol/client converts to ESTALE. I am guessing
>>     these entries got migrated to
>>
>>     other subvolumes leading to 'No such file or directory'
errors.
>>
>>     DHT and thereafter shard get the same error code and log the
>>     following:
>>
>>      0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
>>     [dht-helper.c:1198:dht_migration_complete_check_task]
>>     17-vmware2-dht: <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>:
>>     failed     to lookup the file on vmware2-dht [Stale file handle]
>>       1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
>>     [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat
>>     failed: a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]
>>
>>     which is fine.
>>
>>     2. The other kind are from AFR logging of possible split-brain
>>     which I suppose are harmless too.
>>     [2017-03-17 14:23:36.968883] W [MSGID: 108008]
>>     [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13:
>>     Unreadable subvolume -1 found with event generation 2 for gfid
>>     74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)
>>
>>     Since you are saying the bug is hit only on VMs that are
>>     undergoing IO while rebalance is running (as opposed to those
>>     that remained powered off),
>>
>>     rebalance + IO could be causing some issues.
>>
>>     CC'ing DHT devs
>>
>>     Raghavendra/Nithya/Susant,
>>
>>     Could you take a look?
>>
>>     -Krutika
>>
>>
>>     On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan
>>     <mahdi.adnan at outlook.com <mailto:mahdi.adnan at
outlook.com>> wrote:
>>
>>>     Thank you for your email mate.
>>>
>>>     Yes, im aware of this but, to save costs i chose replica 2,
this
>>>     cluster is all flash.
>>>
>>>     In version 3.7.x i had issues with ping timeout, if one hosts
>>>     went down for few seconds the whole cluster hangs and become
>>>     unavailable, to avoid this i adjusted the ping timeout to 5
seconds.
>>>
>>>     As for choosing Ganesha over gfapi, VMWare does not support
>>>     Gluster (FUSE or gfapi) im stuck with NFS for this volume.
>>>
>>>     The other volume is mounted using gfapi in oVirt cluster.
>>>
>>>
>>>
>>>     -- 
>>>
>>>     Respectfully
>>>     *Mahdi A. Mahdi*
>>>
>>>     *From:*Krutika Dhananjay <kdhananj at redhat.com
>>>     <mailto:kdhananj at redhat.com>>
>>>     *Sent:*Sunday, March 19, 2017 2:01:49 PM
>>>
>>>     *To:* Mahdi Adnan
>>>     *Cc:* gluster-users at gluster.org <mailto:gluster-users at
gluster.org>
>>>     *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
>>>     corruption
>>>
>>>     While I'm still going through the logs, just wanted to
point out
>>>     a couple of things:
>>>
>>>     1. It is recommended that you use 3-way replication (replica
>>>     count 3) for VM store use case
>>>
>>>     2. network.ping-timeout at 5 seconds is way too low. Please
>>>     change it to 30.
>>>
>>>     Is there any specific reason for using NFS-Ganesha over
gfapi/FUSE?
>>>
>>>     Will get back with anything else I might find or more questions
>>>     if I have any.
>>>
>>>     -Krutika
>>>
>>>     On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan
>>>     <mahdi.adnan at outlook.com <mailto:mahdi.adnan at
outlook.com>> wrote:
>>>
>>>>     Thanks mate,
>>>>
>>>>     Kindly, check the attachment.
>>>>
>>>>     -- 
>>>>
>>>>     Respectfully
>>>>     *Mahdi A. Mahdi*
>>>>
>>>>     *From:*Krutika Dhananjay <kdhananj at redhat.com
>>>>     <mailto:kdhananj at redhat.com>>
>>>>     *Sent:*Sunday, March 19, 2017 10:00:22 AM
>>>>
>>>>     *To:* Mahdi Adnan
>>>>     *Cc:* gluster-users at gluster.org <mailto:gluster-users
at gluster.org>
>>>>     *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
>>>>     corruption
>>>>
>>>>     In that case could you share the ganesha-gfapi logs?
>>>>
>>>>     -Krutika
>>>>
>>>>     On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan
>>>>     <mahdi.adnan at outlook.com <mailto:mahdi.adnan at
outlook.com>> wrote:
>>>>
>>>>>     I have two volumes, one is mounted using libgfapi for
ovirt
>>>>>     mount, the other one is exported via NFS-Ganesha for
VMWare
>>>>>     which is the one im testing now.
>>>>>
>>>>>     -- 
>>>>>
>>>>>     Respectfully
>>>>>     *Mahdi A. Mahdi*
>>>>>
>>>>>     *From:*Krutika Dhananjay <kdhananj at redhat.com
>>>>>     <mailto:kdhananj at redhat.com>>
>>>>>     *Sent:*Sunday, March 19, 2017 8:02:19 AM
>>>>>
>>>>>     *To:* Mahdi Adnan
>>>>>     *Cc:* gluster-users at gluster.org
<mailto:gluster-users at gluster.org>
>>>>>     *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance
VMs
>>>>>     corruption
>>>>>
>>>>>
>>>>>
>>>>>     On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan
>>>>>     <mahdi.adnan at outlook.com <mailto:mahdi.adnan
at outlook.com>> wrote:
>>>>>
>>>>>>     Kindly, check the attached new log file, i dont
know if it's
>>>>>>     helpful or not but, i couldn't find the log
with the name you
>>>>>>     just described.
>>>>>>
>>>>>
>>>>>     No. Are you using FUSE or libgfapi for accessing the
volume?
>>>>>     Or is it NFS?
>>>>>
>>>>>     -Krutika
>>>>>
>>>>>>     -- 
>>>>>>
>>>>>>     Respectfully
>>>>>>     *Mahdi A. Mahdi*
>>>>>>
>>>>>>     *From:*Krutika Dhananjay <kdhananj at redhat.com
>>>>>>     <mailto:kdhananj at redhat.com>>
>>>>>>     *Sent:*Saturday, March 18, 2017 6:10:40 PM
>>>>>>
>>>>>>     *To:* Mahdi Adnan
>>>>>>     *Cc:* gluster-users at gluster.org
>>>>>>     <mailto:gluster-users at gluster.org>
>>>>>>     *Subject:* Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>>>>>>     corruption
>>>>>>
>>>>>>     mnt-disk11-vmware2.log seems like a brick log.
Could you
>>>>>>     attach the fuse mount logs? It should be right
under
>>>>>>     /var/log/glusterfs/ directory
>>>>>>
>>>>>>     named after the mount point name, only hyphenated.
>>>>>>
>>>>>>     -Krutika
>>>>>>
>>>>>>     On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan
>>>>>>     <mahdi.adnan at outlook.com
<mailto:mahdi.adnan at outlook.com>> wrote:
>>>>>>
>>>>>>>     Hello Krutika,
>>>>>>>
>>>>>>>     Kindly, check the attached logs.
>>>>>>>
>>>>>>>     -- 
>>>>>>>
>>>>>>>     Respectfully
>>>>>>>     *Mahdi A. Mahdi*
>>>>>>>
>>>>>>>     *From:*Krutika Dhananjay <kdhananj at
redhat.com
>>>>>>>     <mailto:kdhananj at redhat.com>>
>>>>>>>
>>>>>>>
>>>>>>>     *Sent:*Saturday, March 18, 2017 3:29:03 PM
>>>>>>>     *To:*Mahdi Adnan
>>>>>>>     *Cc:*gluster-users at gluster.org
>>>>>>>     <mailto:gluster-users at gluster.org>
>>>>>>>     *Subject:*Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>>>>>>>     corruption
>>>>>>>
>>>>>>>     Hi Mahdi,
>>>>>>>
>>>>>>>     Could you attach mount, brick and rebalance
logs?
>>>>>>>
>>>>>>>     -Krutika
>>>>>>>
>>>>>>>     On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan
>>>>>>>     <mahdi.adnan at outlook.com
<mailto:mahdi.adnan at outlook.com>>
>>>>>>>     wrote:
>>>>>>>
>>>>>>>>     Hi,
>>>>>>>>
>>>>>>>>     I have upgraded to Gluster 3.8.10 today and
ran the
>>>>>>>>     add-brick procedure in a volume contains
few VMs.
>>>>>>>>
>>>>>>>>     After the completion of rebalance, i have
rebooted the VMs,
>>>>>>>>     some of ran just fine, and others just
crashed.
>>>>>>>>
>>>>>>>>     Windows boot to recovery mode and Linux
throw xfs errors
>>>>>>>>     and does not boot.
>>>>>>>>
>>>>>>>>     I ran the test again and it happened just
as the first one,
>>>>>>>>     but i have noticed only VMs doing disk IOs
are affected by
>>>>>>>>     this bug.
>>>>>>>>
>>>>>>>>     The VMs in power off mode started fine and
even md5 of the
>>>>>>>>     disk file did not change after the
rebalance.
>>>>>>>>
>>>>>>>>     anyone else can confirm this ?
>>>>>>>>
>>>>>>>>     Volume info:
>>>>>>>>
>>>>>>>>     Volume Name: vmware2
>>>>>>>>
>>>>>>>>     Type: Distributed-Replicate
>>>>>>>>
>>>>>>>>     Volume ID:
02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>>>>>>
>>>>>>>>     Status: Started
>>>>>>>>
>>>>>>>>     Snapshot Count: 0
>>>>>>>>
>>>>>>>>     Number of Bricks: 22 x 2 = 44
>>>>>>>>
>>>>>>>>     Transport-type: tcp
>>>>>>>>
>>>>>>>>     Bricks:
>>>>>>>>
>>>>>>>>     Brick1: gluster01:/mnt/disk1/vmware2
>>>>>>>>
>>>>>>>>     Brick2: gluster03:/mnt/disk1/vmware2
>>>>>>>>
>>>>>>>>     Brick3: gluster02:/mnt/disk1/vmware2
>>>>>>>>
>>>>>>>>     Brick4: gluster04:/mnt/disk1/vmware2
>>>>>>>>
>>>>>>>>     Brick5: gluster01:/mnt/disk2/vmware2
>>>>>>>>
>>>>>>>>     Brick6: gluster03:/mnt/disk2/vmware2
>>>>>>>>
>>>>>>>>     Brick7: gluster02:/mnt/disk2/vmware2
>>>>>>>>
>>>>>>>>     Brick8: gluster04:/mnt/disk2/vmware2
>>>>>>>>
>>>>>>>>     Brick9: gluster01:/mnt/disk3/vmware2
>>>>>>>>
>>>>>>>>     Brick10: gluster03:/mnt/disk3/vmware2
>>>>>>>>
>>>>>>>>     Brick11: gluster02:/mnt/disk3/vmware2
>>>>>>>>
>>>>>>>>     Brick12: gluster04:/mnt/disk3/vmware2
>>>>>>>>
>>>>>>>>     Brick13: gluster01:/mnt/disk4/vmware2
>>>>>>>>
>>>>>>>>     Brick14: gluster03:/mnt/disk4/vmware2
>>>>>>>>
>>>>>>>>     Brick15: gluster02:/mnt/disk4/vmware2
>>>>>>>>
>>>>>>>>     Brick16: gluster04:/mnt/disk4/vmware2
>>>>>>>>
>>>>>>>>     Brick17: gluster01:/mnt/disk5/vmware2
>>>>>>>>
>>>>>>>>     Brick18: gluster03:/mnt/disk5/vmware2
>>>>>>>>
>>>>>>>>     Brick19: gluster02:/mnt/disk5/vmware2
>>>>>>>>
>>>>>>>>     Brick20: gluster04:/mnt/disk5/vmware2
>>>>>>>>
>>>>>>>>     Brick21: gluster01:/mnt/disk6/vmware2
>>>>>>>>
>>>>>>>>     Brick22: gluster03:/mnt/disk6/vmware2
>>>>>>>>
>>>>>>>>     Brick23: gluster02:/mnt/disk6/vmware2
>>>>>>>>
>>>>>>>>     Brick24: gluster04:/mnt/disk6/vmware2
>>>>>>>>
>>>>>>>>     Brick25: gluster01:/mnt/disk7/vmware2
>>>>>>>>
>>>>>>>>     Brick26: gluster03:/mnt/disk7/vmware2
>>>>>>>>
>>>>>>>>     Brick27: gluster02:/mnt/disk7/vmware2
>>>>>>>>
>>>>>>>>     Brick28: gluster04:/mnt/disk7/vmware2
>>>>>>>>
>>>>>>>>     Brick29: gluster01:/mnt/disk8/vmware2
>>>>>>>>
>>>>>>>>     Brick30: gluster03:/mnt/disk8/vmware2
>>>>>>>>
>>>>>>>>     Brick31: gluster02:/mnt/disk8/vmware2
>>>>>>>>
>>>>>>>>     Brick32: gluster04:/mnt/disk8/vmware2
>>>>>>>>
>>>>>>>>     Brick33: gluster01:/mnt/disk9/vmware2
>>>>>>>>
>>>>>>>>     Brick34: gluster03:/mnt/disk9/vmware2
>>>>>>>>
>>>>>>>>     Brick35: gluster02:/mnt/disk9/vmware2
>>>>>>>>
>>>>>>>>     Brick36: gluster04:/mnt/disk9/vmware2
>>>>>>>>
>>>>>>>>     Brick37: gluster01:/mnt/disk10/vmware2
>>>>>>>>
>>>>>>>>     Brick38: gluster03:/mnt/disk10/vmware2
>>>>>>>>
>>>>>>>>     Brick39: gluster02:/mnt/disk10/vmware2
>>>>>>>>
>>>>>>>>     Brick40: gluster04:/mnt/disk10/vmware2
>>>>>>>>
>>>>>>>>     Brick41: gluster01:/mnt/disk11/vmware2
>>>>>>>>
>>>>>>>>     Brick42: gluster03:/mnt/disk11/vmware2
>>>>>>>>
>>>>>>>>     Brick43: gluster02:/mnt/disk11/vmware2
>>>>>>>>
>>>>>>>>     Brick44: gluster04:/mnt/disk11/vmware2
>>>>>>>>
>>>>>>>>     Options Reconfigured:
>>>>>>>>
>>>>>>>>     cluster.server-quorum-type: server
>>>>>>>>
>>>>>>>>     nfs.disable: on
>>>>>>>>
>>>>>>>>     performance.readdir-ahead: on
>>>>>>>>
>>>>>>>>     transport.address-family: inet
>>>>>>>>
>>>>>>>>     performance.quick-read: off
>>>>>>>>
>>>>>>>>     performance.read-ahead: off
>>>>>>>>
>>>>>>>>     performance.io-cache: off
>>>>>>>>
>>>>>>>>     performance.stat-prefetch: off
>>>>>>>>
>>>>>>>>     cluster.eager-lock: enable
>>>>>>>>
>>>>>>>>     network.remote-dio: enable
>>>>>>>>
>>>>>>>>     features.shard: on
>>>>>>>>
>>>>>>>>     cluster.data-self-heal-algorithm: full
>>>>>>>>
>>>>>>>>     features.cache-invalidation: on
>>>>>>>>
>>>>>>>>     ganesha.enable: on
>>>>>>>>
>>>>>>>>     features.shard-block-size: 256MB
>>>>>>>>
>>>>>>>>     client.event-threads: 2
>>>>>>>>
>>>>>>>>     server.event-threads: 2
>>>>>>>>
>>>>>>>>     cluster.favorite-child-policy: size
>>>>>>>>
>>>>>>>>     storage.build-pgfid: off
>>>>>>>>
>>>>>>>>     network.ping-timeout: 5
>>>>>>>>
>>>>>>>>     cluster.enable-shared-storage: enable
>>>>>>>>
>>>>>>>>     nfs-ganesha: enable
>>>>>>>>
>>>>>>>>     cluster.server-quorum-ratio: 51%
>>>>>>>>
>>>>>>>>     Adding bricks:
>>>>>>>>
>>>>>>>>     gluster volume add-brick vmware2 replica 2
>>>>>>>>     gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2
>>>>>>>>     gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2
>>>>>>>>
>>>>>>>>     starting fix layout:
>>>>>>>>
>>>>>>>>     gluster volume rebalance vmware2 fix-layout
start
>>>>>>>>
>>>>>>>>     Starting rebalance:
>>>>>>>>
>>>>>>>>     gluster volume rebalance vmware2  start
>>>>>>>>
>>>>>>>>
>>>>>>>>     -- 
>>>>>>>>
>>>>>>>>     Respectfully
>>>>>>>>     *Mahdi A. Mahdi*
>>>>>>>>
>>>>>>>>    
_______________________________________________
>>>>>>>>     Gluster-users mailing list
>>>>>>>>     Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>    
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>    
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170328/c1a44146/attachment.html>

Pranith Kumar Karampuri

2017-Mar-28 18:19 UTC

head link

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:
> Hi,
>
>
> Do you guys have any update regarding this issue ?
>I do not actively work on this issue so I do not have an accurate update,
but from what I heard from Krutika and Raghavendra(works on DHT) is:
Krutika debugged initially and found that the issue seems more likely to be
in DHT, Satheesaran who helped us recreate this issue in lab found that
just fix-layout without rebalance also caused the corruption 1 out of 3
times. Raghavendra came up with a possible RCA for why this can happen.
Raghavendra(CCed) would be the right person to provide accurate update.
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------------------------------
> *From:* Krutika Dhananjay <kdhananj at redhat.com>
> *Sent:* Tuesday, March 21, 2017 3:02:55 PM
> *To:* Mahdi Adnan
> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
> gluster-users at gluster.org List
>
> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>
> Hi,
>
> So it looks like Satheesaran managed to recreate this issue. We will be
> seeking his help in debugging this. It will be easier that way.
>
> -Krutika
>
> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
> wrote:
>
>> Hello and thank you for your email.
>> Actually no, i didn't check the gfid of the vms.
>> If this will help, i can setup a new test cluster and get all the data
>> you need.
>>
>> Get Outlook for Android <https://aka.ms/ghei36>
>>
>> From: Nithya Balachandran
>> Sent: Monday, March 20, 20:57
>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>> To: Krutika Dhananjay
>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
>> gluster-users at gluster.org List
>>
>> Hi,
>>
>> Do you know the GFIDs of the VM images which were corrupted?
>>
>> Regards,
>>
>> Nithya
>>
>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at
redhat.com> wrote:
>>
>> I looked at the logs.
>>
>> From the time the new graph (since the add-brick command you shared
where
>> bricks 41 through 44 are added) is switched to (line 3011 onwards in
>> nfs-gfapi.log), I see the following kinds of errors:
>>
>> 1. Lookups to a bunch of files failed with ENOENT on both replicas
which
>> protocol/client converts to ESTALE. I am guessing these entries got
>> migrated to
>>
>> other subvolumes leading to 'No such file or directory' errors.
>>
>> DHT and thereafter shard get the same error code and log the following:
>>
>>  0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
>> [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht:
>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to lookup
the
>> file on vmware2-dht [Stale file handle]
>>
>>
>>   1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed:
>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]
>>
>> which is fine.
>>
>> 2. The other kind are from AFR logging of possible split-brain which I
>> suppose are harmless too.
>> [2017-03-17 14:23:36.968883] W [MSGID: 108008]
>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable
>> subvolume -1 found with event generation 2 for gfid
>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)
>>
>> Since you are saying the bug is hit only on VMs that are undergoing IO
>> while rebalance is running (as opposed to those that remained powered
off),
>>
>> rebalance + IO could be causing some issues.
>>
>> CC'ing DHT devs
>>
>> Raghavendra/Nithya/Susant,
>>
>> Could you take a look?
>>
>> -Krutika
>>
>>
>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> Thank you for your email mate.
>>
>> Yes, im aware of this but, to save costs i chose replica 2, this
cluster
>> is all flash.
>>
>> In version 3.7.x i had issues with ping timeout, if one hosts went down
>> for few seconds the whole cluster hangs and become unavailable, to
avoid
>> this i adjusted the ping timeout to 5 seconds.
>>
>> As for choosing Ganesha over gfapi, VMWare does not support Gluster
(FUSE
>> or gfapi) im stuck with NFS for this volume.
>>
>> The other volume is mounted using gfapi in oVirt cluster.
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Sunday, March 19, 2017 2:01:49 PM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users at gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>>
>>
>> While I'm still going through the logs, just wanted to point out a
couple
>> of things:
>>
>> 1. It is recommended that you use 3-way replication (replica count 3)
for
>> VM store use case
>>
>> 2. network.ping-timeout at 5 seconds is way too low. Please change it
to
>> 30.
>>
>> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE?
>>
>> Will get back with anything else I might find or more questions if I
have
>> any.
>>
>> -Krutika
>>
>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> Thanks mate,
>>
>> Kindly, check the attachment.
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Sunday, March 19, 2017 10:00:22 AM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users at gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>>
>>
>> In that case could you share the ganesha-gfapi logs?
>>
>> -Krutika
>>
>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> I have two volumes, one is mounted using libgfapi for ovirt mount, the
>> other one is exported via NFS-Ganesha for VMWare which is the one im
>> testing now.
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Sunday, March 19, 2017 8:02:19 AM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users at gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>>
>>
>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> Kindly, check the attached new log file, i dont know if it's
helpful or
>> not but, i couldn't find the log with the name you just described.
>>
>>
>> No. Are you using FUSE or libgfapi for accessing the volume? Or is it
NFS?
>>
>>
>>
>> -Krutika
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Saturday, March 18, 2017 6:10:40 PM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users at gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>>
>>
>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the
fuse
>> mount logs? It should be right under /var/log/glusterfs/ directory
>>
>> named after the mount point name, only hyphenated.
>>
>> -Krutika
>>
>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> Hello Krutika,
>>
>> Kindly, check the attached logs.
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>
>> *Sent:* Saturday, March 18, 2017 3:29:03 PM
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users at gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>>
>>
>> Hi Mahdi,
>>
>> Could you attach mount, brick and rebalance logs?
>>
>> -Krutika
>>
>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> Hi,
>>
>> I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure
>> in a volume contains few VMs.
>>
>> After the completion of rebalance, i have rebooted the VMs, some of ran
>> just fine, and others just crashed.
>>
>> Windows boot to recovery mode and Linux throw xfs errors and does not
>> boot.
>>
>> I ran the test again and it happened just as the first one, but i have
>> noticed only VMs doing disk IOs are affected by this bug.
>>
>> The VMs in power off mode started fine and even md5 of the disk file
did
>> not change after the rebalance.
>>
>> anyone else can confirm this ?
>>
>> Volume info:
>>
>>
>>
>> Volume Name: vmware2
>>
>> Type: Distributed-Replicate
>>
>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 22 x 2 = 44
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: gluster01:/mnt/disk1/vmware2
>>
>> Brick2: gluster03:/mnt/disk1/vmware2
>>
>> Brick3: gluster02:/mnt/disk1/vmware2
>>
>> Brick4: gluster04:/mnt/disk1/vmware2
>>
>> Brick5: gluster01:/mnt/disk2/vmware2
>>
>> Brick6: gluster03:/mnt/disk2/vmware2
>>
>> Brick7: gluster02:/mnt/disk2/vmware2
>>
>> Brick8: gluster04:/mnt/disk2/vmware2
>>
>> Brick9: gluster01:/mnt/disk3/vmware2
>>
>> Brick10: gluster03:/mnt/disk3/vmware2
>>
>> Brick11: gluster02:/mnt/disk3/vmware2
>>
>> Brick12: gluster04:/mnt/disk3/vmware2
>>
>> Brick13: gluster01:/mnt/disk4/vmware2
>>
>> Brick14: gluster03:/mnt/disk4/vmware2
>>
>> Brick15: gluster02:/mnt/disk4/vmware2
>>
>> Brick16: gluster04:/mnt/disk4/vmware2
>>
>> Brick17: gluster01:/mnt/disk5/vmware2
>>
>> Brick18: gluster03:/mnt/disk5/vmware2
>>
>> Brick19: gluster02:/mnt/disk5/vmware2
>>
>> Brick20: gluster04:/mnt/disk5/vmware2
>>
>> Brick21: gluster01:/mnt/disk6/vmware2
>>
>> Brick22: gluster03:/mnt/disk6/vmware2
>>
>> Brick23: gluster02:/mnt/disk6/vmware2
>>
>> Brick24: gluster04:/mnt/disk6/vmware2
>>
>> Brick25: gluster01:/mnt/disk7/vmware2
>>
>> Brick26: gluster03:/mnt/disk7/vmware2
>>
>> Brick27: gluster02:/mnt/disk7/vmware2
>>
>> Brick28: gluster04:/mnt/disk7/vmware2
>>
>> Brick29: gluster01:/mnt/disk8/vmware2
>>
>> Brick30: gluster03:/mnt/disk8/vmware2
>>
>> Brick31: gluster02:/mnt/disk8/vmware2
>>
>> Brick32: gluster04:/mnt/disk8/vmware2
>>
>> Brick33: gluster01:/mnt/disk9/vmware2
>>
>> Brick34: gluster03:/mnt/disk9/vmware2
>>
>> Brick35: gluster02:/mnt/disk9/vmware2
>>
>> Brick36: gluster04:/mnt/disk9/vmware2
>>
>> Brick37: gluster01:/mnt/disk10/vmware2
>>
>> Brick38: gluster03:/mnt/disk10/vmware2
>>
>> Brick39: gluster02:/mnt/disk10/vmware2
>>
>> Brick40: gluster04:/mnt/disk10/vmware2
>>
>> Brick41: gluster01:/mnt/disk11/vmware2
>>
>> Brick42: gluster03:/mnt/disk11/vmware2
>>
>> Brick43: gluster02:/mnt/disk11/vmware2
>>
>> Brick44: gluster04:/mnt/disk11/vmware2
>>
>> Options Reconfigured:
>>
>> cluster.server-quorum-type: server
>>
>> nfs.disable: on
>>
>> performance.readdir-ahead: on
>>
>> transport.address-family: inet
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: off
>>
>> cluster.eager-lock: enable
>>
>> network.remote-dio: enable
>>
>> features.shard: on
>>
>> cluster.data-self-heal-algorithm: full
>>
>> features.cache-invalidation: on
>>
>> ganesha.enable: on
>>
>> features.shard-block-size: 256MB
>>
>> client.event-threads: 2
>>
>> server.event-threads: 2
>>
>> cluster.favorite-child-policy: size
>>
>> storage.build-pgfid: off
>>
>> network.ping-timeout: 5
>>
>> cluster.enable-shared-storage: enable
>>
>> nfs-ganesha: enable
>>
>> cluster.server-quorum-ratio: 51%
>>
>> Adding bricks:
>>
>> gluster volume add-brick vmware2 replica 2
gluster01:/mnt/disk11/vmware2
>> gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2
>> gluster04:/mnt/disk11/vmware2
>>
>> starting fix layout:
>>
>> gluster volume rebalance vmware2 fix-layout start
>>
>> Starting rebalance:
>>
>> gluster volume rebalance vmware2  start
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170328/78ab0b52/attachment.html>

Krutika Dhananjay

2017-Apr-03 08:22 UTC

head link

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

So Raghavendra has an RCA for this issue.

Copy-pasting his comment here:

<RCA>

Following is a rough algorithm of shard_writev:

1. Based on the offset, calculate the shards touched by current write.
2. Look for inodes corresponding to these shard files in itable.
3. If one or more inodes are missing from itable, issue mknod for
corresponding shard files and ignore EEXIST in cbk.
4. resume writes on respective shards.

Now, imagine a write which falls to an existing "shard_file". For the
sake of discussion lets consider a distribute of three subvols - s1,
s2, s3

1. "shard_file" hashes to subvolume s2 and is present on s2
2. add a subvolume s4 and initiate a fix layout. The layout of
".shard" is fixed to include s4 and hash ranges are changed.
3. write that touches "shard_file" is issued.
4. The inode for "shard_file" is not present in itable after a graph
switch and features/shard issues an mknod.
5. With new layout of .shard, lets say "shard_file" hashes to s3 and
mknod (shard_file) on s3 succeeds. But, the shard_file is already
present on s2.

So, we have two files on two different subvols of dht representing
same shard and this will lead to corruption.

</RCA>

Raghavendra will be sending out a patch in DHT to fix this issue.

-Krutika


On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri <
pkarampu at redhat.com> wrote:
>
>
> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
> wrote:
>
>> Hi,
>>
>>
>> Do you guys have any update regarding this issue ?
>>
> I do not actively work on this issue so I do not have an accurate update,
> but from what I heard from Krutika and Raghavendra(works on DHT) is:
> Krutika debugged initially and found that the issue seems more likely to be
> in DHT, Satheesaran who helped us recreate this issue in lab found that
> just fix-layout without rebalance also caused the corruption 1 out of 3
> times. Raghavendra came up with a possible RCA for why this can happen.
> Raghavendra(CCed) would be the right person to provide accurate update.
>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Tuesday, March 21, 2017 3:02:55 PM
>> *To:* Mahdi Adnan
>> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
>> gluster-users at gluster.org List
>>
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>> Hi,
>>
>> So it looks like Satheesaran managed to recreate this issue. We will be
>> seeking his help in debugging this. It will be easier that way.
>>
>> -Krutika
>>
>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>>> Hello and thank you for your email.
>>> Actually no, i didn't check the gfid of the vms.
>>> If this will help, i can setup a new test cluster and get all the
data
>>> you need.
>>>
>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>
>>> From: Nithya Balachandran
>>> Sent: Monday, March 20, 20:57
>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>> To: Krutika Dhananjay
>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
>>> gluster-users at gluster.org List
>>>
>>> Hi,
>>>
>>> Do you know the GFIDs of the VM images which were corrupted?
>>>
>>> Regards,
>>>
>>> Nithya
>>>
>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at
redhat.com>
>>> wrote:
>>>
>>> I looked at the logs.
>>>
>>> From the time the new graph (since the add-brick command you shared
>>> where bricks 41 through 44 are added) is switched to (line 3011
onwards in
>>> nfs-gfapi.log), I see the following kinds of errors:
>>>
>>> 1. Lookups to a bunch of files failed with ENOENT on both replicas
which
>>> protocol/client converts to ESTALE. I am guessing these entries got
>>> migrated to
>>>
>>> other subvolumes leading to 'No such file or directory'
errors.
>>>
>>> DHT and thereafter shard get the same error code and log the
following:
>>>
>>>  0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
>>> [dht-helper.c:1198:dht_migration_complete_check_task]
17-vmware2-dht:
>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to
lookup the
>>> file on vmware2-dht [Stale file handle]
>>>
>>>
>>>   1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed:
>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]
>>>
>>> which is fine.
>>>
>>> 2. The other kind are from AFR logging of possible split-brain
which I
>>> suppose are harmless too.
>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008]
>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13:
Unreadable
>>> subvolume -1 found with event generation 2 for gfid
>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)
>>>
>>> Since you are saying the bug is hit only on VMs that are undergoing
IO
>>> while rebalance is running (as opposed to those that remained
powered off),
>>>
>>> rebalance + IO could be causing some issues.
>>>
>>> CC'ing DHT devs
>>>
>>> Raghavendra/Nithya/Susant,
>>>
>>> Could you take a look?
>>>
>>> -Krutika
>>>
>>>
>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>> Thank you for your email mate.
>>>
>>> Yes, im aware of this but, to save costs i chose replica 2, this
cluster
>>> is all flash.
>>>
>>> In version 3.7.x i had issues with ping timeout, if one hosts went
down
>>> for few seconds the whole cluster hangs and become unavailable, to
avoid
>>> this i adjusted the ping timeout to 5 seconds.
>>>
>>> As for choosing Ganesha over gfapi, VMWare does not support Gluster
>>> (FUSE or gfapi) im stuck with NFS for this volume.
>>>
>>> The other volume is mounted using gfapi in oVirt cluster.
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Sunday, March 19, 2017 2:01:49 PM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>
>>>
>>>
>>> While I'm still going through the logs, just wanted to point
out a
>>> couple of things:
>>>
>>> 1. It is recommended that you use 3-way replication (replica count
3)
>>> for VM store use case
>>>
>>> 2. network.ping-timeout at 5 seconds is way too low. Please change
it to
>>> 30.
>>>
>>> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE?
>>>
>>> Will get back with anything else I might find or more questions if
I
>>> have any.
>>>
>>> -Krutika
>>>
>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>> Thanks mate,
>>>
>>> Kindly, check the attachment.
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Sunday, March 19, 2017 10:00:22 AM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>
>>>
>>>
>>> In that case could you share the ganesha-gfapi logs?
>>>
>>> -Krutika
>>>
>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>> I have two volumes, one is mounted using libgfapi for ovirt mount,
the
>>> other one is exported via NFS-Ganesha for VMWare which is the one
im
>>> testing now.
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Sunday, March 19, 2017 8:02:19 AM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>
>>>
>>>
>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>> Kindly, check the attached new log file, i dont know if it's
helpful or
>>> not but, i couldn't find the log with the name you just
described.
>>>
>>>
>>> No. Are you using FUSE or libgfapi for accessing the volume? Or is
it
>>> NFS?
>>>
>>>
>>>
>>> -Krutika
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Saturday, March 18, 2017 6:10:40 PM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>
>>>
>>>
>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the
fuse
>>> mount logs? It should be right under /var/log/glusterfs/ directory
>>>
>>> named after the mount point name, only hyphenated.
>>>
>>> -Krutika
>>>
>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>> Hello Krutika,
>>>
>>> Kindly, check the attached logs.
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>
>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>
>>>
>>>
>>> Hi Mahdi,
>>>
>>> Could you attach mount, brick and rebalance logs?
>>>
>>> -Krutika
>>>
>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick
procedure
>>> in a volume contains few VMs.
>>>
>>> After the completion of rebalance, i have rebooted the VMs, some of
ran
>>> just fine, and others just crashed.
>>>
>>> Windows boot to recovery mode and Linux throw xfs errors and does
not
>>> boot.
>>>
>>> I ran the test again and it happened just as the first one, but i
have
>>> noticed only VMs doing disk IOs are affected by this bug.
>>>
>>> The VMs in power off mode started fine and even md5 of the disk
file did
>>> not change after the rebalance.
>>>
>>> anyone else can confirm this ?
>>>
>>> Volume info:
>>>
>>>
>>>
>>> Volume Name: vmware2
>>>
>>> Type: Distributed-Replicate
>>>
>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>
>>> Status: Started
>>>
>>> Snapshot Count: 0
>>>
>>> Number of Bricks: 22 x 2 = 44
>>>
>>> Transport-type: tcp
>>>
>>> Bricks:
>>>
>>> Brick1: gluster01:/mnt/disk1/vmware2
>>>
>>> Brick2: gluster03:/mnt/disk1/vmware2
>>>
>>> Brick3: gluster02:/mnt/disk1/vmware2
>>>
>>> Brick4: gluster04:/mnt/disk1/vmware2
>>>
>>> Brick5: gluster01:/mnt/disk2/vmware2
>>>
>>> Brick6: gluster03:/mnt/disk2/vmware2
>>>
>>> Brick7: gluster02:/mnt/disk2/vmware2
>>>
>>> Brick8: gluster04:/mnt/disk2/vmware2
>>>
>>> Brick9: gluster01:/mnt/disk3/vmware2
>>>
>>> Brick10: gluster03:/mnt/disk3/vmware2
>>>
>>> Brick11: gluster02:/mnt/disk3/vmware2
>>>
>>> Brick12: gluster04:/mnt/disk3/vmware2
>>>
>>> Brick13: gluster01:/mnt/disk4/vmware2
>>>
>>> Brick14: gluster03:/mnt/disk4/vmware2
>>>
>>> Brick15: gluster02:/mnt/disk4/vmware2
>>>
>>> Brick16: gluster04:/mnt/disk4/vmware2
>>>
>>> Brick17: gluster01:/mnt/disk5/vmware2
>>>
>>> Brick18: gluster03:/mnt/disk5/vmware2
>>>
>>> Brick19: gluster02:/mnt/disk5/vmware2
>>>
>>> Brick20: gluster04:/mnt/disk5/vmware2
>>>
>>> Brick21: gluster01:/mnt/disk6/vmware2
>>>
>>> Brick22: gluster03:/mnt/disk6/vmware2
>>>
>>> Brick23: gluster02:/mnt/disk6/vmware2
>>>
>>> Brick24: gluster04:/mnt/disk6/vmware2
>>>
>>> Brick25: gluster01:/mnt/disk7/vmware2
>>>
>>> Brick26: gluster03:/mnt/disk7/vmware2
>>>
>>> Brick27: gluster02:/mnt/disk7/vmware2
>>>
>>> Brick28: gluster04:/mnt/disk7/vmware2
>>>
>>> Brick29: gluster01:/mnt/disk8/vmware2
>>>
>>> Brick30: gluster03:/mnt/disk8/vmware2
>>>
>>> Brick31: gluster02:/mnt/disk8/vmware2
>>>
>>> Brick32: gluster04:/mnt/disk8/vmware2
>>>
>>> Brick33: gluster01:/mnt/disk9/vmware2
>>>
>>> Brick34: gluster03:/mnt/disk9/vmware2
>>>
>>> Brick35: gluster02:/mnt/disk9/vmware2
>>>
>>> Brick36: gluster04:/mnt/disk9/vmware2
>>>
>>> Brick37: gluster01:/mnt/disk10/vmware2
>>>
>>> Brick38: gluster03:/mnt/disk10/vmware2
>>>
>>> Brick39: gluster02:/mnt/disk10/vmware2
>>>
>>> Brick40: gluster04:/mnt/disk10/vmware2
>>>
>>> Brick41: gluster01:/mnt/disk11/vmware2
>>>
>>> Brick42: gluster03:/mnt/disk11/vmware2
>>>
>>> Brick43: gluster02:/mnt/disk11/vmware2
>>>
>>> Brick44: gluster04:/mnt/disk11/vmware2
>>>
>>> Options Reconfigured:
>>>
>>> cluster.server-quorum-type: server
>>>
>>> nfs.disable: on
>>>
>>> performance.readdir-ahead: on
>>>
>>> transport.address-family: inet
>>>
>>> performance.quick-read: off
>>>
>>> performance.read-ahead: off
>>>
>>> performance.io-cache: off
>>>
>>> performance.stat-prefetch: off
>>>
>>> cluster.eager-lock: enable
>>>
>>> network.remote-dio: enable
>>>
>>> features.shard: on
>>>
>>> cluster.data-self-heal-algorithm: full
>>>
>>> features.cache-invalidation: on
>>>
>>> ganesha.enable: on
>>>
>>> features.shard-block-size: 256MB
>>>
>>> client.event-threads: 2
>>>
>>> server.event-threads: 2
>>>
>>> cluster.favorite-child-policy: size
>>>
>>> storage.build-pgfid: off
>>>
>>> network.ping-timeout: 5
>>>
>>> cluster.enable-shared-storage: enable
>>>
>>> nfs-ganesha: enable
>>>
>>> cluster.server-quorum-ratio: 51%
>>>
>>> Adding bricks:
>>>
>>> gluster volume add-brick vmware2 replica 2
gluster01:/mnt/disk11/vmware2
>>> gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2
>>> gluster04:/mnt/disk11/vmware2
>>>
>>> starting fix layout:
>>>
>>> gluster volume rebalance vmware2 fix-layout start
>>>
>>> Starting rebalance:
>>>
>>> gluster volume rebalance vmware2  start
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Pranith
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170403/8ab557a5/attachment.html>

Gluster users - Mar 2017 - Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption