Pranith Kumar Karampuri
2017-Apr-27 10:58 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
I am not a DHT developer, so some of what I say could be a little wrong. But this is what I gather. I think they found 2 classes of bugs in dht 1) Graceful fop failover when rebalance is in progress is missing for some fops, that lead to VM pause. I see that https://review.gluster.org/17085 got merged on 24th on master for this. I see patches are posted for 3.8.x for this one. 2) I think there is some work needs to be done for dht_[f]xattrop. I believe this is the next step that is underway. On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta < gandalf.corvotempesta at gmail.com> wrote:> Updates on this critical bug ? > > Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" < > gandalf.corvotempesta at gmail.com> ha scritto: > >> Any update ? >> In addition, if this is a different bug but the "workflow" is the same >> as the previous one, how is possible that fixing the previous bug >> triggered this new one ? >> >> Is possible to have some details ? >> >> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at redhat.com>: >> > Nope. This is a different bug. >> > >> > -Krutika >> > >> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta >> > <gandalf.corvotempesta at gmail.com> wrote: >> >> >> >> This is a good news >> >> Is this related to the previously fixed bug? >> >> >> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at redhat.com> ha >> >> scritto: >> >>> >> >>> So Raghavendra has an RCA for this issue. >> >>> >> >>> Copy-pasting his comment here: >> >>> >> >>> <RCA> >> >>> >> >>> Following is a rough algorithm of shard_writev: >> >>> >> >>> 1. Based on the offset, calculate the shards touched by current write. >> >>> 2. Look for inodes corresponding to these shard files in itable. >> >>> 3. If one or more inodes are missing from itable, issue mknod for >> >>> corresponding shard files and ignore EEXIST in cbk. >> >>> 4. resume writes on respective shards. >> >>> >> >>> Now, imagine a write which falls to an existing "shard_file". For the >> >>> sake of discussion lets consider a distribute of three subvols - s1, >> s2, s3 >> >>> >> >>> 1. "shard_file" hashes to subvolume s2 and is present on s2 >> >>> 2. add a subvolume s4 and initiate a fix layout. The layout of >> ".shard" >> >>> is fixed to include s4 and hash ranges are changed. >> >>> 3. write that touches "shard_file" is issued. >> >>> 4. The inode for "shard_file" is not present in itable after a graph >> >>> switch and features/shard issues an mknod. >> >>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and >> >>> mknod (shard_file) on s3 succeeds. But, the shard_file is already >> present on >> >>> s2. >> >>> >> >>> So, we have two files on two different subvols of dht representing >> same >> >>> shard and this will lead to corruption. >> >>> >> >>> </RCA> >> >>> >> >>> Raghavendra will be sending out a patch in DHT to fix this issue. >> >>> >> >>> -Krutika >> >>> >> >>> >> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri >> >>> <pkarampu at redhat.com> wrote: >> >>>> >> >>>> >> >>>> >> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan < >> mahdi.adnan at outlook.com> >> >>>> wrote: >> >>>>> >> >>>>> Hi, >> >>>>> >> >>>>> >> >>>>> Do you guys have any update regarding this issue ? >> >>>> >> >>>> I do not actively work on this issue so I do not have an accurate >> >>>> update, but from what I heard from Krutika and Raghavendra(works on >> DHT) is: >> >>>> Krutika debugged initially and found that the issue seems more >> likely to be >> >>>> in DHT, Satheesaran who helped us recreate this issue in lab found >> that just >> >>>> fix-layout without rebalance also caused the corruption 1 out of 3 >> times. >> >>>> Raghavendra came up with a possible RCA for why this can happen. >> >>>> Raghavendra(CCed) would be the right person to provide accurate >> update. >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> >> >>>>> Respectfully >> >>>>> Mahdi A. Mahdi >> >>>>> >> >>>>> ________________________________ >> >>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM >> >>>>> To: Mahdi Adnan >> >>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; >> >>>>> gluster-users at gluster.org List >> >>>>> >> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >> >>>>> >> >>>>> Hi, >> >>>>> >> >>>>> So it looks like Satheesaran managed to recreate this issue. We >> will be >> >>>>> seeking his help in debugging this. It will be easier that way. >> >>>>> >> >>>>> -Krutika >> >>>>> >> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan < >> mahdi.adnan at outlook.com> >> >>>>> wrote: >> >>>>>> >> >>>>>> Hello and thank you for your email. >> >>>>>> Actually no, i didn't check the gfid of the vms. >> >>>>>> If this will help, i can setup a new test cluster and get all the >> data >> >>>>>> you need. >> >>>>>> >> >>>>>> Get Outlook for Android >> >>>>>> >> >>>>>> >> >>>>>> From: Nithya Balachandran >> >>>>>> Sent: Monday, March 20, 20:57 >> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >> corruption >> >>>>>> To: Krutika Dhananjay >> >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, >> >>>>>> gluster-users at gluster.org List >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> Do you know the GFIDs of the VM images which were corrupted? >> >>>>>> >> >>>>>> Regards, >> >>>>>> >> >>>>>> Nithya >> >>>>>> >> >>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com> >> >>>>>> wrote: >> >>>>>> >> >>>>>> I looked at the logs. >> >>>>>> >> >>>>>> From the time the new graph (since the add-brick command you shared >> >>>>>> where bricks 41 through 44 are added) is switched to (line 3011 >> onwards in >> >>>>>> nfs-gfapi.log), I see the following kinds of errors: >> >>>>>> >> >>>>>> 1. Lookups to a bunch of files failed with ENOENT on both replicas >> >>>>>> which protocol/client converts to ESTALE. I am guessing these >> entries got >> >>>>>> migrated to >> >>>>>> >> >>>>>> other subvolumes leading to 'No such file or directory' errors. >> >>>>>> >> >>>>>> DHT and thereafter shard get the same error code and log the >> >>>>>> following: >> >>>>>> >> >>>>>> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] >> >>>>>> [dht-helper.c:1198:dht_migration_complete_check_task] >> 17-vmware2-dht: >> >>>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to lookup >> the file >> >>>>>> on vmware2-dht [Stale file handle] >> >>>>>> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] >> >>>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat >> failed: >> >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] >> >>>>>> >> >>>>>> which is fine. >> >>>>>> >> >>>>>> 2. The other kind are from AFR logging of possible split-brain >> which I >> >>>>>> suppose are harmless too. >> >>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008] >> >>>>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: >> Unreadable >> >>>>>> subvolume -1 found with event generation 2 for gfid >> >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) >> >>>>>> >> >>>>>> Since you are saying the bug is hit only on VMs that are >> undergoing IO >> >>>>>> while rebalance is running (as opposed to those that remained >> powered off), >> >>>>>> >> >>>>>> rebalance + IO could be causing some issues. >> >>>>>> >> >>>>>> CC'ing DHT devs >> >>>>>> >> >>>>>> Raghavendra/Nithya/Susant, >> >>>>>> >> >>>>>> Could you take a look? >> >>>>>> >> >>>>>> -Krutika >> >>>>>> >> >>>>>> >> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan < >> mahdi.adnan at outlook.com> >> >>>>>> wrote: >> >>>>>> >> >>>>>> Thank you for your email mate. >> >>>>>> >> >>>>>> Yes, im aware of this but, to save costs i chose replica 2, this >> >>>>>> cluster is all flash. >> >>>>>> >> >>>>>> In version 3.7.x i had issues with ping timeout, if one hosts went >> >>>>>> down for few seconds the whole cluster hangs and become >> unavailable, to >> >>>>>> avoid this i adjusted the ping timeout to 5 seconds. >> >>>>>> >> >>>>>> As for choosing Ganesha over gfapi, VMWare does not support Gluster >> >>>>>> (FUSE or gfapi) im stuck with NFS for this volume. >> >>>>>> >> >>>>>> The other volume is mounted using gfapi in oVirt cluster. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> >> >>>>>> Respectfully >> >>>>>> Mahdi A. Mahdi >> >>>>>> >> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >> >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM >> >>>>>> >> >>>>>> To: Mahdi Adnan >> >>>>>> Cc: gluster-users at gluster.org >> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >> corruption >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> While I'm still going through the logs, just wanted to point out a >> >>>>>> couple of things: >> >>>>>> >> >>>>>> 1. It is recommended that you use 3-way replication (replica count >> 3) >> >>>>>> for VM store use case >> >>>>>> >> >>>>>> 2. network.ping-timeout at 5 seconds is way too low. Please change >> it >> >>>>>> to 30. >> >>>>>> >> >>>>>> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? >> >>>>>> >> >>>>>> Will get back with anything else I might find or more questions if >> I >> >>>>>> have any. >> >>>>>> >> >>>>>> -Krutika >> >>>>>> >> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan < >> mahdi.adnan at outlook.com> >> >>>>>> wrote: >> >>>>>> >> >>>>>> Thanks mate, >> >>>>>> >> >>>>>> Kindly, check the attachment. >> >>>>>> >> >>>>>> -- >> >>>>>> >> >>>>>> Respectfully >> >>>>>> Mahdi A. Mahdi >> >>>>>> >> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >> >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM >> >>>>>> >> >>>>>> To: Mahdi Adnan >> >>>>>> Cc: gluster-users at gluster.org >> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >> corruption >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> In that case could you share the ganesha-gfapi logs? >> >>>>>> >> >>>>>> -Krutika >> >>>>>> >> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan >> >>>>>> <mahdi.adnan at outlook.com> wrote: >> >>>>>> >> >>>>>> I have two volumes, one is mounted using libgfapi for ovirt mount, >> the >> >>>>>> other one is exported via NFS-Ganesha for VMWare which is the one >> im testing >> >>>>>> now. >> >>>>>> >> >>>>>> -- >> >>>>>> >> >>>>>> Respectfully >> >>>>>> Mahdi A. Mahdi >> >>>>>> >> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >> >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM >> >>>>>> >> >>>>>> To: Mahdi Adnan >> >>>>>> Cc: gluster-users at gluster.org >> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >> corruption >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan >> >>>>>> <mahdi.adnan at outlook.com> wrote: >> >>>>>> >> >>>>>> Kindly, check the attached new log file, i dont know if it's >> helpful >> >>>>>> or not but, i couldn't find the log with the name you just >> described. >> >>>>>> >> >>>>>> >> >>>>>> No. Are you using FUSE or libgfapi for accessing the volume? Or is >> it >> >>>>>> NFS? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -Krutika >> >>>>>> >> >>>>>> -- >> >>>>>> >> >>>>>> Respectfully >> >>>>>> Mahdi A. Mahdi >> >>>>>> >> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >> >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM >> >>>>>> >> >>>>>> To: Mahdi Adnan >> >>>>>> Cc: gluster-users at gluster.org >> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >> corruption >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the >> >>>>>> fuse mount logs? It should be right under /var/log/glusterfs/ >> directory >> >>>>>> >> >>>>>> named after the mount point name, only hyphenated. >> >>>>>> >> >>>>>> -Krutika >> >>>>>> >> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan < >> mahdi.adnan at outlook.com> >> >>>>>> wrote: >> >>>>>> >> >>>>>> Hello Krutika, >> >>>>>> >> >>>>>> Kindly, check the attached logs. >> >>>>>> >> >>>>>> -- >> >>>>>> >> >>>>>> Respectfully >> >>>>>> Mahdi A. Mahdi >> >>>>>> >> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >> >>>>>> >> >>>>>> >> >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM >> >>>>>> To: Mahdi Adnan >> >>>>>> Cc: gluster-users at gluster.org >> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >> corruption >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Hi Mahdi, >> >>>>>> >> >>>>>> Could you attach mount, brick and rebalance logs? >> >>>>>> >> >>>>>> -Krutika >> >>>>>> >> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan >> >>>>>> <mahdi.adnan at outlook.com> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick >> >>>>>> procedure in a volume contains few VMs. >> >>>>>> >> >>>>>> After the completion of rebalance, i have rebooted the VMs, some of >> >>>>>> ran just fine, and others just crashed. >> >>>>>> >> >>>>>> Windows boot to recovery mode and Linux throw xfs errors and does >> not >> >>>>>> boot. >> >>>>>> >> >>>>>> I ran the test again and it happened just as the first one, but i >> have >> >>>>>> noticed only VMs doing disk IOs are affected by this bug. >> >>>>>> >> >>>>>> The VMs in power off mode started fine and even md5 of the disk >> file >> >>>>>> did not change after the rebalance. >> >>>>>> >> >>>>>> anyone else can confirm this ? >> >>>>>> >> >>>>>> Volume info: >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Volume Name: vmware2 >> >>>>>> >> >>>>>> Type: Distributed-Replicate >> >>>>>> >> >>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf >> >>>>>> >> >>>>>> Status: Started >> >>>>>> >> >>>>>> Snapshot Count: 0 >> >>>>>> >> >>>>>> Number of Bricks: 22 x 2 = 44 >> >>>>>> >> >>>>>> Transport-type: tcp >> >>>>>> >> >>>>>> Bricks: >> >>>>>> >> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2 >> >>>>>> >> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2 >> >>>>>> >> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2 >> >>>>>> >> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2 >> >>>>>> >> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2 >> >>>>>> >> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2 >> >>>>>> >> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2 >> >>>>>> >> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2 >> >>>>>> >> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2 >> >>>>>> >> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2 >> >>>>>> >> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2 >> >>>>>> >> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2 >> >>>>>> >> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2 >> >>>>>> >> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2 >> >>>>>> >> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2 >> >>>>>> >> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2 >> >>>>>> >> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2 >> >>>>>> >> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2 >> >>>>>> >> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2 >> >>>>>> >> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2 >> >>>>>> >> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2 >> >>>>>> >> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2 >> >>>>>> >> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2 >> >>>>>> >> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2 >> >>>>>> >> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2 >> >>>>>> >> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2 >> >>>>>> >> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2 >> >>>>>> >> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2 >> >>>>>> >> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2 >> >>>>>> >> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2 >> >>>>>> >> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2 >> >>>>>> >> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2 >> >>>>>> >> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2 >> >>>>>> >> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2 >> >>>>>> >> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2 >> >>>>>> >> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2 >> >>>>>> >> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2 >> >>>>>> >> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2 >> >>>>>> >> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2 >> >>>>>> >> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2 >> >>>>>> >> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2 >> >>>>>> >> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2 >> >>>>>> >> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2 >> >>>>>> >> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2 >> >>>>>> >> >>>>>> Options Reconfigured: >> >>>>>> >> >>>>>> cluster.server-quorum-type: server >> >>>>>> >> >>>>>> nfs.disable: on >> >>>>>> >> >>>>>> performance.readdir-ahead: on >> >>>>>> >> >>>>>> transport.address-family: inet >> >>>>>> >> >>>>>> performance.quick-read: off >> >>>>>> >> >>>>>> performance.read-ahead: off >> >>>>>> >> >>>>>> performance.io-cache: off >> >>>>>> >> >>>>>> performance.stat-prefetch: off >> >>>>>> >> >>>>>> cluster.eager-lock: enable >> >>>>>> >> >>>>>> network.remote-dio: enable >> >>>>>> >> >>>>>> features.shard: on >> >>>>>> >> >>>>>> cluster.data-self-heal-algorithm: full >> >>>>>> >> >>>>>> features.cache-invalidation: on >> >>>>>> >> >>>>>> ganesha.enable: on >> >>>>>> >> >>>>>> features.shard-block-size: 256MB >> >>>>>> >> >>>>>> client.event-threads: 2 >> >>>>>> >> >>>>>> server.event-threads: 2 >> >>>>>> >> >>>>>> cluster.favorite-child-policy: size >> >>>>>> >> >>>>>> storage.build-pgfid: off >> >>>>>> >> >>>>>> network.ping-timeout: 5 >> >>>>>> >> >>>>>> cluster.enable-shared-storage: enable >> >>>>>> >> >>>>>> nfs-ganesha: enable >> >>>>>> >> >>>>>> cluster.server-quorum-ratio: 51% >> >>>>>> >> >>>>>> Adding bricks: >> >>>>>> >> >>>>>> gluster volume add-brick vmware2 replica 2 >> >>>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 >> >>>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 >> >>>>>> >> >>>>>> starting fix layout: >> >>>>>> >> >>>>>> gluster volume rebalance vmware2 fix-layout start >> >>>>>> >> >>>>>> Starting rebalance: >> >>>>>> >> >>>>>> gluster volume rebalance vmware2 start >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> >> >>>>>> Respectfully >> >>>>>> Mahdi A. Mahdi >> >>>>>> >> >>>>>> _______________________________________________ >> >>>>>> Gluster-users mailing list >> >>>>>> Gluster-users at gluster.org >> >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Gluster-users mailing list >> >>>>> Gluster-users at gluster.org >> >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Pranith >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> Gluster-users mailing list >> >>> Gluster-users at gluster.org >> >>> http://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > >> >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/4ddb1250/attachment.html>
Gandalf Corvotempesta
2017-Apr-27 11:00 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
I think we are talking about a different bug. Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri" <pkarampu at redhat.com> ha scritto:> I am not a DHT developer, so some of what I say could be a little wrong. > But this is what I gather. > I think they found 2 classes of bugs in dht > 1) Graceful fop failover when rebalance is in progress is missing for some > fops, that lead to VM pause. > > I see that https://review.gluster.org/17085 got merged on 24th on master > for this. I see patches are posted for 3.8.x for this one. > > 2) I think there is some work needs to be done for dht_[f]xattrop. I > believe this is the next step that is underway. > > > On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta < > gandalf.corvotempesta at gmail.com> wrote: > >> Updates on this critical bug ? >> >> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" < >> gandalf.corvotempesta at gmail.com> ha scritto: >> >>> Any update ? >>> In addition, if this is a different bug but the "workflow" is the same >>> as the previous one, how is possible that fixing the previous bug >>> triggered this new one ? >>> >>> Is possible to have some details ? >>> >>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at redhat.com>: >>> > Nope. This is a different bug. >>> > >>> > -Krutika >>> > >>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta >>> > <gandalf.corvotempesta at gmail.com> wrote: >>> >> >>> >> This is a good news >>> >> Is this related to the previously fixed bug? >>> >> >>> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at redhat.com> ha >>> >> scritto: >>> >>> >>> >>> So Raghavendra has an RCA for this issue. >>> >>> >>> >>> Copy-pasting his comment here: >>> >>> >>> >>> <RCA> >>> >>> >>> >>> Following is a rough algorithm of shard_writev: >>> >>> >>> >>> 1. Based on the offset, calculate the shards touched by current >>> write. >>> >>> 2. Look for inodes corresponding to these shard files in itable. >>> >>> 3. If one or more inodes are missing from itable, issue mknod for >>> >>> corresponding shard files and ignore EEXIST in cbk. >>> >>> 4. resume writes on respective shards. >>> >>> >>> >>> Now, imagine a write which falls to an existing "shard_file". For the >>> >>> sake of discussion lets consider a distribute of three subvols - s1, >>> s2, s3 >>> >>> >>> >>> 1. "shard_file" hashes to subvolume s2 and is present on s2 >>> >>> 2. add a subvolume s4 and initiate a fix layout. The layout of >>> ".shard" >>> >>> is fixed to include s4 and hash ranges are changed. >>> >>> 3. write that touches "shard_file" is issued. >>> >>> 4. The inode for "shard_file" is not present in itable after a graph >>> >>> switch and features/shard issues an mknod. >>> >>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and >>> >>> mknod (shard_file) on s3 succeeds. But, the shard_file is already >>> present on >>> >>> s2. >>> >>> >>> >>> So, we have two files on two different subvols of dht representing >>> same >>> >>> shard and this will lead to corruption. >>> >>> >>> >>> </RCA> >>> >>> >>> >>> Raghavendra will be sending out a patch in DHT to fix this issue. >>> >>> >>> >>> -Krutika >>> >>> >>> >>> >>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri >>> >>> <pkarampu at redhat.com> wrote: >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan < >>> mahdi.adnan at outlook.com> >>> >>>> wrote: >>> >>>>> >>> >>>>> Hi, >>> >>>>> >>> >>>>> >>> >>>>> Do you guys have any update regarding this issue ? >>> >>>> >>> >>>> I do not actively work on this issue so I do not have an accurate >>> >>>> update, but from what I heard from Krutika and Raghavendra(works on >>> DHT) is: >>> >>>> Krutika debugged initially and found that the issue seems more >>> likely to be >>> >>>> in DHT, Satheesaran who helped us recreate this issue in lab found >>> that just >>> >>>> fix-layout without rebalance also caused the corruption 1 out of 3 >>> times. >>> >>>> Raghavendra came up with a possible RCA for why this can happen. >>> >>>> Raghavendra(CCed) would be the right person to provide accurate >>> update. >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> >>> >>>>> Respectfully >>> >>>>> Mahdi A. Mahdi >>> >>>>> >>> >>>>> ________________________________ >>> >>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM >>> >>>>> To: Mahdi Adnan >>> >>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; >>> >>>>> gluster-users at gluster.org List >>> >>>>> >>> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>> corruption >>> >>>>> >>> >>>>> Hi, >>> >>>>> >>> >>>>> So it looks like Satheesaran managed to recreate this issue. We >>> will be >>> >>>>> seeking his help in debugging this. It will be easier that way. >>> >>>>> >>> >>>>> -Krutika >>> >>>>> >>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan < >>> mahdi.adnan at outlook.com> >>> >>>>> wrote: >>> >>>>>> >>> >>>>>> Hello and thank you for your email. >>> >>>>>> Actually no, i didn't check the gfid of the vms. >>> >>>>>> If this will help, i can setup a new test cluster and get all the >>> data >>> >>>>>> you need. >>> >>>>>> >>> >>>>>> Get Outlook for Android >>> >>>>>> >>> >>>>>> >>> >>>>>> From: Nithya Balachandran >>> >>>>>> Sent: Monday, March 20, 20:57 >>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>> corruption >>> >>>>>> To: Krutika Dhananjay >>> >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, >>> >>>>>> gluster-users at gluster.org List >>> >>>>>> >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> Do you know the GFIDs of the VM images which were corrupted? >>> >>>>>> >>> >>>>>> Regards, >>> >>>>>> >>> >>>>>> Nithya >>> >>>>>> >>> >>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com >>> > >>> >>>>>> wrote: >>> >>>>>> >>> >>>>>> I looked at the logs. >>> >>>>>> >>> >>>>>> From the time the new graph (since the add-brick command you >>> shared >>> >>>>>> where bricks 41 through 44 are added) is switched to (line 3011 >>> onwards in >>> >>>>>> nfs-gfapi.log), I see the following kinds of errors: >>> >>>>>> >>> >>>>>> 1. Lookups to a bunch of files failed with ENOENT on both replicas >>> >>>>>> which protocol/client converts to ESTALE. I am guessing these >>> entries got >>> >>>>>> migrated to >>> >>>>>> >>> >>>>>> other subvolumes leading to 'No such file or directory' errors. >>> >>>>>> >>> >>>>>> DHT and thereafter shard get the same error code and log the >>> >>>>>> following: >>> >>>>>> >>> >>>>>> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] >>> >>>>>> [dht-helper.c:1198:dht_migration_complete_check_task] >>> 17-vmware2-dht: >>> >>>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to >>> lookup the file >>> >>>>>> on vmware2-dht [Stale file handle] >>> >>>>>> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] >>> >>>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat >>> failed: >>> >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] >>> >>>>>> >>> >>>>>> which is fine. >>> >>>>>> >>> >>>>>> 2. The other kind are from AFR logging of possible split-brain >>> which I >>> >>>>>> suppose are harmless too. >>> >>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008] >>> >>>>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: >>> Unreadable >>> >>>>>> subvolume -1 found with event generation 2 for gfid >>> >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) >>> >>>>>> >>> >>>>>> Since you are saying the bug is hit only on VMs that are >>> undergoing IO >>> >>>>>> while rebalance is running (as opposed to those that remained >>> powered off), >>> >>>>>> >>> >>>>>> rebalance + IO could be causing some issues. >>> >>>>>> >>> >>>>>> CC'ing DHT devs >>> >>>>>> >>> >>>>>> Raghavendra/Nithya/Susant, >>> >>>>>> >>> >>>>>> Could you take a look? >>> >>>>>> >>> >>>>>> -Krutika >>> >>>>>> >>> >>>>>> >>> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan < >>> mahdi.adnan at outlook.com> >>> >>>>>> wrote: >>> >>>>>> >>> >>>>>> Thank you for your email mate. >>> >>>>>> >>> >>>>>> Yes, im aware of this but, to save costs i chose replica 2, this >>> >>>>>> cluster is all flash. >>> >>>>>> >>> >>>>>> In version 3.7.x i had issues with ping timeout, if one hosts went >>> >>>>>> down for few seconds the whole cluster hangs and become >>> unavailable, to >>> >>>>>> avoid this i adjusted the ping timeout to 5 seconds. >>> >>>>>> >>> >>>>>> As for choosing Ganesha over gfapi, VMWare does not support >>> Gluster >>> >>>>>> (FUSE or gfapi) im stuck with NFS for this volume. >>> >>>>>> >>> >>>>>> The other volume is mounted using gfapi in oVirt cluster. >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> -- >>> >>>>>> >>> >>>>>> Respectfully >>> >>>>>> Mahdi A. Mahdi >>> >>>>>> >>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>> >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM >>> >>>>>> >>> >>>>>> To: Mahdi Adnan >>> >>>>>> Cc: gluster-users at gluster.org >>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>> corruption >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> While I'm still going through the logs, just wanted to point out a >>> >>>>>> couple of things: >>> >>>>>> >>> >>>>>> 1. It is recommended that you use 3-way replication (replica >>> count 3) >>> >>>>>> for VM store use case >>> >>>>>> >>> >>>>>> 2. network.ping-timeout at 5 seconds is way too low. Please >>> change it >>> >>>>>> to 30. >>> >>>>>> >>> >>>>>> Is there any specific reason for using NFS-Ganesha over >>> gfapi/FUSE? >>> >>>>>> >>> >>>>>> Will get back with anything else I might find or more questions >>> if I >>> >>>>>> have any. >>> >>>>>> >>> >>>>>> -Krutika >>> >>>>>> >>> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan < >>> mahdi.adnan at outlook.com> >>> >>>>>> wrote: >>> >>>>>> >>> >>>>>> Thanks mate, >>> >>>>>> >>> >>>>>> Kindly, check the attachment. >>> >>>>>> >>> >>>>>> -- >>> >>>>>> >>> >>>>>> Respectfully >>> >>>>>> Mahdi A. Mahdi >>> >>>>>> >>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>> >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM >>> >>>>>> >>> >>>>>> To: Mahdi Adnan >>> >>>>>> Cc: gluster-users at gluster.org >>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>> corruption >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> In that case could you share the ganesha-gfapi logs? >>> >>>>>> >>> >>>>>> -Krutika >>> >>>>>> >>> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan >>> >>>>>> <mahdi.adnan at outlook.com> wrote: >>> >>>>>> >>> >>>>>> I have two volumes, one is mounted using libgfapi for ovirt >>> mount, the >>> >>>>>> other one is exported via NFS-Ganesha for VMWare which is the one >>> im testing >>> >>>>>> now. >>> >>>>>> >>> >>>>>> -- >>> >>>>>> >>> >>>>>> Respectfully >>> >>>>>> Mahdi A. Mahdi >>> >>>>>> >>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>> >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM >>> >>>>>> >>> >>>>>> To: Mahdi Adnan >>> >>>>>> Cc: gluster-users at gluster.org >>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>> corruption >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan >>> >>>>>> <mahdi.adnan at outlook.com> wrote: >>> >>>>>> >>> >>>>>> Kindly, check the attached new log file, i dont know if it's >>> helpful >>> >>>>>> or not but, i couldn't find the log with the name you just >>> described. >>> >>>>>> >>> >>>>>> >>> >>>>>> No. Are you using FUSE or libgfapi for accessing the volume? Or >>> is it >>> >>>>>> NFS? >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> -Krutika >>> >>>>>> >>> >>>>>> -- >>> >>>>>> >>> >>>>>> Respectfully >>> >>>>>> Mahdi A. Mahdi >>> >>>>>> >>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>> >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM >>> >>>>>> >>> >>>>>> To: Mahdi Adnan >>> >>>>>> Cc: gluster-users at gluster.org >>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>> corruption >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach >>> the >>> >>>>>> fuse mount logs? It should be right under /var/log/glusterfs/ >>> directory >>> >>>>>> >>> >>>>>> named after the mount point name, only hyphenated. >>> >>>>>> >>> >>>>>> -Krutika >>> >>>>>> >>> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan < >>> mahdi.adnan at outlook.com> >>> >>>>>> wrote: >>> >>>>>> >>> >>>>>> Hello Krutika, >>> >>>>>> >>> >>>>>> Kindly, check the attached logs. >>> >>>>>> >>> >>>>>> -- >>> >>>>>> >>> >>>>>> Respectfully >>> >>>>>> Mahdi A. Mahdi >>> >>>>>> >>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>> >>>>>> >>> >>>>>> >>> >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM >>> >>>>>> To: Mahdi Adnan >>> >>>>>> Cc: gluster-users at gluster.org >>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>> corruption >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> Hi Mahdi, >>> >>>>>> >>> >>>>>> Could you attach mount, brick and rebalance logs? >>> >>>>>> >>> >>>>>> -Krutika >>> >>>>>> >>> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan >>> >>>>>> <mahdi.adnan at outlook.com> wrote: >>> >>>>>> >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick >>> >>>>>> procedure in a volume contains few VMs. >>> >>>>>> >>> >>>>>> After the completion of rebalance, i have rebooted the VMs, some >>> of >>> >>>>>> ran just fine, and others just crashed. >>> >>>>>> >>> >>>>>> Windows boot to recovery mode and Linux throw xfs errors and does >>> not >>> >>>>>> boot. >>> >>>>>> >>> >>>>>> I ran the test again and it happened just as the first one, but i >>> have >>> >>>>>> noticed only VMs doing disk IOs are affected by this bug. >>> >>>>>> >>> >>>>>> The VMs in power off mode started fine and even md5 of the disk >>> file >>> >>>>>> did not change after the rebalance. >>> >>>>>> >>> >>>>>> anyone else can confirm this ? >>> >>>>>> >>> >>>>>> Volume info: >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> Volume Name: vmware2 >>> >>>>>> >>> >>>>>> Type: Distributed-Replicate >>> >>>>>> >>> >>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf >>> >>>>>> >>> >>>>>> Status: Started >>> >>>>>> >>> >>>>>> Snapshot Count: 0 >>> >>>>>> >>> >>>>>> Number of Bricks: 22 x 2 = 44 >>> >>>>>> >>> >>>>>> Transport-type: tcp >>> >>>>>> >>> >>>>>> Bricks: >>> >>>>>> >>> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2 >>> >>>>>> >>> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2 >>> >>>>>> >>> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2 >>> >>>>>> >>> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2 >>> >>>>>> >>> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2 >>> >>>>>> >>> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2 >>> >>>>>> >>> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2 >>> >>>>>> >>> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2 >>> >>>>>> >>> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2 >>> >>>>>> >>> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2 >>> >>>>>> >>> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2 >>> >>>>>> >>> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2 >>> >>>>>> >>> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2 >>> >>>>>> >>> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2 >>> >>>>>> >>> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2 >>> >>>>>> >>> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2 >>> >>>>>> >>> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2 >>> >>>>>> >>> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2 >>> >>>>>> >>> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2 >>> >>>>>> >>> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2 >>> >>>>>> >>> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2 >>> >>>>>> >>> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2 >>> >>>>>> >>> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2 >>> >>>>>> >>> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2 >>> >>>>>> >>> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2 >>> >>>>>> >>> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2 >>> >>>>>> >>> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2 >>> >>>>>> >>> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2 >>> >>>>>> >>> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2 >>> >>>>>> >>> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2 >>> >>>>>> >>> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2 >>> >>>>>> >>> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2 >>> >>>>>> >>> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2 >>> >>>>>> >>> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2 >>> >>>>>> >>> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2 >>> >>>>>> >>> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2 >>> >>>>>> >>> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2 >>> >>>>>> >>> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2 >>> >>>>>> >>> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2 >>> >>>>>> >>> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2 >>> >>>>>> >>> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2 >>> >>>>>> >>> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2 >>> >>>>>> >>> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2 >>> >>>>>> >>> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2 >>> >>>>>> >>> >>>>>> Options Reconfigured: >>> >>>>>> >>> >>>>>> cluster.server-quorum-type: server >>> >>>>>> >>> >>>>>> nfs.disable: on >>> >>>>>> >>> >>>>>> performance.readdir-ahead: on >>> >>>>>> >>> >>>>>> transport.address-family: inet >>> >>>>>> >>> >>>>>> performance.quick-read: off >>> >>>>>> >>> >>>>>> performance.read-ahead: off >>> >>>>>> >>> >>>>>> performance.io-cache: off >>> >>>>>> >>> >>>>>> performance.stat-prefetch: off >>> >>>>>> >>> >>>>>> cluster.eager-lock: enable >>> >>>>>> >>> >>>>>> network.remote-dio: enable >>> >>>>>> >>> >>>>>> features.shard: on >>> >>>>>> >>> >>>>>> cluster.data-self-heal-algorithm: full >>> >>>>>> >>> >>>>>> features.cache-invalidation: on >>> >>>>>> >>> >>>>>> ganesha.enable: on >>> >>>>>> >>> >>>>>> features.shard-block-size: 256MB >>> >>>>>> >>> >>>>>> client.event-threads: 2 >>> >>>>>> >>> >>>>>> server.event-threads: 2 >>> >>>>>> >>> >>>>>> cluster.favorite-child-policy: size >>> >>>>>> >>> >>>>>> storage.build-pgfid: off >>> >>>>>> >>> >>>>>> network.ping-timeout: 5 >>> >>>>>> >>> >>>>>> cluster.enable-shared-storage: enable >>> >>>>>> >>> >>>>>> nfs-ganesha: enable >>> >>>>>> >>> >>>>>> cluster.server-quorum-ratio: 51% >>> >>>>>> >>> >>>>>> Adding bricks: >>> >>>>>> >>> >>>>>> gluster volume add-brick vmware2 replica 2 >>> >>>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 >>> >>>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 >>> >>>>>> >>> >>>>>> starting fix layout: >>> >>>>>> >>> >>>>>> gluster volume rebalance vmware2 fix-layout start >>> >>>>>> >>> >>>>>> Starting rebalance: >>> >>>>>> >>> >>>>>> gluster volume rebalance vmware2 start >>> >>>>>> >>> >>>>>> >>> >>>>>> -- >>> >>>>>> >>> >>>>>> Respectfully >>> >>>>>> Mahdi A. Mahdi >>> >>>>>> >>> >>>>>> _______________________________________________ >>> >>>>>> Gluster-users mailing list >>> >>>>>> Gluster-users at gluster.org >>> >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>> >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> Gluster-users mailing list >>> >>>>> Gluster-users at gluster.org >>> >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Pranith >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> Gluster-users mailing list >>> >>> Gluster-users at gluster.org >>> >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> > >>> > >>> >> > > > -- > Pranith >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/55194d14/attachment.html>