On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at redhat.com> wrote:> > > On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at redhat.com> > wrote: > >> >> >> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage < >> dgossage at carouselchecks.com> wrote: >> >>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj at redhat.com> >>> wrote: >>> >>>> Could you also share the glustershd logs? >>>> >>> >>> I'll get them when I get to work sure >>> >> >>> >>>> >>>> I tried the same steps that you mentioned multiple times, but heal is >>>> running to completion without any issues. >>>> >>>> It must be said that 'heal full' traverses the files and directories in >>>> a depth-first order and does heals also in the same order. But if it gets >>>> interrupted in the middle (say because self-heal-daemon was either >>>> intentionally or unintentionally brought offline and then brought back up), >>>> self-heal will only pick up the entries that are so far marked as >>>> new-entries that need heal which it will find in indices/xattrop directory. >>>> What this means is that those files and directories that were not visited >>>> during the crawl, will remain untouched and unhealed in this second >>>> iteration of heal, unless you execute a 'heal-full' again. >>>> >>> >>> So should it start healing shards as it crawls or not until after it >>> crawls the entire .shard directory? At the pace it was going that could be >>> a week with one node appearing in the cluster but with no shard files if >>> anything tries to access a file on that node. From my experience other day >>> telling it to heal full again did nothing regardless of node used. >>> >> > Crawl is started from '/' of the volume. Whenever self-heal detects during > the crawl that a file or directory is present in some brick(s) and absent > in others, it creates the file on the bricks where it is absent and marks > the fact that the file or directory might need data/entry and metadata heal > too (this also means that an index is created under > .glusterfs/indices/xattrop of the src bricks). And the data/entry and > metadata heal are picked up and done in >the background with the help of these indices.>Looking at my 3rd node as example i find nearly an exact same number of files in xattrop dir as reported by heal count at time I brought down node2 to try and alleviate read io errors that seemed to occur from what I was guessing as attempts to use the node with no shards for reads. Also attached are the glustershd logs from the 3 nodes, along with the test node i tried yesterday with same results.> > >>> >>>> My suspicion is that this is what happened on your setup. Could you >>>> confirm if that was the case? >>>> >>> >>> Brick was brought online with force start then a full heal launched. >>> Hours later after it became evident that it was not adding new files to >>> heal I did try restarting self-heal daemon and relaunching full heal again. >>> But this was after the heal had basically already failed to work as >>> intended. >>> >> >> OK. How did you figure it was not adding any new files? I need to know >> what places you were monitoring to come to this conclusion. >> >> -Krutika >> >> >>> >>> >>>> As for those logs, I did manager to do something that caused these >>>> warning messages you shared earlier to appear in my client and server logs. >>>> Although these logs are annoying and a bit scary too, they didn't do >>>> any harm to the data in my volume. Why they appear just after a brick is >>>> replaced and under no other circumstances is something I'm still >>>> investigating. >>>> >>>> But for future, it would be good to follow the steps Anuradha gave as >>>> that would allow self-heal to at least detect that it has some repairing to >>>> do whenever it is restarted whether intentionally or otherwise. >>>> >>> >>> I followed those steps as described on my test box and ended up with >>> exact same outcome of adding shards at an agonizing slow pace and no >>> creation of .shard directory or heals on shard directory. Directories >>> visible from mount healed quickly. This was with one VM so it has only 800 >>> shards as well. After hours at work it had added a total of 33 shards to >>> be healed. I sent those logs yesterday as well though not the glustershd. >>> >>> Does replace-brick command copy files in same manner? For these >>> purposes I am contemplating just skipping the heal route. >>> >>> >>>> -Krutika >>>> >>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage < >>>> dgossage at carouselchecks.com> wrote: >>>> >>>>> attached brick and client logs from test machine where same behavior >>>>> occurred not sure if anything new is there. its still on 3.8.2 >>>>> >>>>> Number of Bricks: 1 x 3 = 3 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>> Options Reconfigured: >>>>> cluster.locking-scheme: granular >>>>> performance.strict-o-direct: off >>>>> features.shard-block-size: 64MB >>>>> features.shard: on >>>>> server.allow-insecure: on >>>>> storage.owner-uid: 36 >>>>> storage.owner-gid: 36 >>>>> cluster.server-quorum-type: server >>>>> cluster.quorum-type: auto >>>>> network.remote-dio: on >>>>> cluster.eager-lock: enable >>>>> performance.stat-prefetch: off >>>>> performance.io-cache: off >>>>> performance.quick-read: off >>>>> cluster.self-heal-window-size: 1024 >>>>> cluster.background-self-heal-count: 16 >>>>> nfs.enable-ino32: off >>>>> nfs.addr-namelookup: off >>>>> nfs.disable: on >>>>> performance.read-ahead: off >>>>> performance.readdir-ahead: on >>>>> cluster.granular-entry-heal: on >>>>> >>>>> >>>>> >>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage < >>>>> dgossage at carouselchecks.com> wrote: >>>>> >>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <atalur at redhat.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>> > From: "David Gossage" <dgossage at carouselchecks.com> >>>>>>> > To: "Anuradha Talur" <atalur at redhat.com> >>>>>>> > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>, >>>>>>> "Krutika Dhananjay" <kdhananj at redhat.com> >>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM >>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow >>>>>>> > >>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <atalur at redhat.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > > Response inline. >>>>>>> > > >>>>>>> > > ----- Original Message ----- >>>>>>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com> >>>>>>> > > > To: "David Gossage" <dgossage at carouselchecks.com> >>>>>>> > > > Cc: "gluster-users at gluster.org List" < >>>>>>> Gluster-users at gluster.org> >>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM >>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow >>>>>>> > > > >>>>>>> > > > Could you attach both client and brick logs? Meanwhile I will >>>>>>> try these >>>>>>> > > steps >>>>>>> > > > out on my machines and see if it is easily recreatable. >>>>>>> > > > >>>>>>> > > > -Krutika >>>>>>> > > > >>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage < >>>>>>> > > dgossage at carouselchecks.com >>>>>>> > > > > wrote: >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > Centos 7 Gluster 3.8.3 >>>>>>> > > > >>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>>>>>> > > > Options Reconfigured: >>>>>>> > > > cluster.data-self-heal-algorithm: full >>>>>>> > > > cluster.self-heal-daemon: on >>>>>>> > > > cluster.locking-scheme: granular >>>>>>> > > > features.shard-block-size: 64MB >>>>>>> > > > features.shard: on >>>>>>> > > > performance.readdir-ahead: on >>>>>>> > > > storage.owner-uid: 36 >>>>>>> > > > storage.owner-gid: 36 >>>>>>> > > > performance.quick-read: off >>>>>>> > > > performance.read-ahead: off >>>>>>> > > > performance.io-cache: off >>>>>>> > > > performance.stat-prefetch: on >>>>>>> > > > cluster.eager-lock: enable >>>>>>> > > > network.remote-dio: enable >>>>>>> > > > cluster.quorum-type: auto >>>>>>> > > > cluster.server-quorum-type: server >>>>>>> > > > server.allow-insecure: on >>>>>>> > > > cluster.self-heal-window-size: 1024 >>>>>>> > > > cluster.background-self-heal-count: 16 >>>>>>> > > > performance.strict-write-ordering: off >>>>>>> > > > nfs.disable: on >>>>>>> > > > nfs.addr-namelookup: off >>>>>>> > > > nfs.enable-ino32: off >>>>>>> > > > cluster.granular-entry-heal: on >>>>>>> > > > >>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues. >>>>>>> > > > Following steps detailed in previous recommendations began >>>>>>> proces of >>>>>>> > > > replacing and healngbricks one node at a time. >>>>>>> > > > >>>>>>> > > > 1) kill pid of brick >>>>>>> > > > 2) reconfigure brick from raid6 to raid10 >>>>>>> > > > 3) recreate directory of brick >>>>>>> > > > 4) gluster volume start <> force >>>>>>> > > > 5) gluster volume heal <> full >>>>>>> > > Hi, >>>>>>> > > >>>>>>> > > I'd suggest that full heal is not used. There are a few bugs in >>>>>>> full heal. >>>>>>> > > Better safe than sorry ;) >>>>>>> > > Instead I'd suggest the following steps: >>>>>>> > > >>>>>>> > > Currently I brought the node down by systemctl stop glusterd as >>>>>>> I was >>>>>>> > getting sporadic io issues and a few VM's paused so hoping that >>>>>>> will help. >>>>>>> > I may wait to do this till around 4PM when most work is done in >>>>>>> case it >>>>>>> > shoots load up. >>>>>>> > >>>>>>> > >>>>>>> > > 1) kill pid of brick >>>>>>> > > 2) to configuring of brick that you need >>>>>>> > > 3) recreate brick dir >>>>>>> > > 4) while the brick is still down, from the mount point: >>>>>>> > > a) create a dummy non existent dir under / of mount. >>>>>>> > > >>>>>>> > >>>>>>> > so if noee 2 is down brick, pick node for example 3 and make a >>>>>>> test dir >>>>>>> > under its brick directory that doesnt exist on 2 or should I be >>>>>>> dong this >>>>>>> > over a gluster mount? >>>>>>> You should be doing this over gluster mount. >>>>>>> > >>>>>>> > > b) set a non existent extended attribute on / of mount. >>>>>>> > > >>>>>>> > >>>>>>> > Could you give me an example of an attribute to set? I've read a >>>>>>> tad on >>>>>>> > this, and looked up attributes but haven't set any yet myself. >>>>>>> > >>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" <path-to-mount> >>>>>>> > Doing these steps will ensure that heal happens only from updated >>>>>>> brick to >>>>>>> > > down brick. >>>>>>> > > 5) gluster v start <> force >>>>>>> > > 6) gluster v heal <> >>>>>>> > > >>>>>>> > >>>>>>> > Will it matter if somewhere in gluster the full heal command was >>>>>>> run other >>>>>>> > day? Not sure if it eventually stops or times out. >>>>>>> > >>>>>>> full heal will stop once the crawl is done. So if you want to >>>>>>> trigger heal again, >>>>>>> run gluster v heal <>. Actually even brick up or volume start force >>>>>>> should >>>>>>> trigger the heal. >>>>>>> >>>>>> >>>>>> Did this on test bed today. its one server with 3 bricks on same >>>>>> machine so take that for what its worth. also it still runs 3.8.2. Maybe >>>>>> ill update and re-run test. >>>>>> >>>>>> killed brick >>>>>> deleted brick dir >>>>>> recreated brick dir >>>>>> created fake dir on gluster mount >>>>>> set suggested fake attribute on it >>>>>> ran volume start <> force >>>>>> >>>>>> looked at files it said needed healing and it was just 8 shards that >>>>>> were modified for few minutes I ran through steps >>>>>> >>>>>> gave it few minutes and it stayed same >>>>>> ran gluster volume <> heal >>>>>> >>>>>> it healed all the directories and files you can see over mount >>>>>> including fakedir. >>>>>> >>>>>> same issue for shards though. it adds more shards to heal at glacier >>>>>> pace. slight jump in speed if I stat every file and dir in VM running but >>>>>> not all shards. >>>>>> >>>>>> It started with 8 shards to heal and is now only at 33 out of 800 and >>>>>> probably wont finish adding for few days at rate it goes. >>>>>> >>>>>> >>>>>> >>>>>>> > > >>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB data. >>>>>>> Load was >>>>>>> > > little >>>>>>> > > > heavy but nothing shocking. >>>>>>> > > > >>>>>>> > > > About an hour after node 1 finished I began same process on >>>>>>> node2. Heal >>>>>>> > > > proces kicked in as before and the files in directories >>>>>>> visible from >>>>>>> > > mount >>>>>>> > > > and .glusterfs healed in short time. Then it began crawl of >>>>>>> .shard adding >>>>>>> > > > those files to heal count at which point the entire proces >>>>>>> ground to a >>>>>>> > > halt >>>>>>> > > > basically. After 48 hours out of 19k shards it has added 5900 >>>>>>> to heal >>>>>>> > > list. >>>>>>> > > > Load on all 3 machnes is negligible. It was suggested to >>>>>>> change this >>>>>>> > > value >>>>>>> > > > to full cluster.data-self-heal-algorithm and restart volume >>>>>>> which I >>>>>>> > > did. No >>>>>>> > > > efffect. Tried relaunching heal no effect, despite any node >>>>>>> picked. I >>>>>>> > > > started each VM and performed a stat of all files from within >>>>>>> it, or a >>>>>>> > > full >>>>>>> > > > virus scan and that seemed to cause short small spikes in >>>>>>> shards added, >>>>>>> > > but >>>>>>> > > > not by much. Logs are showing no real messages indicating >>>>>>> anything is >>>>>>> > > going >>>>>>> > > > on. I get hits to brick log on occasion of null lookups making >>>>>>> me think >>>>>>> > > its >>>>>>> > > > not really crawling shards directory but waiting for a shard >>>>>>> lookup to >>>>>>> > > add >>>>>>> > > > it. I'll get following in brick log but not constant and >>>>>>> sometime >>>>>>> > > multiple >>>>>>> > > > for same shard. >>>>>>> > > > >>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009] >>>>>>> > > > [server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no >>>>>>> resolution >>>>>>> > > type >>>>>>> > > > for (null) (LOOKUP) >>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050] >>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server: >>>>>>> 12591783: >>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00 >>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) ==> >>>>>>> (Invalid >>>>>>> > > > argument) [Invalid argument] >>>>>>> > > > >>>>>>> > > > This one repeated about 30 times in row then nothing for 10 >>>>>>> minutes then >>>>>>> > > one >>>>>>> > > > hit for one different shard by itself. >>>>>>> > > > >>>>>>> > > > How can I determine if Heal is actually running? How can I >>>>>>> kill it or >>>>>>> > > force >>>>>>> > > > restart? Does node I start it from determine which directory >>>>>>> gets >>>>>>> > > crawled to >>>>>>> > > > determine heals? >>>>>>> > > > >>>>>>> > > > David Gossage >>>>>>> > > > Carousel Checks Inc. | System Administrator >>>>>>> > > > Office 708.613.2284 >>>>>>> > > > >>>>>>> > > > _______________________________________________ >>>>>>> > > > Gluster-users mailing list >>>>>>> > > > Gluster-users at gluster.org >>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > _______________________________________________ >>>>>>> > > > Gluster-users mailing list >>>>>>> > > > Gluster-users at gluster.org >>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>> > > >>>>>>> > > -- >>>>>>> > > Thanks, >>>>>>> > > Anuradha. >>>>>>> > > >>>>>>> > >>>>>>> >>>>>>> -- >>>>>>> Thanks, >>>>>>> Anuradha. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd-node1 Type: application/octet-stream Size: 322716 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd-node2.gz Type: application/x-gzip Size: 645489 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0002.gz> -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd-node3.gz Type: application/x-gzip Size: 296635 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0003.gz> -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd-testnode Type: application/octet-stream Size: 20910 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0003.obj>
On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <dgossage at carouselchecks.com> wrote:> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at redhat.com> > wrote: > >> >> >> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at redhat.com> >> wrote: >> >>> >>> >>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage < >>> dgossage at carouselchecks.com> wrote: >>> >>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj at redhat.com >>>> > wrote: >>>> >>>>> Could you also share the glustershd logs? >>>>> >>>> >>>> I'll get them when I get to work sure >>>> >>> >>>> >>>>> >>>>> I tried the same steps that you mentioned multiple times, but heal is >>>>> running to completion without any issues. >>>>> >>>>> It must be said that 'heal full' traverses the files and directories >>>>> in a depth-first order and does heals also in the same order. But if it >>>>> gets interrupted in the middle (say because self-heal-daemon was either >>>>> intentionally or unintentionally brought offline and then brought back up), >>>>> self-heal will only pick up the entries that are so far marked as >>>>> new-entries that need heal which it will find in indices/xattrop directory. >>>>> What this means is that those files and directories that were not visited >>>>> during the crawl, will remain untouched and unhealed in this second >>>>> iteration of heal, unless you execute a 'heal-full' again. >>>>> >>>> >>>> So should it start healing shards as it crawls or not until after it >>>> crawls the entire .shard directory? At the pace it was going that could be >>>> a week with one node appearing in the cluster but with no shard files if >>>> anything tries to access a file on that node. From my experience other day >>>> telling it to heal full again did nothing regardless of node used. >>>> >>> >> Crawl is started from '/' of the volume. Whenever self-heal detects >> during the crawl that a file or directory is present in some brick(s) and >> absent in others, it creates the file on the bricks where it is absent and >> marks the fact that the file or directory might need data/entry and >> metadata heal too (this also means that an index is created under >> .glusterfs/indices/xattrop of the src bricks). And the data/entry and >> metadata heal are picked up and done in >> > the background with the help of these indices. >> > > Looking at my 3rd node as example i find nearly an exact same number of > files in xattrop dir as reported by heal count at time I brought down node2 > to try and alleviate read io errors that seemed to occur from what I was > guessing as attempts to use the node with no shards for reads. > > Also attached are the glustershd logs from the 3 nodes, along with the > test node i tried yesterday with same results. >Is it possible you just need to spam the heal full command? Wait for a certain amount of time for it to time out? The test server that I did yesterday that stopped at listing 33 shards then healing none of them stlll had 33 shards in list this morning. I issued another heal full and it jumped up and found the missing shards. On the one hand its reassuring that if I just spam the command enough eventually it will heal. It's also disconcerting that if I spam the command enough times the heal will start. I can't test if same behavior would occur on live node as I expect if it did kick in heals I'd have 12 hours of high load during copy again perhaps. But I can test if it happens after last shift. Though I lost track of how many times I tried restarting heal full over Saturday and Sunday when it looked to be doing nothing from all heal tracking commands documented.>> >>>> >>>>> My suspicion is that this is what happened on your setup. Could you >>>>> confirm if that was the case? >>>>> >>>> >>>> Brick was brought online with force start then a full heal launched. >>>> Hours later after it became evident that it was not adding new files to >>>> heal I did try restarting self-heal daemon and relaunching full heal again. >>>> But this was after the heal had basically already failed to work as >>>> intended. >>>> >>> >>> OK. How did you figure it was not adding any new files? I need to know >>> what places you were monitoring to come to this conclusion. >>> >>> -Krutika >>> >>> >>>> >>>> >>>>> As for those logs, I did manager to do something that caused these >>>>> warning messages you shared earlier to appear in my client and server logs. >>>>> Although these logs are annoying and a bit scary too, they didn't do >>>>> any harm to the data in my volume. Why they appear just after a brick is >>>>> replaced and under no other circumstances is something I'm still >>>>> investigating. >>>>> >>>>> But for future, it would be good to follow the steps Anuradha gave as >>>>> that would allow self-heal to at least detect that it has some repairing to >>>>> do whenever it is restarted whether intentionally or otherwise. >>>>> >>>> >>>> I followed those steps as described on my test box and ended up with >>>> exact same outcome of adding shards at an agonizing slow pace and no >>>> creation of .shard directory or heals on shard directory. Directories >>>> visible from mount healed quickly. This was with one VM so it has only 800 >>>> shards as well. After hours at work it had added a total of 33 shards to >>>> be healed. I sent those logs yesterday as well though not the glustershd. >>>> >>>> Does replace-brick command copy files in same manner? For these >>>> purposes I am contemplating just skipping the heal route. >>>> >>>> >>>>> -Krutika >>>>> >>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage < >>>>> dgossage at carouselchecks.com> wrote: >>>>> >>>>>> attached brick and client logs from test machine where same behavior >>>>>> occurred not sure if anything new is there. its still on 3.8.2 >>>>>> >>>>>> Number of Bricks: 1 x 3 = 3 >>>>>> Transport-type: tcp >>>>>> Bricks: >>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>> Options Reconfigured: >>>>>> cluster.locking-scheme: granular >>>>>> performance.strict-o-direct: off >>>>>> features.shard-block-size: 64MB >>>>>> features.shard: on >>>>>> server.allow-insecure: on >>>>>> storage.owner-uid: 36 >>>>>> storage.owner-gid: 36 >>>>>> cluster.server-quorum-type: server >>>>>> cluster.quorum-type: auto >>>>>> network.remote-dio: on >>>>>> cluster.eager-lock: enable >>>>>> performance.stat-prefetch: off >>>>>> performance.io-cache: off >>>>>> performance.quick-read: off >>>>>> cluster.self-heal-window-size: 1024 >>>>>> cluster.background-self-heal-count: 16 >>>>>> nfs.enable-ino32: off >>>>>> nfs.addr-namelookup: off >>>>>> nfs.disable: on >>>>>> performance.read-ahead: off >>>>>> performance.readdir-ahead: on >>>>>> cluster.granular-entry-heal: on >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage < >>>>>> dgossage at carouselchecks.com> wrote: >>>>>> >>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <atalur at redhat.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>> > From: "David Gossage" <dgossage at carouselchecks.com> >>>>>>>> > To: "Anuradha Talur" <atalur at redhat.com> >>>>>>>> > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>, >>>>>>>> "Krutika Dhananjay" <kdhananj at redhat.com> >>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM >>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow >>>>>>>> > >>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur < >>>>>>>> atalur at redhat.com> wrote: >>>>>>>> > >>>>>>>> > > Response inline. >>>>>>>> > > >>>>>>>> > > ----- Original Message ----- >>>>>>>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com> >>>>>>>> > > > To: "David Gossage" <dgossage at carouselchecks.com> >>>>>>>> > > > Cc: "gluster-users at gluster.org List" < >>>>>>>> Gluster-users at gluster.org> >>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM >>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow >>>>>>>> > > > >>>>>>>> > > > Could you attach both client and brick logs? Meanwhile I will >>>>>>>> try these >>>>>>>> > > steps >>>>>>>> > > > out on my machines and see if it is easily recreatable. >>>>>>>> > > > >>>>>>>> > > > -Krutika >>>>>>>> > > > >>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage < >>>>>>>> > > dgossage at carouselchecks.com >>>>>>>> > > > > wrote: >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > Centos 7 Gluster 3.8.3 >>>>>>>> > > > >>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>>>>>>> > > > Options Reconfigured: >>>>>>>> > > > cluster.data-self-heal-algorithm: full >>>>>>>> > > > cluster.self-heal-daemon: on >>>>>>>> > > > cluster.locking-scheme: granular >>>>>>>> > > > features.shard-block-size: 64MB >>>>>>>> > > > features.shard: on >>>>>>>> > > > performance.readdir-ahead: on >>>>>>>> > > > storage.owner-uid: 36 >>>>>>>> > > > storage.owner-gid: 36 >>>>>>>> > > > performance.quick-read: off >>>>>>>> > > > performance.read-ahead: off >>>>>>>> > > > performance.io-cache: off >>>>>>>> > > > performance.stat-prefetch: on >>>>>>>> > > > cluster.eager-lock: enable >>>>>>>> > > > network.remote-dio: enable >>>>>>>> > > > cluster.quorum-type: auto >>>>>>>> > > > cluster.server-quorum-type: server >>>>>>>> > > > server.allow-insecure: on >>>>>>>> > > > cluster.self-heal-window-size: 1024 >>>>>>>> > > > cluster.background-self-heal-count: 16 >>>>>>>> > > > performance.strict-write-ordering: off >>>>>>>> > > > nfs.disable: on >>>>>>>> > > > nfs.addr-namelookup: off >>>>>>>> > > > nfs.enable-ino32: off >>>>>>>> > > > cluster.granular-entry-heal: on >>>>>>>> > > > >>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues. >>>>>>>> > > > Following steps detailed in previous recommendations began >>>>>>>> proces of >>>>>>>> > > > replacing and healngbricks one node at a time. >>>>>>>> > > > >>>>>>>> > > > 1) kill pid of brick >>>>>>>> > > > 2) reconfigure brick from raid6 to raid10 >>>>>>>> > > > 3) recreate directory of brick >>>>>>>> > > > 4) gluster volume start <> force >>>>>>>> > > > 5) gluster volume heal <> full >>>>>>>> > > Hi, >>>>>>>> > > >>>>>>>> > > I'd suggest that full heal is not used. There are a few bugs in >>>>>>>> full heal. >>>>>>>> > > Better safe than sorry ;) >>>>>>>> > > Instead I'd suggest the following steps: >>>>>>>> > > >>>>>>>> > > Currently I brought the node down by systemctl stop glusterd as >>>>>>>> I was >>>>>>>> > getting sporadic io issues and a few VM's paused so hoping that >>>>>>>> will help. >>>>>>>> > I may wait to do this till around 4PM when most work is done in >>>>>>>> case it >>>>>>>> > shoots load up. >>>>>>>> > >>>>>>>> > >>>>>>>> > > 1) kill pid of brick >>>>>>>> > > 2) to configuring of brick that you need >>>>>>>> > > 3) recreate brick dir >>>>>>>> > > 4) while the brick is still down, from the mount point: >>>>>>>> > > a) create a dummy non existent dir under / of mount. >>>>>>>> > > >>>>>>>> > >>>>>>>> > so if noee 2 is down brick, pick node for example 3 and make a >>>>>>>> test dir >>>>>>>> > under its brick directory that doesnt exist on 2 or should I be >>>>>>>> dong this >>>>>>>> > over a gluster mount? >>>>>>>> You should be doing this over gluster mount. >>>>>>>> > >>>>>>>> > > b) set a non existent extended attribute on / of mount. >>>>>>>> > > >>>>>>>> > >>>>>>>> > Could you give me an example of an attribute to set? I've read >>>>>>>> a tad on >>>>>>>> > this, and looked up attributes but haven't set any yet myself. >>>>>>>> > >>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" <path-to-mount> >>>>>>>> > Doing these steps will ensure that heal happens only from updated >>>>>>>> brick to >>>>>>>> > > down brick. >>>>>>>> > > 5) gluster v start <> force >>>>>>>> > > 6) gluster v heal <> >>>>>>>> > > >>>>>>>> > >>>>>>>> > Will it matter if somewhere in gluster the full heal command was >>>>>>>> run other >>>>>>>> > day? Not sure if it eventually stops or times out. >>>>>>>> > >>>>>>>> full heal will stop once the crawl is done. So if you want to >>>>>>>> trigger heal again, >>>>>>>> run gluster v heal <>. Actually even brick up or volume start force >>>>>>>> should >>>>>>>> trigger the heal. >>>>>>>> >>>>>>> >>>>>>> Did this on test bed today. its one server with 3 bricks on same >>>>>>> machine so take that for what its worth. also it still runs 3.8.2. Maybe >>>>>>> ill update and re-run test. >>>>>>> >>>>>>> killed brick >>>>>>> deleted brick dir >>>>>>> recreated brick dir >>>>>>> created fake dir on gluster mount >>>>>>> set suggested fake attribute on it >>>>>>> ran volume start <> force >>>>>>> >>>>>>> looked at files it said needed healing and it was just 8 shards that >>>>>>> were modified for few minutes I ran through steps >>>>>>> >>>>>>> gave it few minutes and it stayed same >>>>>>> ran gluster volume <> heal >>>>>>> >>>>>>> it healed all the directories and files you can see over mount >>>>>>> including fakedir. >>>>>>> >>>>>>> same issue for shards though. it adds more shards to heal at >>>>>>> glacier pace. slight jump in speed if I stat every file and dir in VM >>>>>>> running but not all shards. >>>>>>> >>>>>>> It started with 8 shards to heal and is now only at 33 out of 800 >>>>>>> and probably wont finish adding for few days at rate it goes. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> > > >>>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB data. >>>>>>>> Load was >>>>>>>> > > little >>>>>>>> > > > heavy but nothing shocking. >>>>>>>> > > > >>>>>>>> > > > About an hour after node 1 finished I began same process on >>>>>>>> node2. Heal >>>>>>>> > > > proces kicked in as before and the files in directories >>>>>>>> visible from >>>>>>>> > > mount >>>>>>>> > > > and .glusterfs healed in short time. Then it began crawl of >>>>>>>> .shard adding >>>>>>>> > > > those files to heal count at which point the entire proces >>>>>>>> ground to a >>>>>>>> > > halt >>>>>>>> > > > basically. After 48 hours out of 19k shards it has added 5900 >>>>>>>> to heal >>>>>>>> > > list. >>>>>>>> > > > Load on all 3 machnes is negligible. It was suggested to >>>>>>>> change this >>>>>>>> > > value >>>>>>>> > > > to full cluster.data-self-heal-algorithm and restart volume >>>>>>>> which I >>>>>>>> > > did. No >>>>>>>> > > > efffect. Tried relaunching heal no effect, despite any node >>>>>>>> picked. I >>>>>>>> > > > started each VM and performed a stat of all files from within >>>>>>>> it, or a >>>>>>>> > > full >>>>>>>> > > > virus scan and that seemed to cause short small spikes in >>>>>>>> shards added, >>>>>>>> > > but >>>>>>>> > > > not by much. Logs are showing no real messages indicating >>>>>>>> anything is >>>>>>>> > > going >>>>>>>> > > > on. I get hits to brick log on occasion of null lookups >>>>>>>> making me think >>>>>>>> > > its >>>>>>>> > > > not really crawling shards directory but waiting for a shard >>>>>>>> lookup to >>>>>>>> > > add >>>>>>>> > > > it. I'll get following in brick log but not constant and >>>>>>>> sometime >>>>>>>> > > multiple >>>>>>>> > > > for same shard. >>>>>>>> > > > >>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009] >>>>>>>> > > > [server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no >>>>>>>> resolution >>>>>>>> > > type >>>>>>>> > > > for (null) (LOOKUP) >>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050] >>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server: >>>>>>>> 12591783: >>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00 >>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) >>>>>>>> ==> (Invalid >>>>>>>> > > > argument) [Invalid argument] >>>>>>>> > > > >>>>>>>> > > > This one repeated about 30 times in row then nothing for 10 >>>>>>>> minutes then >>>>>>>> > > one >>>>>>>> > > > hit for one different shard by itself. >>>>>>>> > > > >>>>>>>> > > > How can I determine if Heal is actually running? How can I >>>>>>>> kill it or >>>>>>>> > > force >>>>>>>> > > > restart? Does node I start it from determine which directory >>>>>>>> gets >>>>>>>> > > crawled to >>>>>>>> > > > determine heals? >>>>>>>> > > > >>>>>>>> > > > David Gossage >>>>>>>> > > > Carousel Checks Inc. | System Administrator >>>>>>>> > > > Office 708.613.2284 >>>>>>>> > > > >>>>>>>> > > > _______________________________________________ >>>>>>>> > > > Gluster-users mailing list >>>>>>>> > > > Gluster-users at gluster.org >>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > _______________________________________________ >>>>>>>> > > > Gluster-users mailing list >>>>>>>> > > > Gluster-users at gluster.org >>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>> > > >>>>>>>> > > -- >>>>>>>> > > Thanks, >>>>>>>> > > Anuradha. >>>>>>>> > > >>>>>>>> > >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks, >>>>>>>> Anuradha. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/35fd16da/attachment.html>
On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <dgossage at carouselchecks.com> wrote:> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at redhat.com> > wrote: > >> >> >> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at redhat.com> >> wrote: >> >>> >>> >>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage < >>> dgossage at carouselchecks.com> wrote: >>> >>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj at redhat.com >>>> > wrote: >>>> >>>>> Could you also share the glustershd logs? >>>>> >>>> >>>> I'll get them when I get to work sure >>>> >>> >>>> >>>>> >>>>> I tried the same steps that you mentioned multiple times, but heal is >>>>> running to completion without any issues. >>>>> >>>>> It must be said that 'heal full' traverses the files and directories >>>>> in a depth-first order and does heals also in the same order. But if it >>>>> gets interrupted in the middle (say because self-heal-daemon was either >>>>> intentionally or unintentionally brought offline and then brought back up), >>>>> self-heal will only pick up the entries that are so far marked as >>>>> new-entries that need heal which it will find in indices/xattrop directory. >>>>> What this means is that those files and directories that were not visited >>>>> during the crawl, will remain untouched and unhealed in this second >>>>> iteration of heal, unless you execute a 'heal-full' again. >>>>> >>>> >>>> So should it start healing shards as it crawls or not until after it >>>> crawls the entire .shard directory? At the pace it was going that could be >>>> a week with one node appearing in the cluster but with no shard files if >>>> anything tries to access a file on that node. From my experience other day >>>> telling it to heal full again did nothing regardless of node used. >>>> >>> >> Crawl is started from '/' of the volume. Whenever self-heal detects >> during the crawl that a file or directory is present in some brick(s) and >> absent in others, it creates the file on the bricks where it is absent and >> marks the fact that the file or directory might need data/entry and >> metadata heal too (this also means that an index is created under >> .glusterfs/indices/xattrop of the src bricks). And the data/entry and >> metadata heal are picked up and done in >> > the background with the help of these indices. >> > > Looking at my 3rd node as example i find nearly an exact same number of > files in xattrop dir as reported by heal count at time I brought down node2 > to try and alleviate read io errors that seemed to occur from what I was > guessing as attempts to use the node with no shards for reads. > > Also attached are the glustershd logs from the 3 nodes, along with the > test node i tried yesterday with same results. >Looking at my own logs I notice that a full sweep was only ever recorded in glustershd.log on 2nd node with missing directory. I believe I should have found a sweep begun on every node correct? On my test dev when it did work I do see that [2016-08-30 13:56:25.223333] I [MSGID: 108026] [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0: starting full sweep on subvol glustershard-client-0 [2016-08-30 13:56:25.223522] I [MSGID: 108026] [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0: starting full sweep on subvol glustershard-client-1 [2016-08-30 13:56:25.224616] I [MSGID: 108026] [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0: starting full sweep on subvol glustershard-client-2 [2016-08-30 14:18:48.333740] I [MSGID: 108026] [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0: finished full sweep on subvol glustershard-client-2 [2016-08-30 14:18:48.356008] I [MSGID: 108026] [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0: finished full sweep on subvol glustershard-client-1 [2016-08-30 14:18:49.637811] I [MSGID: 108026] [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0: finished full sweep on subvol glustershard-client-0 While when looking at past few days of the 3 prod nodes i only found that on my 2nd node [2016-08-27 01:26:42.638772] I [MSGID: 108026] [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting full sweep on subvol GLUSTER1-client-1 [2016-08-27 11:37:01.732366] I [MSGID: 108026] [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished full sweep on subvol GLUSTER1-client-1 [2016-08-27 12:58:34.597228] I [MSGID: 108026] [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting full sweep on subvol GLUSTER1-client-1 [2016-08-27 12:59:28.041173] I [MSGID: 108026] [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished full sweep on subvol GLUSTER1-client-1 [2016-08-27 20:03:42.560188] I [MSGID: 108026] [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting full sweep on subvol GLUSTER1-client-1 [2016-08-27 20:03:44.278274] I [MSGID: 108026] [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished full sweep on subvol GLUSTER1-client-1 [2016-08-27 21:00:42.603315] I [MSGID: 108026] [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting full sweep on subvol GLUSTER1-client-1 [2016-08-27 21:00:46.148674] I [MSGID: 108026] [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished full sweep on subvol GLUSTER1-client-1> >> >>>> >>>>> My suspicion is that this is what happened on your setup. Could you >>>>> confirm if that was the case? >>>>> >>>> >>>> Brick was brought online with force start then a full heal launched. >>>> Hours later after it became evident that it was not adding new files to >>>> heal I did try restarting self-heal daemon and relaunching full heal again. >>>> But this was after the heal had basically already failed to work as >>>> intended. >>>> >>> >>> OK. How did you figure it was not adding any new files? I need to know >>> what places you were monitoring to come to this conclusion. >>> >>> -Krutika >>> >>> >>>> >>>> >>>>> As for those logs, I did manager to do something that caused these >>>>> warning messages you shared earlier to appear in my client and server logs. >>>>> Although these logs are annoying and a bit scary too, they didn't do >>>>> any harm to the data in my volume. Why they appear just after a brick is >>>>> replaced and under no other circumstances is something I'm still >>>>> investigating. >>>>> >>>>> But for future, it would be good to follow the steps Anuradha gave as >>>>> that would allow self-heal to at least detect that it has some repairing to >>>>> do whenever it is restarted whether intentionally or otherwise. >>>>> >>>> >>>> I followed those steps as described on my test box and ended up with >>>> exact same outcome of adding shards at an agonizing slow pace and no >>>> creation of .shard directory or heals on shard directory. Directories >>>> visible from mount healed quickly. This was with one VM so it has only 800 >>>> shards as well. After hours at work it had added a total of 33 shards to >>>> be healed. I sent those logs yesterday as well though not the glustershd. >>>> >>>> Does replace-brick command copy files in same manner? For these >>>> purposes I am contemplating just skipping the heal route. >>>> >>>> >>>>> -Krutika >>>>> >>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage < >>>>> dgossage at carouselchecks.com> wrote: >>>>> >>>>>> attached brick and client logs from test machine where same behavior >>>>>> occurred not sure if anything new is there. its still on 3.8.2 >>>>>> >>>>>> Number of Bricks: 1 x 3 = 3 >>>>>> Transport-type: tcp >>>>>> Bricks: >>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>> Options Reconfigured: >>>>>> cluster.locking-scheme: granular >>>>>> performance.strict-o-direct: off >>>>>> features.shard-block-size: 64MB >>>>>> features.shard: on >>>>>> server.allow-insecure: on >>>>>> storage.owner-uid: 36 >>>>>> storage.owner-gid: 36 >>>>>> cluster.server-quorum-type: server >>>>>> cluster.quorum-type: auto >>>>>> network.remote-dio: on >>>>>> cluster.eager-lock: enable >>>>>> performance.stat-prefetch: off >>>>>> performance.io-cache: off >>>>>> performance.quick-read: off >>>>>> cluster.self-heal-window-size: 1024 >>>>>> cluster.background-self-heal-count: 16 >>>>>> nfs.enable-ino32: off >>>>>> nfs.addr-namelookup: off >>>>>> nfs.disable: on >>>>>> performance.read-ahead: off >>>>>> performance.readdir-ahead: on >>>>>> cluster.granular-entry-heal: on >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage < >>>>>> dgossage at carouselchecks.com> wrote: >>>>>> >>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <atalur at redhat.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>> > From: "David Gossage" <dgossage at carouselchecks.com> >>>>>>>> > To: "Anuradha Talur" <atalur at redhat.com> >>>>>>>> > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>, >>>>>>>> "Krutika Dhananjay" <kdhananj at redhat.com> >>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM >>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow >>>>>>>> > >>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur < >>>>>>>> atalur at redhat.com> wrote: >>>>>>>> > >>>>>>>> > > Response inline. >>>>>>>> > > >>>>>>>> > > ----- Original Message ----- >>>>>>>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com> >>>>>>>> > > > To: "David Gossage" <dgossage at carouselchecks.com> >>>>>>>> > > > Cc: "gluster-users at gluster.org List" < >>>>>>>> Gluster-users at gluster.org> >>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM >>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow >>>>>>>> > > > >>>>>>>> > > > Could you attach both client and brick logs? Meanwhile I will >>>>>>>> try these >>>>>>>> > > steps >>>>>>>> > > > out on my machines and see if it is easily recreatable. >>>>>>>> > > > >>>>>>>> > > > -Krutika >>>>>>>> > > > >>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage < >>>>>>>> > > dgossage at carouselchecks.com >>>>>>>> > > > > wrote: >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > Centos 7 Gluster 3.8.3 >>>>>>>> > > > >>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>>>>>>> > > > Options Reconfigured: >>>>>>>> > > > cluster.data-self-heal-algorithm: full >>>>>>>> > > > cluster.self-heal-daemon: on >>>>>>>> > > > cluster.locking-scheme: granular >>>>>>>> > > > features.shard-block-size: 64MB >>>>>>>> > > > features.shard: on >>>>>>>> > > > performance.readdir-ahead: on >>>>>>>> > > > storage.owner-uid: 36 >>>>>>>> > > > storage.owner-gid: 36 >>>>>>>> > > > performance.quick-read: off >>>>>>>> > > > performance.read-ahead: off >>>>>>>> > > > performance.io-cache: off >>>>>>>> > > > performance.stat-prefetch: on >>>>>>>> > > > cluster.eager-lock: enable >>>>>>>> > > > network.remote-dio: enable >>>>>>>> > > > cluster.quorum-type: auto >>>>>>>> > > > cluster.server-quorum-type: server >>>>>>>> > > > server.allow-insecure: on >>>>>>>> > > > cluster.self-heal-window-size: 1024 >>>>>>>> > > > cluster.background-self-heal-count: 16 >>>>>>>> > > > performance.strict-write-ordering: off >>>>>>>> > > > nfs.disable: on >>>>>>>> > > > nfs.addr-namelookup: off >>>>>>>> > > > nfs.enable-ino32: off >>>>>>>> > > > cluster.granular-entry-heal: on >>>>>>>> > > > >>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues. >>>>>>>> > > > Following steps detailed in previous recommendations began >>>>>>>> proces of >>>>>>>> > > > replacing and healngbricks one node at a time. >>>>>>>> > > > >>>>>>>> > > > 1) kill pid of brick >>>>>>>> > > > 2) reconfigure brick from raid6 to raid10 >>>>>>>> > > > 3) recreate directory of brick >>>>>>>> > > > 4) gluster volume start <> force >>>>>>>> > > > 5) gluster volume heal <> full >>>>>>>> > > Hi, >>>>>>>> > > >>>>>>>> > > I'd suggest that full heal is not used. There are a few bugs in >>>>>>>> full heal. >>>>>>>> > > Better safe than sorry ;) >>>>>>>> > > Instead I'd suggest the following steps: >>>>>>>> > > >>>>>>>> > > Currently I brought the node down by systemctl stop glusterd as >>>>>>>> I was >>>>>>>> > getting sporadic io issues and a few VM's paused so hoping that >>>>>>>> will help. >>>>>>>> > I may wait to do this till around 4PM when most work is done in >>>>>>>> case it >>>>>>>> > shoots load up. >>>>>>>> > >>>>>>>> > >>>>>>>> > > 1) kill pid of brick >>>>>>>> > > 2) to configuring of brick that you need >>>>>>>> > > 3) recreate brick dir >>>>>>>> > > 4) while the brick is still down, from the mount point: >>>>>>>> > > a) create a dummy non existent dir under / of mount. >>>>>>>> > > >>>>>>>> > >>>>>>>> > so if noee 2 is down brick, pick node for example 3 and make a >>>>>>>> test dir >>>>>>>> > under its brick directory that doesnt exist on 2 or should I be >>>>>>>> dong this >>>>>>>> > over a gluster mount? >>>>>>>> You should be doing this over gluster mount. >>>>>>>> > >>>>>>>> > > b) set a non existent extended attribute on / of mount. >>>>>>>> > > >>>>>>>> > >>>>>>>> > Could you give me an example of an attribute to set? I've read >>>>>>>> a tad on >>>>>>>> > this, and looked up attributes but haven't set any yet myself. >>>>>>>> > >>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" <path-to-mount> >>>>>>>> > Doing these steps will ensure that heal happens only from updated >>>>>>>> brick to >>>>>>>> > > down brick. >>>>>>>> > > 5) gluster v start <> force >>>>>>>> > > 6) gluster v heal <> >>>>>>>> > > >>>>>>>> > >>>>>>>> > Will it matter if somewhere in gluster the full heal command was >>>>>>>> run other >>>>>>>> > day? Not sure if it eventually stops or times out. >>>>>>>> > >>>>>>>> full heal will stop once the crawl is done. So if you want to >>>>>>>> trigger heal again, >>>>>>>> run gluster v heal <>. Actually even brick up or volume start force >>>>>>>> should >>>>>>>> trigger the heal. >>>>>>>> >>>>>>> >>>>>>> Did this on test bed today. its one server with 3 bricks on same >>>>>>> machine so take that for what its worth. also it still runs 3.8.2. Maybe >>>>>>> ill update and re-run test. >>>>>>> >>>>>>> killed brick >>>>>>> deleted brick dir >>>>>>> recreated brick dir >>>>>>> created fake dir on gluster mount >>>>>>> set suggested fake attribute on it >>>>>>> ran volume start <> force >>>>>>> >>>>>>> looked at files it said needed healing and it was just 8 shards that >>>>>>> were modified for few minutes I ran through steps >>>>>>> >>>>>>> gave it few minutes and it stayed same >>>>>>> ran gluster volume <> heal >>>>>>> >>>>>>> it healed all the directories and files you can see over mount >>>>>>> including fakedir. >>>>>>> >>>>>>> same issue for shards though. it adds more shards to heal at >>>>>>> glacier pace. slight jump in speed if I stat every file and dir in VM >>>>>>> running but not all shards. >>>>>>> >>>>>>> It started with 8 shards to heal and is now only at 33 out of 800 >>>>>>> and probably wont finish adding for few days at rate it goes. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> > > >>>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB data. >>>>>>>> Load was >>>>>>>> > > little >>>>>>>> > > > heavy but nothing shocking. >>>>>>>> > > > >>>>>>>> > > > About an hour after node 1 finished I began same process on >>>>>>>> node2. Heal >>>>>>>> > > > proces kicked in as before and the files in directories >>>>>>>> visible from >>>>>>>> > > mount >>>>>>>> > > > and .glusterfs healed in short time. Then it began crawl of >>>>>>>> .shard adding >>>>>>>> > > > those files to heal count at which point the entire proces >>>>>>>> ground to a >>>>>>>> > > halt >>>>>>>> > > > basically. After 48 hours out of 19k shards it has added 5900 >>>>>>>> to heal >>>>>>>> > > list. >>>>>>>> > > > Load on all 3 machnes is negligible. It was suggested to >>>>>>>> change this >>>>>>>> > > value >>>>>>>> > > > to full cluster.data-self-heal-algorithm and restart volume >>>>>>>> which I >>>>>>>> > > did. No >>>>>>>> > > > efffect. Tried relaunching heal no effect, despite any node >>>>>>>> picked. I >>>>>>>> > > > started each VM and performed a stat of all files from within >>>>>>>> it, or a >>>>>>>> > > full >>>>>>>> > > > virus scan and that seemed to cause short small spikes in >>>>>>>> shards added, >>>>>>>> > > but >>>>>>>> > > > not by much. Logs are showing no real messages indicating >>>>>>>> anything is >>>>>>>> > > going >>>>>>>> > > > on. I get hits to brick log on occasion of null lookups >>>>>>>> making me think >>>>>>>> > > its >>>>>>>> > > > not really crawling shards directory but waiting for a shard >>>>>>>> lookup to >>>>>>>> > > add >>>>>>>> > > > it. I'll get following in brick log but not constant and >>>>>>>> sometime >>>>>>>> > > multiple >>>>>>>> > > > for same shard. >>>>>>>> > > > >>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009] >>>>>>>> > > > [server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no >>>>>>>> resolution >>>>>>>> > > type >>>>>>>> > > > for (null) (LOOKUP) >>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050] >>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server: >>>>>>>> 12591783: >>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00 >>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) >>>>>>>> ==> (Invalid >>>>>>>> > > > argument) [Invalid argument] >>>>>>>> > > > >>>>>>>> > > > This one repeated about 30 times in row then nothing for 10 >>>>>>>> minutes then >>>>>>>> > > one >>>>>>>> > > > hit for one different shard by itself. >>>>>>>> > > > >>>>>>>> > > > How can I determine if Heal is actually running? How can I >>>>>>>> kill it or >>>>>>>> > > force >>>>>>>> > > > restart? Does node I start it from determine which directory >>>>>>>> gets >>>>>>>> > > crawled to >>>>>>>> > > > determine heals? >>>>>>>> > > > >>>>>>>> > > > David Gossage >>>>>>>> > > > Carousel Checks Inc. | System Administrator >>>>>>>> > > > Office 708.613.2284 >>>>>>>> > > > >>>>>>>> > > > _______________________________________________ >>>>>>>> > > > Gluster-users mailing list >>>>>>>> > > > Gluster-users at gluster.org >>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > _______________________________________________ >>>>>>>> > > > Gluster-users mailing list >>>>>>>> > > > Gluster-users at gluster.org >>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>> > > >>>>>>>> > > -- >>>>>>>> > > Thanks, >>>>>>>> > > Anuradha. >>>>>>>> > > >>>>>>>> > >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks, >>>>>>>> Anuradha. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/4af75814/attachment.html>