Anand Babu Periasamy
2009-Jan-05 10:30 UTC
[Gluster-users] [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.
Christopher, main issue with self-heal is its complexity. Handling self-healing logic in a non-blocking asynchronous code path is difficult. Replicating a missing sounds simple, but holding off a lookup call and initiating a new series of calls to heal the file and then resuming back normal operation is tricky. Much of the bugs we faced in 1.3 is related to self-heal. We have handled most of these cases over a period of time. Self-healing is decent now, but not good enough. We feel that it has only complicated the code base. It is hard to test and maintain this part of the code base. Plan is to drop self-heal code all together once the active healing tool gets ready. Unlike self-healing, this active healing can be run by the user on a mounted file system (online) any time. By moving the code out of the file system, into a tool (that is synchronous and linear), we can implement sophisticated healing techniques. Code is not in the repository yet. Hopefully in a month, it will be ready for use. You can simply turn off self-heal and run this utility while the file system is mounted. List-hacking is an internal list, mostly junk :). It is an internal company list. We don't discuss technical / architectural stuff there. They are mostly done over phone and in-person meetings. We do want to actively involve the community right from the design phase. Mailing list is cumbersome and slow to interactively brainstorm design discussions. We can once in a while organize IRC sessions for this purpose. -- Anand Babu Swank iest wrote:> Well, > > I guess this is getting outside of the bug. I suppose you are going to > mark it as not going to fix? > > I'm trying to put gluster into production right now, so may I ask: > > 1) What are the current issues with self-heal that require a full > re-write? Is there a place in the Wiki or elsewhere where it's being > documented? > 2) May I see the new code? I must not be looking in the correct place > in TLA? > 3) If it's not written yet, may I be included in the design discussion? > (As I haven't put gluster into production yet, now would be a good time > to know if it's not going to work in the near future.) > 4) May I be placed on the list-hacking at zresearch.com mailing list, please? > > Christopher. > > > Date: Mon, 5 Jan 2009 01:36:14 -0800 > > From: ab at zresearch.com > > To: krishna at zresearch.com > > CC: swankier at msn.com; list-hacking at zresearch.com > > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not > cause that file to be replicated with afr self-heal. > > > > Krishna, leave it as is. Once self-heal ensures that the volumes are > intact, rm will > > remove both the copies anyways. It is inefficient, but optimizing it > the current framework > > will be hacky. > > > > Swaniker, We are ditching the current self-healing framework with an > active healing tool. > > We can take care of it then. > > > > > > Krishna Srinivas wrote: > >> The current selfheal logic is built in lookup of a file, lookup is > >> issued just before any file operation on a file. So if the lookup call > >> does not know whether an open or rm is going to be done on the file. > >> Will get back to you if we can do anything about this, i.e to save the > >> redundant copy of the file when it is going to be rm'ed > >> > >> Krishna > >> > >> On Mon, Jan 5, 2009 at 12:19 PM, swankier <INVALID.NOREPLY at gnu.org> > wrote: > >>> Follow-up Comment #2, bug #25207 (project gluster): > >>> > >>> I am: > >>> > >>> 1) delete file from posix system beneath afr on one side > >>> 2) run rm on gluster file system > >>> > >>> file is then replicated followed by deletion > >>> > >>> _______________________________________________________ > >>> > >>> Reply to this item at: > >>> > >>> <http://savannah.nongnu.org/bugs/?25207> > > > > -- > > Anand Babu Periasamy > > GPG Key ID: 0x62E15A31 > > Blog [http://ab.freeshell.org] > > GlusterFS [http://www.gluster.org] > > The GNU Operating System [http://www.gnu.org] > > > > ------------------------------------------------------------------------ > Visit messengerbuddies.ca to find out how you could win. Enter today. > <http://www.messengerbuddies.ca/?ocid=BUDDYOMATICENCA20>-- Anand Babu Periasamy GPG Key ID: 0x62E15A31 Blog [http://ab.freeshell.org] GlusterFS [http://www.gluster.org] The GNU Operating System [http://www.gnu.org]
Swank iest
2009-Jan-05 11:37 UTC
[Gluster-users] Rant... WAS: [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.
It's a shame zresearch does not care to include the community in design. Am I mistakenly under the impression that gluster is an Open Source project?For instance, you may find there is a large portion of the community who will feel that removing the file system's ability to heal itself is a bad thing. I, for example, would find having to manually monitor the state of my clustered file system a rather expensive task.I do, however, appreciate that it is a hard problem to solve.I also believe that 1) Being told that FreeBSD is only supported with version 7.0 and only with glusterfs 1.4 (which isn't released) is a bad thing. Where is the stable code base? Has development stopped on 1.3? I feel pressure to be running 1.4, but it's not released yet.2) Being told that 1.4 release candidates are not a good "framework" to be solving problems in is scary. If 1.4 isn't the correct place, where is? Is there a 1.5 that hasn't been made public yet? Is the AFR self-heal code going to be ripped out of 1.4? When will it be ripped out? I thought there was going to be a 1.4 release soon. If 1.3 isn't stable, and 1.4 isn't a good framework, what should someone use in production? Can only code that has been contracted from zresearch be used in production? How much does this cost?3) Talking about features in a public forum may lead to a better end result. For instance it may lead to feedback such as:AFR is broken in a number of ways right now1) AFR blocks on self-heal. ls -lR will not return until the heal is complete. On large directories, this will make many applications break in wonderfully weird ways. I'm imagining users of web applications that have files backed on gluster clicking refresh for 30 minutes.2) AFR self-heal is incredibly slow. I have tracked this down to the use of 4kb "chunks" being sent at a time. The explanation for this is to allow "spare file replication". However, the additional TCP overhead that using such small chunks causes means that self-heal will run at speeds less than 1MBps in my environment (I'm attempting to run gluster over a VPN between data centers.) I believe that the tcp chunk should be tied to the TCP window size. I have set the 4kb size to 131072 in my environment to get things to work a bit better (however, without aggregation of small files, there is still an unnecessary amount of TCP overhead which causes small files to be replicated really slowly.)3) AFR only lists files that exist on the first brick listed in the AFR configuration. This can lead to really awkward situations where a file doesn't exist on the first brick but does on subsequent bricks. Now, I've been explained that this occurs because AFR does not require a metadata server. In fact, this was one of the draws of gluster to me (not having to find some way to make the metadata server highly available.) I did not understand (from any of the documentation available) that it's not that gluster doesn't *require* a metadata server, it's that it doesn't solve name space problem at all.4) AFR does not work reliably above unity or DHT. It crashes a lot. Now, I can understand that gluster was not designed to operate in this fashion, however, I cannot think of any other way to put live data into a gluster file system. (read this as, it would not be my final config, but without having real-time replication of data into my "proper" config... I would need to turn off live servers for days if not weeks to move the data around by hand. If I were to move data around by hand, why would I need a replicated file system?) If it were the case that gluster is not designed to solve these problems, perhaps that should be listed in the documentation somewhere rather than instructions on how to do it (perhaps this is already the case with the 1.4 documentation?). Preferably, we could just fix the problems that cause it to not be possibleNow it's really naive of me to even attempt a design of a working system, but if I were to try...I would break AFR into three code paths1) WRITEon write, files are written to all available bricks. Bricks that are not available are queued until they become available again.2) READon read, lookups happen on all bricks. If a file doesn't exist on a particular brick, it is added to the queue for replication. The file is returned from a valid brick. (this is complicated by a server not being available when a delete occurs. If after it comes back up after a deletion and that file is requested, that file would be replicated again.) This would, of course, not scale linearly with the addition of bricks.3) REPLICATIONProcess the queue. I don't know where this queue should exist. But replication ought occur with it's own thread/process independent of read/write. Somewhere into this could be added code to "balance" files across bricks (should a certain number of bricks only be required for a file. example: 5 bricks, but only two bricks require the file.)</rant>Is there an automated build process for arch somewhere? If not, I would be willing to build one for the project so that developers would be warned of build errors as were introduced and fixed for FreeBSD recently. It would be a convenient place to add unit tests as well. Christopher Owen.> Date: Mon, 5 Jan 2009 02:30:29 -0800> From: ab at zresearch.com> To: swankier at msn.com> CC: list-hacking at zresearch.com; gluster-users at gluster.org; Gluster-devel at nongnu.org> Subject: Re: [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.> > Christopher, main issue with self-heal is its complexity. Handling self-healing> logic in a non-blocking asynchronous code path is difficult. Replicating a missing> sounds simple, but holding off a lookup call and initiating a new series of calls> to heal the file and then resuming back normal operation is tricky. Much of the> bugs we faced in 1.3 is related to self-heal. We have handled most of these cases> over a period of time. Self-healing is decent now, but not good enough. We feel that> it has only complicated the code base. It is hard to test and maintain this part of> the code base.> > Plan is to drop self-heal code all together once the active healing tool gets ready.> Unlike self-healing, this active healing can be run by the user on a mounted file system> (online) any time. By moving the code out of the file system, into a tool (that is> synchronous and linear), we can implement sophisticated healing techniques.> > Code is not in the repository yet. Hopefully in a month, it will be ready for use.> You can simply turn off self-heal and run this utility while the file system is mounted.> > List-hacking is an internal list, mostly junk :). It is an internal company list.> We don't discuss technical / architectural stuff there. They are mostly done over> phone and in-person meetings. We do want to actively involve the community right> from the design phase. Mailing list is cumbersome and slow to interactively> brainstorm design discussions. We can once in a while organize IRC sessions> for this purpose.> > --> Anand Babu> > Swank iest wrote:>> Well,>> >> I guess this is getting outside of the bug. I suppose you are going to >> mark it as not going to fix?>> >> I'm trying to put gluster into production right now, so may I ask:>> >> 1) What are the current issues with self-heal that require a full >> re-write? Is there a place in the Wiki or elsewhere where it's being >> documented?>> 2) May I see the new code? I must not be looking in the correct place >> in TLA?>> 3) If it's not written yet, may I be included in the design discussion? >> (As I haven't put gluster into production yet, now would be a good time >> to know if it's not going to work in the near future.)>> 4) May I be placed on the list-hacking at zresearch.com mailing list, please?>> >> Christopher.>> >> > Date: Mon, 5 Jan 2009 01:36:14 -0800>> > From: ab at zresearch.com>> > To: krishna at zresearch.com>> > CC: swankier at msn.com; list-hacking at zresearch.com>> > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not >> cause that file to be replicated with afr self-heal.>> >>> > Krishna, leave it as is. Once self-heal ensures that the volumes are >> intact, rm will>> > remove both the copies anyways. It is inefficient, but optimizing it >> the current framework>> > will be hacky.>> >>> > Swaniker, We are ditching the current self-healing framework with an >> active healing tool.>> > We can take care of it then.>> >>> >>> > Krishna Srinivas wrote:>> >> The current selfheal logic is built in lookup of a file, lookup is>> >> issued just before any file operation on a file. So if the lookup call>> >> does not know whether an open or rm is going to be done on the file.>> >> Will get back to you if we can do anything about this, i.e to save the>> >> redundant copy of the file when it is going to be rm'ed>> >>>> >> Krishna>> >>>> >> On Mon, Jan 5, 2009 at 12:19 PM, swankier <INVALID.NOREPLY at gnu.org> >> wrote:>> >>> Follow-up Comment #2, bug #25207 (project gluster):>> >>>>> >>> I am:>> >>>>> >>> 1) delete file from posix system beneath afr on one side>> >>> 2) run rm on gluster file system>> >>>>> >>> file is then replicated followed by deletion>> >>>>> >>> _______________________________________________________>> >>>>> >>> Reply to this item at:>> >>>>> >>> <http://savannah.nongnu.org/bugs/?25207>>> >>> > -->> > Anand Babu Periasamy>> > GPG Key ID: 0x62E15A31>> > Blog [http://ab.freeshell.org]>> > GlusterFS [http://www.gluster.org]>> > The GNU Operating System [http://www.gnu.org]>> >>> >> ------------------------------------------------------------------------>> Visit messengerbuddies.ca to find out how you could win. Enter today. >> <http://www.messengerbuddies.ca/?ocid=BUDDYOMATICENCA20>> > -- > Anand Babu Periasamy> GPG Key ID: 0x62E15A31> Blog [http://ab.freeshell.org]> GlusterFS [http://www.gluster.org]> The GNU Operating System [http://www.gnu.org]> _________________________________________________________________ Show them the way! Add maps and directions to your party invites. http://www.microsoft.com/windows/windowslive/events.aspx -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090105/72c0c645/attachment.html>
At 02:30 AM 1/5/2009, Anand Babu Periasamy wrote:>Christopher, main issue with self-heal is its complexity. Handling >self-healing >logic in a non-blocking asynchronous code path is difficult. >Replicating a missing >sounds simple, but holding off a lookup call and initiating a new >series of calls >to heal the file and then resuming back normal operation is tricky. >Much of the >bugs we faced in 1.3 is related to self-heal. We have handled most >of these cases >over a period of time. Self-healing is decent now, but not good >enough. We feel that >it has only complicated the code base. It is hard to test and >maintain this part of >the code base. > >Plan is to drop self-heal code all together once the active healing >tool gets ready. >Unlike self-healing, this active healing can be run by the user on a >mounted file system >(online) any time. By moving the code out of the file system, into a >tool (that is >synchronous and linear), we can implement sophisticated healing techniques. > >Code is not in the repository yet. Hopefully in a month, it will be >ready for use. >You can simply turn off self-heal and run this utility while the >file system is mounted.I realize this is perhaps a bit premature, but am I to understand you''ll be doing away with auto self-healing in replicate? this seems to eliminate much of the value of glusters AFR component. if we have to manually heal with some tool, there''s always a risk of a data integrity problem while this healing process is being excuted after a server interruption. if it''s going to be optional to turn on/off, that''s fine, I suppose, but please, if you''re considering removing this feature altogether, reconsider. Unless this active healing tol is something that would be run automatically anytime there''s a disconnect between AFR servers. While I certainly do realize that the self-heal code is a HUGE performance issue as it''s currently written (at least that''s what I''m noticing on my servers), it''s function is necessary to make the AFR useful.
At 02:30 AM 1/5/2009, Anand Babu Periasamy wrote:>Christopher, main issue with self-heal is its complexity. Handling >self-healing >logic in a non-blocking asynchronous code path is difficult. >Replicating a missing >sounds simple, but holding off a lookup call and initiating a new >series of calls >to heal the file and then resuming back normal operation is tricky. >Much of the >bugs we faced in 1.3 is related to self-heal. We have handled most >of these cases >over a period of time. Self-healing is decent now, but not good >enough. We feel that >it has only complicated the code base. It is hard to test and >maintain this part of >the code base. > >Plan is to drop self-heal code all together once the active healing >tool gets ready. >Unlike self-healing, this active healing can be run by the user on a >mounted file system >(online) any time. By moving the code out of the file system, into a >tool (that is >synchronous and linear), we can implement sophisticated healing techniques. > >Code is not in the repository yet. Hopefully in a month, it will be >ready for use. >You can simply turn off self-heal and run this utility while the >file system is mounted.I realize this is perhaps a bit premature, but am I to understand you'll be doing away with auto self-healing in replicate? this seems to eliminate much of the value of glusters AFR component. if we have to manually heal with some tool, there's always a risk of a data integrity problem while this healing process is being excuted after a server interruption. if it's going to be optional to turn on/off, that's fine, I suppose, but please, if you're considering removing this feature altogether, reconsider. Unless this active healing tol is something that would be run automatically anytime there's a disconnect between AFR servers. While I certainly do realize that the self-heal code is a HUGE performance issue as it's currently written (at least that's what I'm noticing on my servers), it's function is necessary to make the AFR useful.
Anand Babu Periasamy
2009-Jan-05 13:23 UTC
[Gluster-users] Rant... WAS: [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.
Swank iest wrote:> It's a shame zresearch does not care to include the community in design. > Am I mistakenly under the impression that gluster is an Open Source > project?May be I miscommunicated. Only the intricate implementation details, we discuss in person / phone. We do include the community and strongly value their feedbacks. If you go through the mailing list and IRC archives you will see a number of architectural discussions. Even the public roadmap page has place holder for suggestions. Project itself is hosted under Savannah. Source is under GPLv3 license. We are trying our best with the limited resources we have.> For instance, you may find there is a large portion of the community who > will feel that removing the file system's ability to heal itself is a > bad thing. I, for example, would find having to manually monitor the > state of my clustered file system a rather expensive task. > > I do, however, appreciate that it is a hard problem to solve.We are not removing any ability, we are replacing it with a better one. Self-healing code will re-implemented in a synchronous model through an external tool. Currently it is the most complicated / unstable code inside the file system. Stability is #1 priority for every one. You can launch this tool through a cron-job. We are also planning to add "daemon mode" support to receive notifications for real-time handling of events (active healing).> > I also believe that > > 1) Being told that FreeBSD is only supported with version 7.0 and only > with glusterfs 1.4 (which isn't released) is a bad thing. Where is the > stable code base? Has development stopped on 1.3? I feel pressure to > be running 1.4, but it's not released yet.Yes, only critical bug fixes happen on 1.3. Release 2.0 (formerly 1.4) should happen this month. It is relatively more stable than 1.3.> 2) Being told that 1.4 release candidates are not a good "framework" to > be solving problems in is scary. If 1.4 isn't the correct place, where > is? Is there a 1.5 that hasn't been made public yet? Is the AFR > self-heal code going to be ripped out of 1.4? When will it be ripped > out? I thought there was going to be a 1.4 release soon. If 1.3 isn't > stable, and 1.4 isn't a good framework, what should someone use in > production? Can only code that has been contracted from zresearch be > used in production? How much does this cost?Self-heal code will not be removed until it is replaced with a better alternative. Next 2.0 release will still have self-heal turned on by default. Once we feel that glusterfs-heal is ready, we will turn self-heal off by default. We will not remove features without discussing with the community. GlusterFS code is the same both for commercial and gratis users. We do not hold any code as proprietary. Commercial users pay for the subscription package which is support and service for GlusterFS. We deploy, hand-hold and maintain. (Similar to RedHat, except we don't restrict redistribution of binaries).> 3) Talking about features in a public forum may lead to a better end > result. For instance it may lead to feedback such as:We always do that. Healing tool is already there on the roadmap. It was not supposed to be introduced in this release. But we are planning to make it available as part of 2.0.X minor release, instead of waiting for 2.1. This discussion came up, because you requested an optimization that requires a hacky implementation. I won't complicate the current self-heal design any further. It is achievable easily using the new design.> AFR is broken in a number of ways right now > > 1) AFR blocks on self-heal. ls -lR will not return until the heal is > complete. On large directories, this will make many applications break > in wonderfully weird ways. I'm imagining users of web applications that > have files backed on gluster clicking refresh for 30 minutes. > > 2) AFR self-heal is incredibly slow. I have tracked this down to the > use of 4kb "chunks" being sent at a time. The explanation for this is > to allow "spare file replication". However, the additional TCP overhead > that using such small chunks causes means that self-heal will run at > speeds less than 1MBps in my environment (I'm attempting to run gluster > over a VPN between data centers.) I believe that the tcp chunk should > be tied to the TCP window size. I have set the 4kb size to 131072 in my > environment to get things to work a bit better (however, without > aggregation of small files, there is still an unnecessary amount of TCP > overhead which causes small files to be replicated really slowly.)This was one of the reasons to implement a healing tool. It gives more control to the user. Currently it is hard for the user to track when and how the healing happens. 4KB chunk healing is fixable. I will look in to it. I really appreciate your feedback and in-depth details. Also your bug-reports are very useful. More you contribute, more attention you will gain :).> 3) AFR only lists files that exist on the first brick listed in the AFR > configuration. This can lead to really awkward situations where a file > doesn't exist on the first brick but does on subsequent bricks. Now, > I've been explained that this occurs because AFR does not require a > metadata server. In fact, this was one of the draws of gluster to me > (not having to find some way to make the metadata server highly > available.) I did not understand (from any of the documentation > available) that it's not that gluster doesn't *require* a metadata > server, it's that it doesn't solve name space problem at all.AFR uses two phase commit for atomic write operations. For read operations it load balance across the volumes. What you are asking is to atomic read/readdir from all the volumes and verify if they are same. We can implement so, but it will impact the performance. GlusterFS does not have meta-data server even for file level (distributed hash) or block level (stripe) distribution.> 4) AFR does not work reliably above unity or DHT. It crashes a lot. > Now, I can understand that gluster was not designed to operate in this > fashion, however, I cannot think of any other way to put live data into > a gluster file system. (read this as, it would not be my final config, > but without having real-time replication of data into my "proper" > config... I would need to turn off live servers for days if not weeks to > move the data around by hand. If I were to move data around by hand, > why would I need a replicated file system?) If it were the case that > gluster is not designed to solve these problems, perhaps that should be > listed in the documentation somewhere rather than instructions on how to > do it (perhaps this is already the case with the 1.4 documentation?). > Preferably, we could just fix the problems that cause it to not be possibleAFR is very much intended to work with DHT or Unify. We will look into your bug reports.> Now it's really naive of me to even attempt a design of a working > system, but if I were to try... > > I would break AFR into three code paths > > 1) WRITE > > on write, files are written to all available bricks. Bricks that are > not available are queued until they become available again. > > 2) READ > > on read, lookups happen on all bricks. If a file doesn't exist on a > particular brick, it is added to the queue for replication. The file is > returned from a valid brick. (this is complicated by a server not being > available when a delete occurs. If after it comes back up after a > deletion and that file is requested, that file would be replicated > again.) This would, of course, not scale linearly with the addition of > bricks. > > 3) REPLICATION > > Process the queue. I don't know where this queue should exist. But > replication ought occur with it's own thread/process independent of > read/write. Somewhere into this could be added code to "balance" files > across bricks (should a certain number of bricks only be required for a > file. example: 5 bricks, but only two bricks require the file.) > > </rant>Queuing of writes from multiple clients has lot of coherency issues. It is a complicated design. We have thought of implementing a spare volume concept for this purpose. I will discuss with you when time is right.> Is there an automated build process for arch somewhere? If not, I would > be willing to build one for the project so that developers would be > warned of build errors as were introduced and fixed for FreeBSD > recently. It would be a convenient place to add unit tests as well. > > Christopher Owen.Automated build for FreeBSD? We don't even have an inhouse FreeBSD server. It will be a big help for us. Thanks a lot. Happy Hacking! -- Anand Babu> > > Date: Mon, 5 Jan 2009 02:30:29 -0800 > > From: ab at zresearch.com > > To: swankier at msn.com > > CC: list-hacking at zresearch.com; gluster-users at gluster.org; > Gluster-devel at nongnu.org > > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not > cause that file to be replicated with afr self-heal. > > > > Christopher, main issue with self-heal is its complexity. Handling > self-healing > > logic in a non-blocking asynchronous code path is difficult. > Replicating a missing > > sounds simple, but holding off a lookup call and initiating a new > series of calls > > to heal the file and then resuming back normal operation is tricky. > Much of the > > bugs we faced in 1.3 is related to self-heal. We have handled most of > these cases > > over a period of time. Self-healing is decent now, but not good > enough. We feel that > > it has only complicated the code base. It is hard to test and > maintain this part of > > the code base. > > > > Plan is to drop self-heal code all together once the active healing > tool gets ready. > > Unlike self-healing, this active healing can be run by the user on a > mounted file system > > (online) any time. By moving the code out of the file system, into a > tool (that is > > synchronous and linear), we can implement sophisticated healing > techniques. > > > > Code is not in the repository yet. Hopefully in a month, it will be > ready for use. > > You can simply turn off self-heal and run this utility while the file > system is mounted. > > > > List-hacking is an internal list, mostly junk :). It is an internal > company list. > > We don't discuss technical / architectural stuff there. They are > mostly done over > > phone and in-person meetings. We do want to actively involve the > community right > > from the design phase. Mailing list is cumbersome and slow to > interactively > > brainstorm design discussions. We can once in a while organize IRC > sessions > > for this purpose. > > > > -- > > Anand Babu > > > > Swank iest wrote: > >> Well, > >> > >> I guess this is getting outside of the bug. I suppose you are going to > >> mark it as not going to fix? > >> > >> I'm trying to put gluster into production right now, so may I ask: > >> > >> 1) What are the current issues with self-heal that require a full > >> re-write? Is there a place in the Wiki or elsewhere where it's being > >> documented? > >> 2) May I see the new code? I must not be looking in the correct place > >> in TLA? > >> 3) If it's not written yet, may I be included in the design discussion? > >> (As I haven't put gluster into production yet, now would be a good time > >> to know if it's not going to work in the near future.) > >> 4) May I be placed on the list-hacking at zresearch.com mailing list, > please? > >> > >> Christopher. > >> > >> > Date: Mon, 5 Jan 2009 01:36:14 -0800 > >> > From: ab at zresearch.com > >> > To: krishna at zresearch.com > >> > CC: swankier at msn.com; list-hacking at zresearch.com > >> > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not > >> cause that file to be replicated with afr self-heal. > >> > > >> > Krishna, leave it as is. Once self-heal ensures that the volumes are > >> intact, rm will > >> > remove both the copies anyways. It is inefficient, but optimizing it > >> the current framework > >> > will be hacky. > >> > > >> > Swaniker, We are ditching the current self-healing framework with an > >> active healing tool. > >> > We can take care of it then. > >> > > >> > > >> > Krishna Srinivas wrote: > >> >> The current selfheal logic is built in lookup of a file, lookup is > >> >> issued just before any file operation on a file. So if the lookup > call > >> >> does not know whether an open or rm is going to be done on the file. > >> >> Will get back to you if we can do anything about this, i.e to > save the > >> >> redundant copy of the file when it is going to be rm'ed > >> >> > >> >> Krishna > >> >> > >> >> On Mon, Jan 5, 2009 at 12:19 PM, swankier <INVALID.NOREPLY at gnu.org> > >> wrote: > >> >>> Follow-up Comment #2, bug #25207 (project gluster): > >> >>> > >> >>> I am: > >> >>> > >> >>> 1) delete file from posix system beneath afr on one side > >> >>> 2) run rm on gluster file system > >> >>> > >> >>> file is then replicated followed by deletion > >> >>> > >> >>> _______________________________________________________ > >> >>> > >> >>> Reply to this item at: > >> >>> > >> >>> <http://savannah.nongnu.org/bugs/?25207> > >> > > >> > -- > >> > Anand Babu Periasamy > >> > GPG Key ID: 0x62E15A31 > >> > Blog [http://ab.freeshell.org] > >> > GlusterFS [http://www.gluster.org] > >> > The GNU Operating System [http://www.gnu.org] > >> > > >> > >> ------------------------------------------------------------------------ > >> Visit messengerbuddies.ca to find out how you could win. Enter today. > >> <http://www.messengerbuddies.ca/?ocid=BUDDYOMATICENCA20> > > > > -- > > Anand Babu Periasamy > > GPG Key ID: 0x62E15A31 > > Blog [http://ab.freeshell.org] > > GlusterFS [http://www.gluster.org] > > The GNU Operating System [http://www.gnu.org] > > > > ------------------------------------------------------------------------ > Visit messengerbuddies.ca to find out how you could win. Enter today. > <http://www.messengerbuddies.ca/?ocid=BUDDYOMATICENCA20>-- Anand Babu Periasamy GPG Key ID: 0x62E15A31 Blog [http://ab.freeshell.org] GlusterFS [http://www.gluster.org] The GNU Operating System [http://www.gnu.org]
Nope, we are just implementing a better approach to healing. BTW, We "afr" will be renamed to "replicate" (and still alias as AFR for backward compatibility). -- Anand Babu Keith Freedman wrote:> At 02:30 AM 1/5/2009, Anand Babu Periasamy wrote: >> Christopher, main issue with self-heal is its complexity. Handling >> self-healing >> logic in a non-blocking asynchronous code path is difficult. >> Replicating a missing >> sounds simple, but holding off a lookup call and initiating a new >> series of calls >> to heal the file and then resuming back normal operation is tricky. >> Much of the >> bugs we faced in 1.3 is related to self-heal. We have handled most of >> these cases >> over a period of time. Self-healing is decent now, but not good >> enough. We feel that >> it has only complicated the code base. It is hard to test and maintain >> this part of >> the code base. >> >> Plan is to drop self-heal code all together once the active healing >> tool gets ready. >> Unlike self-healing, this active healing can be run by the user on a >> mounted file system >> (online) any time. By moving the code out of the file system, into a >> tool (that is >> synchronous and linear), we can implement sophisticated healing >> techniques. >> >> Code is not in the repository yet. Hopefully in a month, it will be >> ready for use. >> You can simply turn off self-heal and run this utility while the file >> system is mounted. > > I realize this is perhaps a bit premature, but am I to understand you'll > be doing away with auto self-healing in replicate? > this seems to eliminate much of the value of glusters AFR component. > if we have to manually heal with some tool, there's always a risk of a > data integrity problem while this healing process is being excuted after > a server interruption. > > if it's going to be optional to turn on/off, that's fine, I suppose, but > please, if you're considering removing this feature altogether, > reconsider. Unless this active healing tol is something that would be > run automatically anytime there's a disconnect between AFR servers. > > While I certainly do realize that the self-heal code is a HUGE > performance issue as it's currently written (at least that's what I'm > noticing on my servers), it's function is necessary to make the AFR > useful. >
John Leach
2009-Jan-05 16:15 UTC
[Gluster-users] [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.
On Mon, 2009-01-05 at 02:30 -0800, Anand Babu Periasamy wrote:> Christopher, main issue with self-heal is its complexity. Handling self-healing > logic in a non-blocking asynchronous code path is difficult. Replicating a missing > sounds simple, but holding off a lookup call and initiating a new series of calls > to heal the file and then resuming back normal operation is tricky. Much of the > bugs we faced in 1.3 is related to self-heal. We have handled most of these cases > over a period of time. Self-healing is decent now, but not good enough. We feel that > it has only complicated the code base. It is hard to test and maintain this part of > the code base. > > Plan is to drop self-heal code all together once the active healing tool gets ready. > Unlike self-healing, this active healing can be run by the user on a mounted file system > (online) any time. By moving the code out of the file system, into a tool (that is > synchronous and linear), we can implement sophisticated healing techniques.Hi Anand, the active healing tool sounds good - I'm hoping the more sophisticated healing techniques might include rsync style sync :) the dropping of self-heal looks to be worrying a few people - maybe you can elaborate a little (I'm assuming it's not as bad as it sounds). For example, with aft/replicate but without self-healing, what will the behaviour of the cluster be when a brick is stopped, a file updated and then the brick restarted? Will Gluster will serve the most "recent" file available (from the other bricks) until the active healing tool is run to update the first brick? (then allowing full read balancing) Thanks, John. -- Serious Rails Hosting: http://www.brightbox.co.uk
Gordan Bobic
2009-Jan-05 17:55 UTC
[Gluster-users] [Gluster-devel] Re: [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.
Maybe I'm missing something here, but if you take self-healing out of AFR, then surely that makes the system completely useless and no better than running rsync every 5 minutes. Since that can't be right, what am I missing? Gordan Anand Babu Periasamy wrote:> Christopher, main issue with self-heal is its complexity. Handling > self-healing > logic in a non-blocking asynchronous code path is difficult. Replicating > a missing > sounds simple, but holding off a lookup call and initiating a new series > of calls > to heal the file and then resuming back normal operation is tricky. Much > of the > bugs we faced in 1.3 is related to self-heal. We have handled most of > these cases > over a period of time. Self-healing is decent now, but not good enough. > We feel that > it has only complicated the code base. It is hard to test and maintain > this part of > the code base. > > Plan is to drop self-heal code all together once the active healing tool > gets ready. > Unlike self-healing, this active healing can be run by the user on a > mounted file system > (online) any time. By moving the code out of the file system, into a > tool (that is > synchronous and linear), we can implement sophisticated healing techniques. > > Code is not in the repository yet. Hopefully in a month, it will be > ready for use. > You can simply turn off self-heal and run this utility while the file > system is mounted. > > List-hacking is an internal list, mostly junk :). It is an internal > company list. > We don't discuss technical / architectural stuff there. They are mostly > done over > phone and in-person meetings. We do want to actively involve the > community right > from the design phase. Mailing list is cumbersome and slow to interactively > brainstorm design discussions. We can once in a while organize IRC sessions > for this purpose. > > -- > Anand Babu > > Swank iest wrote: >> Well, >> >> I guess this is getting outside of the bug. I suppose you are going >> to mark it as not going to fix? >> >> I'm trying to put gluster into production right now, so may I ask: >> >> 1) What are the current issues with self-heal that require a full >> re-write? Is there a place in the Wiki or elsewhere where it's being >> documented? >> 2) May I see the new code? I must not be looking in the correct place >> in TLA? >> 3) If it's not written yet, may I be included in the design >> discussion? (As I haven't put gluster into production yet, now would >> be a good time to know if it's not going to work in the near future.) >> 4) May I be placed on the list-hacking at zresearch.com mailing list, >> please? >> >> Christopher. >> >> > Date: Mon, 5 Jan 2009 01:36:14 -0800 >> > From: ab at zresearch.com >> > To: krishna at zresearch.com >> > CC: swankier at msn.com; list-hacking at zresearch.com >> > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not >> cause that file to be replicated with afr self-heal. >> > >> > Krishna, leave it as is. Once self-heal ensures that the volumes >> are intact, rm will >> > remove both the copies anyways. It is inefficient, but optimizing >> it the current framework >> > will be hacky. >> > >> > Swaniker, We are ditching the current self-healing framework with >> an active healing tool. >> > We can take care of it then. >> > >> > >> > Krishna Srinivas wrote: >> >> The current selfheal logic is built in lookup of a file, lookup is >> >> issued just before any file operation on a file. So if the lookup >> call >> >> does not know whether an open or rm is going to be done on the file. >> >> Will get back to you if we can do anything about this, i.e to save >> the >> >> redundant copy of the file when it is going to be rm'ed >> >> >> >> Krishna >> >> >> >> On Mon, Jan 5, 2009 at 12:19 PM, swankier >> <INVALID.NOREPLY at gnu.org> wrote: >> >>> Follow-up Comment #2, bug #25207 (project gluster): >> >>> >> >>> I am: >> >>> >> >>> 1) delete file from posix system beneath afr on one side >> >>> 2) run rm on gluster file system >> >>> >> >>> file is then replicated followed by deletion
Basavanagowda Kanur
2009-Jan-06 05:27 UTC
[Gluster-users] [Gluster-devel] Re: [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.
HEAL tool will monitor the glusterfs in the same way AFR currently does. The only difference being HEAL is a seperate process. HEAL will contain all the functionalities of self-heal (inside AFR as it exists today). On Mon, Jan 5, 2009 at 11:25 PM, Gordan Bobic <gordan at bobich.net> wrote:> Maybe I'm missing something here, but if you take self-healing out of AFR, > then surely that makes the system completely useless and no better than > running rsync every 5 minutes. Since that can't be right, what am I missing? > > Gordan > > > Anand Babu Periasamy wrote: > >> Christopher, main issue with self-heal is its complexity. Handling >> self-healing >> logic in a non-blocking asynchronous code path is difficult. Replicating a >> missing >> sounds simple, but holding off a lookup call and initiating a new series >> of calls >> to heal the file and then resuming back normal operation is tricky. Much >> of the >> bugs we faced in 1.3 is related to self-heal. We have handled most of >> these cases >> over a period of time. Self-healing is decent now, but not good enough. We >> feel that >> it has only complicated the code base. It is hard to test and maintain >> this part of >> the code base. >> >> Plan is to drop self-heal code all together once the active healing tool >> gets ready. >> Unlike self-healing, this active healing can be run by the user on a >> mounted file system >> (online) any time. By moving the code out of the file system, into a tool >> (that is >> synchronous and linear), we can implement sophisticated healing >> techniques. >> >> Code is not in the repository yet. Hopefully in a month, it will be ready >> for use. >> You can simply turn off self-heal and run this utility while the file >> system is mounted. >> >> List-hacking is an internal list, mostly junk :). It is an internal >> company list. >> We don't discuss technical / architectural stuff there. They are mostly >> done over >> phone and in-person meetings. We do want to actively involve the community >> right >> from the design phase. Mailing list is cumbersome and slow to >> interactively >> brainstorm design discussions. We can once in a while organize IRC >> sessions >> for this purpose. >> >> -- >> Anand Babu >> >> Swank iest wrote: >> >>> Well, >>> >>> I guess this is getting outside of the bug. I suppose you are going to >>> mark it as not going to fix? >>> >>> I'm trying to put gluster into production right now, so may I ask: >>> >>> 1) What are the current issues with self-heal that require a full >>> re-write? Is there a place in the Wiki or elsewhere where it's being >>> documented? >>> 2) May I see the new code? I must not be looking in the correct place in >>> TLA? >>> 3) If it's not written yet, may I be included in the design discussion? >>> (As I haven't put gluster into production yet, now would be a good time to >>> know if it's not going to work in the near future.) >>> 4) May I be placed on the list-hacking at zresearch.com mailing list, >>> please? >>> >>> Christopher. >>> >>> > Date: Mon, 5 Jan 2009 01:36:14 -0800 >>> > From: ab at zresearch.com >>> > To: krishna at zresearch.com >>> > CC: swankier at msn.com; list-hacking at zresearch.com >>> > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not >>> cause that file to be replicated with afr self-heal. >>> > >>> > Krishna, leave it as is. Once self-heal ensures that the volumes are >>> intact, rm will >>> > remove both the copies anyways. It is inefficient, but optimizing it >>> the current framework >>> > will be hacky. >>> > >>> > Swaniker, We are ditching the current self-healing framework with an >>> active healing tool. >>> > We can take care of it then. >>> > >>> > >>> > Krishna Srinivas wrote: >>> >> The current selfheal logic is built in lookup of a file, lookup is >>> >> issued just before any file operation on a file. So if the lookup >>> call >>> >> does not know whether an open or rm is going to be done on the file. >>> >> Will get back to you if we can do anything about this, i.e to save >>> the >>> >> redundant copy of the file when it is going to be rm'ed >>> >> >>> >> Krishna >>> >> >>> >> On Mon, Jan 5, 2009 at 12:19 PM, swankier <INVALID.NOREPLY at gnu.org> >>> wrote: >>> >>> Follow-up Comment #2, bug #25207 (project gluster): >>> >>> >>> >>> I am: >>> >>> >>> >>> 1) delete file from posix system beneath afr on one side >>> >>> 2) run rm on gluster file system >>> >>> >>> >>> file is then replicated followed by deletion >>> >> > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at nongnu.org > http://lists.nongnu.org/mailman/listinfo/gluster-devel >-- gowda -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090106/0910bfca/attachment.html>
Keith, Assuming that you are using one of the recent releases from 1.4, 2nd server's glusterfs should not have got hung and should have timed out. Can you easily reproduce this problem? Krishna On Tue, Jan 6, 2009 at 12:14 PM, Keith Freedman <freedman at freeformit.com> wrote:> ok, so I ran into another afr problem tonight. > > I have 2 servers afr-ing eachother. > one of them had a keniption and was in some strange half working state. > the other one was working fine. > > I rebooted the half working one and gluster hung on the other one. > it failed to timeout as expected, and just sat there until the other > machine was pingable. then it realized it was there but not accepting > connections (since it wasn't up enough to mount the filesystem > yet). At this point the good server started moving along ok. > > so it seems that AFR does the right thing when the IP address is > connectable but the AFR process or port isn't responding, but if the > machine/ip is completely down, then it just hangs seemingly forever. > > ?? > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >
I''m running: glusterfs--mainline--3.0--patch-824 which I think is pretty new. I can possibly reproduce parts of the problem. they''re production servers so I''ll have to do it during low useage times. I possibly can tonight. Keith At 07:03 AM 1/6/2009, Krishna Srinivas wrote:>Keith, >Assuming that you are using one of the recent releases from 1.4, 2nd >server''s glusterfs should not have got hung and should have timed out. >Can you easily reproduce this problem? >Krishna > >On Tue, Jan 6, 2009 at 12:14 PM, Keith Freedman ><freedman at freeformit.com> wrote: > > ok, so I ran into another afr problem tonight. > > > > I have 2 servers afr-ing eachother. > > one of them had a keniption and was in some strange half working state. > > the other one was working fine. > > > > I rebooted the half working one and gluster hung on the other one. > > it failed to timeout as expected, and just sat there until the other > > machine was pingable. then it realized it was there but not accepting > > connections (since it wasn''t up enough to mount the filesystem > > yet). At this point the good server started moving along ok. > > > > so it seems that AFR does the right thing when the IP address is > > connectable but the AFR process or port isn''t responding, but if the > > machine/ip is completely down, then it just hangs seemingly forever. > > > > ?? > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > >