Hi again folks, Sorry to bug you with another newbie question : With regard to file replication, does Gluster FS repair a damaged file *only* when someone tries to read it ? For example, let's say the filesystem is supposed to maintain 3 copies of the file and one of the copies is lost / removed from the system for whatever reason, then will the missing copy be created the first time only when someone reads the file ? Thanks again for your help! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090114/e397d9a9/attachment.html>
At 11:52 AM 1/14/2009, Gluster Novice wrote:>Sorry to bug you with another newbie question : >With regard to file replication, does Gluster FS repair a damaged >file *only* when someone tries to read it ?yes (or, I believe when the directory containing the missing file is read).>For example, let''s say the filesystem is supposed to maintain 3 >copies of the file and one of the copies is lost / removed from the >system for whatever reason, then will the missing copy be created >the first time only when someone reads the file ?if you need to insure syncronicity after a failure, there is a find command in the wiki that will force auto-healing of the whole filesystem find . -exec head -1 () \; > /dev/null (I think that''s it, it may not be syntactically valid)>Thanks again for your help! > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
At 11:52 AM 1/14/2009, Gluster Novice wrote:>Sorry to bug you with another newbie question : >With regard to file replication, does Gluster FS repair a damaged >file *only* when someone tries to read it ?yes (or, I believe when the directory containing the missing file is read).>For example, let's say the filesystem is supposed to maintain 3 >copies of the file and one of the copies is lost / removed from the >system for whatever reason, then will the missing copy be created >the first time only when someone reads the file ?if you need to insure syncronicity after a failure, there is a find command in the wiki that will force auto-healing of the whole filesystem find . -exec head -1 () \; > /dev/null (I think that's it, it may not be syntactically valid)>Thanks again for your help! > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
In the latest AFR code, the healing code is in lookup and not in open call flow. Basically 'lookup' is done just before any kind of access to that file (stat, open, chmod, chown, rm, rename). So AFR heals the file when you try to access. so "ls -lR" command will trigger the heal of the entire directory structure. Krishna On Thu, Jan 15, 2009 at 9:35 AM, Keith Freedman <freedman at freeformit.com> wrote:> At 11:52 AM 1/14/2009, Gluster Novice wrote: >>Sorry to bug you with another newbie question : >>With regard to file replication, does Gluster FS repair a damaged >>file *only* when someone tries to read it ? > > yes (or, I believe when the directory containing the missing file is read). > >>For example, let's say the filesystem is supposed to maintain 3 >>copies of the file and one of the copies is lost / removed from the >>system for whatever reason, then will the missing copy be created >>the first time only when someone reads the file ? > > if you need to insure syncronicity after a failure, there is a find > command in the wiki that will force auto-healing of the whole filesystem > find . -exec head -1 () \; > /dev/null > (I think that's it, it may not be syntactically valid) > > >>Thanks again for your help! >> >>_______________________________________________ >>Gluster-users mailing list >>Gluster-users at gluster.org >>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >
At 03:22 AM 1/16/2009, Krishna Srinivas wrote:>In the latest AFR code, the healing code is in lookup and not in open >call flow. Basically ''lookup'' is done just before any kind of access >to that file (stat, open, chmod, chown, rm, rename). So AFR heals the >file when you try to access. so "ls -lR" command will trigger the heal >of the entire directory structure.this behavior is extremely handy from the perspective of data integrity, however, it''s disastrous from the perspective of IO performance from an applications point of view. the idea that an application should have to wait while file completely unrelated to it''s needs are being auto-healed is an unnecessary one. There''s got to be a way to handle this. The replication should happen in the background, and gluster should be smart enough to first auto-heal the file in question return control back to the requesting process and then continue healing in the background. in the case of a directory, the listing of the directory can be returned without actually copying over all the files therein. This should be a relatively quick operation. Lets take an example case of a repository for large video files. each one being 1GB. I have a server down for a few hours, during which time 300 of these files have been updated. now all I need to know is which ones changed recently (say, ls -alrtu | tail -5) . I block waiting for 300GB of data to be transferred when I only need a directory listing? similarly, if I get a request for just one of those files, I have to wait for 300GB of data to move around before I can get access to the only 1GB that matters at that time? If this is only temporary until the new healing methodology previously discussed on the list is in place, I suppose it''s liveable, but if this is the way it''s going to continue to work, I can''t imagine it being useful in any practical real-world situations with either large directories or large files with a normal level of file updates/modifications. Keith>Krishna > >On Thu, Jan 15, 2009 at 9:35 AM, Keith Freedman ><freedman at freeformit.com> wrote: > > At 11:52 AM 1/14/2009, Gluster Novice wrote: > >>Sorry to bug you with another newbie question : > >>With regard to file replication, does Gluster FS repair a damaged > >>file *only* when someone tries to read it ? > > > > yes (or, I believe when the directory containing the missing > file is read). > > > >>For example, let''s say the filesystem is supposed to maintain 3 > >>copies of the file and one of the copies is lost / removed from the > >>system for whatever reason, then will the missing copy be created > >>the first time only when someone reads the file ? > > > > if you need to insure syncronicity after a failure, there is a find > > command in the wiki that will force auto-healing of the whole filesystem > > find . -exec head -1 () \; > /dev/null > > (I think that''s it, it may not be syntactically valid) > > > > > >>Thanks again for your help! > >> > >>_______________________________________________ > >>Gluster-users mailing list > >>Gluster-users at gluster.org > >>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > >
We will bw working on background file sync and atleast give it as a configurable option. Avati On Jan 16, 2009 3:41 AM, "Keith Freedman" <freedman at freeformit.com> wrote: At 03:22 AM 1/16/2009, Krishna Srinivas wrote: >In the latest AFR code, the healing code is in looku... this behavior is extremely handy from the perspective of data integrity, however, it's disastrous from the perspective of IO performance from an applications point of view. the idea that an application should have to wait while file completely unrelated to it's needs are being auto-healed is an unnecessary one. There's got to be a way to handle this. The replication should happen in the background, and gluster should be smart enough to first auto-heal the file in question return control back to the requesting process and then continue healing in the background. in the case of a directory, the listing of the directory can be returned without actually copying over all the files therein. This should be a relatively quick operation. Lets take an example case of a repository for large video files. each one being 1GB. I have a server down for a few hours, during which time 300 of these files have been updated. now all I need to know is which ones changed recently (say, ls -alrtu | tail -5) . I block waiting for 300GB of data to be transferred when I only need a directory listing? similarly, if I get a request for just one of those files, I have to wait for 300GB of data to move around before I can get access to the only 1GB that matters at that time? If this is only temporary until the new healing methodology previously discussed on the list is in place, I suppose it's liveable, but if this is the way it's going to continue to work, I can't imagine it being useful in any practical real-world situations with either large directories or large files with a normal level of file updates/modifications. Keith>Krishna > >On Thu, Jan 15, 2009 at 9:35 AM, Keith Freedman ><freedman at freeformit.com> wrote: > > ... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090116/5f3f375e/attachment.html>