I've seen the various FAQs on the website saying there isn't metadata or caching going on but then I can't get my head around how AFR knows what to do when a brick goes offline and subsequently returns. There must be some record of what changes were made. Is this on disk or in memory? __________________________________ Martin Peacock Systems Development Officer Mater Misercordiae University Hospital +353 (0)1 803 2333 ----------------------------Disclaimer-------------------------------------- This e-mail and any files transmitted with it contain information which may be confidential and which may also be privileged and is intended solely for the use of the individual or entity to which it is addressed. Unless you are the intended recipient you may not copy or use it, or disclose it to anyone else. Any opinions expressed are that of the individual and not necessarily that of Mater Misericordiae University Hospital. ---------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081022/d78c030e/attachment.html>
2008/10/22 martin <mpeacock at mater.ie>:> I've seen the various FAQs on the website saying there isn't metadata or > caching going on but then I can't get my head around how AFR knows what to > do when a brick goes offline and subsequently returns. There must be some > record of what changes were made. Is this on disk or in memory?AFR uses a database like journal to keep track of changes. When a node comes back up, there will be a record on other nodes to say *some* changes are pending on the node that was down. AFR then simply copies the entire file and/or creates files on the node that came back up. Vikas Gorur -- Engineer - Z Research http://gluster.org/
At 05:15 AM 10/22/2008, Vikas Gorur wrote:>2008/10/22 martin <mpeacock at mater.ie>: > > I''ve seen the various FAQs on the website saying there isn''t metadata or > > caching going on but then I can''t get my head around how AFR knows what to > > do when a brick goes offline and subsequently returns. There must be some > > record of what changes were made. Is this on disk or in memory? > >AFR uses a database like journal to keep track of changes. When a node comes >back up, there will be a record on other nodes to say *some* changes >are pending on the node that was down. AFR then simply copies the entire file >and/or creates files on the node that came back up.it''s important to note that it "copies the entire file" so if you have VERY large files, then this will create a lot of bandwidth and can take some time :)>Vikas Gorur >-- >Engineer - Z Research >http://gluster.org/ > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
At 05:15 AM 10/22/2008, Vikas Gorur wrote:>2008/10/22 martin <mpeacock at mater.ie>: > > I''ve seen the various FAQs on the website saying there isn''t metadata or > > caching going on but then I can''t get my head around how AFR knows what to > > do when a brick goes offline and subsequently returns. There must be some > > record of what changes were made. Is this on disk or in memory? > >AFR uses a database like journal to keep track of changes. When a node comes >back up, there will be a record on other nodes to say *some* changes >are pending on the node that was down. AFR then simply copies the entire file >and/or creates files on the node that came back up.it''s important to note that it "copies the entire file" so if you have VERY large files, then this will create a lot of bandwidth and can take some time :)>Vikas Gorur >-- >Engineer - Z Research >http://gluster.org/ > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
At 05:15 AM 10/22/2008, Vikas Gorur wrote:>2008/10/22 martin <mpeacock at mater.ie>: > > I''ve seen the various FAQs on the website saying there isn''t metadata or > > caching going on but then I can''t get my head around how AFR knows what to > > do when a brick goes offline and subsequently returns. There must be some > > record of what changes were made. Is this on disk or in memory? > >AFR uses a database like journal to keep track of changes. When a node comes >back up, there will be a record on other nodes to say *some* changes >are pending on the node that was down. AFR then simply copies the entire file >and/or creates files on the node that came back up.it''s important to note that it "copies the entire file" so if you have VERY large files, then this will create a lot of bandwidth and can take some time :)>Vikas Gorur >-- >Engineer - Z Research >http://gluster.org/ > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
2008/10/22 Keith Freedman <freedman at freeformit.com>:> it's important to note that it "copies the entire file" > so if you have VERY large files, then this will create a lot of bandwidth > and can take some time :)Yes, that's a valid point. We hope to remedy it by making AFR use the rsync algorithm to sync the files. Vikas Gorur -- Engineer - Z Research http://gluster.org/
Thanks for the prompt reply, folks.> AFR uses a database like journal to keep track of changes. When a node> comes back up, there will be a record on other nodes to say *some*changes > are pending on the node that was down. AFR then simply copies the entire > file and/or creates files on the node that came back up. So is that the point of the lazy healing? When a node is defined as 'dirty' any file access is verified with the 'other' node? What then determines when the pair of nodes are 'clean'? Is that the responsibility of the surviving node? My case is this - I have 2 nodes with 10,000,000 + files AFR'd for disaster tolerance. Im developing an SOP for restoration after an event but am working through consequences of aftershock events, and I am not clear at present Thanks Martin -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of gluster-users-request at gluster.org Sent: 22 October 2008 20:00 To: gluster-users at gluster.org Subject: [Possible Spam] Gluster-users Digest, Vol 6, Issue 24 Send Gluster-users mailing list submissions to gluster-users at gluster.org To subscribe or unsubscribe via the World Wide Web, visit http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users or, via email, send a message with subject or body 'help' to gluster-users-request at gluster.org You can reach the person managing the list at gluster-users-owner at gluster.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Gluster-users digest..." Today's Topics: 1. Re: AFR caching (Vikas Gorur) 2. Re: AFR caching (Keith Freedman) 3. Re: AFR caching (Vikas Gorur) 4. Re: Question architecture of GlusterFS (Mario Bonilla) 5. tar: File changed as we read it (Andrew McGill) ---------------------------------------------------------------------- Message: 1 Date: Wed, 22 Oct 2008 17:45:29 +0530 From: "Vikas Gorur" <vikas at zresearch.com> Subject: Re: [Gluster-users] AFR caching To: martin <mpeacock at mater.ie> Cc: gluster-users at gluster.org Message-ID: <f1797ae40810220515h542097f3va5330775fd450e31 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 2008/10/22 martin <mpeacock at mater.ie>:> I've seen the various FAQs on the website saying there isn't metadataor> caching going on but then I can't get my head around how AFR knowswhat to> do when a brick goes offline and subsequently returns. There must besome> record of what changes were made. Is this on disk or in memory?AFR uses a database like journal to keep track of changes. When a node comes back up, there will be a record on other nodes to say *some* changes are pending on the node that was down. AFR then simply copies the entire file and/or creates files on the node that came back up. Vikas Gorur -- Engineer - Z Research http://gluster.org/ ------------------------------ Message: 2 Date: Wed, 22 Oct 2008 07:48:46 -0700 From: Keith Freedman <freedman at FreeFormIT.com> Subject: Re: [Gluster-users] AFR caching To: "Vikas Gorur" <vikas at zresearch.com>,martin <mpeacock at mater.ie> Cc: gluster-users at gluster.org Message-ID: <mailman.6.1224702002.16555.gluster-users at gluster.org> Content-Type: text/plain; charset="us-ascii"; format=flowed At 05:15 AM 10/22/2008, Vikas Gorur wrote:>2008/10/22 martin <mpeacock at mater.ie>: > > I've seen the various FAQs on the website saying there isn'tmetadata or> > caching going on but then I can't get my head around how AFR knowswhat to> > do when a brick goes offline and subsequently returns. There mustbe some> > record of what changes were made. Is this on disk or in memory? > >AFR uses a database like journal to keep track of changes. When a nodecomes>back up, there will be a record on other nodes to say *some* changes >are pending on the node that was down. AFR then simply copies theentire file>and/or creates files on the node that came back up.it's important to note that it "copies the entire file" so if you have VERY large files, then this will create a lot of bandwidth and can take some time :)>Vikas Gorur >-- >Engineer - Z Research >http://gluster.org/ > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users------------------------------ Message: 3 Date: Wed, 22 Oct 2008 20:22:26 +0530 From: "Vikas Gorur" <vikas at zresearch.com> Subject: Re: [Gluster-users] AFR caching To: "Keith Freedman" <freedman at freeformit.com> Cc: gluster-users at gluster.org, martin <mpeacock at mater.ie> Message-ID: <f1797ae40810220752u737c3299k19781c7846650b9c at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 2008/10/22 Keith Freedman <freedman at freeformit.com>:> it's important to note that it "copies the entire file" > so if you have VERY large files, then this will create a lot ofbandwidth> and can take some time :)Yes, that's a valid point. We hope to remedy it by making AFR use the rsync algorithm to sync the files. Vikas Gorur -- Engineer - Z Research http://gluster.org/ ------------------------------ Message: 4 Date: Wed, 22 Oct 2008 16:05:42 +0000 From: Mario Bonilla <hatrickmario at hotmail.com> Subject: Re: [Gluster-users] Question architecture of GlusterFS To: Arend-Jan Wijtzes <ajwytzes at wise-guys.nl> Cc: gluster-users at gluster.org Message-ID: <BLU148-W15ECF6933772063698D132D9290 at phx.gbl> Content-Type: text/plain; charset="iso-8859-1" Thank you all for collaborate, recalled some of the things that AFR had overlooked. Att Mario Bonilla>>> Is this behaviour required for correct operation? Does this meanthat all>>> clients are always reading from the same availble first node? >>> Wouldn't the load be better distributed if you take a random or >>> 'round robin' node to read from (which would seem trivial toimplement)?>> >> That is correct. Gowda gave you a simplified view of the operation. >> The AFR translator supports a 'read-subvolume' option, which lets you >> specify the node from where all the reads should be done. If youdon't>> specify the option, reads will be balanced in a nearly round-robin >> fashion among all alive nodes. > > Ah yes, I see from the documentation that you read the same file fromthe> same volume, which is probably the result of some hashing approachwhich> is even nicer. > > Sorry I jumped the gun. > > -- > Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl_________________________________________________________________ Prueba los prototipos de los ?ltimos en MSN Motor http://motor.es.msn.com/ ------------------------------ Message: 5 Date: Wed, 22 Oct 2008 20:55:30 +0200 From: Andrew McGill <list2008 at lunch.za.net> Subject: [Gluster-users] tar: File changed as we read it To: gluster-users at gluster.org Message-ID: <200810222055.30435.list2008 at lunch.za.net> Content-Type: text/plain; charset="us-ascii" tar: blah.bleh: file changed as we read it I have a file (two files actually) with different timestamps on the AFR backends -- I presume because the file timestamp was set to the current time, when the last write operation completed and there is some minor clock skew or network lag. tar notices this intermittently, depending on which mirror handles the request. It is a little distracting when tar most unreasonably complains about timestamps changing (what's wrong with a file having two timestamps that are really quite similar ?!) Did I do something wrong, and is there a way to avoid this? (FWIW, these files were written by rdiff-backup) &:-) Sometimes this ... % tar -c file_statistics.2008-10-2* | tar -tv -rw------- root/root 13393 2008-10-21 17:25:22 file_statistics.2008-10-21T17:08:38+02:00.data.gz tar: file_statistics.2008-10-21T17\:08\:38+02\:00.data.gz: file changed as we read it -rw------- root/root 15185 2008-10-22 07:24:58 file_statistics.2008-10-22T07:24:41+02:00.data.gz tar: file_statistics.2008-10-22T07\:24\:41+02\:00.data.gz: file changed as we read it Sometimes not ... % tar -c file_statistics.2008-10-2* | tar -tv -rw------- root/root 13393 2008-10-21 17:25:21 file_statistics.2008-10-21T17:08:38+02:00.data.gz -rw------- root/root 15185 2008-10-22 07:24:57 file_statistics.2008-10-22T07:24:41+02:00.data.gz ------------------------------ _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users End of Gluster-users Digest, Vol 6, Issue 24 ******************************************** ----------------------------Disclaimer-------------------------------------- This e-mail and any files transmitted with it contain information which may be confidential and which may also be privileged and is intended solely for the use of the individual or entity to which it is addressed. Unless you are the intended recipient you may not copy or use it, or disclose it to anyone else. Any opinions expressed are that of the individual and not necessarily that of Mater Misericordiae University Hospital. ----------------------------------------------------------------------------
At 11:47 PM 10/22/2008, martin wrote:>Thanks for the prompt reply, folks. > > > AFR uses a database like journal to keep track of changes. When a node > > > comes back up, there will be a record on other nodes to say *some* >changes > are pending on the node that was down. AFR then simply copies >the entire > file and/or creates files on the node that came back up. > >So is that the point of the lazy healing? When a node is defined as >''dirty'' any file access is verified with the ''other'' node? What then >determines when the pair of nodes are ''clean''? Is that the >responsibility of the surviving node? > >My case is this - I have 2 nodes with 10,000,000 + files AFR''d for >disaster tolerance. Im developing an SOP for restoration after an event >but am working through consequences of aftershock events, and I am not >clear at presenthere is how I think things would have to work for your case. basically you''d have to use one of the "find" commands to force auto-healing on the DR volume after a service interruption. the number of files wouldn''t be the bottleneck in this case but rather the number of directories. (they''ll correct me if I''m wrong). recently, I went through this with a server.. it was offline for about 24 hours, and I ran the find command from the wiki and it took about 20 minutes to run through a 40 GB volume (there weren''t many updates.. if you have a lot of updates itll take longer obviously. it seemed that large directories are a performance issue run time, but seem to make it faster when you''re recovering. Keith
At 11:47 PM 10/22/2008, martin wrote:>Thanks for the prompt reply, folks. > > > AFR uses a database like journal to keep track of changes. When a node > > > comes back up, there will be a record on other nodes to say *some* >changes > are pending on the node that was down. AFR then simply copies >the entire > file and/or creates files on the node that came back up. > >So is that the point of the lazy healing? When a node is defined as >''dirty'' any file access is verified with the ''other'' node? What then >determines when the pair of nodes are ''clean''? Is that the >responsibility of the surviving node? > >My case is this - I have 2 nodes with 10,000,000 + files AFR''d for >disaster tolerance. Im developing an SOP for restoration after an event >but am working through consequences of aftershock events, and I am not >clear at presenthere is how I think things would have to work for your case. basically you''d have to use one of the "find" commands to force auto-healing on the DR volume after a service interruption. the number of files wouldn''t be the bottleneck in this case but rather the number of directories. (they''ll correct me if I''m wrong). recently, I went through this with a server.. it was offline for about 24 hours, and I ran the find command from the wiki and it took about 20 minutes to run through a 40 GB volume (there weren''t many updates.. if you have a lot of updates itll take longer obviously. it seemed that large directories are a performance issue run time, but seem to make it faster when you''re recovering. Keith
At 11:47 PM 10/22/2008, martin wrote:>Thanks for the prompt reply, folks. > > > AFR uses a database like journal to keep track of changes. When a node > > > comes back up, there will be a record on other nodes to say *some* >changes > are pending on the node that was down. AFR then simply copies >the entire > file and/or creates files on the node that came back up. > >So is that the point of the lazy healing? When a node is defined as >''dirty'' any file access is verified with the ''other'' node? What then >determines when the pair of nodes are ''clean''? Is that the >responsibility of the surviving node? > >My case is this - I have 2 nodes with 10,000,000 + files AFR''d for >disaster tolerance. Im developing an SOP for restoration after an event >but am working through consequences of aftershock events, and I am not >clear at presenthere is how I think things would have to work for your case. basically you''d have to use one of the "find" commands to force auto-healing on the DR volume after a service interruption. the number of files wouldn''t be the bottleneck in this case but rather the number of directories. (they''ll correct me if I''m wrong). recently, I went through this with a server.. it was offline for about 24 hours, and I ran the find command from the wiki and it took about 20 minutes to run through a 40 GB volume (there weren''t many updates.. if you have a lot of updates itll take longer obviously. it seemed that large directories are a performance issue run time, but seem to make it faster when you''re recovering. Keith