Hi! I have trivial problem with self healing. Maybe somebody will be able to tell mi what am I doing wrong, and why do the files not heal as I expect. Configuration: Servers: two nodes A, B --------- volume posix type storage/posix option directory /ext3/glusterfs13/brick end-volume volume brick type features/posix-locks option mandatory on subvolumes posix end-volume volume server type protocol/server option transport-type tcp/server option auth.ip.brick.allow * option auth.ip.brick-ns.allow * subvolumes brick end-volume -------- Client: C ------- volume brick1 type protocol/client option transport-type tcp/client option remote-host A option remote-subvolume brick end-volume volume brick2 type protocol/client option transport-type tcp/client option remote-host B option remote-subvolume brick end-volume volume afr type cluster/afr subvolumes brick1 brick2 end-volume volume iot type performance/io-threads subvolumes afr option thread-count 8 end-volume Scenario: 1. mount remote afr brick on C 2. do some ops 3. stop the server A (to simulate machine failure) 4. wait some time so clock skew beetween A and B is not an issue 5. write file X to gluster mount on C 6. start the server A 7. wait for C to reconnect to A 8. wait some time so clock skew beetween A and B is not an issue 9. touch, read, stat, write to file X, ls the dir in which X is (all on gluser mount on C) And here is the problem. Whatever I do I cant make the file X appear on backend fs on brick A which was down when file X was created. Help is really appricciate. PS. I discussed, similar auto-healing problem on gluster-devel some time ago, and then it magically worked once, so i stopped thinking about it. Today I see it again and as we are willing to use glusterfs in production soon auto-heal functionality is crucial. Regards, Lukasz Osipiuk. -- ?ukasz Osipiuk mailto: lukasz at osipiuk.net
I forgot one thing, Software version is 1.3.12 glusterfs --version glusterfs 1.3.12 built on Nov 7 2008 18:57:06 Repository revision: glusterfs--mainline--2.5--patch-797 Copyright (c) 2006, 2007, 2008 Z RESEARCH Inc. <http://www.zresearch.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. On Sun, Nov 9, 2008 at 1:32 PM, ?ukasz Osipiuk <lukasz at osipiuk.net> wrote:> Hi! > > I have trivial problem with self healing. Maybe somebody will be able > to tell mi what am I doing wrong, and why do the files not heal as I > expect. > > Configuration: > > Servers: two nodes A, B > --------- > volume posix > type storage/posix > option directory /ext3/glusterfs13/brick > end-volume > > volume brick > type features/posix-locks > option mandatory on > subvolumes posix > end-volume > > volume server > type protocol/server > option transport-type tcp/server > option auth.ip.brick.allow * > option auth.ip.brick-ns.allow * > subvolumes brick > end-volume > -------- > > Client: C > ------- > volume brick1 > type protocol/client > option transport-type tcp/client > option remote-host A > option remote-subvolume brick > end-volume > > volume brick2 > type protocol/client > option transport-type tcp/client > option remote-host B > option remote-subvolume brick > end-volume > > volume afr > type cluster/afr > subvolumes brick1 brick2 > end-volume > > volume iot > type performance/io-threads > subvolumes afr > option thread-count 8 > end-volume > > Scenario: > > 1. mount remote afr brick on C > 2. do some ops > 3. stop the server A (to simulate machine failure) > 4. wait some time so clock skew beetween A and B is not an issue > 5. write file X to gluster mount on C > 6. start the server A > 7. wait for C to reconnect to A > 8. wait some time so clock skew beetween A and B is not an issue > 9. touch, read, stat, write to file X, ls the dir in which X is (all > on gluser mount on C) > > And here is the problem. Whatever I do I cant make the file X appear > on backend fs on brick A which > was down when file X was created. Help is really appricciate. > > > PS. I discussed, similar auto-healing problem on gluster-devel some > time ago, and then it magically worked once, so i stopped thinking > about it. Today I see it again and as we are willing to use glusterfs > in production soon auto-heal functionality is crucial. > > Regards, Lukasz Osipiuk. > > -- > ?ukasz Osipiuk > mailto: lukasz at osipiuk.net >-- ?ukasz Osipiuk mailto: lukasz at osipiuk.net
Krishna Srinivas
2008-Nov-09 20:21 UTC
[Gluster-users] Still problem with trivial self heal
Lukasz, Do you remain in the same "pwd" when A goes down and comes back up? If so can you see if you "cd" out of glusterfs mount point and "cd" back in again and see if that fixes the problem? Thanks Krishna On Sun, Nov 9, 2008 at 9:03 PM, ?ukasz Osipiuk <lukasz at osipiuk.net> wrote:> I forgot one thing, > > Software version is 1.3.12 > glusterfs --version > glusterfs 1.3.12 built on Nov 7 2008 18:57:06 > Repository revision: glusterfs--mainline--2.5--patch-797 > Copyright (c) 2006, 2007, 2008 Z RESEARCH Inc. <http://www.zresearch.com> > GlusterFS comes with ABSOLUTELY NO WARRANTY. > You may redistribute copies of GlusterFS under the terms of the GNU > General Public License. > > > On Sun, Nov 9, 2008 at 1:32 PM, ?ukasz Osipiuk <lukasz at osipiuk.net> wrote: >> Hi! >> >> I have trivial problem with self healing. Maybe somebody will be able >> to tell mi what am I doing wrong, and why do the files not heal as I >> expect. >> >> Configuration: >> >> Servers: two nodes A, B >> --------- >> volume posix >> type storage/posix >> option directory /ext3/glusterfs13/brick >> end-volume >> >> volume brick >> type features/posix-locks >> option mandatory on >> subvolumes posix >> end-volume >> >> volume server >> type protocol/server >> option transport-type tcp/server >> option auth.ip.brick.allow * >> option auth.ip.brick-ns.allow * >> subvolumes brick >> end-volume >> -------- >> >> Client: C >> ------- >> volume brick1 >> type protocol/client >> option transport-type tcp/client >> option remote-host A >> option remote-subvolume brick >> end-volume >> >> volume brick2 >> type protocol/client >> option transport-type tcp/client >> option remote-host B >> option remote-subvolume brick >> end-volume >> >> volume afr >> type cluster/afr >> subvolumes brick1 brick2 >> end-volume >> >> volume iot >> type performance/io-threads >> subvolumes afr >> option thread-count 8 >> end-volume >> >> Scenario: >> >> 1. mount remote afr brick on C >> 2. do some ops >> 3. stop the server A (to simulate machine failure) >> 4. wait some time so clock skew beetween A and B is not an issue >> 5. write file X to gluster mount on C >> 6. start the server A >> 7. wait for C to reconnect to A >> 8. wait some time so clock skew beetween A and B is not an issue >> 9. touch, read, stat, write to file X, ls the dir in which X is (all >> on gluser mount on C) >> >> And here is the problem. Whatever I do I cant make the file X appear >> on backend fs on brick A which >> was down when file X was created. Help is really appricciate. >> >> >> PS. I discussed, similar auto-healing problem on gluster-devel some >> time ago, and then it magically worked once, so i stopped thinking >> about it. Today I see it again and as we are willing to use glusterfs >> in production soon auto-heal functionality is crucial. >> >> Regards, Lukasz Osipiuk. >> >> -- >> ?ukasz Osipiuk >> mailto: lukasz at osipiuk.net >> > > > > -- > ?ukasz Osipiuk > mailto: lukasz at osipiuk.net > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >
Krishna Srinivas
2008-Nov-10 12:15 UTC
[Gluster-users] Still problem with trivial self heal
Lukasz, That behavior is a bug. It will be fixed in coming releases. Regards Krishna On Mon, Nov 10, 2008 at 5:18 PM, ?ukasz Osipiuk <lukasz at osipiuk.net> wrote:> 2008/11/9 Krishna Srinivas <krishna at zresearch.com>: >> Lukasz, >> Do you remain in the same "pwd" when A goes down and comes back up? If >> so can you see if you "cd" out of glusterfs mount point and "cd" back >> in again and see if that fixes the problem? > > Hmm - it helped :) > > Can I rely on such behaviour. Would directory be healed if some other > process on the same or another machine still > used it as working directory? > > Thanks a lot, Lukasz > > >> Thanks >> Krishna >> >> On Sun, Nov 9, 2008 at 9:03 PM, ?ukasz Osipiuk <lukasz at osipiuk.net> wrote: >>> I forgot one thing, >>> >>> Software version is 1.3.12 >>> glusterfs --version >>> glusterfs 1.3.12 built on Nov 7 2008 18:57:06 >>> Repository revision: glusterfs--mainline--2.5--patch-797 >>> Copyright (c) 2006, 2007, 2008 Z RESEARCH Inc. <http://www.zresearch.com> >>> GlusterFS comes with ABSOLUTELY NO WARRANTY. >>> You may redistribute copies of GlusterFS under the terms of the GNU >>> General Public License. >>> >>> >>> On Sun, Nov 9, 2008 at 1:32 PM, ?ukasz Osipiuk <lukasz at osipiuk.net> wrote: >>>> Hi! >>>> >>>> I have trivial problem with self healing. Maybe somebody will be able >>>> to tell mi what am I doing wrong, and why do the files not heal as I >>>> expect. >>>> >>>> Configuration: >>>> >>>> Servers: two nodes A, B >>>> --------- >>>> volume posix >>>> type storage/posix >>>> option directory /ext3/glusterfs13/brick >>>> end-volume >>>> >>>> volume brick >>>> type features/posix-locks >>>> option mandatory on >>>> subvolumes posix >>>> end-volume >>>> >>>> volume server >>>> type protocol/server >>>> option transport-type tcp/server >>>> option auth.ip.brick.allow * >>>> option auth.ip.brick-ns.allow * >>>> subvolumes brick >>>> end-volume >>>> -------- >>>> >>>> Client: C >>>> ------- >>>> volume brick1 >>>> type protocol/client >>>> option transport-type tcp/client >>>> option remote-host A >>>> option remote-subvolume brick >>>> end-volume >>>> >>>> volume brick2 >>>> type protocol/client >>>> option transport-type tcp/client >>>> option remote-host B >>>> option remote-subvolume brick >>>> end-volume >>>> >>>> volume afr >>>> type cluster/afr >>>> subvolumes brick1 brick2 >>>> end-volume >>>> >>>> volume iot >>>> type performance/io-threads >>>> subvolumes afr >>>> option thread-count 8 >>>> end-volume >>>> >>>> Scenario: >>>> >>>> 1. mount remote afr brick on C >>>> 2. do some ops >>>> 3. stop the server A (to simulate machine failure) >>>> 4. wait some time so clock skew beetween A and B is not an issue >>>> 5. write file X to gluster mount on C >>>> 6. start the server A >>>> 7. wait for C to reconnect to A >>>> 8. wait some time so clock skew beetween A and B is not an issue >>>> 9. touch, read, stat, write to file X, ls the dir in which X is (all >>>> on gluser mount on C) >>>> >>>> And here is the problem. Whatever I do I cant make the file X appear >>>> on backend fs on brick A which >>>> was down when file X was created. Help is really appricciate. >>>> >>>> >>>> PS. I discussed, similar auto-healing problem on gluster-devel some >>>> time ago, and then it magically worked once, so i stopped thinking >>>> about it. Today I see it again and as we are willing to use glusterfs >>>> in production soon auto-heal functionality is crucial. >>>> >>>> Regards, Lukasz Osipiuk. >>>> >>>> -- >>>> ?ukasz Osipiuk >>>> mailto: lukasz at osipiuk.net >>>> >>> >>> >>> >>> -- >>> ?ukasz Osipiuk >>> mailto: lukasz at osipiuk.net >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >>> >> > > > > -- > ?ukasz Osipiuk > mailto: lukasz at osipiuk.net >