Raghavendra Gowdappa
2015-Sep-04 07:13 UTC
[Gluster-users] [posix-compliance] unlink and access to file through open fd
All, Posix allows access to file through open fds even if name associated with file is deleted. While this works for glusterfs for most of the cases, there are some corner cases where we fail. 1. Reboot of brick: ================== With the reboot of brick, fd is lost. unlink would've deleted both gfid and path links to file and we would loose the file. As a solution, perhaps we should create an hardlink to the file (say in .glusterfs) which gets deleted only when last fd is closed? 2. Graph switch: ================ The issue is captured in bz 1259995 [1]. Pasting the content from bz verbatim: Consider following sequence of operations: 1. fd = open ("/mnt/glusterfs/file"); 2. unlink ("/mnt/glusterfs/file"); 3. Do a graph-switch, lets say by adding a new brick to volume. 4. migration of fd to new graph fails. This is because as part of migration we do a lookup and open. But, lookup fails as file is already deleted and hence migration fails and fd is marked bad. In fact this test case is already present in our regression tests, though the test checks whether the fd is just marked as bad. But the expectation of filing this bug is that migration should succeed. This is possible since there is an fd opened on brick through old-graph and hence can be duped using dup syscall. Of course the solution outlined here doesn't cover the case where file is not present on brick at all. For eg., a new brick was added to replica set and that new brick doesn't contain the file. Now, since the file is deleted, how do replica heals that file to another brick etc. But atleast this can be solved for those cases where file was present on a brick and fd was already opened. 3. Open-behind and unlink from a different client: ================================================= While open-behind handles unlink from the same client (through which open was performed), if unlink and open are done from two different clients, file is lost. I cannot think of any good solution for this. I wanted to know whether these problems are real enough to channel our efforts to fix these issues. Comments are welcome in terms of solutions or other possible scenarios which can lead to this issue. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1259995 regards, Raghavendra.
Raghavendra Bhat
2015-Sep-04 07:41 UTC
[Gluster-users] [posix-compliance] unlink and access to file through open fd
On 09/04/2015 12:43 PM, Raghavendra Gowdappa wrote:> All, > > Posix allows access to file through open fds even if name associated with file is deleted. While this works for glusterfs for most of the cases, there are some corner cases where we fail. > > 1. Reboot of brick: > ==================> > With the reboot of brick, fd is lost. unlink would've deleted both gfid and path links to file and we would loose the file. As a solution, perhaps we should create an hardlink to the file (say in .glusterfs) which gets deleted only when last fd is closed? > > 2. Graph switch: > ================> > The issue is captured in bz 1259995 [1]. Pasting the content from bz verbatim: > Consider following sequence of operations: > 1. fd = open ("/mnt/glusterfs/file"); > 2. unlink ("/mnt/glusterfs/file"); > 3. Do a graph-switch, lets say by adding a new brick to volume. > 4. migration of fd to new graph fails. This is because as part of migration we do a lookup and open. But, lookup fails as file is already deleted and hence migration fails and fd is marked bad. > > In fact this test case is already present in our regression tests, though the test checks whether the fd is just marked as bad. But the expectation of filing this bug is that migration should succeed. This is possible since there is an fd opened on brick through old-graph and hence can be duped using dup syscall. > > Of course the solution outlined here doesn't cover the case where file is not present on brick at all. For eg., a new brick was added to replica set and that new brick doesn't contain the file. Now, since the file is deleted, how do replica heals that file to another brick etc. > > But atleast this can be solved for those cases where file was present on a brick and fd was already opened.Du, For this 2nd example (where the file is opened, unlinked and a graph swatch happens), there was a patch submitted long back. http://review.gluster.org/#/c/5428/ Regards, Raghavendra Bhat> 3. Open-behind and unlink from a different client: > =================================================> > While open-behind handles unlink from the same client (through which open was performed), if unlink and open are done from two different clients, file is lost. I cannot think of any good solution for this. > > I wanted to know whether these problems are real enough to channel our efforts to fix these issues. Comments are welcome in terms of solutions or other possible scenarios which can lead to this issue. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1259995 > > regards, > Raghavendra. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
Prashanth Pai
2015-Sep-04 09:05 UTC
[Gluster-users] [Gluster-devel] [posix-compliance] unlink and access to file through open fd
----- Original Message -----> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com> > To: gluster-devel at gluster.org > Cc: gluster-users at gluster.org > Sent: Friday, September 4, 2015 12:43:09 PM > Subject: [Gluster-devel] [posix-compliance] unlink and access to file through open fd > > All, > > Posix allows access to file through open fds even if name associated with > file is deleted. While this works for glusterfs for most of the cases, there > are some corner cases where we fail. > > 1. Reboot of brick: > ==================> > With the reboot of brick, fd is lost. unlink would've deleted both gfid and > path links to file and we would loose the file. As a solution, perhaps we > should create an hardlink to the file (say in .glusterfs) which gets deleted > only when last fd is closed? > > 2. Graph switch: > ================> > The issue is captured in bz 1259995 [1]. Pasting the content from bz > verbatim: > Consider following sequence of operations: > 1. fd = open ("/mnt/glusterfs/file"); > 2. unlink ("/mnt/glusterfs/file"); > 3. Do a graph-switch, lets say by adding a new brick to volume. > 4. migration of fd to new graph fails. This is because as part of migration > we do a lookup and open. But, lookup fails as file is already deleted and > hence migration fails and fd is marked bad. > > In fact this test case is already present in our regression tests, though the > test checks whether the fd is just marked as bad. But the expectation of > filing this bug is that migration should succeed. This is possible since > there is an fd opened on brick through old-graph and hence can be duped > using dup syscall. > > Of course the solution outlined here doesn't cover the case where file is not > present on brick at all. For eg., a new brick was added to replica set and > that new brick doesn't contain the file. Now, since the file is deleted, how > do replica heals that file to another brick etc. > > But atleast this can be solved for those cases where file was present on a > brick and fd was already opened. > > 3. Open-behind and unlink from a different client: > =================================================> > While open-behind handles unlink from the same client (through which open was > performed), if unlink and open are done from two different clients, file is > lost. I cannot think of any good solution for this.We *may* have hit this once earlier when we had multiple instances of object-expirer daemon deleting huge number of objects (files). This was only observed at scale - deleting a million objects. Our user-space application flow was roughly as follows: fd = open(...) s = stat(fd) fgetxattr(fd, ....) In our case, open() and stat() succeeded but fgetxattr() failed with ENOENT (many times with ESTALE too) probably because some other client has done an unlink() on the file name already. Is this behavior normal ? @Thiago: Remember this one? http://paste.openstack.org/show/357414/ https://gist.github.com/thiagodasilva/491e405a3385f0e85cc9> > I wanted to know whether these problems are real enough to channel our > efforts to fix these issues. Comments are welcome in terms of solutions or > other possible scenarios which can lead to this issue. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1259995 > > regards, > Raghavendra. > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel >