Vladimir V. Saveliev
2008-Feb-15 12:54 UTC
[Lustre-devel] subtree locks and path re-validation avoidance
Hello On last rabbit meeting in Moscow we agreed, that with subtree locks (http://arch.lustre.org/index.php?title=Sub_Tree_Locks) any use of ".." on client requires path re-validation. The example shows the details: 1. A client C1 holds ordinary lock on an object O1 (it did chdir(/a/b/c/d/e), O1 is inode of /a/b/c/d/e). C1 is idle now. 2. Another client C2 does ls -ld /a/b/c/d/e, MD server sends a BAST to C1 and C1 cancels the lock of O1. 3. C2 is not interested anymore in O1, so it drops the lock. 4. Yet another client C3 acquires subtree lock on /a/b and caches and possibly changes (if under WBC) objects under /a/b including /a/b/c/d/e (the object O1). The key issue is that MDS neither remembers about O1 on C1 nor keeps information about objects cached by a client under a subtree lock. 5. Now C1 continues with stat(``.''''). It sees that the lock on O1 is canceled, so it goes to MD server and acquires the lock on O1. Now we have: uptodate O1 is on C3; MDS has a request for O1 from C1 and MDS can not easily deterimine whether O1 is under any subtree lock. In order to find whether the lock conflict exists we need to have a special procedure. It is referred to as path re-validation. The main thing to be done on path re-validation is to look for above subtree lock. While it is probably doable, the path re-validation is not going to be very efficient (especially in case of CMD). I can provide more details if necessary. However, it looks like it is possible to avoid having to do path re-validation completely. The problem appears when clients request locks on objects directly, without doing downward lookup through a directory structure. This happens, for example, when clients access directly components of current working directories (CWDs). If a client cancels locks on such objects (either due to a BAST or voluntary) - it has to go through the path re-validation later. Objects to which a client may access directly appear in result of normal downward lookup. Therefore, they were locked, and their locks can be canceled. That is the point where we can take care about future accesses without re-validation. On canceling a lock of directly accessible object we have to inform DLM that the ordinary locking has to be used for that object. That will prevent the object from getting cached under a subtree lock. The problem with this schema is to determine which objects are directly accessible. But wouldn''t solving it be worth doing given that it may help to avoid path re-validation deal. Any comments are welcome. Best regards, Vladimir
Alex Zhuravlev
2008-Feb-21 13:30 UTC
[Lustre-devel] subtree locks and path re-validation avoidance
Hi, couple comments inline ... Vladimir V. Saveliev wrote:> The example shows the details: > > 1. A client C1 holds ordinary lock on an object O1 (it did > chdir(/a/b/c/d/e), O1 is inode of /a/b/c/d/e). C1 is idle now.chdir doesn''t return any lock. should it?> 2. Another client C2 does ls -ld /a/b/c/d/e, MD server sends a BAST to > C1 and C1 cancels the lock of O1. > > 3. C2 is not interested anymore in O1, so it drops the lock. > > 4. Yet another client C3 acquires subtree lock on /a/b and caches and > possibly changes (if under WBC) objects under /a/b including /a/b/c/d/e > (the object O1). The key issue is that MDS neither remembers about O1 on > C1 nor keeps information about objects cached by a client under a > subtree lock. > > 5. Now C1 continues with stat(``.''''). It sees that the lock on O1 is > canceled, so it goes to MD server and acquires the lock on O1. > > Now we have: > uptodate O1 is on C3; > MDS has a request for O1 from C1 and MDS can not easily deterimine > whether O1 is under any subtree lock. In order to find whether the lock > conflict exists we need to have a special procedure. It is referred to > as path re-validation. > > The main thing to be done on path re-validation is to look for above > subtree lock. While it is probably doable, the path re-validation is not > going to be very efficient (especially in case of CMD). I can provide > more details if necessary. > > > However, it looks like it is possible to avoid having to do path > re-validation completely. > > > The problem appears when clients request locks on objects directly, > without doing downward lookup through a directory structure. > This happens, for example, when clients access directly components of > current working directories (CWDs). > If a client cancels locks on such objects (either due to a BAST or > voluntary) - it has to go through the path re-validation later. > > Objects to which a client may access directly appear in result of normal > downward lookup. Therefore, they were locked, and their locks can be > canceled. That is the point where we can take care about future accesses > without re-validation. > On canceling a lock of directly accessible object we have to inform DLM > that the ordinary locking has to be used for that object. That will > prevent the object from getting cached under a subtree lock.1) there may be thousands of such objects (many processes on many nodes) 2) it''s not clear when to enable this back> > The problem with this schema is to determine which objects are directly > accessible. But wouldn''t solving it be worth doing given that it may > help to avoid path re-validation deal. > > Any comments are welcome.thanks, Alex
Peter J Braam
2008-Feb-23 03:15 UTC
[Lustre-devel] subtree locks and path re-validation avoidance
I''d like to make a suggestion to perhaps immediately find the right primitives for getcwd to return a reasonably correct pathname in Lustre. I believe this is the simplest case where I have seen pathname revalidation being important. In the context of that example the subtree lock discussion might gain more clarity. I would also like to note that I had a discussion with Linus at one of the kernel workshops in Ottawa maybe almost 4-5 years ago. First Linus attacked the idea of using file identifiers - he suggested that doing everything with pathnames was better (which is what InterMezzo did). When we explained to him that this requires locking all parents he began to see the problems we had with this and understood the locking at the fid/name level that we use in Lustre. I found little resistance when I mentioned to him that for this model the VFS does not have a correct implementation of getcwd, unless the dcache is kept current. UCSC has received funding from the National Labs and now been turned into a peta-scale I/O institute I believe did more results on file systems implemented with pathnames. Some things are beautiful and easy with pathnames, but others get really ugly, and so far I don''t see this displacing fid ideas that govern NFS, AFS and Lustre. - Peter - Alex Zhuravlev wrote:> Hi, > > couple comments inline ... > > Vladimir V. Saveliev wrote: > >> The example shows the details: >> >> 1. A client C1 holds ordinary lock on an object O1 (it did >> chdir(/a/b/c/d/e), O1 is inode of /a/b/c/d/e). C1 is idle now. >> > > chdir doesn''t return any lock. should it? > > >> 2. Another client C2 does ls -ld /a/b/c/d/e, MD server sends a BAST to >> C1 and C1 cancels the lock of O1. >> >> 3. C2 is not interested anymore in O1, so it drops the lock. >> >> 4. Yet another client C3 acquires subtree lock on /a/b and caches and >> possibly changes (if under WBC) objects under /a/b including /a/b/c/d/e >> (the object O1). The key issue is that MDS neither remembers about O1 on >> C1 nor keeps information about objects cached by a client under a >> subtree lock. >> >> 5. Now C1 continues with stat(``.''''). It sees that the lock on O1 is >> canceled, so it goes to MD server and acquires the lock on O1. >> >> Now we have: >> uptodate O1 is on C3; >> MDS has a request for O1 from C1 and MDS can not easily deterimine >> whether O1 is under any subtree lock. In order to find whether the lock >> conflict exists we need to have a special procedure. It is referred to >> as path re-validation. >> >> The main thing to be done on path re-validation is to look for above >> subtree lock. While it is probably doable, the path re-validation is not >> going to be very efficient (especially in case of CMD). I can provide >> more details if necessary. >> >> >> However, it looks like it is possible to avoid having to do path >> re-validation completely. >> >> >> The problem appears when clients request locks on objects directly, >> without doing downward lookup through a directory structure. >> This happens, for example, when clients access directly components of >> current working directories (CWDs). >> If a client cancels locks on such objects (either due to a BAST or >> voluntary) - it has to go through the path re-validation later. >> >> Objects to which a client may access directly appear in result of normal >> downward lookup. Therefore, they were locked, and their locks can be >> canceled. That is the point where we can take care about future accesses >> without re-validation. >> On canceling a lock of directly accessible object we have to inform DLM >> that the ordinary locking has to be used for that object. That will >> prevent the object from getting cached under a subtree lock. >> > > 1) there may be thousands of such objects (many processes on many nodes) > 2) it''s not clear when to enable this back > > >> The problem with this schema is to determine which objects are directly >> accessible. But wouldn''t solving it be worth doing given that it may >> help to avoid path re-validation deal. >> >> Any comments are welcome. >> > > thanks, Alex > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >
Alexander Zarochentsev
2008-Feb-24 21:01 UTC
[Lustre-devel] subtree locks and path re-validation avoidance
Hi, On 15 February 2008 15:54:22 Vladimir V. Saveliev wrote:> Hello[...]> > The problem appears when clients request locks on objects directly, > without doing downward lookup through a directory structure. > This happens, for example, when clients access directly components of > current working directories (CWDs). > If a client cancels locks on such objects (either due to a BAST or > voluntary) - it has to go through the path re-validation later. > > Objects to which a client may access directly appear in result of > normal downward lookup. Therefore, they were locked, and their locks > can be canceled. That is the point where we can take care about > future accesses without re-validation.what to do if a lookup looses all its locks due to a conflict with STL holder? Of course the parent lock can be correctly re-acquired but the lookup result may be incorrect -- the result lock may be done for STL-cached object. There is a more detailed example: Suppose parent lock in a lookup step has been lost before acquiring a lock on child. If we don''t want to perform path re-validation we have to inform any STL-holder that the child is not-stl-lockable. I think the problem looks even worse, any ancestor of the revoked parent should be not-stl-lockable. When lookup (C1) holding at least one lock, the lock is a barrier for STL-holder, who can''t go over the lock into the subtree. Once C1 looses all its locks, an STL-holder may leak into the subtree, cache those locks and cause locking correctness problems. I suggest any STL-holder to switch to ordinary locks mode when going over that parent dir. The parent dir would have a marker: "well, STL behave not well under this dir, please use ordinary locks instead".> On canceling a lock of directly accessible object we have to inform > DLM that the ordinary locking has to be used for that object. That > will prevent the object from getting cached under a subtree lock. > > The problem with this schema is to determine which objects are > directly accessible. But wouldn''t solving it be worth doing given > that it may help to avoid path re-validation deal. > > Any comments are welcome. > > Best regards, > Vladimir >Thanks.
Vladimir V. Saveliev
2008-Feb-28 19:01 UTC
[Lustre-devel] subtree locks and path re-validation avoidance
Hello On Fri, 2008-02-22 at 20:15 -0700, Peter J Braam wrote:> I''d like to make a suggestion to perhaps immediately find the right > primitives for getcwd to return a reasonably correct pathname in > Lustre. I believe this is the simplest case where I have seen pathname > revalidation being important. In the context of that example the > subtree lock discussion might gain more clarity.Can we have something like th When a client cancels a lock on an object on CWD, server sets NULL mode lock for the object. That NULL mode lock indicates that the object is> > I would also like to note that I had a discussion with Linus at one of > the kernel workshops in Ottawa maybe almost 4-5 years ago. First Linus > attacked the idea of using file identifiers - he suggested that doing > everything with pathnames was better (which is what InterMezzo did). > When we explained to him that this requires locking all parents he began > to see the problems we had with this and understood the locking at the > fid/name level that we use in Lustre. I found little resistance when I > mentioned to him that for this model the VFS does not have a correct > implementation of getcwd, unless the dcache is kept current. > > UCSC has received funding from the National Labs and now been turned > into a peta-scale I/O institute I believe did more results on file > systems implemented with pathnames. Some things are beautiful and easy > with pathnames, but others get really ugly, and so far I don''t see this > displacing fid ideas that govern NFS, AFS and Lustre. > > - Peter - > > Alex Zhuravlev wrote: > > Hi, > > > > couple comments inline ... > > > > Vladimir V. Saveliev wrote: > > > >> The example shows the details: > >> > >> 1. A client C1 holds ordinary lock on an object O1 (it did > >> chdir(/a/b/c/d/e), O1 is inode of /a/b/c/d/e). C1 is idle now. > >> > > > > chdir doesn''t return any lock. should it? > > > > > >> 2. Another client C2 does ls -ld /a/b/c/d/e, MD server sends a BAST to > >> C1 and C1 cancels the lock of O1. > >> > >> 3. C2 is not interested anymore in O1, so it drops the lock. > >> > >> 4. Yet another client C3 acquires subtree lock on /a/b and caches and > >> possibly changes (if under WBC) objects under /a/b including /a/b/c/d/e > >> (the object O1). The key issue is that MDS neither remembers about O1 on > >> C1 nor keeps information about objects cached by a client under a > >> subtree lock. > >> > >> 5. Now C1 continues with stat(``.''''). It sees that the lock on O1 is > >> canceled, so it goes to MD server and acquires the lock on O1. > >> > >> Now we have: > >> uptodate O1 is on C3; > >> MDS has a request for O1 from C1 and MDS can not easily deterimine > >> whether O1 is under any subtree lock. In order to find whether the lock > >> conflict exists we need to have a special procedure. It is referred to > >> as path re-validation. > >> > >> The main thing to be done on path re-validation is to look for above > >> subtree lock. While it is probably doable, the path re-validation is not > >> going to be very efficient (especially in case of CMD). I can provide > >> more details if necessary. > >> > >> > >> However, it looks like it is possible to avoid having to do path > >> re-validation completely. > >> > >> > >> The problem appears when clients request locks on objects directly, > >> without doing downward lookup through a directory structure. > >> This happens, for example, when clients access directly components of > >> current working directories (CWDs). > >> If a client cancels locks on such objects (either due to a BAST or > >> voluntary) - it has to go through the path re-validation later. > >> > >> Objects to which a client may access directly appear in result of normal > >> downward lookup. Therefore, they were locked, and their locks can be > >> canceled. That is the point where we can take care about future accesses > >> without re-validation. > >> On canceling a lock of directly accessible object we have to inform DLM > >> that the ordinary locking has to be used for that object. That will > >> prevent the object from getting cached under a subtree lock. > >> > > > > 1) there may be thousands of such objects (many processes on many nodes) > > 2) it''s not clear when to enable this back > > > > > >> The problem with this schema is to determine which objects are directly > >> accessible. But wouldn''t solving it be worth doing given that it may > >> help to avoid path re-validation deal. > >> > >> Any comments are welcome. > >> > > > > thanks, Alex > > _______________________________________________ > > Lustre-devel mailing list > > Lustre-devel at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-devel > >
Vladimir V. Saveliev
2008-Feb-28 19:05 UTC
[Lustre-devel] please ignore previous mail, it was sent accidentially Re: subtree locks and path re-validation avoidance
Oops. Sorry, I did not complete the mail On Thu, 2008-02-28 at 21:02 +0200, Vladimir V. Saveliev wrote:> Hello > > On Fri, 2008-02-22 at 20:15 -0700, Peter J Braam wrote: > > I''d like to make a suggestion to perhaps immediately find the right > > primitives for getcwd to return a reasonably correct pathname in > > Lustre. I believe this is the simplest case where I have seen pathname > > revalidation being important. In the context of that example the > > subtree lock discussion might gain more clarity. > > Can we have something like th > > When a client cancels a lock on an object on CWD, server sets NULL mode > lock for the object. That NULL mode lock indicates that the object is > > > > > I would also like to note that I had a discussion with Linus at one of > > the kernel workshops in Ottawa maybe almost 4-5 years ago. First Linus > > attacked the idea of using file identifiers - he suggested that doing > > everything with pathnames was better (which is what InterMezzo did). > > When we explained to him that this requires locking all parents he began > > to see the problems we had with this and understood the locking at the > > fid/name level that we use in Lustre. I found little resistance when I > > mentioned to him that for this model the VFS does not have a correct > > implementation of getcwd, unless the dcache is kept current. > > > > UCSC has received funding from the National Labs and now been turned > > into a peta-scale I/O institute I believe did more results on file > > systems implemented with pathnames. Some things are beautiful and easy > > with pathnames, but others get really ugly, and so far I don''t see this > > displacing fid ideas that govern NFS, AFS and Lustre. > > > > - Peter - > > > > Alex Zhuravlev wrote: > > > Hi, > > > > > > couple comments inline ... > > > > > > Vladimir V. Saveliev wrote: > > > > > >> The example shows the details: > > >> > > >> 1. A client C1 holds ordinary lock on an object O1 (it did > > >> chdir(/a/b/c/d/e), O1 is inode of /a/b/c/d/e). C1 is idle now. > > >> > > > > > > chdir doesn''t return any lock. should it? > > > > > > > > >> 2. Another client C2 does ls -ld /a/b/c/d/e, MD server sends a BAST to > > >> C1 and C1 cancels the lock of O1. > > >> > > >> 3. C2 is not interested anymore in O1, so it drops the lock. > > >> > > >> 4. Yet another client C3 acquires subtree lock on /a/b and caches and > > >> possibly changes (if under WBC) objects under /a/b including /a/b/c/d/e > > >> (the object O1). The key issue is that MDS neither remembers about O1 on > > >> C1 nor keeps information about objects cached by a client under a > > >> subtree lock. > > >> > > >> 5. Now C1 continues with stat(``.''''). It sees that the lock on O1 is > > >> canceled, so it goes to MD server and acquires the lock on O1. > > >> > > >> Now we have: > > >> uptodate O1 is on C3; > > >> MDS has a request for O1 from C1 and MDS can not easily deterimine > > >> whether O1 is under any subtree lock. In order to find whether the lock > > >> conflict exists we need to have a special procedure. It is referred to > > >> as path re-validation. > > >> > > >> The main thing to be done on path re-validation is to look for above > > >> subtree lock. While it is probably doable, the path re-validation is not > > >> going to be very efficient (especially in case of CMD). I can provide > > >> more details if necessary. > > >> > > >> > > >> However, it looks like it is possible to avoid having to do path > > >> re-validation completely. > > >> > > >> > > >> The problem appears when clients request locks on objects directly, > > >> without doing downward lookup through a directory structure. > > >> This happens, for example, when clients access directly components of > > >> current working directories (CWDs). > > >> If a client cancels locks on such objects (either due to a BAST or > > >> voluntary) - it has to go through the path re-validation later. > > >> > > >> Objects to which a client may access directly appear in result of normal > > >> downward lookup. Therefore, they were locked, and their locks can be > > >> canceled. That is the point where we can take care about future accesses > > >> without re-validation. > > >> On canceling a lock of directly accessible object we have to inform DLM > > >> that the ordinary locking has to be used for that object. That will > > >> prevent the object from getting cached under a subtree lock. > > >> > > > > > > 1) there may be thousands of such objects (many processes on many nodes) > > > 2) it''s not clear when to enable this back > > > > > > > > >> The problem with this schema is to determine which objects are directly > > >> accessible. But wouldn''t solving it be worth doing given that it may > > >> help to avoid path re-validation deal. > > >> > > >> Any comments are welcome. > > >> > > > > > > thanks, Alex > > > _______________________________________________ > > > Lustre-devel mailing list > > > Lustre-devel at lists.lustre.org > > > http://lists.lustre.org/mailman/listinfo/lustre-devel > > >
Vladimir V. Saveliev
2008-Feb-29 15:05 UTC
[Lustre-devel] subtree locks and path re-validation avoidance
Hello On Fri, 2008-02-22 at 20:15 -0700, Peter J Braam wrote:> I''d like to make a suggestion to perhaps immediately find the right > primitives for getcwd to return a reasonably correct pathname in > Lustre.Peter, would you say a bit more about that: currently, there is nothing a filesystem can do in linux''s getcwd. It simply returns instant dcache path from "." to "/".> I believe this is the simplest case where I have seen pathname > revalidation being important. In the context of that example the > subtree lock discussion might gain more clarity. > > I would also like to note that I had a discussion with Linus at one of > the kernel workshops in Ottawa maybe almost 4-5 years ago. First Linus > attacked the idea of using file identifiers - he suggested that doing > everything with pathnames was better (which is what InterMezzo did). > When we explained to him that this requires locking all parents he began > to see the problems we had with this and understood the locking at the > fid/name level that we use in Lustre. I found little resistance when I > mentioned to him that for this model the VFS does not have a correct > implementation of getcwd, unless the dcache is kept current. > > UCSC has received funding from the National Labs and now been turned > into a peta-scale I/O institute I believe did more results on file > systems implemented with pathnames. Some things are beautiful and easy > with pathnames, but others get really ugly, and so far I don''t see this > displacing fid ideas that govern NFS, AFS and Lustre. >Best regards, Vladimir
Peter Braam
2008-Mar-01 17:39 UTC
[Lustre-devel] subtree locks and path re-validation avoidance
Hi On 2/29/08 8:05 AM, "Vladimir V. Saveliev" <Vladimir.Saveliev at Sun.COM> wrote:> Hello > > On Fri, 2008-02-22 at 20:15 -0700, Peter J Braam wrote: >> I''d like to make a suggestion to perhaps immediately find the right >> primitives for getcwd to return a reasonably correct pathname in >> Lustre. > > Peter, would you say a bit more about that: > > currently, there is nothing a filesystem can do in linux''s getcwd. It > simply returns instant dcache path from "." to "/". >Yes. So the question is what new dentry methods we might add so that the dcache can call into the FS to validate the path. The second question is then if these would be useful for revalidating subtree lock paths. - Peter ->> I believe this is the simplest case where I have seen pathname >> revalidation being important. In the context of that example the >> subtree lock discussion might gain more clarity. >> >> I would also like to note that I had a discussion with Linus at one of >> the kernel workshops in Ottawa maybe almost 4-5 years ago. First Linus >> attacked the idea of using file identifiers - he suggested that doing >> everything with pathnames was better (which is what InterMezzo did). >> When we explained to him that this requires locking all parents he began >> to see the problems we had with this and understood the locking at the >> fid/name level that we use in Lustre. I found little resistance when I >> mentioned to him that for this model the VFS does not have a correct >> implementation of getcwd, unless the dcache is kept current. >> >> UCSC has received funding from the National Labs and now been turned >> into a peta-scale I/O institute I believe did more results on file >> systems implemented with pathnames. Some things are beautiful and easy >> with pathnames, but others get really ugly, and so far I don''t see this >> displacing fid ideas that govern NFS, AFS and Lustre. >> > > Best regards, > Vladimir > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel