Hello! We discussed a bit of this in Beijing last week, but decided to continue the discussion via email. So, I think it is a given we do not want to revoke a subtree lock every time somebody steps through it, because that will be too costly in a lot of cases. Anyway here is what I have in mind. STL locks could be granted by server regardless if they were requested by the client or not. We would require clients to provide a lock "cookie" with every operation they perform, in normal case that would be a handle they have on a parent directory. This cookie should allow a way to find out what server this cookie originates from (needed for CMD support). For the case of a different client stepping into area covered by STL lock, this client would get STL lock''s cookie and will start present it for all subsequent operations (also a special flag meaning that the client is not operating within STL). When the server receives a request with a cookie that is found out to be for STL lock, a callback is made to that lock (if necessary - through other server in CMD case) and information about currently-accessed fid and access mode is included, the client where the callback ends up on will do necessary writeout of the object content (flush dirty data for the case of a file, flush any metadata changes in case of a directory (needed for metadata writeback cache. Would be a server-noop for r/o access to directories before WBC is implemented) and aside from that if the operation is modifying, the STL-holding client would have to release the STL lock and would have a choice of completely flushing its cache for the subtree protected by the STL or obtaining STLs for parts of the tree below STL and retain its cache for those subtrees. Additionally for r/o access the STL-holding client would have extra choices of doing nothing (besides cache writeout flush for the object content) or allowing a server to issue a lock on that fid, in which case the client would flush its own cache for entire subtree starting with that fid first. If the lock cookie presented by the accessing client is determined to be invalid (rogue client, or lock was already released), a reverse lookup is performed up the tree (possibly crossing MDT boundaries) by the server in search of an already granted (to a client) lock or the root of the tree, whatever is met first. If during this lookup a lock is met, and it happens to be STL lock, its cookie is returned to the client along with indication of the STL lock presence, otherwise normal operations with normal lock granting occur. When a client gets STL lock for itself, it also performs all subsequent operations by presenting the STL lock handle. It might get a reply from a server indicating that the entry being accessed is "shared" (determined by server as an opened file or inode on which there are any locks granted to any clients) and a normal lock (or in case this area of the tree is covered by somebody else''s STL - that STL''s cookie) if needed. All metadata cached on behalf of STL lock is marked as such in the client''s cache. This approach allows for dynamically growing STL tree with ability to cut it at any level (by a presence of a lock in some part of the tree). Originally after issued, STL lock would span from the root of the subtree it was issued on to any points where other clients might have any cached information (or if no other clients hold locks there - for entire subtree), and then there is a possibility to cut some of the subsubtrees from the subtree protected by STL. This also allows for nested STLs held by different clients. One important thing that needs to be done in this scenario is we must ensure any process with CWD on lustre would have a lock on that directory if possible (of course we cannot refuse this lock revocation if other clients want to modify directory content). This would allow us to avoid costly reverse lookups to find if we are under any STL lock when we operate from a CWD on lustre (STL lock would just be cut at the CWD point with the normal lock). We would need to implement cross-MDT lock callbacks. I think it is safe to depend on clients to provide locks since if they don''t or provide invalid ones - we can find this out (and we can couple locks with some secure tokens if needed too), the only downside is rogue clients would be able to slow down servers to do all the reverse lookups (though if we just refuse to speak with clients that present invalid locks that were never present in the system on a non-root of the FS inode, that should be somewhat mitigated). The other alternative is to mark every server dentry with a STL marker during traversal, but in that case recovery in case of server restart becomes somewhat problematic, so I do not think this is a good idea. Bye, Oleg
On Jan 21, 2009 15:49 -0500, Oleg Drokin wrote:> So, I think it is a given we do not want to revoke a subtree lock > every time somebody steps through it, because that will be too costly > in a lot of cases.A few comments that I have from the later discussions: - you previously mentioned that only a single client would be able to hold a subtree lock. I think it is critical that multiple clients be able to get read subtree locks on the same directory. This would be very important for uses like many clients and a shared read-mostly directory like /usr/bin or /usr/lib. - Alex (I think) suggested that the STL locks would only be on a single directory and its contents, instead of being on an arbitrary depth sub-tree. While it seems somewhat appealing to have a single lock that covers an entire subtree, the complexity of having to locate and manage arbitrary-depth locks on the MDS might be too high. In most use cases it is pretty rare to have very deep subtrees, and the common case will be a large number of files in a single directory and a subtree lock will serve this use case equally well. Having only a single-level of subtree lock would avoid the need to pass cookies to the MDS for anything other than the directory in which names are being looked up.> Anyway here is what I have in mind. > > STL locks could be granted by server regardless if they were > requested by the client or not. > > We would require clients to provide a lock "cookie" with every > operation they perform, in normal case that would be a handle they > have on a parent directory. > This cookie should allow a way to find out what server this cookie > originates from (needed for CMD support). > > For the case of a different client stepping into area covered by > STL lock, this client would get STL lock''s cookie and will start > present it for all subsequent > operations (also a special flag meaning that the client is not > operating within STL). > When the server receives a request with a cookie that is found out > to be for STL lock, a callback is made to that lock (if necessary - > through other server in CMD case) > and information about currently-accessed fid and access mode is > included, the client where the callback ends up on will do necessary > writeout of the object content (flush dirty data > for the case of a file, flush any metadata changes in case of a > directory (needed for metadata writeback cache. Would be a server-noop > for r/o access to directories before > WBC is implemented) and aside from that if the operation is > modifying, the STL-holding client would have to release the STL lock > and would have a choice of completely > flushing its cache for the subtree protected by the STL or > obtaining STLs for parts of the tree below STL and retain its cache > for those subtrees. > Additionally for r/o access the STL-holding client would have > extra choices of doing nothing (besides cache writeout flush for the > object content) or allowing a server to > issue a lock on that fid, in which case the client would flush its > own cache for entire subtree starting with that fid first. > If the lock cookie presented by the accessing client is determined > to be invalid (rogue client, or lock was already released), a reverse > lookup is performed up the tree > (possibly crossing MDT boundaries) by the server in search of an > already granted (to a client) lock or the root of the tree, whatever > is met first. If during this > lookup a lock is met, and it happens to be STL lock, its cookie is > returned to the client along with indication of the STL lock presence, > otherwise normal > operations with normal lock granting occur. > > When a client gets STL lock for itself, it also performs all > subsequent operations by presenting the STL lock handle. It might get > a reply from a server indicating that > the entry being accessed is "shared" (determined by server as an > opened file or inode on which there are any locks granted to any > clients) and a normal lock (or in case this > area of the tree is covered by somebody else''s STL - that STL''s > cookie) if needed. All metadata cached on behalf of STL lock is marked > as such in the client''s cache. > > This approach allows for dynamically growing STL tree with ability > to cut it at any level (by a presence of a lock in some part of the > tree). Originally after issued, STL > lock would span from the root of the subtree it was issued on to > any points where other clients might have any cached information (or > if no other clients hold locks there - > for entire subtree), and then there is a possibility to cut some > of the subsubtrees from the subtree protected by STL. This also allows > for nested STLs held by different > clients. > One important thing that needs to be done in this scenario is we > must ensure any process with CWD on lustre would have a lock on that > directory if possible (of course we > cannot refuse this lock revocation if other clients want to modify > directory content). This would allow us to avoid costly reverse > lookups to find if we are under any STL > lock when we operate from a CWD on lustre (STL lock would just be > cut at the CWD point with the normal lock). > > We would need to implement cross-MDT lock callbacks. > > I think it is safe to depend on clients to provide locks since if > they don''t or provide invalid ones - we can find this out (and we can > couple locks with > some secure tokens if needed too), the only downside is rogue > clients would be able to slow down servers to do all the reverse > lookups (though if we just > refuse to speak with clients that present invalid locks that were > never present in the system on a non-root of the FS inode, that should > be somewhat mitigated). > The other alternative is to mark every server dentry with a STL > marker during traversal, but in that case recovery in case of server > restart becomes somewhat > problematic, so I do not think this is a good idea. > > > Bye, > Oleg > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-develCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hello! On Jan 26, 2009, at 5:08 AM, Andreas Dilger wrote:> A few comments that I have from the later discussions: > - you previously mentioned that only a single client would be able to > hold a subtree lock. I think it is critical that multiple clients be > able to get read subtree locks on the same directory. This would be > very important for uses like many clients and a shared read-mostly > directory like /usr/bin or /usr/lib.In fact I see zero benefit for read-only subtree lock except memory conservation, which should not be such a big issue. Much more important is to reduce amount of RPCs, esp. synchronous ones.> - Alex (I think) suggested that the STL locks would only be on a > single > directory and its contents, instead of being on an arbitrary depth > sub-tree. While it seems somewhat appealing to have a single lock > that covers an entire subtree, the complexity of having to locate > and manage arbitrary-depth locks on the MDS might be too high.That''s right.> Having only a single-level of subtree lock would avoid the need to > pass cookies to the MDS for anything other than the directory in > which names are being looked up.I had a lengthly call with Eric today and at the end we came to a conclusion that perhaps STL at the moment is a total overkill. What we need is ability to reduce metadata RPCs traffic. We can start with implementation of just allowing WRITE locks on a directory that would be only responsible for this directory and its content (HELPS: by allowing to aggregate creates into bunches before sending) + a special "entire file lock" (perhaps implemented by just WRITE lock on a file) metadata lock that would guard all file data without obtaining any locks from OSTs (would be revoked by open from another client, perhaps would need to support glimpses too). The WRITE directory lock only helps us to aggregate metadata RPCs if we just created the empty directory OR if we have entire list of entries in that directory. If we do not have entire directory content, we must issue synchronous create RPC to avoid cases where we locally create a file that already exists in that dir, for example. So perhaps in a lot of cases obtaining a write lock on a dir would need to be followed by some sort of bulk directory read (readdir+ of sorts). This is also not always feasible, as I can imagine there could be directories much bigger than what we would like to cache, in which case we would need to resort to one-by-one creates. Another important thing we would need is lock conversion (downconversion and try-up conversion) so that we do not lose our entire cached directory content after conflicting ls came in and we wrote it out. (we do not care all that much about writing out entire content of the dirty metadata cache at this point, since we still achieve the aggregation and asynchronous creation, even just asynchronous creation would help). Perhaps another useful addition would be to deliver multiple blocking and glimpse callbacks from server to the client in a single RPC (as a result of a readdir+ sort of operation inside a dir where many files have "entire file lock") (we already have aggregated cancels in the other direction). This WRITE metadata lock is in fact a reduced subset of STL lock without any of its advanced features, but perhaps easier to implement because of that. Bye, Oleg
On Jan 27, 2009 23:39 -0500, Oleg Drokin wrote:> On Jan 26, 2009, at 5:08 AM, Andreas Dilger wrote: >> A few comments that I have from the later discussions: >> - you previously mentioned that only a single client would be able to >> hold a subtree lock. I think it is critical that multiple clients be >> able to get read subtree locks on the same directory. This would be >> very important for uses like many clients and a shared read-mostly >> directory like /usr/bin or /usr/lib. > > In fact I see zero benefit for read-only subtree lock except memory > conservation, which should not be such a big issue. Much more important > is to reduce amount of RPCs, esp. synchronous ones.Memory conservation on the server is very important. If there are 100k clients and a DLM lock is 2kB in size then we are looking at 200MB for each lock given to all clients. With an MDS having, say, 32GB of RAM and we would consume all of the server RAM with only 160 locks/client.>> - Alex (I think) suggested that the STL locks would only be on a >> single >> directory and its contents, instead of being on an arbitrary depth >> sub-tree. While it seems somewhat appealing to have a single lock >> that covers an entire subtree, the complexity of having to locate >> and manage arbitrary-depth locks on the MDS might be too high. > > That''s right. > >> Having only a single-level of subtree lock would avoid the need to >> pass cookies to the MDS for anything other than the directory in >> which names are being looked up. > > I had a lengthly call with Eric today and at the end we came to a > conclusion that perhaps STL at the moment is a total overkill. > > What we need is ability to reduce metadata RPCs traffic.And to reduce memory usage for read locks on the server. Having READ STL for cases like read-mostly directories (/usr/bin, /usr/lib, ~/bin) can avoid many thousands/millions of locks and their RPCs.> We can start with implementation of just allowing WRITE locks on a > directory that would be only responsible for this directory and its > content (HELPS: by allowing to aggregate creates into bunches before > sending) + a special "entire file lock" (perhaps implemented by just > WRITE lock on a file) metadata lock that would guard all file data > without obtaining any locks from OSTs (would be revoked by open from > another client, perhaps would need to support glimpses too).Well, if the client will generate the layout on the newly-created files, or will request the layout (LOV EA) lock on the files it wants exclusive access to this is essentially the "entire file lock" you need. For existing files the client holding the layout lock needs to cancel the OST extent locks first, to ensure they flush their cache.> The WRITE directory lock only helps us to aggregate metadata RPCs if we > just created the empty directory OR if we have entire list of entries in > that directory. If we do not have entire directory content, we must > issue synchronous create RPC to avoid cases where we locally create a file > that already exists in that dir, for example. So perhaps in a lot of cases > obtaining a write lock on a dir would need to be followed by some sort > of bulk directory read (readdir+ of sorts). This is also not always > feasible, as I can imagine there could be directories much bigger than > what we would like to cache, in which case we would need to resort to > one-by-one creates.> Another important thing we would need is lock conversion (downconversion > and try-up conversion) so that we do not lose our entire cached directory > content after conflicting ls came in and we wrote it out. (we do not care > all that much about writing out entire content of the dirty metadata cache > at this point, since we still achieve the aggregation and asynchronous > creation, even just asynchronous creation would help).We also want to have lock conversion for regular files (write->read) and for the layout lock bit (so clients can drop the LOV EA lock without dropping the LOOKUP or UPDATE bits).> Perhaps another useful addition would be to deliver multiple blocking > and glimpse callbacks from server to the client in a single RPC (as a > result of a readdir+ sort of operation inside a dir where many files have > "entire file lock") (we already have aggregated cancels in the other > direction).Well, I''m not sure how much batching we will get from this, since it will be completely non-deterministic whether multiple independent client requests can be grouped into a single RPC.> This WRITE metadata lock is in fact a reduced subset of STL lock without > any of its advanced features, but perhaps easier to implement because of > that.Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hello! On Feb 2, 2009, at 5:50 PM, Andreas Dilger wrote:> On Jan 27, 2009 23:39 -0500, Oleg Drokin wrote: >> On Jan 26, 2009, at 5:08 AM, Andreas Dilger wrote: >>> A few comments that I have from the later discussions: >>> - you previously mentioned that only a single client would be able >>> to >>> hold a subtree lock. I think it is critical that multiple clients >>> be >>> able to get read subtree locks on the same directory. This would be >>> very important for uses like many clients and a shared read-mostly >>> directory like /usr/bin or /usr/lib. >> In fact I see zero benefit for read-only subtree lock except memory >> conservation, which should not be such a big issue. Much more >> important >> is to reduce amount of RPCs, esp. synchronous ones. > Memory conservation on the server is very important. If there are > 100k > clients and a DLM lock is 2kB in size then we are looking at 200MB for > each lock given to all clients. With an MDS having, say, 32GB of > RAM and > we would consume all of the server RAM with only 160 locks/client.Well. You are of course right and ad certain scale we indeed need to consider the memory conservation effect as well.>> We can start with implementation of just allowing WRITE locks on a >> directory that would be only responsible for this directory and its >> content (HELPS: by allowing to aggregate creates into bunches before >> sending) + a special "entire file lock" (perhaps implemented by just >> WRITE lock on a file) metadata lock that would guard all file data >> without obtaining any locks from OSTs (would be revoked by open from >> another client, perhaps would need to support glimpses too). > Well, if the client will generate the layout on the newly-created > files, or will request the layout (LOV EA) lock on the files it wants > exclusive access to this is essentially the "entire file lock" you > need. > For existing files the client holding the layout lock needs to cancel > the OST extent locks first, to ensure they flush their cache.This is fine as one of teh idea, but would not work all that nicely in all possible usecases. Suppose we would want read-only lock like this too, for example.>> Another important thing we would need is lock conversion >> (downconversion >> and try-up conversion) so that we do not lose our entire cached >> directory >> content after conflicting ls came in and we wrote it out. (we do >> not care >> all that much about writing out entire content of the dirty >> metadata cache >> at this point, since we still achieve the aggregation and >> asynchronous >> creation, even just asynchronous creation would help). > We also want to have lock conversion for regular files (write->read) > and > for the layout lock bit (so clients can drop the LOV EA lock without > dropping the LOOKUP or UPDATE bits).Yes, absolutely.>> Perhaps another useful addition would be to deliver multiple blocking >> and glimpse callbacks from server to the client in a single RPC (as a >> result of a readdir+ sort of operation inside a dir where many >> files have >> "entire file lock") (we already have aggregated cancels in the other >> direction). > Well, I''m not sure how much batching we will get from this, since it > will > be completely non-deterministic whether multiple independent client > requests can be grouped into a single RPC.It would be a lot of batching in many common usecases like "untar a file", "Create a working files for applications, all in same dir/dir tree". From the above my conclusion is we do not necessarily need SubTree locks for efficient metadata write cache, but we do need it for other scenarios (memory conservation). There are some similarities in the functionality too, but also some differences. One particular complexity I see with multiple read-only STLs is every modifying metadata operation would need to traverse the metadata tree all the way back to the root of the fs in order to notify all possible clients holding STL locks about the change about to be made. Bye, Oleg
On Feb 03, 2009 01:24 -0500, Oleg Drokin wrote:>>> Perhaps another useful addition would be to deliver multiple blocking >>> and glimpse callbacks from server to the client in a single RPC (as a >>> result of a readdir+ sort of operation inside a dir where many files >>> have "entire file lock") (we already have aggregated cancels in the >>> other direction). >> >> Well, I''m not sure how much batching we will get from this, since it >> will be completely non-deterministic whether multiple independent >> client requests can be grouped into a single RPC. > > It would be a lot of batching in many common usecases like "untar a > file", "Create a working files for applications, all in same dir/dir tree".Maybe I misunderstand, but all of this batching is in the case of a single client that is doing operations to send to the MDS. What I was thinking would be a rare case is batching from the server to the client when e.g. a bunch of clients independently open a bunch of files that are in a directory for which a client holds a STL. In the latter case, since all of the RPCs are coming from different clients, it is much harder for the server to group them together into a single RPC to send to the STL client.> From the above my conclusion is we do not necessarily need SubTree locks > for efficient metadata write cache, but we do need it for other > scenarios (memory conservation). There are some similarities in the > functionality too, but also some differences. > > One particular complexity I see with multiple read-only STLs is every > modifying metadata operation would need to traverse the metadata tree > all the way back to the root of the fs in order to notify all possible > clients holding STL locks about the change about to be made.Sorry, I was only considering the case of a 1-deep STL (e.g. a DIR lock, not the arbitrary-depth STL you originally described). In that case, there is no requirement for more than a single level of STL to be checked/cancelled if a client is doing some modifying operation therein. This is no different than e.g. if a bunch of clients are holding the LOOKUP lock on a directory that has a new entry in it. Eric also had a proposal that the DIR lock would be a "hash extent" lock instead of a single bit, so that it would be possible (via lock conversion) to avoid cancelling all of the entries cached on a client when a single new file is being added. Only the hash range of the entry being added would need to be removed from the lock, either via a 3-piece lock split (middle extent being cancelled) or via a 2-piece lock split (smallest extent being cancelled). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hello! On Feb 3, 2009, at 4:04 AM, Andreas Dilger wrote:>> It would be a lot of batching in many common usecases like "untar a >> file", "Create a working files for applications, all in same dir/ >> dir tree". > Maybe I misunderstand, but all of this batching is in the case of a > single > client that is doing operations to send to the MDS. What I was > thinking > would be a rare case is batching from the server to the client when > e.g. > a bunch of clients independently open a bunch of files that are in a > directory for which a client holds a STL.Right. I am speaking about aggregation at client level to send batched RPCs to the server. (e.g. tons of creates).> In the latter case, since all of the RPCs are coming from different > clients, > it is much harder for the server to group them together into a > single RPC > to send to the STL client.Indeed, this is much harder. (but still possible if it is just one client that does readdir+ and we do a batched glimpse to a client holding some locks on files in that dir).>> From the above my conclusion is we do not necessarily need SubTree >> locks >> for efficient metadata write cache, but we do need it for other >> scenarios (memory conservation). There are some similarities in the >> functionality too, but also some differences. >> >> One particular complexity I see with multiple read-only STLs is every >> modifying metadata operation would need to traverse the metadata tree >> all the way back to the root of the fs in order to notify all >> possible >> clients holding STL locks about the change about to be made. > Sorry, I was only considering the case of a 1-deep STL (e.g. a DIR > lock, > not the arbitrary-depth STL you originally described). In that case, > there is no requirement for more than a single level of STL to be > checked/cancelled if a client is doing some modifying operation > therein. > This is no different than e.g. if a bunch of clients are holding the > LOOKUP lock on a directory that has a new entry in it.The problem in this case then becomes that if we operate within a tree 16 entries deep, we have consumed 10% of our lock capacity (getting a lock on every subdir in process). If we have several apps going on, then even more.> Eric also had a proposal that the DIR lock would be a "hash extent" > lock > instead of a single bit, so that it would be possible (via lock > conversion) > to avoid cancelling all of the entries cached on a client when a > single > new file is being added. Only the hash range of the entry being added > would need to be removed from the lock, either via a 3-piece lock > split > (middle extent being cancelled) or via a 2-piece lock split (smallest > extent being cancelled).Yes, this is also possible and would be beneficial even with WRITE lock on a dir. But this really is completely orthogonal issue. Bye, Oleg
Oleg Drokin writes: > Hello! > > We discussed a bit of this in Beijing last week, but decided to > continue the discussion via email. > > So, I think it is a given we do not want to revoke a subtree lock > every time somebody steps through it, because that will be too costly > in a lot of cases. > > Anyway here is what I have in mind. > > STL locks could be granted by server regardless if they were > requested by the client or not. > > We would require clients to provide a lock "cookie" with every > operation they perform, in normal case that would be a handle they > have on a parent directory. > This cookie should allow a way to find out what server this cookie > originates from (needed for CMD support). > > For the case of a different client stepping into area covered by > STL lock, this client would get STL lock''s cookie and will start > present it for all subsequent > operations (also a special flag meaning that the client is not > operating within STL). How is it determined that a given point in a namespace is covered by an STL lock? E.g., client A holds an STL on /a, and client B accesses /a/b/c/f (where /a/b/c is a working directory of some process on B)? This looks especially problematic in the CMD case. Nikita.
Hello! On Feb 3, 2009, at 10:01 AM, Nikita Danilov wrote:>> For the case of a different client stepping into area covered by >> STL lock, this client would get STL lock''s cookie and will start >> present it for all subsequent >> operations (also a special flag meaning that the client is not >> operating within STL). > How is it determined that a given point in a namespace is covered by > an > STL lock? E.g., client A holds an STL on /a, and client B accesses > /a/b/c/f (where /a/b/c is a working directory of some process on B)? > This looks especially problematic in the CMD case.When client B looks up /a during its path traversal, it will get a lock cookie of the STL lock and will start presenting it with further lookups. If /a/b/c became a working dir of process B before STL on /a was granted, then /a/b/c has a normal lock for client B and STL does not cover that subtree. Also see other discussion on this topic here, since in the end we might end up not implementing entire STL idea. Bye, Oleg
Oleg Drokin writes: > Hello! > > On Feb 3, 2009, at 10:01 AM, Nikita Danilov wrote: > >> For the case of a different client stepping into area covered by > >> STL lock, this client would get STL lock''s cookie and will start > >> present it for all subsequent > >> operations (also a special flag meaning that the client is not > >> operating within STL). > > How is it determined that a given point in a namespace is covered by > > an > > STL lock? E.g., client A holds an STL on /a, and client B accesses > > /a/b/c/f (where /a/b/c is a working directory of some process on B)? > > This looks especially problematic in the CMD case. > > When client B looks up /a during its path traversal, it will get a > lock cookie > of the STL lock and will start presenting it with further lookups. > If /a/b/c became a working dir of process B before STL on /a was > granted, then > /a/b/c has a normal lock for client B and STL does not cover that > subtree. Yes, this is the case I meant. So we have to track (and recover) current directories for all client processes. > Also see other discussion on this topic here, since in the end we > might end up > not implementing entire STL idea. > > Bye, > Oleg Nikita.
Hello! On Feb 3, 2009, at 2:12 PM, Nikita Danilov wrote:>> When client B looks up /a during its path traversal, it will get a >> lock cookie >> of the STL lock and will start presenting it with further lookups. >> If /a/b/c became a working dir of process B before STL on /a was >> granted, then >> /a/b/c has a normal lock for client B and STL does not cover that >> subtree. > Yes, this is the case I meant. So we have to track (and recover) > current > directories for all client processes.Yes. We do this with locks. If lock is invalid, we are forced to back-traverse the path until we meet any client-visible lock or rot of the filesystem. Bye, Oleg
Oleg Drokin writes: > Hello! > > On Feb 3, 2009, at 2:12 PM, Nikita Danilov wrote: > >> When client B looks up /a during its path traversal, it will get a > >> lock cookie > >> of the STL lock and will start presenting it with further lookups. > >> If /a/b/c became a working dir of process B before STL on /a was > >> granted, then > >> /a/b/c has a normal lock for client B and STL does not cover that > >> subtree. > > Yes, this is the case I meant. So we have to track (and recover) > > current > > directories for all client processes. > > Yes. > We do this with locks. Hm.. I don''t think we currently keep locks on the working directories. > If lock is invalid, we are forced to back-traverse the path until we > meet any client-visible lock or rot of the filesystem. I just thought about another interesting use case. Imagine client C0 holding a lock on /a/b/f, and C1 holding a STL lock on /D. Now client C2 does mv /a /D. C2 crosses STL boundary, gets notified about STL, gets the cookie, etc. But now C1 is having a lock on /D/a/b/f --- under an STL. > > Bye, > Oleg Nikita.
Hello! On Feb 4, 2009, at 9:39 AM, Nikita Danilov wrote:>> On Feb 3, 2009, at 2:12 PM, Nikita Danilov wrote: >>>> When client B looks up /a during its path traversal, it will get a >>>> lock cookie >>>> of the STL lock and will start presenting it with further lookups. >>>> If /a/b/c became a working dir of process B before STL on /a was >>>> granted, then >>>> /a/b/c has a normal lock for client B and STL does not cover that >>>> subtree. >>> Yes, this is the case I meant. So we have to track (and recover) >>> current >>> directories for all client processes. >> Yes. >> We do this with locks. > Hm.. I don''t think we currently keep locks on the working directories.Well, we do because we get them during lookup. That does not mean we hold these locks permanently, of course.>> If lock is invalid, we are forced to back-traverse the path until we >> meet any client-visible lock or rot of the filesystem. > I just thought about another interesting use case. > Imagine client C0 holding a lock on /a/b/f, and C1 holding a STL > lock on > /D. Now client C2 does mv /a /D. C2 crosses STL boundary, gets > notified > about STL, gets the cookie, etc. But now C1 is having a lock on /D/a/ > b/f > --- under an STL.That''s fine. STL is limited by below locks. When STL-holding client gets a callback about modification in D (bad example, actually, since by my idea any modifications in /D would then require STL lock to go away, so let''s suppose the rename was to D/d1/), so callback about modification of /D/d1, the STL holder have choices of basically : 1. getting rid of STL - which avoids the whole problem. OR 2. Flush it''s own cache of /D/d1 and everything in that subtree and allow to grant locks there to other clients. Now STL-holder knows nothing about /D/d1 anymore, and when it needs to do something there again, it will start doing lookups there (RPCs to the server) under STL until it reaches the lock from C2, at which point STL-reach is stopped in that subtree. Bye, Oleg