thr3ads.net - Lustre devel - [Lustre-devel] Sub Tree lock ideas. [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Oleg Drokin

2009-Jan-21 20:49 UTC

[Lustre-devel] Sub Tree lock ideas.

Hello!

    We discussed a bit of this in Beijing last week, but decided to  
continue the discussion via email.

    So, I think it is a given we do not want to revoke a subtree lock  
every time somebody steps through it, because that will be too costly  
in a lot of cases.

    Anyway here is what I have in mind.

    STL locks could be granted by server regardless if they were  
requested by the client or not.

    We would require clients to provide a lock "cookie" with every  
operation they perform, in normal case that would be a handle they  
have on a parent directory.
    This cookie should allow a way to find out what server this cookie  
originates from (needed for CMD support).

    For the case of a different client stepping into area covered by  
STL lock, this client would get STL lock''s cookie and will start  
present it for all subsequent
    operations (also a special flag meaning that the client is not  
operating within STL).
    When the server receives a request with a cookie that is found out  
to be for STL lock, a callback is made to that lock (if necessary -  
through other server in CMD case)
    and information about currently-accessed fid and access mode is  
included, the client where the callback ends up on will do necessary  
writeout of the object content (flush dirty data
    for the case of a file, flush any metadata changes in case of a  
directory (needed for metadata writeback cache. Would be a server-noop  
for r/o access to directories before
    WBC is implemented) and aside from that if the operation is  
modifying, the STL-holding client would have to release the STL lock  
and would have a choice of completely
    flushing its cache for the subtree protected by the STL or  
obtaining STLs for parts of the tree below STL and retain its cache  
for those subtrees.
    Additionally for r/o access the STL-holding client would have  
extra choices of doing nothing (besides cache writeout flush for the  
object content) or allowing a server to
    issue a lock on that fid, in which case the client would flush its  
own cache for entire subtree starting with that fid first.
    If the lock cookie presented by the accessing client is determined  
to be invalid (rogue client, or lock was already released), a reverse  
lookup is performed up the tree
    (possibly crossing MDT boundaries) by the server in search of an  
already granted (to a client) lock or the root of the tree, whatever  
is met first. If during this
    lookup a lock is met, and it happens to be STL lock, its cookie is  
returned to the client along with indication of the STL lock presence,  
otherwise normal
    operations with normal lock granting occur.

    When a client gets STL lock for itself, it also performs all  
subsequent operations by presenting the STL lock handle. It might get  
a reply from a server indicating that
    the entry being accessed is "shared" (determined by server as an  
opened file or inode on which there are any locks granted to any  
clients) and a normal lock (or in case this
    area of the tree is covered by somebody else''s STL - that
STL''s
cookie) if needed. All metadata cached on behalf of STL lock is marked  
as such in the client''s cache.

    This approach allows for dynamically growing STL tree with ability  
to cut it at any level (by a presence of a lock in some part of the  
tree). Originally after issued, STL
    lock would span from the root of the subtree it was issued on to  
any points where other clients might have any cached information (or  
if no other clients hold locks there -
    for entire subtree), and then there is a possibility to cut some  
of the subsubtrees from the subtree protected by STL. This also allows  
for nested STLs held by different
    clients.
    One important thing that needs to be done in this scenario is we  
must ensure any process with CWD on lustre would have a lock on that  
directory if possible (of course we
    cannot refuse this lock revocation if other clients want to modify  
directory content). This would allow us to avoid costly reverse  
lookups to find if we are under any STL
    lock when we operate from a CWD on lustre (STL lock would just be  
cut at the CWD point with the normal lock).

    We would need to implement cross-MDT lock callbacks.

    I think it is safe to depend on clients to provide locks since if  
they don''t or provide invalid ones - we can find this out (and we can  
couple locks with
    some secure tokens if needed too), the only downside is rogue  
clients would be able to slow down servers to do all the reverse  
lookups (though if we just
    refuse to speak with clients that present invalid locks that were  
never present in the system on a non-root of the FS inode, that should  
be somewhat mitigated).
    The other alternative is to mark every server dentry with a STL  
marker during traversal, but in that case recovery in case of server  
restart becomes somewhat
    problematic, so I do not think this is a good idea.


Bye,
     Oleg

Andreas Dilger

2009-Jan-26 10:08 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

On Jan 21, 2009  15:49 -0500, Oleg Drokin wrote:>     So, I think it is a given we do not want to revoke a subtree lock  
> every time somebody steps through it, because that will be too costly  
> in a lot of cases.
A few comments that I have from the later discussions:
- you previously mentioned that only a single client would be able to
  hold a subtree lock.  I think it is critical that multiple clients be
  able to get read subtree locks on the same directory.  This would be
  very important for uses like many clients and a shared read-mostly
  directory like /usr/bin or /usr/lib.

- Alex (I think) suggested that the STL locks would only be on a single
  directory and its contents, instead of being on an arbitrary depth
  sub-tree.  While it seems somewhat appealing to have a single lock
  that covers an entire subtree, the complexity of having to locate
  and manage arbitrary-depth locks on the MDS might be too high.

  In most use cases it is pretty rare to have very deep subtrees, and
  the common case will be a large number of files in a single directory
  and a subtree lock will serve this use case equally well.

  Having only a single-level of subtree lock would avoid the need to
  pass cookies to the MDS for anything other than the directory in
  which names are being looked up.
>     Anyway here is what I have in mind.
> 
>     STL locks could be granted by server regardless if they were  
> requested by the client or not.
> 
>     We would require clients to provide a lock "cookie" with
every
> operation they perform, in normal case that would be a handle they  
> have on a parent directory.
>     This cookie should allow a way to find out what server this cookie  
> originates from (needed for CMD support).
> 
>     For the case of a different client stepping into area covered by  
> STL lock, this client would get STL lock''s cookie and will start  
> present it for all subsequent
>     operations (also a special flag meaning that the client is not  
> operating within STL).
>     When the server receives a request with a cookie that is found out  
> to be for STL lock, a callback is made to that lock (if necessary -  
> through other server in CMD case)
>     and information about currently-accessed fid and access mode is  
> included, the client where the callback ends up on will do necessary  
> writeout of the object content (flush dirty data
>     for the case of a file, flush any metadata changes in case of a  
> directory (needed for metadata writeback cache. Would be a server-noop  
> for r/o access to directories before
>     WBC is implemented) and aside from that if the operation is  
> modifying, the STL-holding client would have to release the STL lock  
> and would have a choice of completely
>     flushing its cache for the subtree protected by the STL or  
> obtaining STLs for parts of the tree below STL and retain its cache  
> for those subtrees.
>     Additionally for r/o access the STL-holding client would have  
> extra choices of doing nothing (besides cache writeout flush for the  
> object content) or allowing a server to
>     issue a lock on that fid, in which case the client would flush its  
> own cache for entire subtree starting with that fid first.
>     If the lock cookie presented by the accessing client is determined  
> to be invalid (rogue client, or lock was already released), a reverse  
> lookup is performed up the tree
>     (possibly crossing MDT boundaries) by the server in search of an  
> already granted (to a client) lock or the root of the tree, whatever  
> is met first. If during this
>     lookup a lock is met, and it happens to be STL lock, its cookie is  
> returned to the client along with indication of the STL lock presence,  
> otherwise normal
>     operations with normal lock granting occur.
> 
>     When a client gets STL lock for itself, it also performs all  
> subsequent operations by presenting the STL lock handle. It might get  
> a reply from a server indicating that
>     the entry being accessed is "shared" (determined by server as
an
> opened file or inode on which there are any locks granted to any  
> clients) and a normal lock (or in case this
>     area of the tree is covered by somebody else''s STL - that
STL''s
> cookie) if needed. All metadata cached on behalf of STL lock is marked  
> as such in the client''s cache.
> 
>     This approach allows for dynamically growing STL tree with ability  
> to cut it at any level (by a presence of a lock in some part of the  
> tree). Originally after issued, STL
>     lock would span from the root of the subtree it was issued on to  
> any points where other clients might have any cached information (or  
> if no other clients hold locks there -
>     for entire subtree), and then there is a possibility to cut some  
> of the subsubtrees from the subtree protected by STL. This also allows  
> for nested STLs held by different
>     clients.
>     One important thing that needs to be done in this scenario is we  
> must ensure any process with CWD on lustre would have a lock on that  
> directory if possible (of course we
>     cannot refuse this lock revocation if other clients want to modify  
> directory content). This would allow us to avoid costly reverse  
> lookups to find if we are under any STL
>     lock when we operate from a CWD on lustre (STL lock would just be  
> cut at the CWD point with the normal lock).
> 
>     We would need to implement cross-MDT lock callbacks.
> 
>     I think it is safe to depend on clients to provide locks since if  
> they don''t or provide invalid ones - we can find this out (and we
can
> couple locks with
>     some secure tokens if needed too), the only downside is rogue  
> clients would be able to slow down servers to do all the reverse  
> lookups (though if we just
>     refuse to speak with clients that present invalid locks that were  
> never present in the system on a non-root of the FS inode, that should  
> be somewhat mitigated).
>     The other alternative is to mark every server dentry with a STL  
> marker during traversal, but in that case recovery in case of server  
> restart becomes somewhat
>     problematic, so I do not think this is a good idea.
> 
> 
> Bye,
>      Oleg
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Oleg Drokin

2009-Jan-28 04:39 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

Hello!

On Jan 26, 2009, at 5:08 AM, Andreas Dilger wrote:
> A few comments that I have from the later discussions:
> - you previously mentioned that only a single client would be able to
>  hold a subtree lock.  I think it is critical that multiple clients be
>  able to get read subtree locks on the same directory.  This would be
>  very important for uses like many clients and a shared read-mostly
>  directory like /usr/bin or /usr/lib.
In fact I see zero benefit for read-only subtree lock except memory
conservation, which should not be such a big issue. Much more important
is to reduce amount of RPCs, esp. synchronous ones.
> - Alex (I think) suggested that the STL locks would only be on a  
> single
>  directory and its contents, instead of being on an arbitrary depth
>  sub-tree.  While it seems somewhat appealing to have a single lock
>  that covers an entire subtree, the complexity of having to locate
>  and manage arbitrary-depth locks on the MDS might be too high.
That''s right.
>  Having only a single-level of subtree lock would avoid the need to
>  pass cookies to the MDS for anything other than the directory in
>  which names are being looked up.
I had a lengthly call with Eric today and at the end we came to a
conclusion that perhaps STL at the moment is a total overkill.

What we need is ability to reduce metadata RPCs traffic.
We can start with implementation of just allowing WRITE locks on a
directory that would be only responsible for this directory and its
content (HELPS: by allowing to aggregate creates into bunches before
sending) + a special "entire file lock" (perhaps implemented by just
WRITE lock on a file) metadata lock that would guard all file data
without obtaining any locks from OSTs (would be revoked by open from
another client, perhaps would need to support glimpses too).

The WRITE directory lock only helps us to aggregate metadata RPCs if we
just created the empty directory OR if we have entire list of entries in
that directory. If we do not have entire directory content, we must  
issue
synchronous create RPC to avoid cases where we locally create a file  
that
already exists in that dir, for example. So perhaps in a lot of cases
obtaining a write lock on a dir would need to be followed by some sort  
of
bulk directory read (readdir+ of sorts). This is also not always  
feasible,
as I can imagine there could be directories much bigger than what we  
would
like to cache, in which case we would need to resort to one-by-one  
creates.

Another important thing we would need is lock conversion  
(downconversion and
try-up conversion) so that we do not lose our entire cached directory  
content
after conflicting ls came in and we wrote it out. (we do not care all  
that
much about writing out entire content of the dirty metadata cache at  
this
point, since we still achieve the aggregation and asynchronous creation,
even just asynchronous creation would help).

Perhaps another useful addition would be to deliver multiple blocking  
and
glimpse callbacks from server to the client in a single RPC (as a result
of a readdir+ sort of operation inside a dir where many files have  
"entire
file lock") (we already have aggregated cancels in the other direction).

This WRITE metadata lock is in fact a reduced subset of STL lock  
without any
of its advanced features, but perhaps easier to implement because of  
that.

Bye,
     Oleg

Andreas Dilger

2009-Feb-02 22:50 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

On Jan 27, 2009  23:39 -0500, Oleg Drokin wrote:> On Jan 26, 2009, at 5:08 AM, Andreas Dilger wrote:
>> A few comments that I have from the later discussions:
>> - you previously mentioned that only a single client would be able to
>>  hold a subtree lock.  I think it is critical that multiple clients be
>>  able to get read subtree locks on the same directory.  This would be
>>  very important for uses like many clients and a shared read-mostly
>>  directory like /usr/bin or /usr/lib.
>
> In fact I see zero benefit for read-only subtree lock except memory
> conservation, which should not be such a big issue. Much more important
> is to reduce amount of RPCs, esp. synchronous ones.
Memory conservation on the server is very important.  If there are 100k
clients and a DLM lock is 2kB in size then we are looking at 200MB for
each lock given to all clients.  With an MDS having, say, 32GB of RAM and
we would consume all of the server RAM with only 160 locks/client.
>> - Alex (I think) suggested that the STL locks would only be on a  
>> single
>>  directory and its contents, instead of being on an arbitrary depth
>>  sub-tree.  While it seems somewhat appealing to have a single lock
>>  that covers an entire subtree, the complexity of having to locate
>>  and manage arbitrary-depth locks on the MDS might be too high.
>
> That''s right.
>
>>  Having only a single-level of subtree lock would avoid the need to
>>  pass cookies to the MDS for anything other than the directory in
>>  which names are being looked up.
>
> I had a lengthly call with Eric today and at the end we came to a
> conclusion that perhaps STL at the moment is a total overkill.
>
> What we need is ability to reduce metadata RPCs traffic.
And to reduce memory usage for read locks on the server.  Having READ
STL for cases like read-mostly directories (/usr/bin, /usr/lib, ~/bin)
can avoid many thousands/millions of locks and their RPCs.
> We can start with implementation of just allowing WRITE locks on a
> directory that would be only responsible for this directory and its
> content (HELPS: by allowing to aggregate creates into bunches before
> sending) + a special "entire file lock" (perhaps implemented by
just
> WRITE lock on a file) metadata lock that would guard all file data
> without obtaining any locks from OSTs (would be revoked by open from
> another client, perhaps would need to support glimpses too).
Well, if the client will generate the layout on the newly-created
files, or will request the layout (LOV EA) lock on the files it wants
exclusive access to this is essentially the "entire file lock" you
need.
For existing files the client holding the layout lock needs to cancel
the OST extent locks first, to ensure they flush their cache.
> The WRITE directory lock only helps us to aggregate metadata RPCs if we
> just created the empty directory OR if we have entire list of entries in
> that directory. If we do not have entire directory content, we must  
> issue synchronous create RPC to avoid cases where we locally create a file
> that already exists in that dir, for example. So perhaps in a lot of cases
> obtaining a write lock on a dir would need to be followed by some sort  
> of bulk directory read (readdir+ of sorts). This is also not always  
> feasible, as I can imagine there could be directories much bigger than
> what we would like to cache, in which case we would need to resort to
> one-by-one creates.
> Another important thing we would need is lock conversion (downconversion 
> and try-up conversion) so that we do not lose our entire cached directory  
> content after conflicting ls came in and we wrote it out. (we do not care
> all that much about writing out entire content of the dirty metadata cache
> at this point, since we still achieve the aggregation and asynchronous
> creation, even just asynchronous creation would help).
We also want to have lock conversion for regular files (write->read) and
for the layout lock bit (so clients can drop the LOV EA lock without
dropping the LOOKUP or UPDATE bits).
> Perhaps another useful addition would be to deliver multiple blocking  
> and glimpse callbacks from server to the client in a single RPC (as a
> result of a readdir+ sort of operation inside a dir where many files have  
> "entire file lock") (we already have aggregated cancels in the
other
> direction).
Well, I''m not sure how much batching we will get from this, since it
will
be completely non-deterministic whether multiple independent client
requests can be grouped into a single RPC.
> This WRITE metadata lock is in fact a reduced subset of STL lock without 
> any of its advanced features, but perhaps easier to implement because of  
> that.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Oleg Drokin

2009-Feb-03 06:24 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

Hello!

On Feb 2, 2009, at 5:50 PM, Andreas Dilger wrote:
> On Jan 27, 2009  23:39 -0500, Oleg Drokin wrote:
>> On Jan 26, 2009, at 5:08 AM, Andreas Dilger wrote:
>>> A few comments that I have from the later discussions:
>>> - you previously mentioned that only a single client would be able
>>> to
>>> hold a subtree lock.  I think it is critical that multiple clients
>>> be
>>> able to get read subtree locks on the same directory.  This would
be
>>> very important for uses like many clients and a shared read-mostly
>>> directory like /usr/bin or /usr/lib.
>> In fact I see zero benefit for read-only subtree lock except memory
>> conservation, which should not be such a big issue. Much more  
>> important
>> is to reduce amount of RPCs, esp. synchronous ones.
> Memory conservation on the server is very important.  If there are  
> 100k
> clients and a DLM lock is 2kB in size then we are looking at 200MB for
> each lock given to all clients.  With an MDS having, say, 32GB of  
> RAM and
> we would consume all of the server RAM with only 160 locks/client.
Well. You are of course right and ad certain scale we indeed need to
consider the memory conservation effect as well.
>> We can start with implementation of just allowing WRITE locks on a
>> directory that would be only responsible for this directory and its
>> content (HELPS: by allowing to aggregate creates into bunches before
>> sending) + a special "entire file lock" (perhaps implemented
by just
>> WRITE lock on a file) metadata lock that would guard all file data
>> without obtaining any locks from OSTs (would be revoked by open from
>> another client, perhaps would need to support glimpses too).
> Well, if the client will generate the layout on the newly-created
> files, or will request the layout (LOV EA) lock on the files it wants
> exclusive access to this is essentially the "entire file lock"
you
> need.
> For existing files the client holding the layout lock needs to cancel
> the OST extent locks first, to ensure they flush their cache.
This is fine as one of teh idea, but would not work all that nicely in  
all
possible usecases. Suppose we would want read-only lock like this too,
for example.
>> Another important thing we would need is lock conversion  
>> (downconversion
>> and try-up conversion) so that we do not lose our entire cached  
>> directory
>> content after conflicting ls came in and we wrote it out. (we do  
>> not care
>> all that much about writing out entire content of the dirty  
>> metadata cache
>> at this point, since we still achieve the aggregation and  
>> asynchronous
>> creation, even just asynchronous creation would help).
> We also want to have lock conversion for regular files (write->read)  
> and
> for the layout lock bit (so clients can drop the LOV EA lock without
> dropping the LOOKUP or UPDATE bits).
Yes, absolutely.
>> Perhaps another useful addition would be to deliver multiple blocking
>> and glimpse callbacks from server to the client in a single RPC (as a
>> result of a readdir+ sort of operation inside a dir where many  
>> files have
>> "entire file lock") (we already have aggregated cancels in
the other
>> direction).
> Well, I''m not sure how much batching we will get from this, since
it
> will
> be completely non-deterministic whether multiple independent client
> requests can be grouped into a single RPC.
It would be a lot of batching in many common usecases like "untar a  
file",
"Create a working files for applications, all in same dir/dir tree".

 From the above my conclusion is we do not necessarily need SubTree  
locks
for efficient metadata write cache, but we do need it for other  
scenarios
(memory conservation). There are some similarities in the  
functionality too,
but also some differences.

One particular complexity I see with multiple read-only STLs is every
modifying metadata operation would need to traverse the metadata tree  
all
the way back to the root of the fs in order to notify all possible  
clients
holding STL locks about the change about to be made.

Bye,
     Oleg

Andreas Dilger

2009-Feb-03 09:04 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

On Feb 03, 2009  01:24 -0500, Oleg Drokin wrote:>>> Perhaps another useful addition would be to deliver multiple
blocking
>>> and glimpse callbacks from server to the client in a single RPC (as
a
>>> result of a readdir+ sort of operation inside a dir where many
files
>>> have "entire file lock") (we already have aggregated
cancels in the
>>> other direction).
>>
>> Well, I''m not sure how much batching we will get from this,
since it
>> will be completely non-deterministic whether multiple independent
>> client requests can be grouped into a single RPC.
>
> It would be a lot of batching in many common usecases like "untar a  
> file", "Create a working files for applications, all in same
dir/dir tree".
Maybe I misunderstand, but all of this batching is in the case of a single
client that is doing operations to send to the MDS.  What I was thinking
would be a rare case is batching from the server to the client when e.g.
a bunch of clients independently open a bunch of files that are in a
directory for which a client holds a STL.

In the latter case, since all of the RPCs are coming from different clients,
it is much harder for the server to group them together into a single RPC
to send to the STL client.
> From the above my conclusion is we do not necessarily need SubTree locks
> for efficient metadata write cache, but we do need it for other  
> scenarios (memory conservation). There are some similarities in the
> functionality too, but also some differences.
>
> One particular complexity I see with multiple read-only STLs is every
> modifying metadata operation would need to traverse the metadata tree  
> all the way back to the root of the fs in order to notify all possible  
> clients holding STL locks about the change about to be made.
Sorry, I was only considering the case of a 1-deep STL (e.g. a DIR lock,
not the arbitrary-depth STL you originally described).  In that case,
there is no requirement for more than a single level of STL to be
checked/cancelled if a client is doing some modifying operation therein.
This is no different than e.g. if a bunch of clients are holding the
LOOKUP lock on a directory that has a new entry in it.

Eric also had a proposal that the DIR lock would be a "hash extent"
lock
instead of a single bit, so that it would be possible (via lock conversion)
to avoid cancelling all of the entries cached on a client when a single
new file is being added.  Only the hash range of the entry being added
would need to be removed from the lock, either via a 3-piece lock split
(middle extent being cancelled) or via a 2-piece lock split (smallest
extent being cancelled).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Oleg Drokin

2009-Feb-03 09:39 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

Hello!

On Feb 3, 2009, at 4:04 AM, Andreas Dilger wrote:>> It would be a lot of batching in many common usecases like "untar
a
>> file", "Create a working files for applications, all in same
dir/
>> dir tree".
> Maybe I misunderstand, but all of this batching is in the case of a  
> single
> client that is doing operations to send to the MDS.  What I was  
> thinking
> would be a rare case is batching from the server to the client when  
> e.g.
> a bunch of clients independently open a bunch of files that are in a
> directory for which a client holds a STL.
Right. I am speaking about aggregation at client level to send batched  
RPCs
to the server. (e.g. tons of creates).
> In the latter case, since all of the RPCs are coming from different  
> clients,
> it is much harder for the server to group them together into a  
> single RPC
> to send to the STL client.
Indeed, this is much harder. (but still possible if it is just one  
client that
does readdir+ and we do a batched glimpse to a client holding some  
locks on
files in that dir).
>> From the above my conclusion is we do not necessarily need SubTree  
>> locks
>> for efficient metadata write cache, but we do need it for other
>> scenarios (memory conservation). There are some similarities in the
>> functionality too, but also some differences.
>>
>> One particular complexity I see with multiple read-only STLs is every
>> modifying metadata operation would need to traverse the metadata tree
>> all the way back to the root of the fs in order to notify all  
>> possible
>> clients holding STL locks about the change about to be made.
> Sorry, I was only considering the case of a 1-deep STL (e.g. a DIR  
> lock,
> not the arbitrary-depth STL you originally described).  In that case,
> there is no requirement for more than a single level of STL to be
> checked/cancelled if a client is doing some modifying operation  
> therein.
> This is no different than e.g. if a bunch of clients are holding the
> LOOKUP lock on a directory that has a new entry in it.
The problem in this case then becomes that if we operate within a tree
16 entries deep, we have consumed 10% of our lock capacity (getting a  
lock
on every subdir in process). If we have several apps going on, then  
even more.

> Eric also had a proposal that the DIR lock would be a "hash
extent"
> lock
> instead of a single bit, so that it would be possible (via lock  
> conversion)
> to avoid cancelling all of the entries cached on a client when a  
> single
> new file is being added.  Only the hash range of the entry being added
> would need to be removed from the lock, either via a 3-piece lock  
> split
> (middle extent being cancelled) or via a 2-piece lock split (smallest
> extent being cancelled).
Yes, this is also possible and would be beneficial even with WRITE  
lock on a dir.
But this really is completely orthogonal issue.

Bye,
     Oleg

Nikita Danilov

2009-Feb-03 15:01 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

Oleg Drokin writes:
 > Hello!
 > 
 >     We discussed a bit of this in Beijing last week, but decided to  
 > continue the discussion via email.
 > 
 >     So, I think it is a given we do not want to revoke a subtree lock  
 > every time somebody steps through it, because that will be too costly  
 > in a lot of cases.
 > 
 >     Anyway here is what I have in mind.
 > 
 >     STL locks could be granted by server regardless if they were  
 > requested by the client or not.
 > 
 >     We would require clients to provide a lock "cookie" with
every
 > operation they perform, in normal case that would be a handle they  
 > have on a parent directory.
 >     This cookie should allow a way to find out what server this cookie  
 > originates from (needed for CMD support).
 > 
 >     For the case of a different client stepping into area covered by  
 > STL lock, this client would get STL lock''s cookie and will start
 > present it for all subsequent
 >     operations (also a special flag meaning that the client is not  
 > operating within STL).

How is it determined that a given point in a namespace is covered by an
STL lock? E.g., client A holds an STL on /a, and client B accesses
/a/b/c/f (where /a/b/c is a working directory of some process on B)?
This looks especially problematic in the CMD case.

Nikita.

Oleg Drokin

2009-Feb-03 19:05 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

Hello!

On Feb 3, 2009, at 10:01 AM, Nikita Danilov wrote:>>    For the case of a different client stepping into area covered by
>> STL lock, this client would get STL lock''s cookie and will
start
>> present it for all subsequent
>>    operations (also a special flag meaning that the client is not
>> operating within STL).
> How is it determined that a given point in a namespace is covered by  
> an
> STL lock? E.g., client A holds an STL on /a, and client B accesses
> /a/b/c/f (where /a/b/c is a working directory of some process on B)?
> This looks especially problematic in the CMD case.
When client B looks up /a during its path traversal, it will get a  
lock cookie
of the STL lock and will start presenting it with further lookups.
If /a/b/c became a working dir of process B before STL on /a was  
granted, then
/a/b/c has a normal lock for client B and STL does not cover that  
subtree.
Also see other discussion on this topic here, since in the end we  
might end up
not implementing entire STL idea.

Bye,
     Oleg

Nikita Danilov

2009-Feb-03 19:12 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

Oleg Drokin writes:
 > Hello!
 > 
 > On Feb 3, 2009, at 10:01 AM, Nikita Danilov wrote:
 > >>    For the case of a different client stepping into area covered
by
 > >> STL lock, this client would get STL lock''s cookie and
will start
 > >> present it for all subsequent
 > >>    operations (also a special flag meaning that the client is not
 > >> operating within STL).
 > > How is it determined that a given point in a namespace is covered by
 > > an
 > > STL lock? E.g., client A holds an STL on /a, and client B accesses
 > > /a/b/c/f (where /a/b/c is a working directory of some process on B)?
 > > This looks especially problematic in the CMD case.
 > 
 > When client B looks up /a during its path traversal, it will get a  
 > lock cookie
 > of the STL lock and will start presenting it with further lookups.
 > If /a/b/c became a working dir of process B before STL on /a was  
 > granted, then
 > /a/b/c has a normal lock for client B and STL does not cover that  
 > subtree.

Yes, this is the case I meant. So we have to track (and recover) current
directories for all client processes.

 > Also see other discussion on this topic here, since in the end we  
 > might end up
 > not implementing entire STL idea.
 > 
 > Bye,
 >      Oleg

Nikita.

Oleg Drokin

2009-Feb-03 19:25 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

Hello!

On Feb 3, 2009, at 2:12 PM, Nikita Danilov wrote:>> When client B looks up /a during its path traversal, it will get a
>> lock cookie
>> of the STL lock and will start presenting it with further lookups.
>> If /a/b/c became a working dir of process B before STL on /a was
>> granted, then
>> /a/b/c has a normal lock for client B and STL does not cover that
>> subtree.
> Yes, this is the case I meant. So we have to track (and recover)  
> current
> directories for all client processes.
Yes.
We do this with locks.
If lock is invalid, we are forced to back-traverse the path until we
meet any client-visible lock or rot of the filesystem.

Bye,
     Oleg

Nikita Danilov

2009-Feb-04 14:39 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

Oleg Drokin writes:
 > Hello!
 > 
 > On Feb 3, 2009, at 2:12 PM, Nikita Danilov wrote:
 > >> When client B looks up /a during its path traversal, it will get
a
 > >> lock cookie
 > >> of the STL lock and will start presenting it with further
lookups.
 > >> If /a/b/c became a working dir of process B before STL on /a was
 > >> granted, then
 > >> /a/b/c has a normal lock for client B and STL does not cover that
 > >> subtree.
 > > Yes, this is the case I meant. So we have to track (and recover)  
 > > current
 > > directories for all client processes.
 > 
 > Yes.
 > We do this with locks.

Hm.. I don''t think we currently keep locks on the working directories.

 > If lock is invalid, we are forced to back-traverse the path until we
 > meet any client-visible lock or rot of the filesystem.

I just thought about another interesting use case.

Imagine client C0 holding a lock on /a/b/f, and C1 holding a STL lock on
/D. Now client C2 does mv /a /D. C2 crosses STL boundary, gets notified
about STL, gets the cookie, etc. But now C1 is having a lock on /D/a/b/f
--- under an STL.

 > 
 > Bye,
 >      Oleg

Nikita.

Oleg Drokin

2009-Feb-05 17:01 UTC

head link

[Lustre-devel] Sub Tree lock ideas.

Hello!

On Feb 4, 2009, at 9:39 AM, Nikita Danilov wrote:>> On Feb 3, 2009, at 2:12 PM, Nikita Danilov wrote:
>>>> When client B looks up /a during its path traversal, it will
get a
>>>> lock cookie
>>>> of the STL lock and will start presenting it with further
lookups.
>>>> If /a/b/c became a working dir of process B before STL on /a
was
>>>> granted, then
>>>> /a/b/c has a normal lock for client B and STL does not cover
that
>>>> subtree.
>>> Yes, this is the case I meant. So we have to track (and recover)
>>> current
>>> directories for all client processes.
>> Yes.
>> We do this with locks.
> Hm.. I don''t think we currently keep locks on the working
directories.
Well, we do because we get them during lookup.
That does not mean we hold these locks permanently, of course.
>> If lock is invalid, we are forced to back-traverse the path until we
>> meet any client-visible lock or rot of the filesystem.
> I just thought about another interesting use case.
> Imagine client C0 holding a lock on /a/b/f, and C1 holding a STL  
> lock on
> /D. Now client C2 does mv /a /D. C2 crosses STL boundary, gets  
> notified
> about STL, gets the cookie, etc. But now C1 is having a lock on /D/a/ 
> b/f
> --- under an STL.
That''s fine.
STL is limited by below locks.
When STL-holding client gets a callback about modification in D (bad  
example,
actually, since by my idea any modifications in /D would then require  
STL lock
to go away, so let''s suppose the rename was to D/d1/), so callback
about
modification of /D/d1, the STL holder have choices of basically :
1. getting rid of STL - which avoids the whole problem.
OR
2. Flush it''s own cache of /D/d1 and everything in that subtree and  
allow
to grant locks there to other clients.

Now STL-holder knows nothing about /D/d1 anymore, and when it needs to  
do something
there again, it will start doing lookups there (RPCs to the server)  
under STL
until it reaches the lock from C2, at which point STL-reach is stopped  
in that
subtree.

Bye,
     Oleg

Lustre devel - Jan 2009 - Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.

[Lustre-devel] Sub Tree lock ideas.