thr3ads.net - Lustre devel - [Lustre-devel] Reducing amount of glimpses [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Oleg Drokin

2008-Feb-02 06:26 UTC

[Lustre-devel] Reducing amount of glimpses

Hello!

    Doing some large scale testing at ORNL, interesting pattern came up.
    Suppose we are doing large-scale IOR testing on a shared file.
    Some unlucky client does its writing at highest offset (or, at the  
beginning,
    was unlucky enough to grab whole-object PW lock).
    As other clients do their writes, they would do glimpses first to  
find out file
    size. Now those glimpses turn into thousands of glimpse requests  
to that poor
    client. And many of them actually coming in parallel.
    So I was thinking - perhaps it would be nice for a  
filter_intent_policy() to check
    if there are any glimpse requests being in flight to that client  
already for that
    same lock, and if there are, just wait for the request to return  
and use data from
    there?
    Of course potential caveat here is that we have no way to tell if  
the request reached
    client by the time we started our processing or not, and so  
potentially we might get
    size data that is a bit stale, but I wonder if this is critical  
enough in our case?

    Any ideas?

Bye,
     Oleg

Eric Barton

2008-Feb-02 06:39 UTC

head link

[Lustre-devel] Reducing amount of glimpses

Oleg,

Reduction (as you describe) is an absolutely essential strategy
to achieve scalability.  So without having checked through all the
details of your idea, it feels like absolutely the right solution.

If your idea not only changes latencies but also significant orderings,
I''ll feel less secure.  Have you thought it all through?

    Cheers,
              Eric
> -----Original Message-----
> From: lustre-devel-bounces at lists.lustre.org 
> [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of 
> Oleg Drokin
> Sent: 02 February 2008 6:27 AM
> To: lustre-devel at clusterfs.com
> Subject: [Lustre-devel] Reducing amount of glimpses
> 
> Hello!
> 
>     Doing some large scale testing at ORNL, interesting 
> pattern came up.
>     Suppose we are doing large-scale IOR testing on a shared file.
>     Some unlucky client does its writing at highest offset 
> (or, at the  
> beginning,
>     was unlucky enough to grab whole-object PW lock).
>     As other clients do their writes, they would do glimpses 
> first to  
> find out file
>     size. Now those glimpses turn into thousands of glimpse requests  
> to that poor
>     client. And many of them actually coming in parallel.
>     So I was thinking - perhaps it would be nice for a  
> filter_intent_policy() to check
>     if there are any glimpse requests being in flight to that client  
> already for that
>     same lock, and if there are, just wait for the request to return  
> and use data from
>     there?
>     Of course potential caveat here is that we have no way to 
> tell if  
> the request reached
>     client by the time we started our processing or not, and so  
> potentially we might get
>     size data that is a bit stale, but I wonder if this is critical  
> enough in our case?
> 
>     Any ideas?
> 
> Bye,
>      Oleg
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>

Oleg Drokin

2008-Feb-02 07:00 UTC

head link

[Lustre-devel] Reducing amount of glimpses

Hello!

On Feb 2, 2008, at 1:39 AM, Eric Barton wrote:
> Reduction (as you describe) is an absolutely essential strategy
> to achieve scalability.  So without having checked through all the
> details of your idea, it feels like absolutely the right solution.
Yes, I just afraid that a bit stale size data might be reported, and  
not yet
sure how critical is that to us.
> If your idea not only changes latencies but also significant  
> orderings,
> I''ll feel less secure.  Have you thought it all through?
There is no ordering changes other than I would think perfectly normal  
race.
Let me explain with a control flow:

client3 holds highest write lock. clients 1 and 2 do glimpse.
Right now (racy scenario):

client 1 sends glimpse->server on its behalf sends glimpse ast to  
client3 -> client3 receives ast, sends reply
client 2 sends glimpse->server on its behalf sends glimpse ast to  
client3 ->
client3 writes some more data
glimpse on behalf of client2 arrives to client3 -> client3 sends in  
updated size
server now sends different sizes back to client 1 and client2.

If we consolidate, it will look like this:
client 1 sends glimpse->server on its behalf sends glimpse ast to  
client3 -> client3 receives ast, sends reply
client 2 sends glimpse->server notices there is glimpse ast in flight  
already to client3, and waits for it
client3 writes some more data
server receives reply and sends identical sizes to client1 and client2.

I tend to think this is normal race (kind of like with 2 processes  
doing stat on a local file and 3rd writing to file constantly,
it is possible that when two stats are done closely apart, they would  
come with identical size), but just want to check if
there are no objections about it.

Bye,
     Oleg

Andreas Dilger

2008-Feb-02 07:40 UTC

head link

[Lustre-devel] Reducing amount of glimpses

On Feb 02, 2008  02:00 -0500, Oleg Drokin wrote:> On Feb 2, 2008, at 1:39 AM, Eric Barton wrote:
> > Reduction (as you describe) is an absolutely essential strategy
> > to achieve scalability.  So without having checked through all the
> > details of your idea, it feels like absolutely the right solution.
> 
> Yes, I just afraid that a bit stale size data might be reported, and  
> not yet sure how critical is that to us.
A glimpse enqueue is already inherently racy, so I don''t think this
is a serious problem.
> > If your idea not only changes latencies but also significant  
> > orderings, I''ll feel less secure.  Have you thought it all
through?
> 
> There is no ordering changes other than I would think perfectly normal  
> race.  Let me explain with a control flow:
> 
> client3 holds highest write lock. clients 1 and 2 do glimpse.
> Right now (racy scenario):
> 
> client 1 sends glimpse->server on its behalf sends glimpse ast to  
> client3 -> client3 receives ast, sends reply
> client 2 sends glimpse->server on its behalf sends glimpse ast to  
> client3 ->
> client3 writes some more data
> glimpse on behalf of client2 arrives to client3 -> client3 sends in  
> updated size
> server now sends different sizes back to client 1 and client2.
> 
> If we consolidate, it will look like this:
> client 1 sends glimpse->server on its behalf sends glimpse ast to  
> client3 -> client3 receives ast, sends reply
> client 2 sends glimpse->server notices there is glimpse ast in flight  
> already to client3, and waits for it
> client3 writes some more data
> server receives reply and sends identical sizes to client1 and client2.
> 
> I tend to think this is normal race (kind of like with 2 processes  
> doing stat on a local file and 3rd writing to file constantly,
> it is possible that when two stats are done closely apart, they would  
> come with identical size), but just want to check if
> there are no objections about it.
The uses of glimpse are only in a few cases:
- ls -l (inherently racy)
- read
- write

In the read and write cases the glimpse return is used to detemine if the
file size is before or after the given client''s lock.  Since the client
that is getting the glimpse is holding the EOF lock this can''t change
the
relative positions of the locks.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Nikita Danilov

2008-Feb-02 11:22 UTC

head link

[Lustre-devel] Reducing amount of glimpses

Oleg Drokin writes:
 > Hello!
 > 
 >     Doing some large scale testing at ORNL, interesting pattern came up.
 >     Suppose we are doing large-scale IOR testing on a shared file.
 >     Some unlucky client does its writing at highest offset (or, at the  
 > beginning,
 >     was unlucky enough to grab whole-object PW lock).
 >     As other clients do their writes, they would do glimpses first to  
 > find out file
 >     size. Now those glimpses turn into thousands of glimpse requests  
 > to that poor
 >     client. And many of them actually coming in parallel.

I think there is a way to further reduce the number of glimpses sent in
the read/write paths. In a situation when client is doing read and the
region being read ends in a certain stripe, it''s often enough to send a
glimpse request only to the OST where this stripe is located. Indeed, if
stripe is not a hole, then size of this stripe alone is sufficient to
determine whether we are in a short read situation or not. Of course if
stripe happens to end in a hole, client needs to send more glimpses, but
in the most common case of non-sparse file, 1 RPC seems to be enough.

 > 
 > Bye,
 >      Oleg

Nikita.

Lustre devel - Feb 2008 - Reducing amount of glimpses

[Lustre-devel] Reducing amount of glimpses

[Lustre-devel] Reducing amount of glimpses

[Lustre-devel] Reducing amount of glimpses

[Lustre-devel] Reducing amount of glimpses

[Lustre-devel] Reducing amount of glimpses