Hello! Doing some large scale testing at ORNL, interesting pattern came up. Suppose we are doing large-scale IOR testing on a shared file. Some unlucky client does its writing at highest offset (or, at the beginning, was unlucky enough to grab whole-object PW lock). As other clients do their writes, they would do glimpses first to find out file size. Now those glimpses turn into thousands of glimpse requests to that poor client. And many of them actually coming in parallel. So I was thinking - perhaps it would be nice for a filter_intent_policy() to check if there are any glimpse requests being in flight to that client already for that same lock, and if there are, just wait for the request to return and use data from there? Of course potential caveat here is that we have no way to tell if the request reached client by the time we started our processing or not, and so potentially we might get size data that is a bit stale, but I wonder if this is critical enough in our case? Any ideas? Bye, Oleg
Oleg, Reduction (as you describe) is an absolutely essential strategy to achieve scalability. So without having checked through all the details of your idea, it feels like absolutely the right solution. If your idea not only changes latencies but also significant orderings, I''ll feel less secure. Have you thought it all through? Cheers, Eric> -----Original Message----- > From: lustre-devel-bounces at lists.lustre.org > [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of > Oleg Drokin > Sent: 02 February 2008 6:27 AM > To: lustre-devel at clusterfs.com > Subject: [Lustre-devel] Reducing amount of glimpses > > Hello! > > Doing some large scale testing at ORNL, interesting > pattern came up. > Suppose we are doing large-scale IOR testing on a shared file. > Some unlucky client does its writing at highest offset > (or, at the > beginning, > was unlucky enough to grab whole-object PW lock). > As other clients do their writes, they would do glimpses > first to > find out file > size. Now those glimpses turn into thousands of glimpse requests > to that poor > client. And many of them actually coming in parallel. > So I was thinking - perhaps it would be nice for a > filter_intent_policy() to check > if there are any glimpse requests being in flight to that client > already for that > same lock, and if there are, just wait for the request to return > and use data from > there? > Of course potential caveat here is that we have no way to > tell if > the request reached > client by the time we started our processing or not, and so > potentially we might get > size data that is a bit stale, but I wonder if this is critical > enough in our case? > > Any ideas? > > Bye, > Oleg > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >
Hello! On Feb 2, 2008, at 1:39 AM, Eric Barton wrote:> Reduction (as you describe) is an absolutely essential strategy > to achieve scalability. So without having checked through all the > details of your idea, it feels like absolutely the right solution.Yes, I just afraid that a bit stale size data might be reported, and not yet sure how critical is that to us.> If your idea not only changes latencies but also significant > orderings, > I''ll feel less secure. Have you thought it all through?There is no ordering changes other than I would think perfectly normal race. Let me explain with a control flow: client3 holds highest write lock. clients 1 and 2 do glimpse. Right now (racy scenario): client 1 sends glimpse->server on its behalf sends glimpse ast to client3 -> client3 receives ast, sends reply client 2 sends glimpse->server on its behalf sends glimpse ast to client3 -> client3 writes some more data glimpse on behalf of client2 arrives to client3 -> client3 sends in updated size server now sends different sizes back to client 1 and client2. If we consolidate, it will look like this: client 1 sends glimpse->server on its behalf sends glimpse ast to client3 -> client3 receives ast, sends reply client 2 sends glimpse->server notices there is glimpse ast in flight already to client3, and waits for it client3 writes some more data server receives reply and sends identical sizes to client1 and client2. I tend to think this is normal race (kind of like with 2 processes doing stat on a local file and 3rd writing to file constantly, it is possible that when two stats are done closely apart, they would come with identical size), but just want to check if there are no objections about it. Bye, Oleg
On Feb 02, 2008 02:00 -0500, Oleg Drokin wrote:> On Feb 2, 2008, at 1:39 AM, Eric Barton wrote: > > Reduction (as you describe) is an absolutely essential strategy > > to achieve scalability. So without having checked through all the > > details of your idea, it feels like absolutely the right solution. > > Yes, I just afraid that a bit stale size data might be reported, and > not yet sure how critical is that to us.A glimpse enqueue is already inherently racy, so I don''t think this is a serious problem.> > If your idea not only changes latencies but also significant > > orderings, I''ll feel less secure. Have you thought it all through? > > There is no ordering changes other than I would think perfectly normal > race. Let me explain with a control flow: > > client3 holds highest write lock. clients 1 and 2 do glimpse. > Right now (racy scenario): > > client 1 sends glimpse->server on its behalf sends glimpse ast to > client3 -> client3 receives ast, sends reply > client 2 sends glimpse->server on its behalf sends glimpse ast to > client3 -> > client3 writes some more data > glimpse on behalf of client2 arrives to client3 -> client3 sends in > updated size > server now sends different sizes back to client 1 and client2. > > If we consolidate, it will look like this: > client 1 sends glimpse->server on its behalf sends glimpse ast to > client3 -> client3 receives ast, sends reply > client 2 sends glimpse->server notices there is glimpse ast in flight > already to client3, and waits for it > client3 writes some more data > server receives reply and sends identical sizes to client1 and client2. > > I tend to think this is normal race (kind of like with 2 processes > doing stat on a local file and 3rd writing to file constantly, > it is possible that when two stats are done closely apart, they would > come with identical size), but just want to check if > there are no objections about it.The uses of glimpse are only in a few cases: - ls -l (inherently racy) - read - write In the read and write cases the glimpse return is used to detemine if the file size is before or after the given client''s lock. Since the client that is getting the glimpse is holding the EOF lock this can''t change the relative positions of the locks. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Oleg Drokin writes: > Hello! > > Doing some large scale testing at ORNL, interesting pattern came up. > Suppose we are doing large-scale IOR testing on a shared file. > Some unlucky client does its writing at highest offset (or, at the > beginning, > was unlucky enough to grab whole-object PW lock). > As other clients do their writes, they would do glimpses first to > find out file > size. Now those glimpses turn into thousands of glimpse requests > to that poor > client. And many of them actually coming in parallel. I think there is a way to further reduce the number of glimpses sent in the read/write paths. In a situation when client is doing read and the region being read ends in a certain stripe, it''s often enough to send a glimpse request only to the OST where this stripe is located. Indeed, if stripe is not a hole, then size of this stripe alone is sufficient to determine whether we are in a short read situation or not. Of course if stripe happens to end in a hole, client needs to send more glimpses, but in the most common case of non-sparse file, 1 RPC seems to be enough. > > Bye, > Oleg Nikita.