thr3ads.net - Lustre devel - [Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Oleg Drokin

2009-Feb-05 23:18 UTC

[Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping

Hello!

    Adding Lustre-devel to CC.

On Feb 5, 2009, at 5:31 PM, Andreas Dilger wrote:> it is probably worthwhile to do a code audit to see if there are  
> many/any
> "for each stripe" kind of operations that could be avoided for
such
> widely
> striped files.  Common operations like lov_merge_lvb() and  
> lov_adjust_kms()
> will become very expensive, and could possibly be optimized in some  
> cases.
I suspect there are enough of them.
When I worked on slow small i/o, I noticed that we do merge_lvb pretty  
often
needlessly, for example. Basically, on partial page update, on  
refresh_ap
(sending write rpc - for every page), for every ll_readahead call (which
means for every page read). Every time we do glimpse.
Every time after enqueueing extent lock (even if cached).
On every read syscall.

I had a plan on how to fix it that turned out to be more complicated  
than I thought.
And in the end it was not the main culprit at the time.
Basically what we need to do is to store up-to-date merged lvb in  
inode somewhere
and update it with after every enqueue or lock cancel.

This is only relevant to b1_x codebase, I see that in HEAD with new io  
rewrite code,
the number of calls to merge_lvb is dramatically lower (only for  
glimpses), though
potentially some cpu could be saved by only merging after changes  
actually occurred.
> Similarly, we might consider to do MDS-originated object destroys for
> such files (or all files) instead of sending huge RPC with cookies to
> the client (~84kB reply).  These could be batched on unlink commit,  
> and
> would also avoid the "inodes with destroyed objects" bug
previously
> discussed.
Do you only think of this as a way to cut the maximum RPC reply size  
on MDS?

Bye,
     Oleg

Andreas Dilger

2009-Feb-06 00:01 UTC

head link

[Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping

On Feb 05, 2009  18:18 -0500, Oleg Drokin wrote:> On Feb 5, 2009, at 5:31 PM, Andreas Dilger wrote:
>> we might consider to do MDS-originated object destroys for
>> such files (or all files) instead of sending huge RPC with cookies to
>> the client (~84kB reply).  These could be batched on unlink commit,  
>> and would also avoid the "inodes with destroyed objects" bug
previously
>> discussed.
>
> Do you only think of this as a way to cut the maximum RPC reply size on 
> MDS?
Yes, if we don''t have to return cookies to the unlinking client then
reply size will be ~ (24 * num_stripes) instead of (56 * num_stripes).

With SOM it may be that the client will have to pass cookies again,
though I''m not sure if that is for every OST in the file, or
only OSTs that the client wrote to.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre devel - Feb 2009 - Oleg an Eric - Supporting >512 OSTs for Striping

[Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping

[Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping