Oleg Drokin
2009-Feb-05 23:18 UTC
[Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping
Hello!
Adding Lustre-devel to CC.
On Feb 5, 2009, at 5:31 PM, Andreas Dilger wrote:> it is probably worthwhile to do a code audit to see if there are
> many/any
> "for each stripe" kind of operations that could be avoided for
such
> widely
> striped files. Common operations like lov_merge_lvb() and
> lov_adjust_kms()
> will become very expensive, and could possibly be optimized in some
> cases.
I suspect there are enough of them.
When I worked on slow small i/o, I noticed that we do merge_lvb pretty
often
needlessly, for example. Basically, on partial page update, on
refresh_ap
(sending write rpc - for every page), for every ll_readahead call (which
means for every page read). Every time we do glimpse.
Every time after enqueueing extent lock (even if cached).
On every read syscall.
I had a plan on how to fix it that turned out to be more complicated
than I thought.
And in the end it was not the main culprit at the time.
Basically what we need to do is to store up-to-date merged lvb in
inode somewhere
and update it with after every enqueue or lock cancel.
This is only relevant to b1_x codebase, I see that in HEAD with new io
rewrite code,
the number of calls to merge_lvb is dramatically lower (only for
glimpses), though
potentially some cpu could be saved by only merging after changes
actually occurred.
> Similarly, we might consider to do MDS-originated object destroys for
> such files (or all files) instead of sending huge RPC with cookies to
> the client (~84kB reply). These could be batched on unlink commit,
> and
> would also avoid the "inodes with destroyed objects" bug
previously
> discussed.
Do you only think of this as a way to cut the maximum RPC reply size
on MDS?
Bye,
Oleg
Andreas Dilger
2009-Feb-06 00:01 UTC
[Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping
On Feb 05, 2009 18:18 -0500, Oleg Drokin wrote:> On Feb 5, 2009, at 5:31 PM, Andreas Dilger wrote: >> we might consider to do MDS-originated object destroys for >> such files (or all files) instead of sending huge RPC with cookies to >> the client (~84kB reply). These could be batched on unlink commit, >> and would also avoid the "inodes with destroyed objects" bug previously >> discussed. > > Do you only think of this as a way to cut the maximum RPC reply size on > MDS?Yes, if we don''t have to return cookies to the unlinking client then reply size will be ~ (24 * num_stripes) instead of (56 * num_stripes). With SOM it may be that the client will have to pass cookies again, though I''m not sure if that is for every OST in the file, or only OSTs that the client wrote to. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.