Oleg Drokin
2009-Feb-05 23:18 UTC
[Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping
Hello! Adding Lustre-devel to CC. On Feb 5, 2009, at 5:31 PM, Andreas Dilger wrote:> it is probably worthwhile to do a code audit to see if there are > many/any > "for each stripe" kind of operations that could be avoided for such > widely > striped files. Common operations like lov_merge_lvb() and > lov_adjust_kms() > will become very expensive, and could possibly be optimized in some > cases.I suspect there are enough of them. When I worked on slow small i/o, I noticed that we do merge_lvb pretty often needlessly, for example. Basically, on partial page update, on refresh_ap (sending write rpc - for every page), for every ll_readahead call (which means for every page read). Every time we do glimpse. Every time after enqueueing extent lock (even if cached). On every read syscall. I had a plan on how to fix it that turned out to be more complicated than I thought. And in the end it was not the main culprit at the time. Basically what we need to do is to store up-to-date merged lvb in inode somewhere and update it with after every enqueue or lock cancel. This is only relevant to b1_x codebase, I see that in HEAD with new io rewrite code, the number of calls to merge_lvb is dramatically lower (only for glimpses), though potentially some cpu could be saved by only merging after changes actually occurred.> Similarly, we might consider to do MDS-originated object destroys for > such files (or all files) instead of sending huge RPC with cookies to > the client (~84kB reply). These could be batched on unlink commit, > and > would also avoid the "inodes with destroyed objects" bug previously > discussed.Do you only think of this as a way to cut the maximum RPC reply size on MDS? Bye, Oleg
Andreas Dilger
2009-Feb-06 00:01 UTC
[Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping
On Feb 05, 2009 18:18 -0500, Oleg Drokin wrote:> On Feb 5, 2009, at 5:31 PM, Andreas Dilger wrote: >> we might consider to do MDS-originated object destroys for >> such files (or all files) instead of sending huge RPC with cookies to >> the client (~84kB reply). These could be batched on unlink commit, >> and would also avoid the "inodes with destroyed objects" bug previously >> discussed. > > Do you only think of this as a way to cut the maximum RPC reply size on > MDS?Yes, if we don''t have to return cookies to the unlinking client then reply size will be ~ (24 * num_stripes) instead of (56 * num_stripes). With SOM it may be that the client will have to pass cookies again, though I''m not sure if that is for every OST in the file, or only OSTs that the client wrote to. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.