Since I work mostly with Lustre over a WAN I''d definitely like to see the larger readdir RPCs in Lustre to save on RTTs between clients and the MDS. I''m just looking for some input to make sure I''m looking at the right changes. I know Andreas had mentioned an issue with larger pages to me at LUG this year.>From the description I''m guessing the approach would be to requestadditioinal pages in the ll_dir_readpage similar to how read ahead is handled in ll_readpage. mdc_readpage also needs to be changed to handle multiple pages and prep each page for bulk. I think this makes the most sense because at least by default I see getdents64 only be called with a 4k buffer. Otherwise it might make sense to change the ll_readdir functions to handle more pages and ll_get_dir_page to request additional pages and call read_cache_pages instead. Jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100928/40575d40/attachment.html
According to bug 17833 (https://bugzilla.lustre.org/show_bug.cgi?id=17833) comment #0, as Andreas'' pointed out, the existing lustre framework almost supports bulk readdir RPCs on server-side. The mainly work to be done is on client-side to make llite/MDC to trigger multiple pages reading each time instead of single page of readdir(). Just as your first idea, it is quite possible to be used as the implementation, which is relative simple and efficient. On the other hand, Large readdir RPCs is basic of another metadata read performance improvement features - "readdir+", which is quite useful for "ls -l" operation (for large directory), and reduce lookup/getattr RPC as much as possible. In such feature, MDS will pack more dir-item''s attribute (not only name/ino as does currently by readdir, but also mode/owner, and etc) information back to client-side in "readdir+" RPC. That means the dir-item count in one "readdir+" page is less than in the traditional readdir page, then more pages to be sent back to client. If without "Large readdir RPCs", the advantage of "readdir+" will be discounted. Cheers, Nasf On 9/28/10 9:14 PM, Jeremy Filizetti wrote:> Since I work mostly with Lustre over a WAN I''d definitely like to see > the larger readdir RPCs in Lustre to save on RTTs between clients and > the MDS. I''m just looking for some input to make sure I''m looking at > the right changes. I know Andreas had mentioned an issue with larger > pages to me at LUG this year. > > From the description I''m guessing the approach would be to request > additioinal pages in the ll_dir_readpage similar to how read ahead is > handled in ll_readpage. mdc_readpage also needs to be changed to > handle multiple pages and prep each page for bulk. I think this makes > the most sense because at least by default I see getdents64 only be > called with a 4k buffer. Otherwise it might make sense to change the > ll_readdir functions to handle more pages and ll_get_dir_page to > request additional pages and call read_cache_pages instead. > > Jeremy > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100929/10c1fa34/attachment.html
> > > On the other hand, Large readdir RPCs is basic of another metadata read > performance improvement features - "readdir+", which is quite useful for "ls > -l" operation (for large directory), and reduce lookup/getattr RPC as much > as possible. In such feature, MDS will pack more dir-item''s attribute (not > only name/ino as does currently by readdir, but also mode/owner, and etc) > information back to client-side in "readdir+" RPC. That means the dir-item > count in one "readdir+" page is less than in the traditional readdir page, > then more pages to be sent back to client. If without "Large readdir RPCs", > the advantage of "readdir+" will be discounted. >I''d be interested in working this as well but probably as a separate effort since SOM isn''t in 1.8 and that''s my main focus. In our testing, SOM had significant benefits over the WAN and I''d expect even better from readdir+. I have tried Oleg''s patch for asynchronous ll_glimpse_size but oddly I''ve seen someone erradic performance where at times it was worse then statahead and synchronous ll_glimpse_size. Jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100929/91a1e011/attachment.html
On 9/30/10 3:01 AM, Jeremy Filizetti wrote:> > > On the other hand, Large readdir RPCs is basic of another metadata > read performance improvement features - "readdir+", which is quite > useful for "ls -l" operation (for large directory), and reduce > lookup/getattr RPC as much as possible. In such feature, MDS will > pack more dir-item''s attribute (not only name/ino as does > currently by readdir, but also mode/owner, and etc) information > back to client-side in "readdir+" RPC. That means the dir-item > count in one "readdir+" page is less than in the traditional > readdir page, then more pages to be sent back to client. If > without "Large readdir RPCs", the advantage of "readdir+" will be > discounted. > > I''d be interested in working this as well but probably as a separate > effort since SOM isn''t in 1.8 and that''s my main focus. In our > testing, SOM had significant benefits over the WAN and I''d expect even > better from readdir+. I have tried Oleg''s patch for > asynchronous ll_glimpse_size but oddly I''ve seen someone erradic > performance where at times it was worse then statahead and synchronous > ll_glimpse_size. > JeremyYes, SOM is another important feature for metadata reading performance improvement by bypassing the glimpse RPC between client and OSS. Engineers from Lustre Group worked for that for some time, hope can be released soon. As for the asynchronous ll_glimpse_size maybe cause bad performance occasionally, one possible reason is that: glimpse RPC maybe not obtain extent lock(s) if some others in using such lock(s), so the file size information obtained by asynchronous glimpse is invalid when it is really used later, means the caller ("stat") has to send synchronous glimpse again. Anyway, I did not study such patch, so it is just a guess. Cheers, Nasf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100930/1f1bb1a1/attachment.html
I''ve put together a small patch to modify ll_dir_readpage and mdc_readpage to read extra pages (if available) with each RPC. It is posted under the bug (https://bugzilla.lustre.org/show_bug.cgi?id=17833). I see you we''re the original assignee when you were at Sun/Oracle. Jeremy 2010/9/29 Fan Yong <yong.fan at whamcloud.com>> On 9/30/10 3:01 AM, Jeremy Filizetti wrote: > > >> On the other hand, Large readdir RPCs is basic of another metadata read >> performance improvement features - "readdir+", which is quite useful for "ls >> -l" operation (for large directory), and reduce lookup/getattr RPC as much >> as possible. In such feature, MDS will pack more dir-item''s attribute (not >> only name/ino as does currently by readdir, but also mode/owner, and etc) >> information back to client-side in "readdir+" RPC. That means the dir-item >> count in one "readdir+" page is less than in the traditional readdir page, >> then more pages to be sent back to client. If without "Large readdir RPCs", >> the advantage of "readdir+" will be discounted. >> > > I''d be interested in working this as well but probably as a separate > effort since SOM isn''t in 1.8 and that''s my main focus. In our testing, SOM > had significant benefits over the WAN and I''d expect even better from > readdir+. I have tried Oleg''s patch for asynchronous ll_glimpse_size but > oddly I''ve seen someone erradic performance where at times it was worse then > statahead and synchronous ll_glimpse_size. > > Jeremy > > > Yes, SOM is another important feature for metadata reading performance > improvement by bypassing the glimpse RPC between client and OSS. Engineers > from Lustre Group worked for that for some time, hope can be released soon. > > As for the asynchronous ll_glimpse_size maybe cause bad performance > occasionally, one possible reason is that: glimpse RPC maybe not obtain > extent lock(s) if some others in using such lock(s), so the file size > information obtained by asynchronous glimpse is invalid when it is really > used later, means the caller ("stat") has to send synchronous glimpse again. > Anyway, I did not study such patch, so it is just a guess. > > > Cheers, > Nasf > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/df19abc3/attachment.html
On 10/14/10 12:04 AM, Jeremy Filizetti wrote:> I''ve put together a small patch to modify ll_dir_readpage and > mdc_readpage to read extra pages (if available) with each RPC. It is > posted under the bug > (https://bugzilla.lustre.org/show_bug.cgi?id=17833). I see you we''re > the original assignee when you were at Sun/Oracle.Thanks, I will study such patch. I added Lsy into such bug CC list who is interested in it also. -- Nasf> Jeremy > > 2010/9/29 Fan Yong <yong.fan at whamcloud.com > <mailto:yong.fan at whamcloud.com>> > > On 9/30/10 3:01 AM, Jeremy Filizetti wrote: >> >> >> On the other hand, Large readdir RPCs is basic of another >> metadata read performance improvement features - "readdir+", >> which is quite useful for "ls -l" operation (for large >> directory), and reduce lookup/getattr RPC as much as >> possible. In such feature, MDS will pack more dir-item''s >> attribute (not only name/ino as does currently by readdir, >> but also mode/owner, and etc) information back to client-side >> in "readdir+" RPC. That means the dir-item count in one >> "readdir+" page is less than in the traditional readdir page, >> then more pages to be sent back to client. If without "Large >> readdir RPCs", the advantage of "readdir+" will be discounted. >> >> I''d be interested in working this as well but probably as a >> separate effort since SOM isn''t in 1.8 and that''s my main focus. >> In our testing, SOM had significant benefits over the WAN and I''d >> expect even better from readdir+. I have tried Oleg''s patch for >> asynchronous ll_glimpse_size but oddly I''ve seen someone erradic >> performance where at times it was worse then statahead and >> synchronous ll_glimpse_size. >> Jeremy > Yes, SOM is another important feature for metadata reading > performance improvement by bypassing the glimpse RPC between > client and OSS. Engineers from Lustre Group worked for that for > some time, hope can be released soon. > > As for the asynchronous ll_glimpse_size maybe cause bad > performance occasionally, one possible reason is that: glimpse RPC > maybe not obtain extent lock(s) if some others in using such > lock(s), so the file size information obtained by asynchronous > glimpse is invalid when it is really used later, means the caller > ("stat") has to send synchronous glimpse again. Anyway, I did not > study such patch, so it is just a guess. > > > Cheers, > Nasf > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org <mailto:Lustre-devel at lists.lustre.org> > http://lists.lustre.org/mailman/listinfo/lustre-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/b5c28ace/attachment-0001.html