patrice.lucas@cea.fr
2007-Mar-12 02:04 UTC
[Lustre-devel] [Bug 11817] New: superblock lock contention on multiprocess client
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11817 We see a contention on the lustre superblock lock when many process on the same client read on the same Lustre FS . (spinlock ll_lock of the ll_sb_info structure) . * Is it a necessary to share this lock between three different purposes : 1) ll_stats /proc statistics, 2) ll_ra_info read-ahead accounting, 3) ll_pglist llap LRU list and associated counters like ll_pglist_gen and ll_async_page_count ? * In the function ll_removepage, the function llap_from_page is called with 0==UNKNOWN_ORIGIN as origin parameter. Why don''t you use the flag LLAP_ORIGIN_REMOVEPAGE as origin parameter in this case ? This flag prevents the function llap_from_page from moving this llap in the LRU llap list because this page is being removed. It prevents the function llap_from_page from taking the Lustre superblock lock. * Due to Lustre superblock contention, dealing with the llap LRU list costs a lot of time . This llap LRU list seems to be used only for cleaning by the llap_shrink_cache function called when we go beyond the llap_async_page_max counter. Is-it necessary to keep this feature that costs so much time ? Does llite absolutly need to be able to delete llap page by its own wish ? If we want to keep this feature, is-it possible to improve this LRU structure ? Here some tests compare bandwith on a multiprocess client : (each task tries to fill preallocate memory by reading its own file.) (in MB/s) nb reading tasks on client : -----2-----4-----8-----12-----16 lustre 1.4.6.1 : 400 780 1150 700 500 with LLAP_ORIGIN_REMOVE_PAGE : 400 780 1150 1000 600 without using the llap LRU list : 400 780 1150 1400 1400