patrice.lucas@cea.fr
2007-Mar-14 07:44 UTC
[Lustre-devel] [Bug 11817] superblock lock contention on multiprocess client
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11817 Thank you for the previous attached patch. This patch tries to fix a contention on the lists of portals_handle. (We will try to test it ...) But this bug 11817 is not about a contention on portals_handle list. This 11817 bug is about a contention on the list of llap in the lustre superblock client structure . You can try to use the same type of RCU list mecanism for solving the contention on this lustre superblock lock, it might help but I''m not confident of . Indeed, we face a contention on the llap list only with list-mutation primitive : adding new entries at the tail, moving existing entries to the tail , deleting old entries . RCU list helps list-traversal primitive and list-mutation primitive to run concurrently. But RCU list does not allow two list-mutation primitives to run concurrently. So, if we use RCU model for the llap list in the lustre superblock, it might help a bit, but all modifications will still need to take the ll_lock to avoid racing and we will still see the contention. The only function that may benefit from the use of an RCU llap list is the function llap_shrink_cache . This function could use rcu list-traversal primitives and runs concurrently from adding new entries. But,this function is called so rarely. In the previously described test, llap-shrink_cache is never used ! List-mutation primitive are only used on the llap list. Besides trying to improve the structure of the llap LRU list, I have described two other points upon creating this bug. First, is-it relevant to separate the ll_lock in the lustre client superblock between the three independent purposes : /proc statistics, read-ahead accounting and llap LRU list access ? Then, is-it relevant to use the LLAP_ORIGIN_REMOVEPAGE flag, instead of 0==UNKNOWN_ORIGIN, as origin parameter for calling the llap_from_page function in the ll_removepage function ? Thank you in advance for your answers.
patrice.lucas@cea.fr
2007-Mar-15 08:56 UTC
[Lustre-devel] [Bug 11817] superblock lock contention on multiprocess client
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11817 Do you confirm the LRU cache of lustre page is no more needed with the 2.6 kernel ? As Andreas asked, I attach three patchs. 1) no more read-ahead statistic 2) using of the LLAP_ORIGIN_REMOVEPAGE flag 3) disabling the LRU cache of lustre page (My patch was exactly the same as the one you propose ... With exactly the same approach of disabling the LRU cache, I removed the same piece of code .) By making those modifications, I quickly tested some ideas around the contention. These patchs aren''t production-level code ...