pascal.deveze at bull.net
2010-Jan-14 14:22 UTC
[Lustre-discuss] adio lustre patch: calculation of "avail_cb_nodes"
Hi, In this mail I give my investigations about the calculation of "avail_cb_nodes" in the subroutine ADIOI_LUSTRE_Get_striping_info() in the file ad_lustre_aggregate.c. The goal of this calculation is to make to best choice for the number of processes that will write to the common file during the "two phase IO write". For Lustre , to avoid extent lock conflicts, it is necessary that each OST is accessed by one or more constant clients. Each client must access only one OST. If that last condition is not possible, then each client must access the minimum number of OSTs. The parameter "nprocs_for_coll" is the number of processes that might write, so "avail_cb_nodes" must be less or egal to that number. The parameter "stripe_count" gives the number of OSTs that store the common file. The parameter "CO" gives the maximum number of client each OST is allowed to serve (by default it is set to 1). I made tests with different values of "nprocs_for_coll", "stripe_count " (hint called striping_factor) and "CO" (hint called romio_lustre_co_ratio). I got strange values of "avail_cb_nodes", e.g. : - With stripe_count=18, nprocs_for_coll=6, CO=1 the calculation gives avail_cb_nodes=3 The value 6 would be better - With stripe_count=15, nprocs_for_coll=4, CO=1 the calculation gives avail_cb_nodes=4 The right value is 3 (4 is bad because an OST will be accessed by all processes) - With stripe_count=28, nprocs_for_coll=57, CO=3 the calculation gives avail_cb_nodes=42 The right value is 56 (42 is bad because a client wil access 2 different OSTs) I propose a new algorithm to calculate "avail_cb_nodes" on the first attached file. I also attach a little command "avail_cb_nodes.c" that you can use to see where are the differencies: "avail_cb_nodes 18 6 1" will give the result for stripe_count=18, nprocs_for_coll=6, CO=1. If one of the parameter is null, "avail_cb_nodes" enters in a loop and displays a lot of combinations. You will see a lot of differences. I hope this will help. I see also one issue about when to calculate "avail_cb_nodes". Today, ADIOI_LUSTRE_Get_striping_info() is called for each "two phase IO write", and will be called also for each "two phase IO read" when it will be available. I think on my part, that this could be done only once (or perhaps also each time the parameter "romio_lustre_co_ratio" is changed). "avail_cb_nodes" could be a field in the struct ADIOI_hints_struct (below the co_ratio field) or in the struct ADIOI_FileD. I do not see any reason to make that calculation each time. Am I missing something ? Best regards, Pascal (See attached file: patch-for-avail_cb_nodes.txt) (See attached file: avail_cb_nodes.c) -------------- next part -------------- A non-text attachment was scrubbed... Name: patch-for-avail_cb_nodes.txt Type: application/octet-stream Size: 4318 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100114/c9c380b7/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: avail_cb_nodes.c Type: application/octet-stream Size: 4237 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100114/c9c380b7/attachment-0001.obj