vitaly@clusterfs.com
2007-Jan-05 13:22 UTC
[Lustre-devel] [Bug 11301] parallel lock callbacks
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11301 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9225 is|1 |0 patch| | Attachment #9225 is|1 |0 private| | Attachment #9225|review?(vitaly@clusterfs.com|review- Flag|) | (From update of attachment 9225) 1. ldlm_run_bl_ast_work(), ldlm_run_cb_ast_work() @arg is shared between all the requests, which are sent out at ptlrpc_set_wait() and interpreter is called when a request reply is received. by that time @arg is initialized by the data of the last prepared request, thus arg->lock_handle is wrong. 2. ldlm_server_blocking_ast() if (unlikely(instant_cancel)) {} error handling is missed for this case. it looks still needed because we may fail to send the request and want to evict the client. 3. ldlm_server_blocking_ast() if (unlikely(instant_cancel)) {} ptlrpc_req_finished(req) is missed. 4. hld says if the amount of locks is huge, not more than some maximum amount of requests will be sent in parallel. is it missed or considered as not important?
deen@clusterfs.com
2007-Jan-08 06:47 UTC
[Lustre-devel] [Bug 11301] parallel lock callbacks
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11301 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9225 is|0 |1 obsolete| | Created an attachment (id=9296) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9296&action=view) Fixes after vitaly''s CODEINSP. I am not sure about parallel cb limit. Yes, HLD says that "we _can_ split it into several request sets with reasonable size and send them serializely", but there is nothing about this in DLD. If we want to implement this feature we should change DLD first and then code. Vitaly, Oleg?
green@clusterfs.com
2007-Jan-11 16:15 UTC
[Lustre-devel] [Bug 11301] parallel lock callbacks
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11301 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9296|review?(green@clusterfs.com)|review- Flag| | (From update of attachment 9296)> static int reprocess_one_queue(struct ldlm_resource *res, void *closure) >Index: lustre/ldlm/ldlm_lockd.c >==================================================================>RCS file: /cvsroot/cfs/lustre-core/ldlm/ldlm_lockd.c,v >retrieving revision 1.149.2.18.2.31.16.22 >diff -u -p -r1.149.2.18.2.31.16.22 ldlm_lockd.c >--- lustre/ldlm/ldlm_lockd.c 15 Dec 2006 22:21:56 -0000 1.149.2.18.2.31.16.22 >+++ lustre/ldlm/ldlm_lockd.c 8 Jan 2007 13:37:28 -0000 >@@ -478,10 +478,44 @@ static int ldlm_handle_ast_error(struct > return rc; > } > >+static int ldlm_cb_interpret(struct ptlrpc_request *req, void *data, int rc) >+{ >+ struct ldlm_cb_set_arg *arg = (struct ldlm_cb_set_arg *)data; >+ struct ldlm_lock *lock; >+ struct ldlm_reply *reply; >+ ENTRY; >+ >+ LASSERT(data != NULL); >+ >+ ptlrpc_req_finished(req);So you free request>+ if (rc != 0) { >+ reply = lustre_swab_repbuf(req, DLM_LOCKREPLY_OFF, >+ sizeof(*reply), >+ lustre_swab_ldlm_reply);then access it? Also what if there is no reply because there was transmission error? You need to get lockhandle from request itself not from reply. then we free request in ptlrpc_destroy_set() again. I do not see you taking any extra references on a request, so no need to free in in interpret callback.
deen@clusterfs.com
2007-Jan-13 17:03 UTC
[Lustre-devel] [Bug 11301] parallel lock callbacks
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11301 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9296 is|0 |1 obsolete| | Created an attachment (id=9338) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9338&action=view) Corrections in callback func after green''s codeinsp.
green@clusterfs.com
2007-Jan-14 17:10 UTC
[Lustre-devel] [Bug 11301] parallel lock callbacks
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11301 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9338|review?(green@clusterfs.com)|review+ Flag| | (From update of attachment 9338) Looks good. You need a special test developed to excercise error in ldlm_cb_interpret to make sure clients are actually being evicted as expected now. Now about limiting amount of ASTs, I think the change is simple enough not to require anmother round of DLD/CODE. Basically what you need is to look at number of ASTs added to set in ldlm_run_*_ast_work and once it grows to certain value, you just do set_wait/set_destroy and then create the set again (you need to inherit "restart" value over this.). May be some code reusing with those two functions won''t'' hurt as well.
deen@clusterfs.com
2007-Jan-17 06:11 UTC
[Lustre-devel] [Bug 11301] parallel lock callbacks
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11301 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9338 is|0 |1 obsolete| | Created an attachment (id=9359) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9359&action=view) Adding ast limit to parallel callbacks code.
green@clusterfs.com
2007-Jan-17 12:08 UTC
[Lustre-devel] [Bug 11301] parallel lock callbacks
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11301 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9359|review?(green@clusterfs.com)|review- Flag| | (From update of attachment 9359)> int ldlm_run_bl_ast_work(struct list_head *rpc_list) > { >+ struct ldlm_cb_set_arg arg; > struct list_head *tmp, *pos; > struct ldlm_lock_desc d; >- int rc = 0, retval = 0; >+ int ast_count; >+ int rc = 0; > ENTRY; > >+ arg.set = ptlrpc_prep_set(); >+ atomic_set(&arg.restart, 0); >+ arg.type = LDLM_BL_CALLBACK; >+ >+ ast_count = 0; > list_for_each_safe(tmp, pos, rpc_list) { > struct ldlm_lock *lock > list_entry(tmp, struct ldlm_lock, l_bl_ast); >@@ -1051,24 +1076,38 @@ int ldlm_run_bl_ast_work(struct list_hea > > LDLM_LOCK_PUT(lock->l_blocking_lock); > lock->l_blocking_lock = NULL; >- rc = lock->l_blocking_ast(lock, &d, NULL, LDLM_CB_BLOCKING); >- >- if (rc == -ERESTART) >- retval = rc; >- else if (rc) >- CDEBUG(D_DLMTRACE, "Failed AST - should clean & " >- "disconnect client\n"); >+ rc = lock->l_blocking_ast(lock, &d, (void *)&arg, >+ LDLM_CB_BLOCKING); > LDLM_LOCK_PUT(lock); >+ ast_count++; >+ >+ /* Send the request set if it exceeds the PARALLEL_AST_LIMIT, >+ * and create a new set for requests that remained in >+ * @rpc_list */ >+ if (unlikely(ast_count == PARALLEL_AST_LIMIT)) { >+ ldlm_send_and_maybe_create_set(&arg, 1); >+ ast_count = 0; >+ } > } >- RETURN(retval); >+ >+ if (ast_count > 0)Note that if you get here with count 0 (e.g. if there were exactly PARALLEL_AST_LIMIT asts to send, then you leak memory allocated to request.>+ ldlm_send_and_maybe_create_set(&arg, 0); >+ >+ RETURN(atomic_read(&arg.restart) ? -ERESTART : 0); > } > > int ldlm_run_cp_ast_work(struct list_head *rpc_list)Same as above wrt leaking memory.
deen@clusterfs.com
2007-Jan-20 06:29 UTC
[Lustre-devel] [Bug 11301] parallel lock callbacks
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11301 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9359 is|0 |1 obsolete| | Created an attachment (id=9391) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9391&action=view) Fix write memory leaking.
green@clusterfs.com
2007-Jan-23 12:10 UTC
[Lustre-devel] [Bug 11301] parallel lock callbacks
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11301 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9391|review?(green@clusterfs.com)|review+ Flag| | (From update of attachment 9391)>+ if (ast_count > 0) >+ ldlm_send_and_maybe_create_set(&arg, 0); >+ else >+ /* In case when number of ASTs is multiply of >+ * PARALLEL_AST_LIMIT or @rpc_list was initially empty, >+ * @arg.set must be destroyed here, otherwise we get >+ * write memory leaking. */ >+ ptlrpc_set_destroy(arg.set);I was thinking along the line that you can just drop the if in front of ldlm_send_and_maybe_create_set() call, but this will also do, of course. About sanityN.sh, test 29, if it executes that ast error path - then yes, we can use it.