thr3ads.net - Lustre devel - [Lustre-devel] [Bug 11301] parallel lock callbacks [Jan 2007]

If this information is useful, please help other people find it:
Share via:

vitaly@clusterfs.com

2007-Jan-05 13:22 UTC

[Lustre-devel] [Bug 11301] parallel lock callbacks

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11301

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #9225 is|1                           |0
              patch|                            |
Attachment #9225 is|1                           |0
            private|                            |
   Attachment #9225|review?(vitaly@clusterfs.com|review-
               Flag|)                           |


(From update of attachment 9225)
1. ldlm_run_bl_ast_work(), ldlm_run_cb_ast_work()
@arg is shared between all the requests, which are sent out at
ptlrpc_set_wait() and interpreter is called when a request reply
is received. by that time @arg is initialized by the data of the last
prepared request, thus arg->lock_handle is wrong.

2. ldlm_server_blocking_ast()
   if (unlikely(instant_cancel)) {}

error handling is missed for this case. it looks still needed because
we may fail to send the request and want to evict the client.

3. ldlm_server_blocking_ast()
   if (unlikely(instant_cancel)) {}

ptlrpc_req_finished(req) is missed.

4. hld says if the amount of locks is huge, not more than some maximum
amount of requests will be sent in parallel. is it missed or considered
as not important?

deen@clusterfs.com

2007-Jan-08 06:47 UTC

head link

[Lustre-devel] [Bug 11301] parallel lock callbacks

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11301

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #9225 is|0                           |1
           obsolete|                            |


Created an attachment (id=9296)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
 --> (https://bugzilla.lustre.org/attachment.cgi?id=9296&action=view)
Fixes after vitaly''s CODEINSP.

I am not sure about parallel cb limit.
Yes, HLD says that "we _can_ split it into several request sets with
reasonable

size and send them serializely", but there is nothing about this in DLD.
If we want to implement this feature we should change DLD first and then code.

Vitaly, Oleg?

green@clusterfs.com

2007-Jan-11 16:15 UTC

head link

[Lustre-devel] [Bug 11301] parallel lock callbacks

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11301

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #9296|review?(green@clusterfs.com)|review-
               Flag|                            |


(From update of attachment 9296)> static int reprocess_one_queue(struct ldlm_resource *res, void *closure)
>Index: lustre/ldlm/ldlm_lockd.c
>==================================================================>RCS
file: /cvsroot/cfs/lustre-core/ldlm/ldlm_lockd.c,v
>retrieving revision 1.149.2.18.2.31.16.22
>diff -u -p -r1.149.2.18.2.31.16.22 ldlm_lockd.c
>--- lustre/ldlm/ldlm_lockd.c	15 Dec 2006 22:21:56 -0000
1.149.2.18.2.31.16.22
>+++ lustre/ldlm/ldlm_lockd.c	8 Jan 2007 13:37:28 -0000
>@@ -478,10 +478,44 @@ static int ldlm_handle_ast_error(struct 
>         return rc;
> }
> 
>+static int ldlm_cb_interpret(struct ptlrpc_request *req, void *data, int
rc)
>+{
>+        struct ldlm_cb_set_arg *arg = (struct ldlm_cb_set_arg *)data;
>+        struct ldlm_lock *lock;
>+        struct ldlm_reply *reply;
>+        ENTRY;
>+        
>+        LASSERT(data != NULL);
>+
>+        ptlrpc_req_finished(req);
So you free request
>+        if (rc != 0) {
>+                reply = lustre_swab_repbuf(req, DLM_LOCKREPLY_OFF, 
>+                                           sizeof(*reply),
>+                                           lustre_swab_ldlm_reply);
then access it?
Also what if there is no reply because there was transmission error?
You need to get lockhandle from request itself not from reply.

then we free request in ptlrpc_destroy_set() again.
I do not see you taking any extra references on a request, so no need to free
in in interpret callback.

deen@clusterfs.com

2007-Jan-13 17:03 UTC

head link

[Lustre-devel] [Bug 11301] parallel lock callbacks

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11301

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #9296 is|0                           |1
           obsolete|                            |


Created an attachment (id=9338)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
 --> (https://bugzilla.lustre.org/attachment.cgi?id=9338&action=view)
Corrections in callback func after green''s codeinsp.

green@clusterfs.com

2007-Jan-14 17:10 UTC

head link

[Lustre-devel] [Bug 11301] parallel lock callbacks

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11301

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #9338|review?(green@clusterfs.com)|review+
               Flag|                            |


(From update of attachment 9338)
Looks good.
You need a special test developed to excercise error in ldlm_cb_interpret to
make sure clients are actually being evicted as expected now.

Now about limiting amount of ASTs, I think the change is simple enough not to
require anmother round of DLD/CODE.
Basically what you need is to look at number of ASTs added to set in
ldlm_run_*_ast_work and once it grows to certain value, you just do
set_wait/set_destroy and then create the set again (you need to inherit
"restart" value over this.).
May be some code reusing with those two functions won''t'' hurt
as well.

deen@clusterfs.com

2007-Jan-17 06:11 UTC

head link

[Lustre-devel] [Bug 11301] parallel lock callbacks

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11301

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #9338 is|0                           |1
           obsolete|                            |


Created an attachment (id=9359)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
 --> (https://bugzilla.lustre.org/attachment.cgi?id=9359&action=view)
Adding ast limit to parallel callbacks code.

green@clusterfs.com

2007-Jan-17 12:08 UTC

head link

[Lustre-devel] [Bug 11301] parallel lock callbacks

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11301

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #9359|review?(green@clusterfs.com)|review-
               Flag|                            |


(From update of attachment 9359)> int ldlm_run_bl_ast_work(struct list_head *rpc_list)
> {
>+        struct ldlm_cb_set_arg arg;
>         struct list_head *tmp, *pos;
>         struct ldlm_lock_desc d;
>-        int rc = 0, retval = 0;
>+        int ast_count;
>+        int rc = 0;
>         ENTRY;
> 
>+        arg.set = ptlrpc_prep_set();
>+        atomic_set(&arg.restart, 0);
>+        arg.type = LDLM_BL_CALLBACK;
>+
>+        ast_count = 0;
>         list_for_each_safe(tmp, pos, rpc_list) {
>                 struct ldlm_lock *lock >                        
list_entry(tmp, struct ldlm_lock, l_bl_ast);
>@@ -1051,24 +1076,38 @@ int ldlm_run_bl_ast_work(struct list_hea
> 
>                 LDLM_LOCK_PUT(lock->l_blocking_lock);
>                 lock->l_blocking_lock = NULL;
>-                rc = lock->l_blocking_ast(lock, &d, NULL,
LDLM_CB_BLOCKING);
>-
>-                if (rc == -ERESTART)
>-                        retval = rc;
>-                else if (rc)
>-                        CDEBUG(D_DLMTRACE, "Failed AST - should clean
& "
>-                               "disconnect client\n");
>+                rc = lock->l_blocking_ast(lock, &d, (void
*)&arg,
>+                                          LDLM_CB_BLOCKING);
>                 LDLM_LOCK_PUT(lock);
>+                ast_count++;
>+
>+                /* Send the request set if it exceeds the
PARALLEL_AST_LIMIT,
>+                 * and create a new set for requests that remained in
>+                 * @rpc_list */
>+                if (unlikely(ast_count == PARALLEL_AST_LIMIT)) {
>+                        ldlm_send_and_maybe_create_set(&arg, 1);
>+                        ast_count = 0;
>+                }
>         }
>-        RETURN(retval);
>+
>+        if (ast_count > 0)
Note that if you get here with count 0 (e.g. if there were exactly
PARALLEL_AST_LIMIT asts to send,
then you leak memory allocated to request.
>+                ldlm_send_and_maybe_create_set(&arg, 0);
>+
>+        RETURN(atomic_read(&arg.restart) ? -ERESTART : 0);
> }
> 
> int ldlm_run_cp_ast_work(struct list_head *rpc_list)
Same as above wrt leaking memory.

deen@clusterfs.com

2007-Jan-20 06:29 UTC

head link

[Lustre-devel] [Bug 11301] parallel lock callbacks

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11301

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #9359 is|0                           |1
           obsolete|                            |


Created an attachment (id=9391)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
 --> (https://bugzilla.lustre.org/attachment.cgi?id=9391&action=view)
Fix write memory leaking.

green@clusterfs.com

2007-Jan-23 12:10 UTC

head link

[Lustre-devel] [Bug 11301] parallel lock callbacks

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11301

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #9391|review?(green@clusterfs.com)|review+
               Flag|                            |


(From update of attachment 9391)>+        if (ast_count > 0)
>+                ldlm_send_and_maybe_create_set(&arg, 0);
>+        else
>+                /* In case when number of ASTs is multiply of
>+                 * PARALLEL_AST_LIMIT or @rpc_list was initially empty,
>+                 * @arg.set must be destroyed here, otherwise we get 
>+                 * write memory leaking. */
>+                ptlrpc_set_destroy(arg.set);
I was thinking along the line that you can just drop the if in front of
ldlm_send_and_maybe_create_set() call,
but this will also do, of course.

About sanityN.sh, test 29, if it executes that ast error path - then yes, we
can use it.

Lustre devel - Jan 2007 - [Bug 11301] parallel lock callbacks

[Lustre-devel] [Bug 11301] parallel lock callbacks

[Lustre-devel] [Bug 11301] parallel lock callbacks

[Lustre-devel] [Bug 11301] parallel lock callbacks

[Lustre-devel] [Bug 11301] parallel lock callbacks

[Lustre-devel] [Bug 11301] parallel lock callbacks

[Lustre-devel] [Bug 11301] parallel lock callbacks

[Lustre-devel] [Bug 11301] parallel lock callbacks

[Lustre-devel] [Bug 11301] parallel lock callbacks

[Lustre-devel] [Bug 11301] parallel lock callbacks