thr3ads.net - Lustre devel - [Lustre-devel] [Bug 11327] Assertion in target_handle

If this information is useful, please help other people find it:
Share via:

behlendorf1@llnl.gov

2007-Feb-05 14:59 UTC

[Lustre-devel] [Bug 11327] Assertion in target_handle_connect

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11327



Created an attachment (id=9511)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
 --> (https://bugzilla.lustre.org/attachment.cgi?id=9511&action=view)
Kernel dk log

behlendorf1@llnl.gov

2007-Feb-05 19:09 UTC

head link

[Lustre-devel] [Bug 11327] Assertion in target_handle_connect

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11327



Created an attachment (id=9514)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
 --> (https://bugzilla.lustre.org/attachment.cgi?id=9514&action=view)
proposed patch against 1.4.8-5chaos

Proposed fix to resolve connection / eviction race.  Here''s the basic
race full details can be seen in the attached kernel dk log.

ping_evictor				ll_ost_io_77
---------------------------------------------------------------------
- ping_evictor_main()
- timeout evict client
- exp->exp_fail = 1
					- target_handle_connect()
					- valid (but failed) export found 
					  in				       

		 target->obd_exports
					- exp->exp_connecting = 1
- class_disconnect destroys cookie
					- export = class_conn2export(&conn)
					  lookup now fails due, no cookie.
					- LASSERT(export != NULL)


The proposed fix checks for exp_fail to be set in the export when it''s
scanned for in target->obd_exports.  If its set we bail because this
export will soon be destroyed.	This patch also ensures that if an export
is successfully found and exp_fail is not set that exp_connecting will
be set under the exp_lock spin_lock.  This can then be used to prevent
class_fail_export() from setting the exp_fail and destroying the export
when the connect is in progress.

behlendorf1@llnl.gov

2007-Feb-05 19:16 UTC

head link

[Lustre-devel] [Bug 11327] Assertion in target_handle_connect

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11327

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #9514 is|0                           |1
           obsolete|                            |


Created an attachment (id=9515)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
 --> (https://bugzilla.lustre.org/attachment.cgi?id=9515&action=view)
proposed patch against 1.4.8-5chaos

Drat, I knew as soon as I attached it to the bug I''d spot some little 
mistake.  Here''s a new version which ensures exp_fail doesn''t
get set
in the connecting case.

adilger@clusterfs.com

2007-Feb-06 16:07 UTC

head link

[Lustre-devel] [Bug 11327] Assertion in target_handle_connect

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11327

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #9515|review?(adilger@clusterfs.co|review-
               Flag|m)                          |


(From update of attachment 9515)> void class_fail_export(struct obd_export *exp)
> {
>-        int rc, already_failed;
>+        int rc, already_failed, connecting;
> 
>         spin_lock(&exp->exp_lock);
>         already_failed = exp->exp_failed;
>-        exp->exp_failed = 1;
>+        connecting = exp->exp_connecting;
>+        if (!connecting)
>+                exp->exp_failed = 1;
>         spin_unlock(&exp->exp_lock);
The only objection that I have here is that this appears to allow a
reconnecting client to stifle the effort of the server to evict the client.  I
can understand that if the reconnection has already started and it is holding
the locks then we can''t really interrupt it.  However, I wonder if we
shouldn''t
mark the export failed anyways and handle this more gracefully in
target_handle_connect() instead of LBUGging?

What about just returning -ENOTCONN where we currently LBUG if we can''t
look up
the connection handle?	If that lookup succeeds, but the export is failed after
that, the current code at least has a refcount on the export and it will be
destroyed after we (confusingly) return success for the connection and drop the
reference held by req->rq_export.  We could also set the rq_status ==
-ENOTCONN
right at the end of target_handle_connect() if exp_failed is set, but then
there is still a very small window after that check where the client will
return success but the export is destroyed.

>@@ -627,22 +627,35 @@ int target_handle_connect(struct ptlrpc_
>+                        spin_lock(&export->exp_lock);
>+                        if (export->exp_failed) { /* bug 11327
evict/conn race */
>+                                spin_unlock(&export->exp_lock);
>+                                CWARN("%s: exp %p connect racing with
"
>+                                      "eviction\n",
export->exp_obd->obd_name,
>+                                      export);
>+                                export = NULL;
>+                                rc = -EALREADY;
>+                                break;
>+                        }
Specifically, this check could be moved to the very end of the
"success" path.
I''m also not sure if -EALREADY is right here, instead of -ENOTCONN.

Lustre devel - Feb 2007 - [Bug 11327] Assertion in target_handle_connect

[Lustre-devel] [Bug 11327] Assertion in target_handle_connect

[Lustre-devel] [Bug 11327] Assertion in target_handle_connect

[Lustre-devel] [Bug 11327] Assertion in target_handle_connect

[Lustre-devel] [Bug 11327] Assertion in target_handle_connect