thr3ads.net - Lustre devel - [Lustre-devel] lustre 1.8+ issues with automounter [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Jeremy Filizetti

2011-Mar-04 04:48 UTC

[Lustre-devel] lustre 1.8+ issues with automounter

Ever since we moved from Lustre 1.6.6 to 1.8 I''ve seen issues with
using
the automounter and Lustre.  I''ve finally got around to looking at what
the issue is, but I''m not quite sure what the correct way to resolve it
is.  I think the issue will remain in 2.0+ but I didn''t look closely at
the code.  The issue is that lov_connect which calls lov_connect_obd is
an asynchronous connect that does not wait for all OSCs to be connected
before returning.  In the end lustre_fill_super can return before all
OSCs have been set active so any file operations that caused the
automount may return an error.  Many lov functions check to make sure
the lov_tgt_desc ltd_active flag is 1 or return -EIO. 

The following patch handles things correctly by waiting until all OSC''s
that are set to be activated are active before returning from filling
the super block.  There are a few problems that I''m not sure of what
the
expected results are with Lustre.  For example if an OST has not been
mounted the client will attempt to connect and end up returning -ENODEV
and setting the import_state as LUSTRE_IMP_DISCON.  Without the patch
the client mounts immediately even though the OSC is unavailable, with
it the mount would not return until the user kills the process, the OBD
is set inactive, or the state changes.  To provide the same
functionality an extra condition would need to be added to the
l_wait_event condition to monitor the import state is not connecting. 
However if I do that, I''m not sure things handle failover nodes
correctly.  So what I''m wondering is what are the expected actions for
the different conditions of OSTs.

Thanks,
Jeremy

diff --git a/lustre/include/obd.h b/lustre/include/obd.h
index e89805d..3046a5c 100644
--- a/lustre/include/obd.h
+++ b/lustre/include/obd.h
@@ -754,6 +754,8 @@ struct lov_tgt_desc {
         unsigned long       ltd_active:1,/* is this target up for
requests */
                             ltd_activate:1,/* should this target be
activated */
                             ltd_reap:1;  /* should this target be
deleted */
+    cfs_waitq_t         ltd_started; /* waitqueue to notify tgt has
been fully started
+                                      * so IO can start */
 };
 
 /* Pool metadata */
@@ -942,6 +944,8 @@ enum obd_notify_event {
         OBD_NOTIFY_ACTIVE,
         /* Device deactivated */
         OBD_NOTIFY_INACTIVE,
+        /* Device disconnected */
+        OBD_NOTIFY_DISCON,
         /* Connect data for import were changed */
         OBD_NOTIFY_OCD,
         /* Sync request */
diff --git a/lustre/lov/lov_obd.c b/lustre/lov/lov_obd.c
index 8b2d848..ff4a04a 100644
--- a/lustre/lov/lov_obd.c
+++ b/lustre/lov/lov_obd.c
@@ -222,7 +222,33 @@ static int lov_notify(struct obd_device *obd,
struct obd_device *watched,
                 }
                 /* active event should be pass lov target index as data */
                 data = &rc;
-        }
+        } else if (ev == OBD_NOTIFY_DISCON) {
+        struct lov_tgt_desc *tgt;
+        struct lov_obd *lov = &obd->u.lov;
+        int i;
+
+        LASSERT(watched);
+                if (strcmp(watched->obd_type->typ_name, LUSTRE_OSC_NAME))
{
+                        CERROR("unexpected notification of %s %s!\n",
+                               watched->obd_type->typ_name,
+                               watched->obd_name);
+                        RETURN(-EINVAL);
+        }
+
+        obd_getref(obd);
+        for (i = 0; i < lov->desc.ld_tgt_count; i++) {
+            tgt = lov->lov_tgts[i];
+            if (!tgt || !tgt->ltd_exp)
+                continue;
+
+            if (obd_uuid_equals(&watched->u.cli.cl_target_uuid,
&tgt->ltd_uuid)) {
+                cfs_waitq_signal(&lov->lov_tgts[i]->ltd_started);
+                data = &i;
+                break;
+            }
+        }
+        obd_putref(obd);
+    }
 
         /* Pass the notification up the chain. */
         if (watched) {
@@ -424,6 +450,27 @@ static int lov_connect(struct lustre_handle *conn,
struct obd_device *obd,
                                obd->obd_name, rc);
                 }
         }
+
+    /* Wait for all the connections to complete before returning so
that all
+         * obds are set active that should be.  Otherwise IO that
happens immediately
+     * after mount could (autofs) could glimpse or touch objects before
the connecction
+     * is established */
+    for (i = 0; i < lov->desc.ld_tgt_count; i++) {
+        struct l_wait_info lwi = { 0 };
+
+        tgt = lov->lov_tgts[i];
+        if (!tgt || !tgt->ltd_exp || obd_uuid_empty(&tgt->ltd_uuid))
+            continue;
+
+        if (tgt->ltd_activate == tgt->ltd_active)
+            continue;
+
+        CDEBUG(D_CONFIG, "Target %s activate/active %d/%d, waiting on
state change\n",
+               tgt->ltd_obd->obd_name, tgt->ltd_activate,
tgt->ltd_active);
+
+        l_wait_event(tgt->ltd_started, tgt->ltd_activate
=tgt->ltd_active ||
+                     tgt->ltd_obd->u.cli.cl_import->imp_deactive,
&lwi);
+    }
         obd_putref(obd);
 
         RETURN(0);
@@ -445,6 +492,9 @@ static int lov_disconnect_obd(struct obd_device
*obd, struct lov_tgt_desc *tgt)
                 tgt->ltd_active = 0;
                 lov->desc.ld_active_tgt_count--;
                 tgt->ltd_exp->exp_obd->obd_inactive = 1;
+
+        /* If state change wake up wait queue */
+        cfs_waitq_signal(&tgt->ltd_started);
         }
 
         lov_proc_dir = lprocfs_srch(obd->obd_proc_entry,
"target_obds");
@@ -582,6 +632,9 @@ static int lov_set_osc_active(struct obd_device
*obd, struct obd_uuid *uuid,
         lov->lov_tgts[i]->ltd_qos.ltq_penalty = 0;
 
  out:
+    if (i >= 0)
+        cfs_waitq_signal(&lov->lov_tgts[i]->ltd_started);
+
         obd_putref(obd);
         RETURN(i);
 }
@@ -673,6 +726,8 @@ static int lov_add_target(struct obd_device *obd,
struct obd_uuid *uuidp,
         if (index >= lov->desc.ld_tgt_count)
                 lov->desc.ld_tgt_count = index + 1;
 
+    cfs_waitq_init(&tgt->ltd_started);
+
         mutex_up(&lov->lov_lock);
 
         CDEBUG(D_CONFIG, "idx=%d ltd_gen=%d ld_tgt_count=%d\n",
diff --git a/lustre/osc/osc_request.c b/lustre/osc/osc_request.c
index 7dd8667..cfc6ccf 100644
--- a/lustre/osc/osc_request.c
+++ b/lustre/osc/osc_request.c
@@ -4398,6 +4398,7 @@ static int osc_import_event(struct obd_device *obd,
                 cli->cl_lost_grant = 0;
                 client_obd_list_unlock(&cli->cl_loi_list_lock);
                 ptlrpc_import_setasync(imp, -1);
+        obd_notify_observer(obd, obd, OBD_NOTIFY_DISCON, NULL);
 
                 break;
         }

Alexey Lyashkov

2011-Mar-04 05:47 UTC

head link

[Lustre-devel] lustre 1.8+ issues with automounter

On Mar 4, 2011, at 07:48, Jeremy Filizetti wrote:
> Ever since we moved from Lustre 1.6.6 to 1.8 I''ve seen issues with
using
> the automounter and Lustre.  I''ve finally got around to looking at
what
> the issue is, but I''m not quite sure what the correct way to
resolve it
> is.  I think the issue will remain in 2.0+ but I didn''t look
closely at
> the code.  The issue is that lov_connect which calls lov_connect_obd is
> an asynchronous connect that does not wait for all OSCs to be connected
> before returning.  In the end lustre_fill_super can return before all
> OSCs have been set active so any file operations that caused the
> automount may return an error.  Many lov functions check to make sure
> the lov_tgt_desc ltd_active flag is 1 or return -EIO. 
> 
> you patch is wrong in case some OSC targets will be inaccessible (in
maintenance, or network troubles).
In that case lov_connect will stick in waiting for infinity time, but that is
don''t expected behavior.
Can you provide more details about what is situation confuses automount ?
or try to move>>        err = obd_statfs(obd, &osfs, cfs_time_current_64() - HZ, 0);
        if (err)
                GOTO(out_mdc, err);                                             
>>from current location to something after get root fid.

if FS mounted without lazystatfs option, obd_statfs will blocked until all
connection requests is finished.
so you will have same behavior but without changes in obd_connect() code.

--------------------------------------------
Alexey Lyashkov
alexey_lyashkov at xyratex.com

______________________________________________________________________
This email may contain privileged or confidential information, which should only
be used for the purpose for which it was sent by Xyratex. No further rights or
licenses are granted to use such information. If you are not the intended
recipient of this message, please notify the sender by return and delete it. You
may not use, copy, disclose or rely on the information contained in it.

Internet email is susceptible to data corruption, interception and unauthorised
amendment for which Xyratex does not accept liability. While we have taken
reasonable precautions to ensure that this email is free of viruses, Xyratex
does not accept liability for the presence of any computer viruses in this
email, nor for any losses caused as a result of viruses.

Xyratex Technology Limited (03134912), Registered in England & Wales,
Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.

The Xyratex group of companies also includes, Xyratex Ltd, registered in
Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia)
Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in
The People''s Republic of China and Xyratex Japan Limited registered in
Japan.
______________________________________________________________________

Jeremy Filizetti

2011-Mar-04 06:12 UTC

head link

[Lustre-devel] lustre 1.8+ issues with automounter

An example is below with some comments and a handful of the log
removed.  I don''t actually have this many OSTs but I just created a lot
of OSTs to easily reproduce the problem in a VM.  autofs is setup to
mount lustre.  The autofs attempts to mount the file system when I typed
"ls -l  /lustre/xen1/tmp/testfile" where testfile is allocated on the
192nd OST IIRC.

Mount kicked off by the above command by the automounter.
00000020:01200004:2:1298954011.295906:0:8398:0:(obd_mount.c:2001:lustre_fill_super())
VFS Op: sb ffff8801e7e22c00
00000020:01000004:2:1298954011.295920:0:8398:0:(obd_mount.c:2015:lustre_fill_super())
Mounting client xen1-client
00000080:00200000:2:1298954011.301889:0:8398:0:(llite_lib.c:1017:ll_fill_super())
VFS Op: sb ffff8801e7e22c00
00000080:01000000:2:1298954011.431273:0:8398:0:(llite_lib.c:1115:ll_fill_super())
Found profile xen1-client: mdc=xen1-MDT0000-mdc osc=xen1-clilov
00000080:00000010:2:1298954011.431274:0:8398:0:(llite_lib.c:1118:ll_fill_super())
kmalloced ''osc'': 29 at ffff8801e7efd9a0.
00000080:00000010:2:1298954011.431276:0:8398:0:(llite_lib.c:1124:ll_fill_super())
kmalloced ''mdc'': 34 at ffff8801dcb56ec0.
00000080:00000010:2:1298954011.431277:0:8398:0:(llite_lib.c:267:client_common_fill_super())
kmalloced ''data'': 72 at ffff8801e9deedc0.
00000080:00100000:2:1298954011.432116:0:8398:0:(llite_lib.c:409:client_common_fill_super())
ocd_connect_flags: 0xe1440478 ocd_version: 17302784 ocd_grant: 0
00020000:01000000:1:1298954011.432928:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST0000_UUID active
00020000:01000000:1:1298954011.432977:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST0002_UUID active
00020000:01000000:1:1298954011.433025:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST0004_UUID active
.
.
.
00020000:01000000:2:1298954011.455806:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST0094_UUID active
00020000:01000000:2:1298954011.455924:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST0095_UUID active
00020000:01000000:2:1298954011.456042:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST0096_UUID active
00020000:01000000:2:1298954011.456161:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST0097_UUID active
00020000:01000000:2:1298954011.457417:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST0098_UUID active
00000080:00000004:1:1298954011.457543:0:8398:0:(llite_lib.c:467:client_common_fill_super())
rootfid 16:[0x10:0xababf859:0x4000]
00020000:01000000:2:1298954011.457573:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST0099_UUID active
00020000:01000000:2:1298954011.457705:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST009a_UUID active
00000080:00000010:1:1298954011.457830:0:8398:0:(super25.c:57:ll_alloc_inode())
slab-alloced ''(lli)'': 928 at ffff8801e0de4bc0.
00020000:01000000:2:1298954011.457855:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST009b_UUID active
00000080:00000010:1:1298954011.457938:0:8398:0:(llite_lib.c:528:client_common_fill_super())
kfreed ''data'': 72 at ffff8801e9deedc0.
00000080:00000010:1:1298954011.457977:0:8398:0:(llite_lib.c:1151:ll_fill_super())
kfreed ''mdc'': 34 at ffff8801dcb56ec0.
00000080:00000010:1:1298954011.457979:0:8398:0:(llite_lib.c:1153:ll_fill_super())
kfreed ''osc'': 29 at ffff8801e7efd9a0.
00000080:02000400:1:1298954011.457979:0:8398:0:(llite_lib.c:1157:ll_fill_super())
Client xen1-client has started
00000020:00000004:1:1298954011.457980:0:8398:0:(obd_mount.c:2053:lustre_fill_super())
Mount 192.168.66.2 at tcp8:/xen1 complete

We just returned from filling the super block so now the file system is
accessible, but as you can see by the lov_set_osc_active not all OSC''s
have been set active yet.

00020000:01000000:2:1298954011.457981:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST009c_UUID active
00020000:01000000:2:1298954011.458108:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST009d_UUID active
.
.
.
00020000:01000000:2:1298954011.460053:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST00ac_UUID active
00020000:01000000:2:1298954011.460187:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST00ad_UUID active
00000080:00000010:1:1298954011.461272:0:8395:0:(super25.c:57:ll_alloc_inode())
slab-alloced ''(lli)'': 928 at ffff8801e0de4800.
00020000:01000000:2:1298954011.461487:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST00ae_UUID active
00000080:00000010:1:1298954011.461589:0:8395:0:(super25.c:57:ll_alloc_inode())
slab-alloced ''(lli)'': 928 at ffff8801e0de4440.
00000080:00010000:1:1298954011.461624:0:8395:0:(file.c:965:ll_glimpse_size())
Glimpsing inode 218
00000080:00020000:1:1298954011.461636:0:8395:0:(file.c:995:ll_glimpse_size())
obd_enqueue returned rc -5, returning -EIO

Now glimpsing the inode from above that is allocated on xen-OST00bf
which is not yet active so the set is empty and returns -EIO.

00020000:01000000:2:1298954011.461644:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST00af_UUID active
00020000:01000000:2:1298954011.461782:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST00b0_UUID active
.
.
.
00020000:01000000:2:1298954011.463766:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST00be_UUID active
00020000:01000000:2:1298954011.463911:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
Marking OSC xen1-OST00bf_UUID active

Finally the last OSC is set active, this is where
client_common_fill_super should, ll_fill_super, lustre_fill_super should
return from the mount syscall because the file system is now all accessible.

I will take a look at your suggestion below tomorrow to see if it will
handle this situate.


Thanks,
Jeremy
> you patch is wrong in case some OSC targets will be inaccessible (in
maintenance, or network troubles).
> In that case lov_connect will stick in waiting for infinity time, but that
is don''t expected behavior.
> Can you provide more details about what is situation confuses automount ?
> or try to move
>>>
>         err = obd_statfs(obd, &osfs, cfs_time_current_64() - HZ, 0);
>         if (err)
>                 GOTO(out_mdc, err);
>>>
> from current location to something after get root fid.
>
> if FS mounted without lazystatfs option, obd_statfs will blocked until all
connection requests is finished.
> so you will have same behavior but without changes in obd_connect() code.

Alexey Lyashkov

2011-Mar-04 06:21 UTC

head link

[Lustre-devel] lustre 1.8+ issues with automounter

if you can add "df " call after mounting lustre fs - it will also
help.

On Mar 4, 2011, at 09:12, Jeremy Filizetti wrote:
> An example is below with some comments and a handful of the log
> removed.  I don''t actually have this many OSTs but I just created
a lot
> of OSTs to easily reproduce the problem in a VM.  autofs is setup to
> mount lustre.  The autofs attempts to mount the file system when I typed
> "ls -l  /lustre/xen1/tmp/testfile" where testfile is allocated on
the
> 192nd OST IIRC.
> 
> Mount kicked off by the above command by the automounter.
>
00000020:01200004:2:1298954011.295906:0:8398:0:(obd_mount.c:2001:lustre_fill_super())
> VFS Op: sb ffff8801e7e22c00
>
00000020:01000004:2:1298954011.295920:0:8398:0:(obd_mount.c:2015:lustre_fill_super())
> Mounting client xen1-client
>
00000080:00200000:2:1298954011.301889:0:8398:0:(llite_lib.c:1017:ll_fill_super())
> VFS Op: sb ffff8801e7e22c00
>
00000080:01000000:2:1298954011.431273:0:8398:0:(llite_lib.c:1115:ll_fill_super())
> Found profile xen1-client: mdc=xen1-MDT0000-mdc osc=xen1-clilov
>
00000080:00000010:2:1298954011.431274:0:8398:0:(llite_lib.c:1118:ll_fill_super())
> kmalloced ''osc'': 29 at ffff8801e7efd9a0.
>
00000080:00000010:2:1298954011.431276:0:8398:0:(llite_lib.c:1124:ll_fill_super())
> kmalloced ''mdc'': 34 at ffff8801dcb56ec0.
>
00000080:00000010:2:1298954011.431277:0:8398:0:(llite_lib.c:267:client_common_fill_super())
> kmalloced ''data'': 72 at ffff8801e9deedc0.
>
00000080:00100000:2:1298954011.432116:0:8398:0:(llite_lib.c:409:client_common_fill_super())
> ocd_connect_flags: 0xe1440478 ocd_version: 17302784 ocd_grant: 0
>
00020000:01000000:1:1298954011.432928:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST0000_UUID active
>
00020000:01000000:1:1298954011.432977:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST0002_UUID active
>
00020000:01000000:1:1298954011.433025:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST0004_UUID active
> .
> .
> .
>
00020000:01000000:2:1298954011.455806:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST0094_UUID active
>
00020000:01000000:2:1298954011.455924:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST0095_UUID active
>
00020000:01000000:2:1298954011.456042:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST0096_UUID active
>
00020000:01000000:2:1298954011.456161:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST0097_UUID active
>
00020000:01000000:2:1298954011.457417:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST0098_UUID active
>
00000080:00000004:1:1298954011.457543:0:8398:0:(llite_lib.c:467:client_common_fill_super())
> rootfid 16:[0x10:0xababf859:0x4000]
>
00020000:01000000:2:1298954011.457573:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST0099_UUID active
>
00020000:01000000:2:1298954011.457705:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST009a_UUID active
>
00000080:00000010:1:1298954011.457830:0:8398:0:(super25.c:57:ll_alloc_inode())
> slab-alloced ''(lli)'': 928 at ffff8801e0de4bc0.
>
00020000:01000000:2:1298954011.457855:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST009b_UUID active
>
00000080:00000010:1:1298954011.457938:0:8398:0:(llite_lib.c:528:client_common_fill_super())
> kfreed ''data'': 72 at ffff8801e9deedc0.
>
00000080:00000010:1:1298954011.457977:0:8398:0:(llite_lib.c:1151:ll_fill_super())
> kfreed ''mdc'': 34 at ffff8801dcb56ec0.
>
00000080:00000010:1:1298954011.457979:0:8398:0:(llite_lib.c:1153:ll_fill_super())
> kfreed ''osc'': 29 at ffff8801e7efd9a0.
>
00000080:02000400:1:1298954011.457979:0:8398:0:(llite_lib.c:1157:ll_fill_super())
> Client xen1-client has started
>
00000020:00000004:1:1298954011.457980:0:8398:0:(obd_mount.c:2053:lustre_fill_super())
> Mount 192.168.66.2 at tcp8:/xen1 complete
> 
> We just returned from filling the super block so now the file system is
> accessible, but as you can see by the lov_set_osc_active not all
OSC''s
> have been set active yet.
> 
>
00020000:01000000:2:1298954011.457981:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST009c_UUID active
>
00020000:01000000:2:1298954011.458108:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST009d_UUID active
> .
> .
> .
>
00020000:01000000:2:1298954011.460053:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST00ac_UUID active
>
00020000:01000000:2:1298954011.460187:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST00ad_UUID active
>
00000080:00000010:1:1298954011.461272:0:8395:0:(super25.c:57:ll_alloc_inode())
> slab-alloced ''(lli)'': 928 at ffff8801e0de4800.
>
00020000:01000000:2:1298954011.461487:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST00ae_UUID active
>
00000080:00000010:1:1298954011.461589:0:8395:0:(super25.c:57:ll_alloc_inode())
> slab-alloced ''(lli)'': 928 at ffff8801e0de4440.
>
00000080:00010000:1:1298954011.461624:0:8395:0:(file.c:965:ll_glimpse_size())
> Glimpsing inode 218
>
00000080:00020000:1:1298954011.461636:0:8395:0:(file.c:995:ll_glimpse_size())
> obd_enqueue returned rc -5, returning -EIO
> 
> Now glimpsing the inode from above that is allocated on xen-OST00bf
> which is not yet active so the set is empty and returns -EIO.
> 
>
00020000:01000000:2:1298954011.461644:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST00af_UUID active
>
00020000:01000000:2:1298954011.461782:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST00b0_UUID active
> .
> .
> .
>
00020000:01000000:2:1298954011.463766:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST00be_UUID active
>
00020000:01000000:2:1298954011.463911:0:11545:0:(lov_obd.c:570:lov_set_osc_active())
> Marking OSC xen1-OST00bf_UUID active
> 
> Finally the last OSC is set active, this is where
> client_common_fill_super should, ll_fill_super, lustre_fill_super should
> return from the mount syscall because the file system is now all
accessible.
> 
> I will take a look at your suggestion below tomorrow to see if it will
> handle this situate.
> 
> 
> Thanks,
> Jeremy
> 
>> you patch is wrong in case some OSC targets will be inaccessible (in
maintenance, or network troubles).
>> In that case lov_connect will stick in waiting for infinity time, but
that is don''t expected behavior.
>> Can you provide more details about what is situation confuses automount
?
>> or try to move
>>>> 
>>        err = obd_statfs(obd, &osfs, cfs_time_current_64() - HZ, 0);
>>        if (err)
>>                GOTO(out_mdc, err);
>>>> 
>> from current location to something after get root fid.
>> 
>> if FS mounted without lazystatfs option, obd_statfs will blocked until
all connection requests is finished.
>> so you will have same behavior but without changes in obd_connect()
code.
> 
______________________________________________________________________
This email may contain privileged or confidential information, which should only
be used for the purpose for which it was sent by Xyratex. No further rights or
licenses are granted to use such information. If you are not the intended
recipient of this message, please notify the sender by return and delete it. You
may not use, copy, disclose or rely on the information contained in it.
 
Internet email is susceptible to data corruption, interception and unauthorised
amendment for which Xyratex does not accept liability. While we have taken
reasonable precautions to ensure that this email is free of viruses, Xyratex
does not accept liability for the presence of any computer viruses in this
email, nor for any losses caused as a result of viruses.
 
Xyratex Technology Limited (03134912), Registered in England & Wales,
Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.
 
The Xyratex group of companies also includes, Xyratex Ltd, registered in
Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia)
Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in
The People''s Republic of China and Xyratex Japan Limited registered in
Japan.
______________________________________________________________________

Andreas Dilger

2011-Mar-04 06:39 UTC

head link

[Lustre-devel] lustre 1.8+ issues with automounter

On 2011-03-03, at 9:48 PM, Jeremy Filizetti wrote:> Ever since we moved from Lustre 1.6.6 to 1.8 I''ve seen issues with
using
> the automounter and Lustre.  I''ve finally got around to looking at
what
> the issue is, but I''m not quite sure what the correct way to
resolve it
> is.  I think the issue will remain in 2.0+ but I didn''t look
closely at
> the code.
Interesting.  I''ve known about automount problems with Lustre for some
time (probably a search in the list history would find a bunch), but nobody has
every dug into the root cause.  Thanks for taking the time to investigate.
>  The issue is that lov_connect which calls lov_connect_obd is
> an asynchronous connect that does not wait for all OSCs to be connected
> before returning.  In the end lustre_fill_super can return before all
> OSCs have been set active so any file operations that caused the
> automount may return an error.  Many lov functions check to make sure
> the lov_tgt_desc ltd_active flag is 1 or return -EIO. 
Right.  This is to allow Lustre to operate in "failout" mode (i.e.
never wait for recovery on a down OST, and instead allow the application to do
something else), and/or if the administrator marks the OST unavailable via
"lctl deactivate" if it is down for some extended period (major
hardware failure, corruption, etc).
> The following patch handles things correctly by waiting until all
OSC''s
> that are set to be activated are active before returning from filling
> the super block.  There are a few problems that I''m not sure of
what the
> expected results are with Lustre.  For example if an OST has not been
> mounted the client will attempt to connect and end up returning -ENODEV
> and setting the import_state as LUSTRE_IMP_DISCON.  Without the patch
> the client mounts immediately even though the OSC is unavailable, with
> it the mount would not return until the user kills the process, the OBD
> is set inactive, or the state changes.
This is done intentionally, so that the client can complete the mount without
waiting for all of the connections, which may take tens of seconds when there
are 100k of clients booting at the same time, or may take a very long time if
the OST is down, and block the client boot process indefinitely.
> To provide the same functionality an extra condition would need to be added
> to the l_wait_event condition to monitor the import state is not
connecting.
> However if I do that, I''m not sure things handle failover nodes
correctly.
> So what I''m wondering is what are the expected actions for the
different
> conditions of OSTs.
I wonder if it makes sense to start the OSCs in "active" mode, and
only mark them inactive if they fail the initial connect request.  I
haven''t looked at this code for a long time, so I''m not sure
if this will have some unintended side effects.

For future patch submissions, please follow the Lustre Coding Guidelines at
http://wiki.lustre.org/index.php/Coding_Guidelines
> diff --git a/lustre/include/obd.h b/lustre/include/obd.h
> index e89805d..3046a5c 100644
> --- a/lustre/include/obd.h
> +++ b/lustre/include/obd.h
> @@ -754,6 +754,8 @@ struct lov_tgt_desc {
>         unsigned long       ltd_active:1,/* is this target up for
> requests */
>                             ltd_activate:1,/* should this target be
> activated */
>                             ltd_reap:1;  /* should this target be
> deleted */
> +    cfs_waitq_t         ltd_started; /* waitqueue to notify tgt has
> been fully started
> +                                      * so IO can start */
> };
> 
> /* Pool metadata */
> @@ -942,6 +944,8 @@ enum obd_notify_event {
>         OBD_NOTIFY_ACTIVE,
>         /* Device deactivated */
>         OBD_NOTIFY_INACTIVE,
> +        /* Device disconnected */
> +        OBD_NOTIFY_DISCON,
>         /* Connect data for import were changed */
>         OBD_NOTIFY_OCD,
>         /* Sync request */
> diff --git a/lustre/lov/lov_obd.c b/lustre/lov/lov_obd.c
> index 8b2d848..ff4a04a 100644
> --- a/lustre/lov/lov_obd.c
> +++ b/lustre/lov/lov_obd.c
> @@ -222,7 +222,33 @@ static int lov_notify(struct obd_device *obd,
> struct obd_device *watched,
>                 }
>                 /* active event should be pass lov target index as data */
>                 data = &rc;
> -        }
> +        } else if (ev == OBD_NOTIFY_DISCON) {
> +        struct lov_tgt_desc *tgt;
> +        struct lov_obd *lov = &obd->u.lov;
> +        int i;
> +
> +        LASSERT(watched);
> +                if (strcmp(watched->obd_type->typ_name,
LUSTRE_OSC_NAME)) {
> +                        CERROR("unexpected notification of %s
%s!\n",
> +                               watched->obd_type->typ_name,
> +                               watched->obd_name);
> +                        RETURN(-EINVAL);
> +        }
> +
> +        obd_getref(obd);
> +        for (i = 0; i < lov->desc.ld_tgt_count; i++) {
> +            tgt = lov->lov_tgts[i];
> +            if (!tgt || !tgt->ltd_exp)
> +                continue;
> +
> +            if (obd_uuid_equals(&watched->u.cli.cl_target_uuid,
> &tgt->ltd_uuid)) {
> +               
cfs_waitq_signal(&lov->lov_tgts[i]->ltd_started);
> +                data = &i;
> +                break;
> +            }
> +        }
> +        obd_putref(obd);
> +    }
> 
>         /* Pass the notification up the chain. */
>         if (watched) {
> @@ -424,6 +450,27 @@ static int lov_connect(struct lustre_handle *conn,
> struct obd_device *obd,
>                                obd->obd_name, rc);
>                 }
>         }
> +
> +    /* Wait for all the connections to complete before returning so
> that all
> +         * obds are set active that should be.  Otherwise IO that
> happens immediately
> +     * after mount could (autofs) could glimpse or touch objects before
> the connecction
> +     * is established */
> +    for (i = 0; i < lov->desc.ld_tgt_count; i++) {
> +        struct l_wait_info lwi = { 0 };
> +
> +        tgt = lov->lov_tgts[i];
> +        if (!tgt || !tgt->ltd_exp ||
obd_uuid_empty(&tgt->ltd_uuid))
> +            continue;
> +
> +        if (tgt->ltd_activate == tgt->ltd_active)
> +            continue;
> +
> +        CDEBUG(D_CONFIG, "Target %s activate/active %d/%d, waiting on
> state change\n",
> +               tgt->ltd_obd->obd_name, tgt->ltd_activate,
tgt->ltd_active);
> +
> +        l_wait_event(tgt->ltd_started, tgt->ltd_activate =>
tgt->ltd_active ||
> +                     tgt->ltd_obd->u.cli.cl_import->imp_deactive,
&lwi);
> +    }
>         obd_putref(obd);
> 
>         RETURN(0);
> @@ -445,6 +492,9 @@ static int lov_disconnect_obd(struct obd_device
> *obd, struct lov_tgt_desc *tgt)
>                 tgt->ltd_active = 0;
>                 lov->desc.ld_active_tgt_count--;
>                 tgt->ltd_exp->exp_obd->obd_inactive = 1;
> +
> +        /* If state change wake up wait queue */
> +        cfs_waitq_signal(&tgt->ltd_started);
>         }
> 
>         lov_proc_dir = lprocfs_srch(obd->obd_proc_entry,
"target_obds");
> @@ -582,6 +632,9 @@ static int lov_set_osc_active(struct obd_device
> *obd, struct obd_uuid *uuid,
>         lov->lov_tgts[i]->ltd_qos.ltq_penalty = 0;
> 
>  out:
> +    if (i >= 0)
> +        cfs_waitq_signal(&lov->lov_tgts[i]->ltd_started);
> +
>         obd_putref(obd);
>         RETURN(i);
> }
> @@ -673,6 +726,8 @@ static int lov_add_target(struct obd_device *obd,
> struct obd_uuid *uuidp,
>         if (index >= lov->desc.ld_tgt_count)
>                 lov->desc.ld_tgt_count = index + 1;
> 
> +    cfs_waitq_init(&tgt->ltd_started);
> +
>         mutex_up(&lov->lov_lock);
> 
>         CDEBUG(D_CONFIG, "idx=%d ltd_gen=%d ld_tgt_count=%d\n",
> diff --git a/lustre/osc/osc_request.c b/lustre/osc/osc_request.c
> index 7dd8667..cfc6ccf 100644
> --- a/lustre/osc/osc_request.c
> +++ b/lustre/osc/osc_request.c
> @@ -4398,6 +4398,7 @@ static int osc_import_event(struct obd_device *obd,
>                 cli->cl_lost_grant = 0;
>                 client_obd_list_unlock(&cli->cl_loi_list_lock);
>                 ptlrpc_import_setasync(imp, -1);
> +        obd_notify_observer(obd, obd, OBD_NOTIFY_DISCON, NULL);
> 
>                 break;
>         }
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.

Alexey Lyashkov

2011-Mar-04 09:22 UTC

head link

[Lustre-devel] lustre 1.8+ issues with automounter

On Mar 4, 2011, at 09:39, Andreas Dilger wrote:
> On 2011-03-03, at 9:48 PM, Jeremy Filizetti wrote:
>> Ever since we moved from Lustre 1.6.6 to 1.8 I''ve seen issues
with using
>> the automounter and Lustre.  I''ve finally got around to
looking at what
>> the issue is, but I''m not quite sure what the correct way to
resolve it
>> is.  I think the issue will remain in 2.0+ but I didn''t look
closely at
>> the code.
> 
> Interesting.  I''ve known about automount problems with Lustre for
some time (probably a search in the list history would find a bunch), but nobody
has every dug into the root cause.  Thanks for taking the time to investigate.
> Looks it is result of rq_no_resend flag for glimpse request, so it will failed
(instead of put to delay list) and that error returned to caller.

--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

______________________________________________________________________
This email may contain privileged or confidential information, which should only
be used for the purpose for which it was sent by Xyratex. No further rights or
licenses are granted to use such information. If you are not the intended
recipient of this message, please notify the sender by return and delete it. You
may not use, copy, disclose or rely on the information contained in it.

Internet email is susceptible to data corruption, interception and unauthorised
amendment for which Xyratex does not accept liability. While we have taken
reasonable precautions to ensure that this email is free of viruses, Xyratex
does not accept liability for the presence of any computer viruses in this
email, nor for any losses caused as a result of viruses.

Xyratex Technology Limited (03134912), Registered in England & Wales,
Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.

The Xyratex group of companies also includes, Xyratex Ltd, registered in
Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia)
Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in
The People''s Republic of China and Xyratex Japan Limited registered in
Japan.
______________________________________________________________________

Lustre devel - Mar 2011 - lustre 1.8+ issues with automounter

[Lustre-devel] lustre 1.8+ issues with automounter

[Lustre-devel] lustre 1.8+ issues with automounter

[Lustre-devel] lustre 1.8+ issues with automounter

[Lustre-devel] lustre 1.8+ issues with automounter

[Lustre-devel] lustre 1.8+ issues with automounter

[Lustre-devel] lustre 1.8+ issues with automounter