thr3ads.net - Lustre discuss - [Lustre-discuss] How long should an MDS failover event block [Apr 2008]

If this information is useful, please help other people find it:
Share via:

stumped

2008-Apr-30 14:53 UTC

[Lustre-discuss] How long should an MDS failover event block

Greetings,

I''m using 1.6.4.3 and have 2 MDSes servers with heartbeat.
I have 2 OSSes (each with one OST).  A client mounts the filesystem
with

  mount -t lustre mds01 at tcp0:mds02 at tcp0:/i3_lfs3  /mnt/lustre/i3_lfs3

and everything is great.

I then run this on the client:

while true; do echo "----------- $(date)"; touch $(date +%s); sleep 2;
done

So I turn off the power on mds01 and the output of the above script
is:

----------- Wed Apr 30 08:31:29 MDT 2008
----------- Wed Apr 30 08:31:31 MDT 2008
----------- Wed Apr 30 08:34:53 MDT 2008
----------- Wed Apr 30 08:38:15 MDT 2008
----------- Wed Apr 30 08:38:17 MDT 2008

In other words, my client blocks for almost 7 minutes.

Is this what I should expect?

Can I reasonably shrink this time?  Since some of the time is spent
with the MDT being "recovered" - as shown by doing
     cat /proc/fs/lustre/mds/i3_lfs3-MDT0000/recovery_status
can I expect that as my file system has more data on it (eventually
it''ll be about 700TB) that this time will also increase.

Also, the MDS gives up this stack trace; doesn''t look good; should I
be worried?  As I said, after 7 minutes or so everything seems to be
working (though I''m not really doing much with the filesystem yet).

Thanks
JR



Apr 30 08:36:33 mds02 kernel: Lustre: 30256:0:(ldlm_lib.c:
747:target_handle_connect()) i3_lfs3-MDT0000: refuse reconnection from
b1d47bed-6eca-7ccf-45e8-c39f930b361f at 10.200.20.63@tcp to
0xffff810203c6a000; still busy with 3 active RPCs
Apr 30 08:36:33 mds02 kernel: LustreError: 30256:0:(ldlm_lib.c:
1442:target_send_reply_msg()) @@@ processing error (-16)
req at ffff810227188e00
x19317/t0 o38->b1d47bed-6eca-7ccf-45e8-
c39f930b361f at NET_0x200000ac8143f_UUID:-1 lens 304/200 ref 0 fl
Interpret:/0/0 rc -16/0
Apr 30 08:37:00 mds02 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
Watchdog triggered for pid 30253: it was inactive for 100s
Apr 30 08:37:00 mds02 kernel: Lustre: 0:0:(linux-debug.c:
168:libcfs_debug_dumpstack()) showing stack for process 30253
Apr 30 08:37:00 mds02 kernel: ll_mdt_12     S ffff810001071780     0
30253      1         30254 30252 (L-TLB)
Apr 30 08:37:00 mds02 kernel: ffff810201c5b798 0000000000000046
ffff810201c5b7b8 000000000000000a
Apr 30 08:37:00 mds02 kernel:        ffff8102024559d8 ffff810202455790
ffff8102278ad040 00003b4ad4efe62e
Apr 30 08:37:00 mds02 kernel:        00000000000068eb
0000000500000000
Apr 30 08:37:00 mds02 kernel: Call Trace:
<ffffffff8013acbf>{lock_timer_base+27}
<ffffffff884f0f70>{:ptlrpc:ldlm_expired_completion_wait+0}
Apr 30 08:37:00 mds02 kernel:
<ffffffff802d91a1>{schedule_timeout+138}
<ffffffff8013b1d8>{process_timeout+0}
Apr 30 08:37:00 mds02 kernel:
<ffffffff884f23cd>{:ptlrpc:ldlm_completion_ast+861}
Apr 30 08:37:00 mds02 kernel:
<ffffffff884da629>{:ptlrpc:ldlm_lock_enqueue+1369}
<ffffffff8012ae07>{default_wake_function+0}
Apr 30 08:37:00 mds02 kernel:
<ffffffff884d6e8a>{:ptlrpc:ldlm_lock_addref_internal_nolock+58}
Apr 30 08:37:00 mds02 kernel:
<ffffffff884eed90>{:ptlrpc:ldlm_blocking_ast+0}
<ffffffff884f0004>{:ptlrpc:ldlm_cli_enqueue_local+1108}
Apr 30 08:37:00 mds02 kernel:
<ffffffff8874ce27>{:mds:mds_fid2locked_dentry+471}
<ffffffff884f2070>{:ptlrpc:ldlm_completion_ast+0}
Apr 30 08:37:00 mds02 kernel:
<ffffffff8874d5e7>{:mds:mds_getattr_lock+1783}
<ffffffff887588a1>{:mds:mds_intent_policy+1489}
Apr 30 08:37:00 mds02 kernel:
<ffffffff884dd7d6>{:ptlrpc:ldlm_resource_putref+438}
Apr 30 08:37:00 mds02 kernel:
<ffffffff884da1c3>{:ptlrpc:ldlm_lock_enqueue+243}
<ffffffff884d7bfd>{:ptlrpc:ldlm_lock_create+2477}
Apr 30 08:37:00 mds02 kernel:
<ffffffff8850f1e5>{:ptlrpc:lustre_swab_buf+197}
<ffffffff884fa8c0>{:ptlrpc:ldlm_server_completion_ast+0}
Apr 30 08:37:00 mds02 kernel:
<ffffffff884f6ff7>{:ptlrpc:ldlm_handle_enqueue+3495}
Apr 30 08:37:00 mds02 kernel:
<ffffffff884fae30>{:ptlrpc:ldlm_server_blocking_ast+0}
Apr 30 08:37:00 mds02 kernel:        <ffffffff88753fad>{:mds:mds_handle
+18141} <ffffffff88479d92>{:obdclass:class_handle2object+210}
Apr 30 08:37:00 mds02 kernel:
<ffffffff88511660>{:ptlrpc:lustre_swab_ptlrpc_body+0}
Apr 30 08:37:00 mds02 kernel:
<ffffffff8850f1e5>{:ptlrpc:lustre_swab_buf+197}
<ffffffff8013adf9>{__mod_timer+173}
Apr 30 08:37:00 mds02 kernel:
<ffffffff88519327>{:ptlrpc:ptlrpc_main+4903}
<ffffffff8012ae07>{default_wake_function+0}
Apr 30 08:37:00 mds02 kernel:        <ffffffff8010bb8a>{child_rip+8}
<ffffffff88518000>{:ptlrpc:ptlrpc_main+0}
Apr 30 08:37:00 mds02 kernel:        <ffffffff8010bb82>{child_rip+0}
Apr 30 08:37:00 mds02 kernel: LustreError: 30253:0:(ldlm_request.c:
64:ldlm_expired_completion_wait()) ### lock timed out (enqueued at
12095661
20, 100s ago); not entering recovery in server code, just going back
to sleep ns: mds-i3_lfs3-MDT0000_UUID lock:
ffff8102002cda80/0xdd61b8a90a
25a009 lrc: 3/1,0 mode: --/CR res: 1096876033/1115803706 bits 0x3 rrc:
4 type: IBT flags: 4004000 remote: 0x0 expref: -99 pid 30253
Apr 30 08:37:00 mds02 kernel: LustreError: dumping log to /tmp/lustre-
log.1209566220.30253

Andreas Dilger

2008-May-02 22:14 UTC

head link

[Lustre-discuss] How long should an MDS failover event block

On Apr 30, 2008  07:53 -0700, stumped wrote:> I''m using 1.6.4.3 and have 2 MDSes servers with heartbeat.
> I have 2 OSSes (each with one OST).  A client mounts the filesystem
> with
> 
>   mount -t lustre mds01 at tcp0:mds02 at tcp0:/i3_lfs3  /mnt/lustre/i3_lfs3
> 
> and everything is great.
> 
> I then run this on the client:
> 
> while true; do echo "----------- $(date)"; touch $(date +%s);
sleep 2;
> done
> 
> So I turn off the power on mds01 and the output of the above script
> is:
> 
> ----------- Wed Apr 30 08:31:29 MDT 2008
> ----------- Wed Apr 30 08:31:31 MDT 2008
> ----------- Wed Apr 30 08:34:53 MDT 2008
> ----------- Wed Apr 30 08:38:15 MDT 2008
> ----------- Wed Apr 30 08:38:17 MDT 2008
> 
> In other words, my client blocks for almost 7 minutes.
> 
> Is this what I should expect?
Yes, this is pretty reasonable.  We are working to reduce this delay
in 1.8 with a feature called "adaptive timeouts" that will tune the
Lustre recovery timeout as a function of how busy the servers are.

Currently the default timeout is 100s and for large clusters (say 500
or more clients) the recommended delay is 300s because under load a
server is in essence undergoing a denial-of-service attack from the
clients, and each client might get 1/1000th of a server''s idle
performance,
which is sometimes hard to distinguish from server failure.
> Can I reasonably shrink this time?  Since some of the time is spent
> with the MDT being "recovered" - as shown by doing
>      cat /proc/fs/lustre/mds/i3_lfs3-MDT0000/recovery_status
> can I expect that as my file system has more data on it (eventually
> it''ll be about 700TB) that this time will also increase.
The recovery time is not a function of the amount of data on the
filesystem.
> Also, the MDS gives up this stack trace; doesn''t look good; should
I
> be worried?  As I said, after 7 minutes or so everything seems to be
> working (though I''m not really doing much with the filesystem
yet).
> 
> 
> Apr 30 08:36:33 mds02 kernel: Lustre: 30256:0:(ldlm_lib.c:
> 747:target_handle_connect()) i3_lfs3-MDT0000: refuse reconnection from
> b1d47bed-6eca-7ccf-45e8-c39f930b361f at 10.200.20.63@tcp to
> 0xffff810203c6a000; still busy with 3 active RPCs
> Apr 30 08:36:33 mds02 kernel: LustreError: 30256:0:(ldlm_lib.c:
> 1442:target_send_reply_msg()) @@@ processing error (-16)
> req at ffff810227188e00
> x19317/t0 o38->b1d47bed-6eca-7ccf-45e8-
> c39f930b361f at NET_0x200000ac8143f_UUID:-1 lens 304/200 ref 0 fl
> Interpret:/0/0 rc -16/0
> Apr 30 08:37:00 mds02 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
> Watchdog triggered for pid 30253: it was inactive for 100s
> Apr 30 08:37:00 mds02 kernel: LustreError: 30253:0:(ldlm_request.c:
> 64:ldlm_expired_completion_wait()) ### lock timed out (enqueued at
> 1209566120, 100s ago); not entering recovery in server code, just going
back
> to sleep ns: mds-i3_lfs3-MDT0000_UUID lock: ffff8102002cda80/0xdd61b8a90a
> 25a009 lrc: 3/1,0 mode: --/CR res: 1096876033/1115803706 bits 0x3 rrc:
> 4 type: IBT flags: 4004000 remote: 0x0 expref: -99 pid 30253
> Apr 30 08:37:00 mds02 kernel: LustreError: dumping log to /tmp/lustre-
> log.1209566220.30253
Probably not ideal, but not fatal.  There should be a message after
this which reports that this thread was responsive after 120s or
similar...

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Apr 2008 - How long should an MDS failover event block

[Lustre-discuss] How long should an MDS failover event block

[Lustre-discuss] How long should an MDS failover event block