thr3ads.net - Lustre discuss - Mount takes long time after abnormal shutdown of MDS/OSS [May 2013]

If this information is useful, please help other people find it:
Share via:

Chan Ching Yu, Patrick

2013-May-27 15:00 UTC

head link

Mount takes long time after abnormal shutdown of MDS/OSS

Dilger, Andreas

2013-May-28 00:38 UTC

head link

Re: Mount takes long time after abnormal shutdown of MDS/OSS

On 2013-05-27, at 9:00, "Chan Ching Yu, Patrick"
<cychan-eQCasaxV0mrc+919tysfdA@public.gmane.org<mailto:cychan-eQCasaxV0mrc+919tysfdA@public.gmane.org>>
wrote:

In my testing environment, there are one MDS/OSS server and one Lustre client,
running on CentOS 6.3. Lustre 2.1.5 is used.

I tried to power off the MDS/OSS server abnormally while Lustre filesystem is
still mounted on Lustre client.
Then I power off Lustre client, start MDS/OSS and Lustre client. However, Lustre
client takes long time to mount.

This is expected behavior if the clients are shut down after the servers. Since
Lustre clients may have recovery state after a server crash, the servers wait
after restart for the clients to reconnect and perform recovery. This can
happen relatively quickly if all of the clients are available.

If the clients have been rebooted, the servers will wait for the old clients to
connect, but this never happens. New clients are prevented from connecting
during recovery so that they do not modify the filesystem in a way that was
incompatible with what the old clients previously did.

If you know the old clients are not available, you can mount the servers with
"-o abort_recovery" to skip this delay.

The following repeated messages are generated on console:
May 27 22:06:33 node1 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.8.1@tcp. The mds_connect operation failed with –16

-16 is -EBUSY, which means the servers are busy during recovery, and are
blocking new clients from reconnecting until recovery is finished.

Thanks.
CY

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org<mailto:Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lustre discuss - May 2013 - Mount takes long time after abnormal shutdown of MDS/OSS

Mount takes long time after abnormal shutdown of MDS/OSS

Re: Mount takes long time after abnormal shutdown of MDS/OSS