thr3ads.net - Lustre discuss - [Lustre-discuss] Luster access locking up login nodes [May 2008]

If this information is useful, please help other people find it:
Share via:

Brock Palen

2008-May-16 19:48 UTC

[Lustre-discuss] Luster access locking up login nodes

I have seen this behavior a few times.
Under heavy IO lustre will just stop and dmesg will have the following:

LustreError: 3976:0:(events.c:134:client_bulk_callback()) event type  
0, status -5, desc 000001012ce12000
LustreError: 11-0: an error occurred while communicating with  
141.212.30.184 at tcp. The mds_statfs operation failed with -107
LustreError: Skipped 1 previous similar messageLustre: nobackup- 
MDT0000-mdc-00000100e9e9ac00: Connection to service nobackup-MDT0000  
via nid 141.212.30.184 at tcp was lost; in progress operations using  
this service will wait for recovery to complete.


No network connection issues between the login nodes.
When this happens the client does not recover till we reboot the  
node.  This does happen at times on the compute nodes but I see it  
most on login hosts.

If I just go to the lustre mount and try to ls it it will hang for  
forever.   Many times when lustre screws up it recovers but more and  
more it does not. and we see these bulk errors followed by mds errors.

We are using lustre 1.6.x


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985

Brian J. Murrell

2008-May-16 20:13 UTC

head link

[Lustre-discuss] Luster access locking up login nodes

On Fri, 2008-05-16 at 15:48 -0400, Brock Palen wrote:> I have seen this behavior a few times.
> Under heavy IO lustre will just stop and dmesg will have the following:
Review the list archives for statahead problems.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080516/76a8d903/attachment.bin

Brock Palen

2008-May-16 20:24 UTC

head link

[Lustre-discuss] Luster access locking up login nodes

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ahh didn''t realize this was related to that.  Good to know fix in the  
works  (2 x4500''s on the way so we have made a commitment to lustre).

How would I make this option the default on boot?  There isn''t an  
llite module I see on the clients.
I can pdsh to all the clients, but machines to get rebooted some times.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On May 16, 2008, at 4:13 PM, Brian J. Murrell wrote:> On Fri, 2008-05-16 at 15:48 -0400, Brock Palen wrote:
>> I have seen this behavior a few times.
>> Under heavy IO lustre will just stop and dmesg will have the  
>> following:
>
> Review the list archives for statahead problems.
>
> b.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFILe2TMFCQB4Bvz5QRAvuRAKCG94UfDQvSxcBXCPSxThLuirrGbACfUsTm
hobLA1aA+AHrZd4mwkY5sKQ=3Gzv
-----END PGP SIGNATURE-----

Brian J. Murrell

2008-May-16 20:28 UTC

head link

[Lustre-discuss] Luster access locking up login nodes

On Fri, 2008-05-16 at 16:24 -0400, Brock Palen wrote:> 
> Ahh didn''t realize this was related to that.
Sure looks like it from what you posted.
> How would I make this option the default on boot?
initscript.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080516/2a067cd3/attachment-0001.bin

Lustre discuss - May 2008 - Luster access locking up login nodes

[Lustre-discuss] Luster access locking up login nodes

[Lustre-discuss] Luster access locking up login nodes

[Lustre-discuss] Luster access locking up login nodes

[Lustre-discuss] Luster access locking up login nodes