thr3ads.net - Lustre discuss - [Lustre-discuss] Error when using lfs setstripe on new failover nodes [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Jeremy Mann

2007-Aug-21 11:48 UTC

[Lustre-discuss] Error when using lfs setstripe on new failover nodes

I have set up our cluster with 19 OSTs and several clients. When I try and
use setstripe, it hangs as well as trying to use the filesystem. In
/var/log/kern I''m seeing:

Aug 21 12:42:27 bcf kernel: LustreError:
6942:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from
12345-192.168.1.218@tcp, match 45084596 length 928 too big: 792 left, 792
allowed
Aug 21 12:44:07 bcf kernel: LustreError:
2512:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at
1187718147, 100s ago) req@00000101fe3daa00 x45084596/t0
o101->bcffs-MDT0000_UUID@192.168.1.218@tcp:12 lens 464/792 ref 1 fl
Rpc:P/2/0 rc -11/-22
Aug 21 12:44:07 bcf kernel: LustreError:
2512:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4
Aug 21 12:44:07 bcf kernel: LustreError: 2521:0:(dir.c:332:ll_readdir())
error reading dir 17734241/140907266 page 0: rc -4
Aug 21 12:44:17 bcf kernel: LustreError:
6942:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from
12345-192.168.1.218@tcp, match 45085399 length 928 too big: 792 left, 792
allowed
Aug 21 12:45:57 bcf kernel: LustreError:
4705:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at
1187718257, 100s ago) req@000001007e288200 x45085399/t0
o101->bcffs-MDT0000_UUID@192.168.1.218@tcp:12 lens 464/792 ref 1 fl
Rpc:P/0/0 rc 0/-22

This goes on for a few minutes, then everything comes back and gives this
error:

Aug 21 12:45:57 bcf kernel: LustreError:
4705:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4

I can now use getstripe to see that it worked and it did. I''m just
curious
if this was an error caused by the size of the stripe?

lfs setstripe 131072 -1 -1



-- 
Jeremy Mann
jeremy@biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672

Andreas Dilger

2007-Aug-29 14:13 UTC

head link

[Lustre-discuss] Error when using lfs setstripe on new failover nodes

On Aug 21, 2007  12:48 -0500, Jeremy Mann wrote:> I have set up our cluster with 19 OSTs and several clients. When I try and
> use setstripe, it hangs as well as trying to use the filesystem. In
> /var/log/kern I''m seeing:
When you say in the subject "new failover nodes", does this mean you
have recently added new OSTs to your filesystem?
> Aug 21 12:42:27 bcf kernel: LustreError:
> 6942:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from
> 12345-192.168.1.218@tcp, match 45084596 length 928 too big: 792 left, 792
> allowed
> Aug 21 12:44:07 bcf kernel: LustreError:
> 2512:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at
> 1187718147, 100s ago) req@00000101fe3daa00 x45084596/t0
> o101->bcffs-MDT0000_UUID@192.168.1.218@tcp:12 lens 464/792 ref 1 fl
> Rpc:P/2/0 rc -11/-22
This means the client didn''t set up enough reply space for what the
server
returned.
> I can now use getstripe to see that it worked and it did. I''m just
curious
> if this was an error caused by the size of the stripe?
> 
> lfs setstripe 131072 -1 -1
It seems probable that using a smaller stripe count would succeed (5 or less)
and remounting the client will probably also fix it.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Lustre discuss - Aug 2007 - Error when using lfs setstripe on new failover nodes

[Lustre-discuss] Error when using lfs setstripe on new failover nodes

[Lustre-discuss] Error when using lfs setstripe on new failover nodes