Jeremy Mann
2007-Aug-21 11:48 UTC
[Lustre-discuss] Error when using lfs setstripe on new failover nodes
I have set up our cluster with 19 OSTs and several clients. When I try and use setstripe, it hangs as well as trying to use the filesystem. In /var/log/kern I''m seeing: Aug 21 12:42:27 bcf kernel: LustreError: 6942:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 12345-192.168.1.218@tcp, match 45084596 length 928 too big: 792 left, 792 allowed Aug 21 12:44:07 bcf kernel: LustreError: 2512:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1187718147, 100s ago) req@00000101fe3daa00 x45084596/t0 o101->bcffs-MDT0000_UUID@192.168.1.218@tcp:12 lens 464/792 ref 1 fl Rpc:P/2/0 rc -11/-22 Aug 21 12:44:07 bcf kernel: LustreError: 2512:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4 Aug 21 12:44:07 bcf kernel: LustreError: 2521:0:(dir.c:332:ll_readdir()) error reading dir 17734241/140907266 page 0: rc -4 Aug 21 12:44:17 bcf kernel: LustreError: 6942:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 12345-192.168.1.218@tcp, match 45085399 length 928 too big: 792 left, 792 allowed Aug 21 12:45:57 bcf kernel: LustreError: 4705:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1187718257, 100s ago) req@000001007e288200 x45085399/t0 o101->bcffs-MDT0000_UUID@192.168.1.218@tcp:12 lens 464/792 ref 1 fl Rpc:P/0/0 rc 0/-22 This goes on for a few minutes, then everything comes back and gives this error: Aug 21 12:45:57 bcf kernel: LustreError: 4705:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4 I can now use getstripe to see that it worked and it did. I''m just curious if this was an error caused by the size of the stripe? lfs setstripe 131072 -1 -1 -- Jeremy Mann jeremy@biochem.uthscsa.edu University of Texas Health Science Center Bioinformatics Core Facility http://www.bioinformatics.uthscsa.edu Phone: (210) 567-2672
Andreas Dilger
2007-Aug-29 14:13 UTC
[Lustre-discuss] Error when using lfs setstripe on new failover nodes
On Aug 21, 2007 12:48 -0500, Jeremy Mann wrote:> I have set up our cluster with 19 OSTs and several clients. When I try and > use setstripe, it hangs as well as trying to use the filesystem. In > /var/log/kern I''m seeing:When you say in the subject "new failover nodes", does this mean you have recently added new OSTs to your filesystem?> Aug 21 12:42:27 bcf kernel: LustreError: > 6942:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from > 12345-192.168.1.218@tcp, match 45084596 length 928 too big: 792 left, 792 > allowed > Aug 21 12:44:07 bcf kernel: LustreError: > 2512:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1187718147, 100s ago) req@00000101fe3daa00 x45084596/t0 > o101->bcffs-MDT0000_UUID@192.168.1.218@tcp:12 lens 464/792 ref 1 fl > Rpc:P/2/0 rc -11/-22This means the client didn''t set up enough reply space for what the server returned.> I can now use getstripe to see that it worked and it did. I''m just curious > if this was an error caused by the size of the stripe? > > lfs setstripe 131072 -1 -1It seems probable that using a smaller stripe count would succeed (5 or less) and remounting the client will probably also fix it. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.