Heiko Schröter
2009-Nov-10 09:42 UTC
[Lustre-discuss] lustre automount mounting problem with listing dir content deeper than mountpoint
Hello, after fixing the broken hardware stuff one problem remains. We are using lustre with automount since over one year without problems. Since the hardware failure a few days ago (a Gigabit switch and the SATA Backplane in one MDS) the following happens. Client quadcore1. The lustre system is mounted under ''/misc/data'' (mountpoint) via automount. ''mount'': mds1 at tcp0:mds2 at tcp0:/scia on /misc/data type lustre (rw) Now doing: ''umount /misc/data'' Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(ldlm_request.c:1575:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(connection.c:144:ptlrpc_put_connection()) NULL connection Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(connection.c:144:ptlrpc_put_connection()) Skipped 13 previous similar messages Nov 10 10:14:24 quadcore1 Lustre: client ffff81001606dc00 umount complete Trying to automount and digging one or more Dirs deeper than the mountpoint (client console hangs after this command): ''ls -la /misc/data/OneDirDeeper'' Nov 10 10:14:33 quadcore1 automount[2797]: attempting to mount entry /misc/data Nov 10 10:14:34 quadcore1 Lustre: Client scia-client has started Nov 10 10:14:34 quadcore1 automount[2797]: mount(generic): mounted mds1 at tcp0:mds2 at tcp0:/scia type lustre on /misc/data Nov 10 10:14:34 quadcore1 automount[2797]: mounted /misc/data Nov 10 10:14:34 quadcore1 LustreError: 3115:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 12345-192.168.16.122 at tcp, match 86814 length 1336 too big: 1272 left, 1272 allowed Nov 10 10:14:34 quadcore1 Lustre: 3115:0:(lib-move.c:1647:lnet_parse_put()) Dropping PUT from 12345-192.168.16.122 at tcp portal 10 match 86814 offset 128 length 1336: 2 In a new console (releasing the freezed console above): ''umount /misc/data'': Nov 10 10:15:38 quadcore1 Lustre: setting import scia-MDT0000_UUID INACTIVE by administrator request Nov 10 10:15:38 quadcore1 Lustre: Skipped 13 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 3122:0:(mdc_locks.c:841:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -4 Nov 10 10:15:38 quadcore1 LustreError: 3122:0:(mdc_locks.c:841:mdc_intent_getattr_async_interpret()) Skipped 2 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(client.c:716:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff8102244c6c00 x86895/t0 o101->scia-MDT0000_UUID at 192.168.16.122@tcp:12/10 lens 440/1400 e 0 to 100 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(client.c:716:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(mdc_locks.c:586:mdc_enqueue()) ldlm_cli_enqueue: -108 Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(mdc_locks.c:586:mdc_enqueue()) Skipped 76 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:261:ll_get_dir_page()) lock enqueue: rc: -108 Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:261:ll_get_dir_page()) Skipped 2 previous similar messages Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:415:ll_readdir()) error reading dir 4167519/1275738219 page 6: rc -108 Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:415:ll_readdir()) Skipped 2 previous similar messages There are no messages on the MDS or OSTs related to this. Doing an ''ls -la /misc/data'' works ok and the lustre system gets mounted properly on /misc/data. The above scenario is reproducable on all clients. The system works fine when the lustre system is mounted statically or after the mount is done in a proper way. lustre-1.6.6 vanilla-2.6.22.19 Thanks and Regards Heiko
Heiko Schröter
2009-Nov-10 10:01 UTC
[Lustre-discuss] lustre automount mounting problem with listing dir content deeper than mountpoint
Hello, just found that it seems to be related to a link problem inside the lustre directory structure. One of our users is doing this (mountpoint /misc/data): ln -s /misc/data/index.gz /misc/data/WorkOrSo/index.gz Than he overwrites the file /misc/data/index.gz thereby moving it to a different OST. If i delete these links (...../WorOrSo/) than lustre gets properly mounted with the automounter even diving into deeper dirs in one go. i.e. on an unmounted lustre (without these links) doing a ''ls -la /misc/data/WorkOrSo/WhateverGoesHere/'' is just fine. Does lustre has a problem with _moving_ links meaning when the file pointed to is moved to a different OST ? Regards Heiko