MDS/OSSs: 1.8.8-wc1_2.6.18_308.4.1.el5_gbc88c4c Client: 1.8.9-wc1_2.6.32_358.23.2.el6 One (out of hundreds) of our clients has been unable to mount our lustre file system. We could find no host or network issues. Attempts to mount yielded the following on the client mount -t lustre -o localflock 10.13.68.1@o2ib:10.13.68.2@o2ib:/lfs /lfs/scratch mount.lustre: mount 10.13.68.1@o2ib:10.13.68.2@o2ib:/lfs at /lfs/scratch failed: Interrupted system call Error: Failed to mount 10.13.68.1@o2ib:10.13.68.2@o2ib:/lfs with the following syslog messages. Jun 10 15:21:05 r15a-s40 kernel: Lustre: 1269:0:(o2iblnd_cb.c:1813:kiblnd_close_conn_locked()) Closing conn to 10.13.79.252@o2ib2: error 0(waiting) Jun 10 15:21:05 r15a-s40 kernel: LustreError: 166-1: MGC10.13.68.1@o2ib: Connection to service MGS via nid 10.13.68.1@o2ib was lost; in progress operations using this service will fail. Jun 10 15:21:05 r15a-s40 kernel: LustreError: 15c-8: MGC10.13.68.1@o2ib: The configuration from log 'lfs-client' failed (-4). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(llite_lib.c:1099:ll_fill_super()) Unable to process log: -4 Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(lov_obd.c:1012:lov_cleanup()) lov tgt 1 not cleaned! deathrow=0, lovrc=1 Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(lov_obd.c:1012:lov_cleanup()) Skipped 5 previous similar messages Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(lov_obd.c:1012:lov_cleanup()) lov tgt 13 not cleaned! deathrow=1, lovrc=1 Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(mdc_request.c:1500:mdc_precleanup()) client import never connected Jun 10 15:21:05 r15a-s40 kernel: Lustre: MGC10.13.68.1@o2ib: Reactivating import Jun 10 15:21:05 r15a-s40 kernel: Lustre: MGC10.13.68.1@o2ib: Connection restored to service MGS using nid 10.13.68.1@o2ib. Jun 10 15:21:05 r15a-s40 kernel: Lustre: client lfs-client(ffff88061e105c00) umount complete Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(obd_mount.c:2067:lustre_fill_super()) Unable to mount (-4) Nothing noteworthy on the MDS. After reconfiguring the client with a new IPoIB IP (and hence, NID), it was able to mount with no problems and is working fine. Additionally, the MDS was rebooted at least once during the time that this client in question was unable to mount so it seems like whatever was on the MDT was saved - presumably on the MDT. I'm particularly curious about the "ll_fill_super" message. To what "log" is it referring? Anyone seen this before and have an idea what we need to clear on the MDS/MDT to allow this client to successfully mount the file system again? Thanks, Charlie Taylor UF Research Computing