Hi, I''m having a strange issue and would like to get closer to understanding it. With lustre 1.6.2 over o2ib I had some cluster nodes hanging on lustre I/O processes and rebooted them. No LBUGs seen, only RDMA failures. Only the client nodes were rebooted. After re-mounting the lustre filesystem, "ls" hangs (traceback is below). But when the lustre FS is unmounted with "umount -f" ls returns the correct output. Any idea on what could be wrong? I noticed that on the buggy clients cat /proc/fs/lustre/ldlm/namespaces/*/lock_count shows something very different from the output of the "good" clients. Only the MGC* lock_count is 1, the others are zero. Is there a way to fix this? Best regards, Erich ===== traceback of hanging "ls" command ====================================ls S 00000000ffffc29c 0 4296 4092 (NOTLB) 000001007c049958 0000000000000002 000001007e2f3030 ffffffff00000074 00000101376bc808 0000000039288440 0000010080051000 00000001a02f67d3 000001007e004800 000000000000205a Call Trace:<ffffffff8013f4a4>{__mod_timer+293} <ffffffff80320c33>{schedule_timeout+367} <ffffffff8013fed4>{process_timeout+0} <ffffffffa030b474>{:ptlrpc:ptlrpc_set_wait+932} <ffffffff801335c2>{default_wake_function+0} <ffffffffa03096b0>{:ptlrpc:ptlrpc_expired_set+0} <ffffffffa0307710>{:ptlrpc:ptlrpc_interrupted_set+0} <ffffffffa03096b0>{:ptlrpc:ptlrpc_expired_set+0} <ffffffffa0307710>{:ptlrpc:ptlrpc_interrupted_set+0} <ffffffffa042637d>{:lustre:ll_glimpse_size+1613} <ffffffffa02e0e6a>{:ptlrpc:__ldlm_handle2lock+794} <ffffffffa02dc035>{:ptlrpc:lock_res_and_lock+53} <ffffffffa02dc035>{:ptlrpc:lock_res_and_lock+53} <ffffffffa02dc06f>{:ptlrpc:unlock_res_and_lock+31} <ffffffffa02e0aaa>{:ptlrpc:ldlm_lock_decref_internal+746} <ffffffffa0424b40>{:lustre:ll_extent_lock_callback+0} <ffffffffa02f2960>{:ptlrpc:ldlm_completion_ast+0} <ffffffffa0424ee0>{:lustre:ll_glimpse_callback+0} <ffffffffa0416c4f>{:lustre:ll_intent_drop_lock+143} <ffffffffa0431318>{:lustre:ll_inode_revalidate_it+1528} <ffffffffa044f360>{:lustre:ll_mdc_blocking_ast+0} <ffffffff8018e106>{dput+55} <ffffffff801859eb>{__link_path_walk+3928} <ffffffff80185b75>{link_path_walk+179} <ffffffffa04313c4>{:lustre:ll_getattr_it+36} <ffffffffa04314f5>{:lustre:ll_getattr+53} <ffffffff80180257>{vfs_getattr64_it+146} <ffffffff80180532>{vfs_lstat64+100} <ffffffff8016838d>{handle_mm_fault+354} <ffffffff801ea149>{__up_read+16} <ffffffff80123991>{do_page_fault+577} <ffffffffa0416c70>{:lustre:ll_intent_release+0} <ffffffff80180891>{sys_newlstat+17} <ffffffff8018a01c>{vfs_readdir+176} <ffffffff8018a428>{sys_getdents64+166} <ffffffff80110c2d>{error_exit+0} <ffffffff8011022a>{system_call+126}