cwalker at fas.harvard.edu
2008-Oct-25 15:03 UTC
[Lustre-discuss] clients hanging -- found exisiting inode...
Hello, We''re having problems with clients hanging with the following messages on the client: Oct 25 07:22:55 herologin1 kernel: LustreError: 16556:0 (osc_request.c:2866:osc_set_data_with_check()) ## # inconsistent l_ast_data found ns: circelfs-OST0017-osc-ffff81021d4dbc00 lock: ffff81017dc9a600/0x45756a e33592b057 lrc: 3/1,0 mode: PR/PR res: 690850/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->184467 44073709551615) flags: 100000 remote: 0x310e24cdd40141c8 expref: -99 pid: 16385 Oct 25 07:22:55 herologin1 kernel: LustreError: 16556:0:(osc_request.c:2872:osc_set_data_with_check()) AS SERTION(old_inode->i_state & I_FREEING) failed:Found existing inode ffff810171540638/130588871/1546649987 state 1 in lock: setting data to ffff81015f5275f8/130588869/1546649985 Oct 25 07:22:55 herologin1 kernel: LustreError: 16556:0:(osc_request.c:2872:osc_set_data_with_check()) LB followed by the client hanging. Nothing appears on the MDS or on the OSS in question. These symptoms were reported by another user, but there was no resolution or workaround. We''re getting this about once per day on our head nodes. Has anyone had any luck with this issue? Thanks, Chris
Andreas Dilger
2008-Oct-27 19:44 UTC
[Lustre-discuss] clients hanging -- found exisiting inode...
On Oct 25, 2008 11:03 -0400, cwalker at fas.harvard.edu wrote:> We''re having problems with clients hanging with the following messages on the > client: > > Oct 25 07:22:55 herologin1 kernel: LustreError: 16556:0 > (osc_request.c:2866:osc_set_data_with_check()) ## > # inconsistent l_ast_data found ns: circelfs-OST0017-osc-ffff81021d4dbc00 lock: > ffff81017dc9a600/0x45756a > e33592b057 lrc: 3/1,0 mode: PR/PR res: 690850/0 rrc: 2 type: EXT > [0->18446744073709551615] (req 0->184467 > 44073709551615) flags: 100000 remote: 0x310e24cdd40141c8 expref: -99 pid: 16385 > Oct 25 07:22:55 herologin1 kernel: LustreError: > 16556:0:(osc_request.c:2872:osc_set_data_with_check()) AS > SERTION(old_inode->i_state & I_FREEING) failed:Found existing inode > ffff810171540638/130588871/1546649987 > state 1 in lock: setting data to ffff81015f5275f8/130588869/1546649985 > Oct 25 07:22:55 herologin1 kernel: LustreError: > 16556:0:(osc_request.c:2872:osc_set_data_with_check()) LB > > > followed by the client hanging. Nothing appears on the MDS or on the OSS in > question. These symptoms were reported by another user, but there was no > resolution or workaround. We''re getting this about once per day on our head > nodes. Has anyone had any luck with this issue?This means you may have a corrupted back-end filesystem. The inode 130588871 and inode 130588869 both are using the same OST object ID. If the user knows which files they are accessing then the easiest soltion is to just delete those two files. Failing that, you can find out the pathnames for these files on the MDS with: debugfs -c -R "ncheck 130588871 130588869" /dev/{mdsdev} and take off the "/ROOT" part of the pathname provided. If you want to make a copy of the file (one of them will likely be corrupted, or a duplicate of the other) then make a copy of ONE file on one node, and the OTHER file on another node and then delete both of the files on their respective nodes. Accessing both files on a single node will trigger this assertion again. The "lfsck" tool will also detect and fix this, but it is much slower than doing it by hand unless there is a large amount of corruption. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.