Hi List, I was wondering if any of you guys got hit by https://bugzilla.lustre.org/show_bug.cgi?id=22177 We are running 1.8.2 and over the holidays we had an MDS crash and saw the following show up a couple of times in our logs: mds kernel: LustreError: 6295:0:(mds_reint.c:1772:mds_orphan_add_link()) ASSERTION(inode->i_nlink == 2) failed: dir nlink == 1 There was associated ext fs corruption and difficulty remounting the MDS without crashes. I''m trying to verify that we are in fact running into 22177 before declaring the case closed and ruling out other possibilities. If you ran into this bug, does this sound like what you experienced? Mark -- Mark Nelson, HPC Systems Administrator Minnesota Supercomputing Institute Phone: (612)626-4479 Email: mark at msi.umn.edu
Mark> I was wondering if any of you guys got hit by > https://bugzilla.lustre.org/show_bug.cgi?id=22177 > > We are running 1.8.2 and over the holidays we had an MDS crash and saw > the following show up a couple of times in our logs: > > mds kernel: LustreError: 6295:0:(mds_reint.c:1772:mds_orphan_add_link()) > ASSERTION(inode->i_nlink == 2) failed: dir nlink == 1 > > There was associated ext fs corruption and difficulty remounting the MDS > without crashes. I''m trying to verify that we are in fact running into > 22177 before declaring the case closed and ruling out other > possibilities. If you ran into this bug, does this sound like what you > experienced?That definitely looks like bug 22177, from the limited information provided. Note that this issue is fixed in Lustre v1.8.3 (or you can apply the patches from but 22177)
Hi Rick, Yeah, it definitely seems like that''s what we are hitting based on the information provided in the bug report. In the short term we are working with our vendor to deploy a patched 1.8.2 to the MDS. Later we''ll migrate to 1.8.5 after we can do a bit more testing. We''d like to avoid a forced migration if possible. Unfortunately our MDS just went down again. I imagine that one of our users must have repeated whatever set of operations that brought it down the first time. Thanks, Mark On 01/03/2011 04:50 PM, Rick Grubin wrote:> That definitely looks like bug 22177, from the limited information provided. > > Note that this issue is fixed in Lustre v1.8.3 (or you can apply the > patches from but 22177) > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- Mark Nelson, HPC Systems Administrator Minnesota Supercomputing Institute Phone: (612)626-4479 Email: mark at msi.umn.edu