Lustre 1.6.4.1 on Ubuntu Dapper with Debian 2.6.18 AMD64 kernel. MDS LBUG:ed with: -------------8<-------------------- Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink == 1) failed:dir nlink == 0 Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG Jan 12 10:39:40 Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 6198 Jan 12 10:39:41 LustreError: dumping log to /tmp/lustre-log.1200130781.6198 -------------8<-------------------- I also have the lustre-log.1200130781.6198, but it seems to contain binary data so I''ll supply it only if it''s needed. The following triggered the bug: - mkdir rfiles - in rfiles create 300000 files of random size 0-32k - rm -rf rfiles & - sleep 600 (ie. wait until you get bored and the rm isn''t finished). - rm -rf rfiles & This suggests that something isn''t locked properly since two concurrent rm''s in a directory definitely shouldn''t cause the MDS so fall over... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se --------------------------------------------------------------------------- CHOCOLATE: The other major food group. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
On Mon, 14 Jan 2008, Niklas Edmundsson wrote:> Lustre 1.6.4.1 on Ubuntu Dapper with Debian 2.6.18 AMD64 kernel. MDS > LBUG:ed with: > > -------------8<-------------------- > Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink == 1) failed:dir nlink == 0 > Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG > Jan 12 10:39:40 Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 6198 > Jan 12 10:39:41 LustreError: dumping log to /tmp/lustre-log.1200130781.6198 > -------------8<--------------------Ahem. It seems I got a little carried away with grep there and missed the stack trace. This should be more complete: ---------------8<--------------- Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink == 1) failed:dir nlink == 0 Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG Jan 12 10:39:40 Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 6198 Jan 12 10:39:40 ll_mdt_22 R running task 0 6198 1 6199 6197 (L-TLB) Jan 12 10:39:40 343836365b3e343c 0036373832382e32 0000383338373433 0000000000000246 Jan 12 10:39:40 ffff8100f0697560 0000000000000018 343836365b3e303c 3030313132382e32 Jan 12 10:39:40 ffffffffff00205d ffff8100f06976fa ffff8101706976ef ffffffff805172e0 Jan 12 10:39:40 Call Trace: Jan 12 10:39:40 [<ffffffff80315c71>] vsnprintf+0x5b1/0x630 Jan 12 10:39:40 [<ffffffff8021d470>] physflat_send_IPI_mask+0x0/0x80 Jan 12 10:39:40 [<ffffffff802360ef>] vprintk+0x2ef/0x320 Jan 12 10:39:40 [<ffffffff8022b923>] __wake_up_common+0x43/0x80 Jan 12 10:39:40 [<ffffffff8022b923>] __wake_up_common+0x43/0x80 Jan 12 10:39:40 [<ffffffff8023616e>] printk+0x4e/0x60 Jan 12 10:39:40 [<ffffffff802360ef>] vprintk+0x2ef/0x320 Jan 12 10:39:40 [<ffffffff802360ef>] vprintk+0x2ef/0x320 Jan 12 10:39:40 [<ffffffff802360ef>] vprintk+0x2ef/0x320 Jan 12 10:39:40 [<ffffffff80256450>] kallsyms_lookup+0xf0/0x230 Jan 12 10:39:40 [<ffffffff80256450>] kallsyms_lookup+0xf0/0x230 Jan 12 10:39:40 [<ffffffff8020b090>] printk_address+0xb0/0xc0 Jan 12 10:39:40 [<ffffffff8023616e>] printk+0x4e/0x60 Jan 12 10:39:40 [<ffffffff80255c2a>] module_text_address+0x3a/0x50 Jan 12 10:39:40 [<ffffffff802491da>] kernel_text_address+0x1a/0x30 Jan 12 10:39:40 [<ffffffff802491da>] kernel_text_address+0x1a/0x30 Jan 12 10:39:40 [<ffffffff8020b4cc>] show_trace+0x21c/0x250 Jan 12 10:39:40 [<ffffffff8020b5ea>] _show_stack+0xea/0x100 Jan 12 10:39:40 [<ffffffff883f3a0a>] :libcfs:lbug_with_loc+0x7a/0xc0 Jan 12 10:39:40 [<ffffffff8871bb01>] :mds:mds_orphan_add_link+0x641/0x7e0 Jan 12 10:39:40 [<ffffffff883cabfd>] :ldiskfs:__ldiskfs_journal_stop+0x2d/0x60 Jan 12 10:39:40 [<ffffffff802cb55b>] dnotify_parent+0x2b/0xa0 Jan 12 10:39:40 [<ffffffff802a81a3>] dput+0x23/0x170 Jan 12 10:39:40 [<ffffffff8871d498>] :mds:mds_reint_unlink+0x17f8/0x25f0 Jan 12 10:39:40 [<ffffffff8850ec47>] :ptlrpc:ptlrpc_prep_set+0x2c7/0x360 Jan 12 10:39:40 [<ffffffff802a81a3>] dput+0x23/0x170 Jan 12 10:39:40 [<ffffffff8870f7b9>] :mds:mds_reint_rec+0x1d9/0x2b0 Jan 12 10:39:40 [<ffffffff887357cc>] :mds:mds_unlink_unpack+0x29c/0x3c0 Jan 12 10:39:40 [<ffffffff884e6f91>] :ptlrpc:ldlm_run_cp_ast_work+0x171/0x200 Jan 12 10:39:40 [<ffffffff88734624>] :mds:mds_update_unpack+0x214/0x2b0 Jan 12 10:39:40 [<ffffffff886ff971>] :mds:mds_reint+0x4b1/0x5a0 Jan 12 10:39:40 [<ffffffff885201cf>] :ptlrpc:lustre_msg_get_version+0x4f/0x100 Jan 12 10:39:40 [<ffffffff8870beea>] :mds:mds_handle+0x2fca/0x5f88 Jan 12 10:39:40 [<ffffffff884ff878>] :ptlrpc:ldlm_cli_cancel+0x298/0x2c0 Jan 12 10:39:40 [<ffffffff802899d0>] __drain_alien_cache+0x60/0x90 Jan 12 10:39:40 [<ffffffff8022e812>] find_busiest_group+0x252/0x6c0 Jan 12 10:39:40 [<ffffffff8848ae45>] :obdclass:class_handle2object+0xd5/0x160 Jan 12 10:39:40 [<ffffffff8851c480>] :ptlrpc:lustre_swab_ptlrpc_body+0x0/0x90 Jan 12 10:39:40 [<ffffffff88521155>] :ptlrpc:lustre_swab_buf+0xc5/0xf0 Jan 12 10:39:40 [<ffffffff8852710a>] :ptlrpc:ptlrpc_server_handle_request+0xc8a/0x1460 Jan 12 10:39:40 [<ffffffff80416d20>] thread_return+0x0/0x100 Jan 12 10:39:40 [<ffffffff8020df9e>] do_gettimeofday+0x5e/0xb0 Jan 12 10:39:40 [<ffffffff883fbf06>] :libcfs:lcw_update_time+0x16/0x100 Jan 12 10:39:40 [<ffffffff8023f309>] lock_timer_base+0x29/0x60 Jan 12 10:39:40 [<ffffffff8023f7f0>] __mod_timer+0xc0/0xf0 Jan 12 10:39:40 [<ffffffff8852933c>] :ptlrpc:ptlrpc_main+0x85c/0x9e0 Jan 12 10:39:40 [<ffffffff8022f490>] default_wake_function+0x0/0x10 Jan 12 10:39:40 [<ffffffff8020ac4c>] child_rip+0xa/0x12 Jan 12 10:39:41 [<ffffffff88528ae0>] :ptlrpc:ptlrpc_main+0x0/0x9e0 Jan 12 10:39:41 [<ffffffff8020ac42>] child_rip+0x0/0x12 Jan 12 10:39:41 Jan 12 10:39:41 LustreError: dumping log to /tmp/lustre-log.1200130781.6198 ---------------8<---------------> I also have the lustre-log.1200130781.6198, but it seems to contain > binary data so I''ll supply it only if it''s needed. > > The following triggered the bug: > - mkdir rfiles > - in rfiles create 300000 files of random size 0-32k > - rm -rf rfiles & > - sleep 600 (ie. wait until you get bored and the rm isn''t finished). > - rm -rf rfiles & > > This suggests that something isn''t locked properly since two > concurrent rm''s in a directory definitely shouldn''t cause the MDS so > fall over.../Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se --------------------------------------------------------------------------- An Elephant Is Just A Mouse Built To Gov''t Specs! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
On Mon, Jan 14, 2008 at 08:02:43AM +0100, Niklas Edmundsson wrote:> Lustre 1.6.4.1 on Ubuntu Dapper with Debian 2.6.18 AMD64 kernel. MDS > LBUG:ed with: > > -------------8<-------------------- > Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink == 1) failed:dir nlink == 0 > Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG > Jan 12 10:39:40 Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 6198 > Jan 12 10:39:41 LustreError: dumping log to /tmp/lustre-log.1200130781.6198The debian kernel maintainers have probably merged the ext3_link() patch to return -ENOENT when inode->i_nlink is equal to 0. Please note that this patch is included in the RHEL5 kernels (and our RHEL5 series handles this), but not in the 2.6.18.8 vanilla kernel. To fix this, you should add ext3-unlink-race.patch to the 2.6.18 ldiskfs series. Johann
On Mon, 14 Jan 2008, Johann Lombardi wrote:> On Mon, Jan 14, 2008 at 08:02:43AM +0100, Niklas Edmundsson wrote: >> Lustre 1.6.4.1 on Ubuntu Dapper with Debian 2.6.18 AMD64 kernel. MDS >> LBUG:ed with: >> >> -------------8<-------------------- >> Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink == 1) failed:dir nlink == 0 >> Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG >> Jan 12 10:39:40 Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 6198 >> Jan 12 10:39:41 LustreError: dumping log to /tmp/lustre-log.1200130781.6198 > > The debian kernel maintainers have probably merged the ext3_link() patch to > return -ENOENT when inode->i_nlink is equal to 0. Please note that this patch > is included in the RHEL5 kernels (and our RHEL5 series handles this), but not in > the 2.6.18.8 vanilla kernel. > To fix this, you should add ext3-unlink-race.patch to the 2.6.18 ldiskfs series.Hmm, ext3-unlink-race.patch didn''t apply at all, and looking manually I see no obvious place to apply it to. Diffing the ext3-trees between kernel.org 2.6.18.8 and debian 2.6.18 I see no patch that obviously touches ext3_link/ENOENT/i_nlink: ---------------------8<---------------------------- diff -rpu /scratch/linux-2.6.18.8/fs/ext3/dir.c ./dir.c --- /scratch/linux-2.6.18.8/fs/ext3/dir.c 2007-02-24 00:52:30.000000000 +0100 +++ ./dir.c 2007-12-22 03:24:00.000000000 +0100 @@ -151,6 +151,9 @@ static int ext3_readdir(struct file * fi ext3_error (sb, "ext3_readdir", "directory #%lu contains a hole at offset %lu", inode->i_ino, (unsigned long)filp->f_pos); + /* corrupt size? Maybe no more blocks to read */ + if (filp->f_pos > inode->i_blocks << 9) + break; filp->f_pos += sb->s_blocksize - offset; continue; } diff -rpu /scratch/linux-2.6.18.8/fs/ext3/namei.c ./namei.c --- /scratch/linux-2.6.18.8/fs/ext3/namei.c 2007-02-24 00:52:30.000000000 +0100 +++ ./namei.c 2007-12-22 03:24:00.000000000 +0100 @@ -551,6 +551,15 @@ static int htree_dirblock_to_tree(struct dir->i_sb->s_blocksize - EXT3_DIR_REC_LEN(0)); for (; de < top; de = ext3_next_entry(de)) { + if (!ext3_check_dir_entry("htree_dirblock_to_tree", dir, de, bh, + (block<<EXT3_BLOCK_SIZE_BITS(dir->i_sb)) + +((char *)de - bh->b_data))) { + /* On error, skip the f_pos to the next block. */ + dir_file->f_pos = (dir_file->f_pos | + (dir->i_sb->s_blocksize - 1)) + 1; + brelse (bh); + return count; + } ext3fs_dirhash(de->name, de->name_len, hinfo); if ((hinfo->hash < start_hash) || ((hinfo->hash == start_hash) && ---------------------8<---------------------------- So I think that this bug is most likely present when using vanilla kernel.org 2.6.18.8 too... Thoughts/suggestions? My gut feeling is that the MDS code is relying on some corner case behaviour of ext3, and that this behaviour is changing with newer kernels... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at acc.umu.se --------------------------------------------------------------------------- I am Homer of Borg. Prepare to be.....ooooh donuts... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Hello Niklas, On Monday 21 January 2008 08:09:35 Niklas Edmundsson wrote:> On Mon, 14 Jan 2008, Johann Lombardi wrote: > > On Mon, Jan 14, 2008 at 08:02:43AM +0100, Niklas Edmundsson wrote: > >> Lustre 1.6.4.1 on Ubuntu Dapper with Debian 2.6.18 AMD64 kernel. MDS > >> LBUG:ed with: > >> > >> -------------8<-------------------- > >> Jan 12 10:39:40 LustreError: > >> 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink > >> == 1) failed:dir nlink == 0 Jan 12 10:39:40 LustreError: > >> 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG Jan 12 10:39:40 > >> Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing > >> stack for process 6198 Jan 12 10:39:41 LustreError: dumping log to > >> /tmp/lustre-log.1200130781.6198 > > > > The debian kernel maintainers have probably merged the ext3_link() patch > > to return -ENOENT when inode->i_nlink is equal to 0. Please note that > > this patch is included in the RHEL5 kernels (and our RHEL5 series handles > > this), but not in the 2.6.18.8 vanilla kernel. > > To fix this, you should add ext3-unlink-race.patch to the 2.6.18 ldiskfs > > series. > > Hmm, ext3-unlink-race.patch didn''t apply at all, and looking manually > I see no obvious place to apply it to. > > Diffing the ext3-trees between kernel.org 2.6.18.8 and debian 2.6.18 I > see no patch that obviously touches ext3_link/ENOENT/i_nlink: > > ---------------------8<---------------------------- > diff -rpu /scratch/linux-2.6.18.8/fs/ext3/dir.c ./dir.c > --- /scratch/linux-2.6.18.8/fs/ext3/dir.c 2007-02-24 > 00:52:30.000000000 +0100 +++ ./dir.c 2007-12-22 03:24:00.000000000 > +0100 > @@ -151,6 +151,9 @@ static int ext3_readdir(struct file * fi > ext3_error (sb, "ext3_readdir", > "directory #%lu contains a hole at offset > %lu", inode->i_ino, (unsigned long)filp->f_pos); + /* > corrupt size? Maybe no more blocks to read */ + if > (filp->f_pos > inode->i_blocks << 9) > + break; > filp->f_pos += sb->s_blocksize - offset; > continue; > } > diff -rpu /scratch/linux-2.6.18.8/fs/ext3/namei.c ./namei.c > --- /scratch/linux-2.6.18.8/fs/ext3/namei.c 2007-02-24 > 00:52:30.000000000 +0100 +++ ./namei.c 2007-12-22 03:24:00.000000000 > +0100 > @@ -551,6 +551,15 @@ static int htree_dirblock_to_tree(struct > dir->i_sb->s_blocksize - > EXT3_DIR_REC_LEN(0)); > for (; de < top; de = ext3_next_entry(de)) { > + if (!ext3_check_dir_entry("htree_dirblock_to_tree", dir, > de, bh, + > (block<<EXT3_BLOCK_SIZE_BITS(dir->i_sb)) + > +((char *)de - bh->b_data))) { + /* On > error, skip the f_pos to the next block. */ + > dir_file->f_pos = (dir_file->f_pos | > + (dir->i_sb->s_blocksize - 1)) + 1; > + brelse (bh); > + return count; > + } > ext3fs_dirhash(de->name, de->name_len, hinfo); > if ((hinfo->hash < start_hash) || > ((hinfo->hash == start_hash) && > ---------------------8<---------------------------- > > So I think that this bug is most likely present when using vanilla > kernel.org 2.6.18.8 too... > > Thoughts/suggestions? > > My gut feeling is that the MDS code is relying on some corner case > behaviour of ext3, and that this behaviour is changing with newer > kernels...Could you try this patch, this is what we are using and what should be in debians lustre svn diff -r a1bf8dcdfe1f lustre/mds/mds_reint.c --- a/lustre/mds/mds_reint.c Mon Jul 09 17:00:16 2007 +0200 +++ b/lustre/mds/mds_reint.c Mon Jul 09 17:01:04 2007 +0200 @@ -1481,7 +1481,12 @@ static int mds_orphan_add_link(struct md * for linking and return real mode back then -bzzz */ mode = inode->i_mode; inode->i_mode = S_IFREG; + + /* 2.6.21 will refuse to add a link of inode->i_nlink == 0 */ + inode->i_nlink = 1; rc = vfs_link(dentry, pending_dir, pending_child); + inode->i_nlink--; + mark_inode_dirty(inode); if (rc) CERROR("error linking orphan %s to PENDING: rc = %d\n", rec->ur_name, rc); I didn''t like the ext3-unlink-race.patch, it removes sanity checks someone certainly added for good reasons and therefore I introduced this patch. Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH
On Mon, 21 Jan 2008, Bernd Schubert wrote:>> Hmm, ext3-unlink-race.patch didn''t apply at all, and looking manually >> I see no obvious place to apply it to. >> >> Diffing the ext3-trees between kernel.org 2.6.18.8 and debian 2.6.18 I >> see no patch that obviously touches ext3_link/ENOENT/i_nlink:<snip>>> So I think that this bug is most likely present when using vanilla >> kernel.org 2.6.18.8 too... >> >> Thoughts/suggestions? >> >> My gut feeling is that the MDS code is relying on some corner case >> behaviour of ext3, and that this behaviour is changing with newer >> kernels... > > Could you try this patch, this is what we are using and what should be in > debians lustre svnI''m building using the pkg-lustre/trunk repository, so I already had this patch when the MDS bug:ed out.> diff -r a1bf8dcdfe1f lustre/mds/mds_reint.c > --- a/lustre/mds/mds_reint.c Mon Jul 09 17:00:16 2007 +0200 > +++ b/lustre/mds/mds_reint.c Mon Jul 09 17:01:04 2007 +0200 > @@ -1481,7 +1481,12 @@ static int mds_orphan_add_link(struct md > * for linking and return real mode back then -bzzz */ > mode = inode->i_mode; > inode->i_mode = S_IFREG; > + > + /* 2.6.21 will refuse to add a link of inode->i_nlink == 0 */ > + inode->i_nlink = 1; > rc = vfs_link(dentry, pending_dir, pending_child); > + inode->i_nlink--; > + mark_inode_dirty(inode); > if (rc) > CERROR("error linking orphan %s to PENDING: rc = %d\n", > rec->ur_name, rc);> I didn''t like the ext3-unlink-race.patch, it removes sanity checks someone > certainly added for good reasons and therefore I introduced this patch.In general it seems to me that a good chunk of the lustre code depends on specific versions of ext3 behaviour, which coupled with the fact that it''s not shipped with lustre but patched based on the kernel version of ext3 only seems to produce surprises at regular intervals. I really can''t wait until this gets into userspace, which if I have understood correctly will happen with the lustre 1.8/ZFS-thingie... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se --------------------------------------------------------------------------- I''m in shape ... pear is a shape isn''t it? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
On Monday 21 January 2008 13:02:43 Niklas Edmundsson wrote:> On Mon, 21 Jan 2008, Bernd Schubert wrote: > >> Hmm, ext3-unlink-race.patch didn''t apply at all, and looking manually > >> I see no obvious place to apply it to. > >> > >> Diffing the ext3-trees between kernel.org 2.6.18.8 and debian 2.6.18 I > >> see no patch that obviously touches ext3_link/ENOENT/i_nlink: > > <snip> > > >> So I think that this bug is most likely present when using vanilla > >> kernel.org 2.6.18.8 too... > >> > >> Thoughts/suggestions? > >> > >> My gut feeling is that the MDS code is relying on some corner case > >> behaviour of ext3, and that this behaviour is changing with newer > >> kernels... > > > > Could you try this patch, this is what we are using and what should be in > > debians lustre svn > > I''m building using the pkg-lustre/trunk repository, so I already had > this patch when the MDS bug:ed out.Have you checked the patch is really applied? I''m not always closely following debians svn and Goswin who is usually syncing our patches with debians was the last weeks rather busy with other things (as was I). [...]> In general it seems to me that a good chunk of the lustre code depends > on specific versions of ext3 behaviour, which coupled with the fact > that it''s not shipped with lustre but patched based on the kernel > version of ext3 only seems to produce surprises at regular intervals. > I really can''t wait until this gets into userspace, which if I have > understood correctly will happen with the lustre 1.8/ZFS-thingie...Sure, in userspace it probably will be much easier. Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH
On Mon, 21 Jan 2008, Bernd Schubert wrote:>> I''m building using the pkg-lustre/trunk repository, so I already had >> this patch when the MDS bug:ed out. > > Have you checked the patch is really applied? I''m not always closely following > debians svn and Goswin who is usually syncing our patches with debians was > the last weeks rather busy with other things (as was I).Yup. At least mds_reint.c has the chunk that the patch adds. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se --------------------------------------------------------------------------- Honey, PLEASE don''t pickup the p?????????????? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Hello,> > I''m building using the pkg-lustre/trunk repository, so I already had > > this patch when the MDS bug:ed out. > > Have you checked the patch is really applied? I''m not always closely > following debians svn and Goswin who is usually syncing our patches with > debians was the last weeks rather busy with other things (as was I).Yes, this patch should be applied. So this couldn''t be the issue. Greetings Patrick Winnertz -- Patrick Winnertz Tel.: +49 (0) 2161 / 4643 - 0 credativ GmbH, HRB M?nchengladbach 12080 Hohenzollernstr. 133, 41061 M?nchengladbach Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz
On Monday 21 January 2008 15:27:34 Niklas Edmundsson wrote:> On Mon, 21 Jan 2008, Bernd Schubert wrote: > >> I''m building using the pkg-lustre/trunk repository, so I already had > >> this patch when the MDS bug:ed out. > > > > Have you checked the patch is really applied? I''m not always closely > > following debians svn and Goswin who is usually syncing our patches with > > debians was the last weeks rather busy with other things (as was I). > > Yup. At least mds_reint.c has the chunk that the patch adds.Looking closer into this I see your error message would be different if vfs_link() would fail, you would run into CERROR("error linking orphan %s to PENDING: rc = %d\n", rec->ur_name, rc); But now I don''t understand whats going on. /* 2.6.21 will refuse to add a link of inode->i_nlink == 0 */ inode->i_nlink = 1; rc = vfs_link(dentry, pending_dir, pending_child); If vfs_link() succeeds we should have inode->i_nlink==2 now, but in your case it seems to be inode->i_nlink==1. Or did you also get the "error linking orphan ..." message? Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH
On Mon, Jan 21, 2008 at 08:09:35AM +0100, Niklas Edmundsson wrote:> Hmm, ext3-unlink-race.patch didn''t apply at all, and looking manually > I see no obvious place to apply it to.ok. Are you able to reproduce the problem with a vanilla 2.6.18.8 kernel (from kernel.org) and a stock 1.6.4.1 (typically w/o Bernd''s mds_orphan_add_link patch)? Johann