thr3ads.net - Lustre discuss - [Lustre-discuss] Journalling problem? [Jul 2007]

If this information is useful, please help other people find it:
Share via:

John R. Dunning

2007-Jul-25 06:06 UTC

[Lustre-discuss] Journalling problem?

I''ve got another odd crash, hoping somebody will have a hint:

Testing various configurations of our lustre/ramdisk stuff.  I''ve come
up with
a reproducible case where I''m tripping an assertion failure in the
generic
transaction code.  The proximate cause seems to be running iozone with a
stripe size of 1MB.  The fs is striped 64K, so that ought to be turning into a
bunch of parallel writes.

I did some digging around and was all excited to find cfs bug 10891, because
it looked like it was fixing something like this, but it doesn''t seem
to make
a difference.

This is kernel 2.6.15, mips64, and still running lustre 1.6b7 (because the
vanilla kernel support still isn''t ready in 1.6.x).  I''m
hopeful that somebody
will recognize this and be able to point me at a fix, perhaps in the real 1.6.

TIA...



[4344894.240000] Assertion failure in journal_dirty_metadata() at
fs/jbd/transaction.c:1117: "handle->h_buffer_credits > 0"
[4344894.246000] Break instruction in kernel code[#1]:
[4344894.248000] Cpu 4
[4344894.249000] $ 0   : 0000000000000000 000000001480c4e0 0000000000000082
ffffffff80528e88
[4344894.257000] $ 4   : ffffffff80528eb8 ffffffff80530000 ffffffff80530000
ffffffff805f0000
[4344894.261000] $ 8   : ffffffff80530000 ffffffff80530000 ffffffff80530000
ffffffff805f0000
[4344894.265000] $12   : ffffffff805f0000 ffffffff805f0000 ffffffff805f0000
ffffffff805f0000
[4344894.269000] $16   : a8000000e8050c80 a8000000e96b3e88 a8000000e96467e8
a8000000ef4616b0
[4344894.272000] $20   : a8000000fa013640 ffffffff805f0000 0000000000001800
a8000000ef4616b0
[4344894.278000] $24   : ffffffff80530000 ffffffff80530000
[4344894.282000] $28   : a8000000f9ec0000 a8000000f9ecf190 a8000000f9ecf2a0
ffffffff80241138
[4344894.290000] Hi    : 0000000000000002
[4344894.292000] Lo    : 0000000000000000
[4344894.293000] epc   : ffffffff80241138 journal_dirty_metadata+0x170/0x598    
Tainted: P
[4344894.297000] ra    : ffffffff80241138 journal_dirty_metadata+0x170/0x598
[4344894.301000] Status: 1480c4e3    KX SX UX KERNEL EXL IE 
[4344894.304000] Cause : 00800024
[4344894.305000] PrId  : 040e1301
[4344894.306000] Modules linked in: osc mds fsfilt_ldiskfs mgs mgc lustre lov
mdc ksocklnd ksclnd ptlrpc obdclass lvfs ldiskfs lnet libcfs sceth scio scfab
[4344894.315000] Process ll_mdt_19 (pid: 1703, threadinfo=a8000000f9ec0000,
task=a8000000f02b4540)
[4344894.319000] Stack : a8000000e96b3e88 ffffffff805f0000 a8000000ef4616b0
a8000000e8e88168
[4344894.326000]         a8000000f9ecf2c0 0000000000000000 c0000000028ec5a8
c0000000028ec474
[4344894.330000]         0000000000000060 ffffffff8023e298 0000000000000040
0000000000000040
[4344894.334000]         00000000000072bf a800000011f3fae0 a800000011cc3f40
c000000000070000
[4344894.338000]         a8000000f9ecf220 c000000002a9bf48 0000000000000000
a8000000ef4e3180
[4344894.341000]         0000000000000000 a8000000f9ecf2c0 a8000000e96b3e30
0000000000000800
[4344894.347000]         a8000000e96c7800 0000000000001800 c0000000028eeef8
0000000000000000
[4344894.351000]         a8000000e96b3e30 0000000000000c00 0000000000000000
0000000000000000
[4344894.357000]         c000000008ecbefc a8000000e9316000 a800000011f3fae0
a8000000fa239700
[4344894.360000]         a8000000f9ecf2c0 c000000002aa08f8 ffffffff00000000
0000000000000005
[4344894.364000]         ...
[4344894.366000] Call Trace:
[4344894.367000]  [<c0000000028ec5a8>] ldiskfs_getblk+0x278/0x668
[ldiskfs]
[4344894.370000]  [<c0000000028ec474>] ldiskfs_getblk+0x144/0x668
[ldiskfs]
[4344894.372000]  [<ffffffff8023e298>] do_get_write_access+0xa40/0xdb8
[4344894.375000]  [<c000000002a9bf48>] llog_lvfs_write_blob+0x4f8/0xc08
[obdclass]
[4344894.378000]  [<c0000000028eeef8>] ldiskfs_bread+0x30/0x1e8 [ldiskfs]
[4344894.381000]  [<c000000008ecbefc>]
fsfilt_ldiskfs_write_record+0x22c/0xa68 [fsfilt_ldiskfs]
[4344894.386000]  [<c000000002aa08f8>] llog_lvfs_write_rec+0x1458/0x2918
[obdclass]
[4344894.389000]  [<c000000002a9ba50>] llog_lvfs_write_blob+0x0/0xc08
[obdclass]
[4344894.392000]  [<c000000002a9bf48>] llog_lvfs_write_blob+0x4f8/0xc08
[obdclass]
[4344894.397000]  [<c000000002a9379c>] llog_cat_new_log+0xff4/0x2350
[obdclass]
[4344894.400000]  [<c000000002a9ba50>] llog_lvfs_write_blob+0x0/0xc08
[obdclass]
[4344894.403000]  [<c000000002aa0004>] llog_lvfs_write_rec+0xb64/0x2918
[obdclass]
[4344894.411000]  [<ffffffff802b0000>] kobject_register+0x80/0xa0
[4344894.413000]  [<ffffffff8013b650>] __might_sleep+0x0/0x158
[4344894.415000]  [<ffffffff804513bc>] _spin_unlock_irqrestore+0x14/0x48
[4344894.418000]  [<c000000002a9788c>] llog_cat_current_log+0x82c/0xb98
[obdclass]
[4344894.420000]  [<ffffffff804514c8>] _spin_lock_irqsave+0x30/0x48
[4344894.423000]  [<ffffffff802b3548>] __up_write+0x40/0x2e8
[4344894.425000]  [<c000000002a90000>] llog_process+0x1f30/0x2e80
[obdclass]
[4344894.430000]  [<c000000002a98350>] llog_cat_add_rec+0x1a0/0x1530
[obdclass]
[4344894.433000]  [<c0000000028f0000>] ldiskfs_prepare_write+0x188/0x290
[ldiskfs]
[4344894.436000]  [<c0000000028eae14>]
ldiskfs_get_block_handle+0x104/0x12f8 [ldiskfs]
[4344894.441000]  [<c000000002aada54>] llog_obd_origin_add+0xec/0x5c0
[obdclass]
[4344894.444000]  [<ffffffff801ba884>] __find_get_block+0x1bc/0x418
[4344894.447000]  [<c0000000037d06b4>] _ldlm_lock_debug+0x1d4/0x838
[ptlrpc]
[4344894.456000]  [<c000000002aaa5cc>] llog_add+0xfc/0x960 [obdclass]
[4344894.459000]  [<c000000008dd0d88>] lov_llog_origin_add+0x168/0x5b8
[lov]
[4344894.461000]  [<ffffffff801bcaa0>] ll_rw_block+0x0/0x450
[4344894.464000]  [<c0000000028ec474>] ldiskfs_getblk+0x144/0x668
[ldiskfs]
[4344894.466000]  [<ffffffff801c0000>] end_buffer_read_sync+0x190/0x228
[4344894.469000]  [<c000000002aaa5cc>] llog_add+0xfc/0x960 [obdclass]
[4344894.471000]  [<c000000008e1cc68>] lov_alloc_memmd+0x178/0xf60 [lov]
[4344894.478000]  [<c000000008f70a4c>] mds_llog_origin_add+0x10c/0x430
[mds]
[4344894.481000]  [<c000000002aaa5cc>] llog_add+0xfc/0x960 [obdclass]
[4344894.483000]  [<ffffffff802b0000>] kobject_register+0x80/0xa0
[4344894.490000]  [<c000000008f77660>] mds_log_op_unlink+0x19f0/0x3cc0
[mds]
[4344894.493000]  [<c000000008f774bc>] mds_log_op_unlink+0x184c/0x3cc0
[mds]
[4344894.495000]  [<c0000000028ffcc8>] __ldiskfs_journal_stop+0x48/0xa0
[ldiskfs]
[4344894.498000]  [<ffffffff8020baf0>] dnotify_parent+0x60/0x198
[4344894.502000]  [<c0000000028f8b88>] ldiskfs_unlink+0xf8/0x340 [ldiskfs]
[4344894.505000]  [<c000000003874730>] lustre_msg_buf+0x0/0x158 [ptlrpc]
[4344894.508000]  [<c000000003883030>] lustre_msg_buflen+0x0/0x1f8
[ptlrpc]
[4344894.513000]  [<c000000008ffa91c>] mds_reint_unlink+0x3aec/0x65a0
[mds]
[4344894.516000]  [<c000000002a5cf70>]
upcall_cache_get_entry+0x14b8/0x1fe8 [lvfs]
[4344894.518000]  [<ffffffff801e3c4c>] dput+0x9c/0x3d0
[4344894.525000]  [<c000000008fd921c>] mds_reint_rec+0x1fc/0x758 [mds]
[4344894.527000]  [<c0000000090498f8>] mds_unlink_unpack+0x1c8/0x838 [mds]
[4344894.530000]  [<c0000000090498cc>] mds_unlink_unpack+0x19c/0x838 [mds]
[4344894.533000]  [<c000000008facf80>] mds_reint+0x568/0xda8 [mds]
[4344894.539000]  [<c000000003870000>] lustre_msg_buf_v2+0x118/0x4d0
[ptlrpc]
[4344894.542000]  [<c000000008fc1830>] mds_handle+0x5ed8/0x12dc0 [mds]
[4344894.545000]  [<ffffffff8010c99c>] do_gettimeofday+0xf4/0x220
[4344894.551000]  [<c000000000065e58>] libcfs_debug_vmsg2+0x330/0xfd0
[libcfs]
[4344894.554000]  [<c000000000065c00>] libcfs_debug_vmsg2+0xd8/0xfd0
[libcfs]
[4344894.557000]  [<c000000003895c24>] ptlrpc_main+0x352c/0x48f8 [ptlrpc]
[4344894.564000]  [<c000000003888fc0>] ptlrpc_retry_rqbds+0x0/0x10
[ptlrpc]
[4344894.567000]  [<c000000003888fc0>] ptlrpc_retry_rqbds+0x0/0x10
[ptlrpc]
[4344894.569000]  [<ffffffff8013cde8>] default_wake_function+0x0/0x20
[4344894.572000]  [<ffffffff80108680>] kernel_thread_helper+0x10/0x18
[4344894.579000]  [<ffffffff80108670>] kernel_thread_helper+0x0/0x18
[4344894.582000] 
[4344894.582000] 
[4344894.583000] Code: 2407045d  0c051fac  0102402d <0200000d> 8e620008 
2442ffff  ae620008  de420028  120200b7

John R. Dunning

2007-Jul-25 07:31 UTC

head link

[Lustre-discuss] Journalling problem?

From: "John R. Dunning" <jrd@sicortex.com>
    Date: Wed, 25 Jul 2007 08:06:29 -0400
    
    I''ve got another odd crash, hoping somebody will have a hint:

Dug a little deeper, and am more confused:

In lustre/ldiskfs/inode.c:

struct buffer_head *ldiskfs_getblk(handle_t *handle, struct inode * inode,
				long block, int create, int * errp)

around line 875:

	dummy.b_blocknr = -1000;

around line 880:

	if (!*errp && buffer_mapped(&dummy)) {
		struct buffer_head *bh;
		bh = sb_getblk(inode->i_sb, dummy.b_blocknr);

and around line 902

			BUFFER_TRACE(bh, "call ldiskfs_journal_dirty_metadata");
			err = ldiskfs_journal_dirty_metadata(handle, bh);
			if (!fatal)
				fatal = err;

It''s in the call to ldiskfs_journal_dirty_metadata that I seem to be
blowing
the ASSERT.

Chasing down the call to sb_getblk, in kernel/fs/buffer.c, around line 1451, I
see

/*
 * __getblk will locate (and, if necessary, create) the buffer_head
 * which corresponds to the passed block_device, block and size. The
 * returned buffer has its reference count incremented.
 *
 * __getblk() cannot fail - it just keeps trying.  If you pass it an
 * illegal block number, __getblk() will happily return a buffer_head
 * which represents the non-existent block.  Very weird.
 *
 * __getblk() will lock up the machine if grow_dev_page''s
try_to_free_buffers()
 * attempt is failing.  FIXME, perhaps?
 */
struct buffer_head *
__getblk(struct block_device *bdev, sector_t block, int size)
{
	struct buffer_head *bh = __find_get_block(bdev, block, size);

	might_sleep();
	if (bh == NULL)
		bh = __getblk_slow(bdev, block, size);
	return bh;
}

>From the ldiskfs code, it looks to me like we''re intentionally
calling__getblk with a bogus block number.  But the comment in __getblk pretty
clearly states that calling it that way will result in a buffer_head pointing
to that illegal block.  Since we then turn around and pass that resultant
buffer_head (returned from sb_getblk) into ldiskfs_journal_dirty_metadata,
that looks to me like a plausible way to end up with a busted bh, and hence
the assertion failure.

Anybody else find this suspect?  Am I missing something?  TIA...

Lustre discuss - Jul 2007 - Journalling problem?

[Lustre-discuss] Journalling problem?

[Lustre-discuss] Journalling problem?