hello! i'm using redhat 7.2 with ext3 as my primary fs on kernel 2.4.17 + grsecurity + acl after 2-3 days of uptime i'm expiriencing problems... i attached below excert from my system logs. machine stops responing for a few seconds and after then it looks, like it's in normal operation again. the only problem is load, which is incrementing constantly, but cpu is 99% idle... after few minutes i cannot login again ... machine: pentium 200, 72 mb of ram, eide disks (brand new), dcom ethernet cards what's wrong? brane Jan 16 12:37:37 frost kernel: Assertion failure in journal_start() at transaction.c:225: "handle->h_transaction->t_journal == journal"Jan 16 12:37:37 frost kernel: invalid operand: 0000 Jan 16 12:37:37 frost kernel: CPU: 0 Jan 16 12:37:38 frost kernel: EIP: 0010:[journal_start+79/208] Not tainted Jan 16 12:37:38 frost kernel: EIP: 0010:[<c015f7ef>] Not tainted Jan 16 12:37:38 frost kernel: EFLAGS: 00010296 Jan 16 12:37:38 frost kernel: eax: 0000006c ebx: c1788480 ecx: 00000005 edx: 00000000 Jan 16 12:37:38 frost kernel: esi: c3f94000 edi: c3bfbe00 ebp: c11c7c00 esp: c3f95d54 Jan 16 12:37:38 frost kernel: ds: 0018 es: 0018 ss: 0018 Jan 16 12:37:38 frost kernel: Process rm (pid: 30409, stackpage=c3f95000) Jan 16 12:37:38 frost kernel: Stack: c020b860 c020f769 c020ff56 000000e1 c020d820 c1788480 c1788480 ffffffe2 Jan 16 12:37:38 frost kernel: c3bfbe00 c395a120 c0157be8 c11c7c00 00000001 c01601ae c1788480 c1cdf8b0 Jan 16 12:37:38 frost kernel: 00000000 c3bfbe00 c3d57000 00000001 c014160e c3bfbe00 00000000 00000020 Jan 16 12:37:38 frost kernel: Call Trace: [ext3_dirty_inode+88/208] [do_get_write_access+1198/1232] [__mark_inod e_dirty+46/128] [generic_file_write+836/1696] [do_get_write_access+1198/1232] Jan 16 12:37:38 frost kernel: Call Trace: [<c0157be8>] [<c01601ae>] [<c014160e>] [<c0126494>] [<c01601ae>] Jan 16 12:37:38 frost kernel: [write_dquot+165/256] [dqput+124/240] [dquot_drop+62/80] [ext3_free_inode+230/1 040] [ext3_mark_iloc_dirty+36/80] [ext3_mark_iloc_dirty+53/80] Jan 16 12:37:38 frost kernel: [<c0145ac5>] [<c0145ebc>] [<c0146dbe>] [<c0153aa6>] [<c0157a74>] [<c0157a85>] Jan 16 12:37:38 frost kernel: [ext3_mark_inode_dirty+39/64] [ext3_delete_inode+187/272] [ext3_delete_inode+0/ 272] [ext3_delete_inode+0/272] [iput+246/496] [d_delete+76/112] Jan 16 12:37:38 frost kernel: [<c0157b77>] [<c0154beb>] [<c0154b30>] [<c0154b30>] [<c01428e6>] [<c0140f7c>] Jan 16 12:37:38 frost kernel: [vfs_unlink+307/352] [sys_unlink+153/272] [system_call+51/64] Jan 16 12:37:38 frost kernel: [<c013a703>] [<c013a7c9>] [<c0106ca3>] Jan 16 12:37:38 frost kernel: Jan 16 12:37:38 frost kernel: Code: 0f 0b 83 c4 14 8b 4b 08 89 d8 41 89 4b 08 eb 6a 90 6a 01 68
Hi Jan, I've had three of these reports now within the past month. On Wed, Jan 16, 2002 at 01:39:44PM +0100, Branko F. Graèner wrote:> i'm using redhat 7.2 with ext3 as my primary fs on kernel 2.4.17 + > grsecurity + acl[ the others have been without any other patches on top of plain ext3]> Jan 16 12:37:37 frost kernel: Assertion failure in journal_start() at > transaction.c:225: "handle->h_transaction- > >t_journal == journal"Executive summary: during a delete, ext3 believes we end up writing a dquot entry out to the wrong filesystem. Does this ring any bells with you at all? In the mean time, I'm trying to reproduce it here. The assert failure is invariably followed by a call trace looking like sys_unlink vfs_unlink d_delete iput ext3_delete_inode ext3_mark_inode_dirty ext3_mark_iloc_dirty ext3_free_inode dquot_drop dqput write_dquot do_get_write_access generic_file_write __mark_inode_dirty do_get_write_access ext3_dirty_inode Translated, that means: we're doing an unlink; we start a transaction inside ext3_delete_inode; ext3 has already truncated the file back to zero, releasing all used data blocks; we have already marked the inode dirty (the ext3_mark_inode_dirty is just the remains of that call sitting on the stack); we have entered ext3_free_inode to release the inode itself; we have done the DQUOT_INIT(inode); DQUOT_FREE_INODE(inode); to update quota for the freed inode; we are trying to DQUOT_DROP(inode); to release the dquot struct and flush it to disk; dqput has called write_dquot to write the quota entry out; the generic_file_write into the quota file has tried to update the mtime and ctime timestamps; ext3_dirty_inode has been called to journal the timestamp update and has tried to start a new transaction; the journaling layer has BUG()ed because there is already a transaction open for the ext3_delete_inode (which is fine), but the new transaction is ON A DIFFERENT FILESYSTEM from the old one (which is really really bad news.) Cheers, Stephen