Matt Bernstein
2005-Nov-16 08:52 UTC
(large, external) data journal BUG (Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:626: "transaction->t_forget == NULL")
Hi, A couple of our important servers, both running FC4 but one i386 and one x86_64, have been crashing recently. They both are running ext3 data=journal with large external journals and high commit intervals. Both machines use the gdth driver for their hardware RAID sets, if that's of any use. I think the hardware is good in both cases. I hope someone finds this data useful enough to be able to fix the bug. IMAP server crash (once only, thus far): Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:626: "transaction->t_forget == NULL" ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at "fs/jbd/checkpoint.c":626 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: loop iptable_nat ip_conntrack_amanda ipt_ULOG ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables w83627hf eeprom lm85 i2c_sensor i2c_isa md5 ipv6 video button battery ac ohci_hcd i2c_amd8111 i2c_amd756 i2c_core shpchp e100 mii tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod gdth sata_sil libata sd_mod scsi_mod Pid: 1485, comm: kjournald Not tainted 2.6.12-1.1398_FC4smp RIP: 0010:[<ffffffff8807d56f>] <ffffffff8807d56f>{:jbd:__journal_drop_transaction+319} RSP: 0018:ffff8100fade9de8 EFLAGS: 00010292 RAX: 0000000000000074 RBX: ffff8100c5f0ea80 RCX: ffffffff8042d908 RDX: ffffffff8042d908 RSI: 0000000000000296 RDI: ffffffff8042d900 RBP: ffff8100f8b55000 R08: ffff81008234c040 R09: 0000000000000030 R10: 0000000000000000 R11: ffffffff8011d680 R12: ffff81003b333080 R13: ffff8100c5f0ea80 R14: ffff8100f8b55000 R15: 0000000000000000 FS: 00002aaaaadfcf00(0000) GS:ffffffff8050d780(0000) knlGS:00000000f7ff16c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaab51a0000 CR3: 00000000e2980000 CR4: 00000000000006e0 Process kjournald (pid: 1485, threadinfo ffff8100fade8000, task ffff8100fb9be880) Stack: ffff8100020ba898 ffff81008caebce8 0000000000000000 ffffffff8807c9d2 ffff8100f8b55024 0000000000000cf7 ffff8100f8b5515c 0000000000000000 0000000000000000 0000000000000000 Call Trace:<ffffffff8807c9d2>{:jbd:journal_commit_transaction+4194} <ffffffff801439f1>{del_timer+113} <ffffffff8807f4d3>{:jbd:kjournald+275} <ffffffff8807eba0>{:jbd:commit_timeout+0} <ffffffff801506e0>{autoremove_wake_function+0} <ffffffff8010f76b>{child_rip+8} <ffffffff8807f3c0>{:jbd:kjournald+0} <ffffffff8010f763>{child_rip+0} Code: 0f 0b fe 15 08 88 ff ff ff ff 72 02 48 83 7b 50 00 74 34 49 RIP <ffffffff8807d56f>{:jbd:__journal_drop_transaction+319} RSP <ffff8100fade9de8> <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0, irqs_disabled():1 Call Trace:<ffffffff8013abd5>{profile_task_exit+21} <ffffffff8013bff2>{do_exit+34} <ffffffff8022178d>{vgacon_cursor+221} <ffffffff8011066d>{die+77} <ffffffff80111203>{do_invalid_op+163} <ffffffff8807d56f>{:jbd:__journal_drop_transaction+319} <ffffffff8010f5b5>{error_exit+0} <ffffffff8011d680>{flat_send_IPI_mask+0} <ffffffff8807d56f>{:jbd:__journal_drop_transaction+319} <ffffffff8807d56f>{:jbd:__journal_drop_transaction+319} <ffffffff8807c9d2>{:jbd:journal_commit_transaction+4194} <ffffffff801439f1>{del_timer+113} <ffffffff8807f4d3>{:jbd:kjournald+275} <ffffffff8807eba0>{:jbd:commit_timeout+0} <ffffffff801506e0>{autoremove_wake_function+0} <ffffffff8010f76b>{child_rip+8} <ffffffff8807f3c0>{:jbd:kjournald+0} <ffffffff8010f763>{child_rip+0} File server crash (has happened a few times now): Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:626: "transaction->t_forget == NULL" ------------[ cut here ]------------ kernel BUG at fs/jbd/checkpoint.c:626! invalid operand: 0000 [#1] SMP Modules linked in: loop nfsd exportfs lockd nfs_acl sunrpc autofs4 ipv6 ip_conntrack_amanda ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button battery ac ohci_hcd i2c_amd756 i2c_core 3c59x mii ns83820 floppy sg ext3 jbd gdth sd_mod scsi_mod CPU: 0 EIP: 0060:[<f88a997c>] Not tainted VLI EFLAGS: 00010296 (2.6.13-1.1526_FC4smp) EIP is at __journal_drop_transaction+0x117/0x2fa [jbd] eax: 00000074 ebx: f064d2e0 ecx: c036fbf4 edx: 00000286 esi: f699a200 edi: c2f50000 ebp: e775df84 esp: c2f50ec4 ds: 007b es: 007b ss: 0068 Process kjournald (pid: 1168, threadinfo=c2f50000 task=c2e64020) Stack: f88acfa8 f88b2e92 f88ada14 00000272 f88ada7c f064d2e0 f699a200 f88a9781 c2f50000 d142414c e775df84 f88a8f61 e775df84 f88a9700 c2f50000 ecb98e60 f064d2e0 000000f5 e85cc160 defc4598 f699a200 00000000 defc4560 f88a7846 Call Trace: [<f88a9781>] __journal_remove_checkpoint+0x56/0x75 [jbd] [<f88a8f61>] __try_to_free_cp_buf+0x31/0x68 [jbd] [<f88a9700>] __journal_clean_checkpoint_list+0x6f/0x9a [jbd] [<f88a7846>] journal_commit_transaction+0x147/0xff1 [jbd] [<c01295f7>] lock_timer_base+0x15/0x2f [<c0129803>] try_to_del_timer_sync+0x45/0x4d [<f88aa68b>] kjournald+0xc5/0x20d [jbd] [<f88aa5c0>] commit_timeout+0x0/0x5 [jbd] [<c01347c2>] autoremove_wake_function+0x0/0x37 [<f88aa5c6>] kjournald+0x0/0x20d [jbd] [<c0101ca1>] kernel_thread_helper+0x5/0xb Code: 44 24 10 7c da 8a f8 c7 44 24 0c 72 02 00 00 c7 44 24 08 14 da 8a f8 c7 44 24 04 92 2e 8b f8 c7 04 24 a8 cf 8a f8 e8 cb 7c 87 c7 <0f> 0b 72 02 14 da 8a f8 8b 4b 2c 85 c9 74 34 c7 44 24 10 c4 d0