Martin_Zielinski at McAfee.com
2011-Apr-08 09:37 UTC
assertion journal->j_running_transaction != NULL fails in commit
Hello! We are using a 2.6.32.25 kernel on a Dell R710 server (16 core Xeon CPU, 12GB RAM). The servers ran for about 4 weeks since updating from 2.6.27 and are using ext3 as filesystem. cat /proc/mounts | grep opt /dev/mapper/vg00-opt /opt ext3 rw,nosuid,nodev,relatime,errors=remount-ro 0 0 The journaling mode is writeback. Suddenly within 3 days, 4 out of 40 machines stop writing data to the /opt partition (all logfiles on this partition are empty since the incident until reboot) /var/log/messages shows that the assertion J_ASSERT(journal->j_running_transaction != NULL); fails. Seemingly a phantom bug. I could not find any report about this assertion. It would be really great, if anyone has an idea, how this can happen or what I could do to track this down. ------------[ cut here ]------------ kernel BUG at fs/jbd/commit.c:342! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.1/net/eth3/statistics/tx_bytes CPU 1 Modules linked in: bridge stp llc iptable_filter ip_tables x_tables i2c_dev i2c_core ipv6 binfmt_misc sbs sbshc pci_slot fan container battery ac parport_pc lp parport sg ses enclosure button thermal processor Pid: 2024, comm: kjournald Not tainted 2.6.32-46.r1-x86_64 #1 Appliance RIP: 0010:[<ffffffff811583db>] [<ffffffff811583db>] journal_commit_transaction+0xb2/0x10f7 RSP: 0018:ffff88031da3fda0 EFLAGS: 00010246 RAX: ffff88031f04a000 RBX: ffff88031ea41424 RCX: 00000000000004af RDX: 00000000000004af RSI: 0000000000000000 RDI: ffff88031ea41400 RBP: ffff88031da3fe60 R08: ffff88031ea41488 R09: 0000000000000009 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88031ea41400 R13: ffff88031ea41400 R14: ffff88031ea41590 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880033020000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007f3289a6f6b0 CR3: 0000000231828000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kjournald (pid: 2024, threadinfo ffff88031da3e000, task ffff88031e6de140) Stack: 0000000000013680 000571ff7e347b40 ffff88031ff98400 ffff88031f04a000 <0> 0000000000000000 0000000000000286 ffff88031da3fe00 ffffffff810522c9 <0> 00000000ffffffff ffff88031ea41590 ffff88031ea414c8 0000000000000286 Call Trace: [<ffffffff810522c9>] ? lock_timer_base+0x26/0x4a [<ffffffff81052392>] ? try_to_del_timer_sync+0xa5/0xb2 [<ffffffff8115c6b1>] kjournald+0x147/0x377 [<ffffffff8105edd4>] ? autoremove_wake_function+0x0/0x38 [<ffffffff8115c56a>] ? kjournald+0x0/0x377 [<ffffffff8105ecbf>] kthread+0x7d/0x86 [<ffffffff8100c9da>] child_rip+0xa/0x20 [<ffffffff8105ec42>] ? kthread+0x0/0x86 [<ffffffff8100c9d0>] ? child_rip+0x0/0x20 Code: 81 ba 53 01 00 00 48 c7 c6 2e b5 8b 81 31 c0 e8 b0 fe ee ff 48 c7 c7 53 b5 8b 81 31 c0 e8 a2 fe ee ff 49 8b 75 50 48 85 f6 75 04 <0f> 0b eb fe 49 83 7d 58 00 74 04 0f 0b eb fe 83 7e 0c 00 49 89 RIP [<ffffffff811583db>] journal_commit_transaction+0xb2/0x10f7 RSP <ffff88031da3fda0> ---[ end trace 7ef4aef5b1834556 ]--- Thanks & Cheers, Martin