Getting this oops on one of our production servers pretty much hangs the server. Do we have a corrupted Journal? how would ewe rebuild it? Any idea how to recover from it? Assertion failure in journal_bmap() at journal.c:636: "ret != 0" invalid operand: 0000 CPU: 0 EIP: 0010:[journal_bmap+70/96] Not tainted EIP: 0010:[<c016b646>] Not tainted EFLAGS: 00010282 eax: 00000044 ebx: 00000000 ecx: 00000002 edx: f7121f64 esi: d93e88e0 edi: f122f700 ebp: f7b58e00 esp: f7a99e50 ds: 0018 es: 0018 ss: 0018 Process kjournald (pid: 14, stackpage=f7a99000) Stack: c0352a20 c034e401 c034e3a5 0000027c c034e3f8 f7b58e00 c016b5f7 f7b58e00 00001b0c d93e88e0 c0168c8d f7b58e00 f7b58ee4 00000000 00000fdc dc757024 00000002 db247c60 f28df240 f122f730 00000001 00000070 00000001 e2bc9ae0 Call Trace: [journal_next_log_block+103/112] [journal_commit_transaction+1661/3856] [do_softirq+123/224] [do_IRQ+221/240] [schedule+1113/1296] Call Trace: [<c016b5f7>] [<c0168c8d>] [<c011cdcb>] [<c01089bd>] [<c0115e59>] [kjournald+310/464] [commit_timeout+0/16] [kernel_thread+38/48] [kjournald+0/464] [<c016aff6>] [<c016aea0>] [<c0105616>] [<c016aec0>] Code: 0f 0b 83 c4 14 eb 05 8d 76 00 89 c3 89 d8 5b c3 8d 76 00 8d -- Martial Herbaut ---------------
Getting this oops on one of our production servers pretty much hangs the server. Do we have a corrupted Journal? how would we rebuild it? Any idea how to recover from it? Assertion failure in journal_bmap() at journal.c:636: "ret != 0" invalid operand: 0000 CPU: 0 EIP: 0010:[journal_bmap+70/96] Not tainted EIP: 0010:[<c016b646>] Not tainted EFLAGS: 00010282 eax: 00000044 ebx: 00000000 ecx: 00000002 edx: f7121f64 esi: d93e88e0 edi: f122f700 ebp: f7b58e00 esp: f7a99e50 ds: 0018 es: 0018 ss: 0018 Process kjournald (pid: 14, stackpage=f7a99000) Stack: c0352a20 c034e401 c034e3a5 0000027c c034e3f8 f7b58e00 c016b5f7 f7b58e00 00001b0c d93e88e0 c0168c8d f7b58e00 f7b58ee4 00000000 00000fdc dc757024 00000002 db247c60 f28df240 f122f730 00000001 00000070 00000001 e2bc9ae0 Call Trace: [journal_next_log_block+103/112] [journal_commit_transaction+1661/3856] [do_softirq+123/224] [do_IRQ+221/240] [schedule+1113/1296] Call Trace: [<c016b5f7>] [<c0168c8d>] [<c011cdcb>] [<c01089bd>] [<c0115e59>] [kjournald+310/464] [commit_timeout+0/16] [kernel_thread+38/48] [kjournald+0/464] [<c016aff6>] [<c016aea0>] [<c0105616>] [<c016aec0>] Code: 0f 0b 83 c4 14 eb 05 8d 76 00 89 c3 89 d8 5b c3 8d 76 00 8d -- Martial Herbaut ---------------
Actually the previous error I reported was on a 2.4.16 kernel. upgrading to 2.4.18 made it stop hanging the system and got us to the real error message: kernel: journal_bmap: journal block not found at offset 6924 on sd(8,2) kernel: Aborting journal on device sd(8,2). kernel: ext3_abort called. kernel: EXT3-fs abort (device sd(8,2)): ext3_journal_start: Detected aborted journal kernel: Remounting filesystem read-only any idea how we can recover from here? is there a way to rebuild an ext3 journal?> > > Getting this oops on one of our production servers > pretty much hangs the server. > Do we have a corrupted Journal? how would ewe rebuild it? > Any idea how to recover from it? > > > Assertion failure in journal_bmap() at journal.c:636: "ret != 0" > invalid operand: 0000 > CPU: 0 > EIP: 0010:[journal_bmap+70/96] Not tainted > EIP: 0010:[<c016b646>] Not tainted > EFLAGS: 00010282 > eax: 00000044 ebx: 00000000 ecx: 00000002 edx: f7121f64 > esi: d93e88e0 edi: f122f700 ebp: f7b58e00 esp: f7a99e50 > ds: 0018 es: 0018 ss: 0018 > Process kjournald (pid: 14, stackpage=f7a99000) > Stack: c0352a20 c034e401 c034e3a5 0000027c c034e3f8 f7b58e00 c016b5f7 > f7b58e00 > 00001b0c d93e88e0 c0168c8d f7b58e00 f7b58ee4 00000000 00000fdc dc757024 > 00000002 db247c60 f28df240 f122f730 00000001 00000070 00000001 e2bc9ae0 > Call Trace: [journal_next_log_block+103/112] > [journal_commit_transaction+1661/3856] [do_softirq+123/224] > [do_IRQ+221/240] [schedule+1113/1296] > Call Trace: [<c016b5f7>] [<c0168c8d>] [<c011cdcb>] [<c01089bd>] > [<c0115e59>] > [kjournald+310/464] [commit_timeout+0/16] [kernel_thread+38/48] > [kjournald+0/464] > [<c016aff6>] [<c016aea0>] [<c0105616>] [<c016aec0>] > > Code: 0f 0b 83 c4 14 eb 05 8d 76 00 89 c3 89 d8 5b c3 8d 76 00 8d > > > >-- Martial Herbaut --------------- Server101 Fast and Reliable Hosting! http://www.server101.com/
Martial Herbaut wrote:> > Actually the previous error I reported was on a 2.4.16 kernel. > > upgrading to 2.4.18 made it stop hanging the system and got us to the real > error message: > > kernel: journal_bmap: journal block not found at offset 6924 on sd(8,2) > kernel: Aborting journal on device sd(8,2). > kernel: ext3_abort called. > kernel: EXT3-fs abort (device sd(8,2)): > ext3_journal_start: Detected aborted journal > kernel: Remounting filesystem read-only > > any idea how we can recover from here? is there a way to rebuild an ext3 > journal?Have you tried forcing a fsck of that filesystem?
fsck.ext3 -f /dev/sda2 Yes many times, it then recovers, boots up normally then a few errors later same error all over again.. :( -> loop fsck-error-fsck The worse thing is that I also since then forced that partition (root) to ext2 in the fstab but for some reason unknown to me it persists to wanting to commit to journal anyway even though the partition shows mounted ext2.. what do you think?> > error message: > > > > kernel: journal_bmap: journal block not found at offset 6924 on sd(8,2) > > kernel: Aborting journal on device sd(8,2). > > kernel: ext3_abort called. > > kernel: EXT3-fs abort (device sd(8,2)): > > ext3_journal_start: Detected aborted journal > > kernel: Remounting filesystem read-only > > > > any idea how we can recover from here? is there a way to rebuild an ext3 > > journal? > > Have you tried forcing a fsck of that filesystem? >