Hello. I have installed the ext3 file system on a test system, and sometimes I have a problem: I get an assert from within jbd-kernel.c, and whatever prgram was writing to the disk when this happens is unable to continue. The system is a server I built, which I named "dax". It is running Debian unstable, and I updated it to all the latest packages in Debian unstable as of today. It is running a Linux 2.4.10 kernel. On the Linux Weekly News page I saw someone offering 2.4.10 sources pre-patched for both ext3 and kernel preemption, so I built from those sources. http://lwn.net/2001/0927/a/ext3-preempt.php3 http://lameter.com/kernel/linux-2.4.10-ext3-preempt.tar.gz I enabled both ext3 and kernel preemption. The server is running Linux software RAID, using RAID 1 (mirroring) on both ext3 filesystems. I have seen the same problem and assert several times now. Whenever I see this happen, I always reboot the system, since I am not sure how serious the problem is. The assert text is as follows: -- cut here -- cut here -- cut here -- cut here -- cut here -- Message from syslogd@dax at Tue Oct 9 12:07:47 2001 ... dax kernel: Assertion failure in jbd_preclean_buffer_check() at jbd-kernel.c:80: "(((bh)->b_state & (1UL << BH_Dirty)) != 0)" -- cut here -- cut here -- cut here -- cut here -- cut here -- Here is line 80 from jbd-kernel.c: -- cut here -- cut here -- cut here -- cut here -- cut here -- J_ASSERT_JH(jh, buffer_dirty(bh)); -- cut here -- cut here -- cut here -- cut here -- cut here -- Then on the reboot, the boot message included this: -- cut here -- cut here -- cut here -- cut here -- cut here -- EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. (recovery.c, 253): journal_recover: JBD: recovery, exit status 0, recovered tran sactions 20383 to 20655 (recovery.c, 255): journal_recover: JBD: Replayed 5021 and revoked 37/134 blocks kjournald starting. Commit interval 5 seconds EXT3-fs: md(9,0): orphan cleanup on readonly fs ext3_orphan_cleanup: truncating inode 100113 to 894 bytes EXT3-fs: md(9,0): 1 truncate cleaned up EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 204k freed Unable to find swap-space signature Adding Swap: 249440k swap-space (priority -1) EXT3 FS 2.4-0.9.9, 5 Sep 2001 on md(9,0), internal journal kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.9, 5 Sep 2001 on md(9,1), internal journal EXT3-fs: mounted filesystem with ordered data mode. -- cut here -- cut here -- cut here -- cut here -- cut here -- The problem happened again today, as I was using aptitude(1) to update the packages on the system. aptitude downloaded all the packages successfully, but as it began to unpack them all the error occurred. Unpacking many packages does cause a lot of disk activity, so perhaps the problem is related to a lot of disk activity. I rebooted, and again ran aptitude; it managed to unpack a few more packages when the error ocurred again. I rebooted, tried again, hit the problem again, rebooted, tried again, and finished the package unpacking and installation without further errors. (It made progress on unpacking the packages each time, and the last time it had only a few packages left.) I would like to help in finding and fixing the problem, if I can. If there is some sort of extra logging or debugging that I can enable, I am very willing to do it. I have no experience with kernel or file system debugging, but I am an experienced software engineer, and I can spare some time to work on this. -- Steve R. Hastings "Vita est" steve@hastings.org http://www.blarg.net/~steveha
Hello. I have installed the ext3 file system on a test system, and sometimes I have a problem: I get an assert from within jbd-kernel.c, and whatever prgram was writing to the disk when this happens is unable to continue. The system is a server I built, which I named "dax". It is running Debian unstable, and I updated it to all the latest packages in Debian unstable as of today. It is running a Linux 2.4.10 kernel. On the Linux Weekly News page I saw someone offering 2.4.10 sources pre-patched for both ext3 and kernel preemption, so I built from those sources. http://lwn.net/2001/0927/a/ext3-preempt.php3 http://lameter.com/kernel/linux-2.4.10-ext3-preempt.tar.gz I enabled both ext3 and kernel preemption. The server is running Linux software RAID, using RAID 1 (mirroring) on both ext3 filesystems. I have seen the same problem and assert several times now. Whenever I see this happen, I always reboot the system, since I am not sure how serious the problem is. The assert text is as follows: -- cut here -- cut here -- cut here -- cut here -- cut here -- Message from syslogd@dax at Tue Oct 9 12:07:47 2001 ... dax kernel: Assertion failure in jbd_preclean_buffer_check() at jbd-kernel.c:80: "(((bh)->b_state & (1UL << BH_Dirty)) != 0)" -- cut here -- cut here -- cut here -- cut here -- cut here -- Here is line 80 from jbd-kernel.c: -- cut here -- cut here -- cut here -- cut here -- cut here -- J_ASSERT_JH(jh, buffer_dirty(bh)); -- cut here -- cut here -- cut here -- cut here -- cut here -- Then on the reboot, the boot message included this: -- cut here -- cut here -- cut here -- cut here -- cut here -- EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. (recovery.c, 253): journal_recover: JBD: recovery, exit status 0, recovered tran sactions 20383 to 20655 (recovery.c, 255): journal_recover: JBD: Replayed 5021 and revoked 37/134 blocks kjournald starting. Commit interval 5 seconds EXT3-fs: md(9,0): orphan cleanup on readonly fs ext3_orphan_cleanup: truncating inode 100113 to 894 bytes EXT3-fs: md(9,0): 1 truncate cleaned up EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 204k freed Unable to find swap-space signature Adding Swap: 249440k swap-space (priority -1) EXT3 FS 2.4-0.9.9, 5 Sep 2001 on md(9,0), internal journal kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.9, 5 Sep 2001 on md(9,1), internal journal EXT3-fs: mounted filesystem with ordered data mode. -- cut here -- cut here -- cut here -- cut here -- cut here -- The problem happened again today, as I was using aptitude(1) to update the packages on the system. aptitude downloaded all the packages successfully, but as it began to unpack them all the error occurred. Unpacking many packages does cause a lot of disk activity, so perhaps the problem is related to a lot of disk activity. I rebooted, and again ran aptitude; it managed to unpack a few more packages when the error ocurred again. I rebooted, tried again, hit the problem again, rebooted, tried again, and finished the package unpacking and installation without further errors. (It made progress on unpacking the packages each time, and the last time it had only a few packages left.) I would like to help in finding and fixing the problem, if I can. If there is some sort of extra logging or debugging that I can enable, I am very willing to do it. I have no experience with kernel or file system debugging, but I am an experienced software engineer, and I can spare some time to work on this. -- Steve R. Hastings "Vita est" steve@hastings.org http://www.blarg.net/~steveha
Hi, On Tue, Oct 09, 2001 at 03:15:45PM -0700, Steve R. Hastings wrote:> Hello. I have installed the ext3 file system on a test system, and > sometimes I have a problem: I get an assert from within jbd-kernel.c, > and whatever prgram was writing to the disk when this happens is unable > to continue. > > The system is a server I built, which I named "dax". It is running > Debian unstable, and I updated it to all the latest packages in Debian > unstable as of today. It is running a Linux 2.4.10 kernel. On the > Linux Weekly News page I saw someone offering 2.4.10 sources pre-patched > for both ext3 and kernel preemption, so I built from those sources. > > http://lwn.net/2001/0927/a/ext3-preempt.php3 > http://lameter.com/kernel/linux-2.4.10-ext3-preempt.tar.gzCould you please try to reproduce this with a couple of different kernels? First, if you could rebuild with CONFIG_BUFFER_DEBUG and CONFIG_JBD_DEBUG set and mail me the log if the problem recurs, that will give me a much better idea of what happened. Second, you should also try the current -ac kernel. I've no idea whether this is a genuine ext3 bug, an interaction with the huge vm/vfs changes in 2.4.10, or a preempt kernel interaction. The tests above should help to narrow this down. Thanks, Stephen