Brian J. Murrell
2001-Mar-20 21:53 UTC
Interesting interaction between journal recovery and slow boots
For some time now I have been puzzled as to why certain portions of my system boot were quite slow -- but only after journal recoveries. I was fearing that there was some ugly interaction between the recovery and the use of the journal shortly afterward but alas that is not the case. So just in case anybody else is seeing this problem and decides to try to hunt it down, let me save you some time. I am using an 0.0.6b ext3 kernel on a Red Hat 7.0 system. I was finding that if my system did not get shutdown nicely, upon reboot it would of course do journal recoveries. When it was starting the system services after the recoveries, the "portmapper" and "nfslock" startup scripts were taking for ever to complete -- up to a minute or two each. After much investigation (strace and "set -x" in bash scripts) and many times hitting the power switch) I finally figured it out. The portmapper and nfslock scripts are the first ones started after syslogd and klogd are started. During the journal recovery there are copious quantities of messages such as: Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Mar 20 12:27:20 brian-laptop fsck: Scanning for sequence ID 484035 at 9590/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484036 at 9591/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484036 at 9593/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484037 at 9594/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484037 at 9596/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484038 at 9597/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484038 at 9599/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484039 at 9600/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484039 at 9602/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484040 at 9603/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484040 at 9609/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484041 at 9610/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484041 at 9612/10000 Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484042 at 9613/10000 diplayed. On Red Hat systems (and others I am sure) the startup scripts use a package called initlog to capture all of the console output (including the above) and "syslog" it (to /var/log/boot.log on Red Hat 7.0 systems). Well, the copious amounts of JFS records were flooding the syslog server and causing it to back up, blocking further attemtps to write to syslog while the back up was being flushed to disk. Every startup script uses initlog to write the output of the startup to syslog, so the scripts immediatley following the start of the syslogger were blocking, presumably due to syslogd's buffer being full. b.
Andreas Dilger
2001-Mar-20 22:29 UTC
Re: Interesting interaction between journal recovery and slow boots
Brian Murrell writes:> For some time now I have been puzzled as to why certain portions > of my system boot were quite slow -- but only after journal > recoveries. > > After much investigation (strace and "set -x" in bash scripts) and > many times hitting the power switch) I finally figured it out. The > portmapper and nfslock scripts are the first ones started after > syslogd and klogd are started. During the journal recovery there > are copious quantities of messages such as: > > Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: > Mar 20 12:27:20 brian-laptop fsck: Scanning for sequence ID 484035 at 9590/10000 > Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484036 at 9591/10000 > Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484036 at 9593/10000This is purely a user-space issue, coming from e2fsck it appears. I have no such copious logs on my system. Maybe you should undefine the JFS_DEBUG in e2fsprogs/include/linux/jfs.h? I seem to recall that there was a problem with jfs.h, where it enabled JFS_DEBUG whether in kernel mode or not. There is a configure option "--enable-jfs-debug" which should set JFS_DEBUG for us, so no need to always enable it in jfs.h. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
Stephen C. Tweedie
2001-Mar-20 23:33 UTC
Re: Interesting interaction between journal recovery and slow boots
Hi, On Tue, Mar 20, 2001 at 01:53:01PM -0800, Brian J. Murrell wrote:> > I am using an 0.0.6b ext3 kernel on a Red Hat 7.0 system.Which e2fsprogs? I think my last e2fsck release should have jfs_enable_debug set to zero by default already. Cheers, Stephen
Brian J. Murrell
2001-Mar-21 01:32 UTC
Re: Interesting interaction between journal recovery and slow boots
On Tue, Mar 20, 2001 at 11:33:33PM +0000, Stephen C. Tweedie wrote:> Hi, > > On Tue, Mar 20, 2001 at 01:53:01PM -0800, Brian J. Murrell wrote: > > > > I am using an 0.0.6b ext3 kernel on a Red Hat 7.0 system. > > Which e2fsprogs?# rpm -q e2fsprogs e2fsprogs-1.20.WIP.sct-20010216 b.