Yuval Yeret
2003-Jan-08 16:34 UTC
FW: 2.4.18-14 kernel stuck during ext3 umount with ping still responding
> > Hi, > > I'm running a 2.4.18-14 kernel with a heavy IO profile using > ext3 over RAID 0+1 volumes. > > From time to time I get a black screen stuck machine while > trying to umount a volume during an IO workload (as part of a > failback solution - but after killing all IO processes ), > with ping still responding, but everything else mostly dead. > > I tried using the forcedumount patch to solve this problem - > to no avail. Also tried upgrading the qlogic drivers to the > latest drivers from Qlogic. > > After one of the occurences I managed to get some output > using the sysrq keys. > > This seems similar to what is described in > http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77508 but > there is no solution yet... > > What I have here is what I managed to copy down (for some > reason pgup/pgdown didn't work so not all information is > full...) and my sorted /proc/ksyms together with a manual > lookup of the call trace: > > process umount > EIP c01190b8 (set_running_and_schedule) > call trace: > c01144c9 f25f9ec0 IO_APIC_get_PCI_irq_vector > c010a8b0 f25f9ed0 enable_irq > c014200c f25f9ef0 fsync_buffers_list > c0155595 f25f9efc clear_inode > c015553d f25f9f2c invalidate_inodes > c01461d8 f25f9f78 get_super > c014a629 f25f9f94 path_release > c0157c58 f25f9fc0 sys_umount > c0108cab sys_sigaltstack > > > attached /proc/ksyms | sort > <<ksyms>> > Any idea what can cause this ? > > > Thanks, > Yuval > > > -- > Yuval Yeret > Exanet > yuval@exanet.com > http://www.exanet.com > Tel. 972-9-9717782 > Fax. 972-9-9717778 > > > <<Yuval Yeret (E-mail).vcf>>
Stephen C. Tweedie
2003-Jan-09 23:00 UTC
Re: FW: 2.4.18-14 kernel stuck during ext3 umount with ping still responding
Hi, On Wed, 2003-01-08 at 16:34, Yuval Yeret wrote:> > I'm running a 2.4.18-14 kernel with a heavy IO profile using > > ext3 over RAID 0+1 volumes.> > After one of the occurences I managed to get some output > > using the sysrq keys.Serial console is useful for getting such traces reliably.> > This seems similar to what is described in > > http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77508 but > > there is no solution yet...That particular bug was different, and should be fixed in current kernels --- it was due to a low-latency scheduling point in invalidate_bdev, whereas your trace seems to be in fsync_buffers_list() --- somewhere completely different. Does the errata kernel fix things? --Stephen