Jason Harmening
2006-Mar-16 18:47 UTC
[6.1-PRERELEASE/amd64] Kernel panic during heavy UFS traffic
Last night I ran into a series of kernel panics that seemed to be related to heavy UFS traffic. I ran into two consecutive panics when trying to mount a UFS-formatted DVD-RAM as a regular user (though not when I mounted it as root). The system seemed to actually succeed in mounting the disk, as it was marked dirty after the ensuing panic. Upon rebooting after the second panic, I saw another two consecutive panics which happened whenever I tried to do something fairly disk-intensive (e.g. starting the X server + KDE) while the bgfsck was still running from the last panic. Ultimately I rebooted in single-user mode, ran fsck manually, and have experienced no further panics. I suspect these panics may be related to UFS deadlocks, as in all cases the application that was attempting disk access hung for several seconds before the panic, followed by a few seconds of total system hang, followed by the automatic reboot. I'm running 6.1-PRELEASE/amd64 from 12 March on an Athlon 64 x2 (SMP) with SCHED_ULE+PREEMPTION--dangerous combination I know, but it's been rock solid for months until now. If anyone is interested, I'll try to reproduce this panic with a dump/backtrace. It may be one of the UFS deadlock issues that's already under investigation for 6.1-RELEASE. Thanks, Jason Harmening
Kris Kennaway
2006-Mar-16 19:54 UTC
[6.1-PRERELEASE/amd64] Kernel panic during heavy UFS traffic
On Thu, Mar 16, 2006 at 12:45:07PM -0600, Jason Harmening wrote:> Last night I ran into a series of kernel panics that seemed to be related to > heavy UFS traffic. I ran into two consecutive panics when trying to mount a > UFS-formatted DVD-RAM as a regular user (though not when I mounted it as > root). The system seemed to actually succeed in mounting the disk, as it > was marked dirty after the ensuing panic. Upon rebooting after the second > panic, I saw another two consecutive panics which happened whenever I tried > to do something fairly disk-intensive (e.g. starting the X server + KDE) > while the bgfsck was still running from the last panic. Ultimately I > rebooted in single-user mode, ran fsck manually, and have experienced no > further panics. I suspect these panics may be related to UFS deadlocks, as > in all cases the application that was attempting disk access hung for > several seconds before the panic, followed by a few seconds of total system > hang, followed by the automatic reboot. > > I'm running 6.1-PRELEASE/amd64 from 12 March on an Athlon 64 x2 (SMP) with > SCHED_ULE+PREEMPTION--dangerous combination I know, but it's been rock solid > for months until now. If anyone is interested, I'll try to reproduce this > panic with a dump/backtrace. It may be one of the UFS deadlock issues > that's already under investigation for 6.1-RELEASE.Yeah, we need a trace. Kris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060316/1074b08e/attachment.pgp
Jason Harmening
2006-Apr-01 21:49 UTC
[6.1-PRERELEASE/amd64] Kernel panic during heavy UFS traffic
On Saturday 18 March 2006 19:39, you wrote:> On Sat, Mar 18, 2006 at 07:29:25PM -0600, Jason Harmening wrote: > > I finally managed to reproduce the mount panic on the console: > > > > CORONA% mount /dev/acd0 /home/jason/dvdram > > g_vfs_done():acd0[READ(offset=114688, length=16384)]error = 5 > > panic: mount: lost mount > > cpuid = 0 > > KDB: stack backtrace: > > kdb_backtrace() at kdb_backtrace+0x37 > > panic() at panic+0x1d1 > > vfs_domount() at vfs_domount+0x9ae > > vfs_donmount() at vfs_donmount+0x400 > > kernel_mount() at kernel_mount+0x40 > > ffs_cmount() at ffs_cmount+0x7c > > mount() at mount+0x1e3 > > syscall() at syscall+0x3a4 > > Xfast_syscall() at Xfast_syscall+0xa8 > > --- syscall (21, FreeBSD ELF64, mount), rip = 0x80067e0dc, rsp > > 0x7fffffffdc88, rbp = 0x7 > > fffffffe748 --- > > Uptime: 1m34s > > Dumping 1023 MB (2 chunks) > > > > I'm starting to worry this may be a hardware issue... > > Yes, it could well be (or bad media) - the drive returned an I/O error > (error 5 = EIO) when you tried to mount the media. > > > If it is, would there be > > a more elegant way for the OS to handle a failed removable drive mount > > besides panicking? > > In principle, yes. I don't know if there's any hope of getting it > fixed in time for 6.1, but please file a PR with this trace.I filed PR 94669 for this issue and finally took some time to do some further investigation on my own. I've found the following: 1. I can invariably mount the DVD-RAM successfully if I first do some operation on the disk that doesn't require it to be mounted (namely, an fsck), or if I've previously mounted successfully and haven't since ejected the media. I will only see the panic if I try to mount immediately after inserting the media, and then not 100% of the time. This leads me to believe there may be some confusion between the drive, the ATAPI CD/DVD driver, and the VFS subsystem as to when, exactly, the drive is ready for mounting. 2. I looked at the VFS sources for RELENG_6 and found the point at which the panic seems to be occurring--lines 891-892 of vfs_mount.c: if (VFS_ROOT(mp, LK_EXCLUSIVE, &newdp, td)) panic("mount: lost mount"); So essentially the invocation of mp->mount_op->vfs_root (In this case, I'm guessing whatever the vfs_root function for UFS is) returns an error. Would it be safe to handle this error by returning an error code instead of panicking? Or would this have undesirable ramifications for auto-mounted filesystems on fixed disks, or could the failed vfs_root possibly induce side-effects that would leave the kernel in an unstable state? I don't know much about the FreeBSD VFS, but I'm willing to take a crack at fixing/testing this. Thanks, Jason> > Kris