It was brought to my attention that on FreeBSD with a hardware watchdog in use (e.g. ichwd(4) + watchdogd(8)), once the kernel panics, it's quite possible for the watchdog to fire (reboot the system) once the panic has happened. This issue basically inhibits the ability for a system with a hardware watchdog in place to be able to successfully complete doadump(). There's confirmations of this problem dating all the way back to 2005: PR kern/82219, opened in 2005: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/82219 PR bin/145183, opened in 2010 (not sure if this is the same): http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/145183 Confirmation that the problem still exists today (first paragraph): http://lists.freebsd.org/pipermail/freebsd-stable/2010-August/058350.html On Linux, it appears that they've worked around this problem by using what's called a "pretimeout" (basically a way to get the watchdog to become delayed, thus not firing during important tasks): http://www.mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt According to watchdog(4), it looks like the kernel setting WD_PASSIVE immediately upon entering panic would solve the problem, but the BUGS section indicates WD_PASSIVE hasn't been implemented (returns ENOSYS). Thoughts on solving this dilemma? -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
In message <20100823103412.GA21044@icarus.home.lan>, Jeremy Chadwick writes:>It was brought to my attention that on FreeBSD with a hardware watchdog >in use (e.g. ichwd(4) + watchdogd(8)), once the kernel panics, it's >quite possible for the watchdog to fire (reboot the system) once the >panic has happened. This issue basically inhibits the ability for a >system with a hardware watchdog in place to be able to successfully >complete doadump().The good news is that the watchdog hopefully gets your system back on the air, even if the dumping hangs. If it is decided to reset/disarm the watchdog before a dump, please make that a sysctl tunable. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.
On Mon, Aug 23, 2010 at 3:34 AM, Jeremy Chadwick <freebsd@jdc.parodius.com> wrote:> PR bin/145183, opened in 2010 (not sure if this is the same): > http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/145183Speaking for this I think we can do it by issuing an explicit watchdog(8) command on shutdown (like, set the timeout to several minutes) in /etc/rc.d/watchdog's shutdown section. This would be trivial to implement. Additionally, I'd personally think init(8) should be taught about watchdog facility. For panics, I think we should have the disk driver to "pat" watchdog rather than disabling it in their write success callback? Another thing is that ddb should be able to disable watchdog when it's waiting for keyboard input (or received first user input) I think. Cheers, -- Xin LI <delphij@delphij.net> http://www.delphij.net