Hi. I have a bunch of FreeBSDs that hangs (and I really want to do something to fight this). May be it's the zfs or may be it's the pf (I also have a bunch of really stable ones, so it's hard to isolate and tell). Since 9.x hang more often I suppose it's pf. I use ichwd.ko and watchdogd to reboot a machine when it hangs. It works pretty well; I'm also working on a various WITNESS/INVARIANTS stuff and I'm trying to report it to gnats, but obviously it would be much nicer if the system would panic and leave some debuggable core after a hang (so far I don't have any, so I can only guess). I've read about software watchdog in kernel and I doesn'y quite understand: it's said that kernel software watchdog is able to panic when a deadlock occurs. Can this be achieved with ichwd ? Another one: as far as I understand ichwd reboots my machine on a hardware level, right ? So am I right saying that software watchdog can be, in theory, also deadlocked, thus, being kinda less reliable solution ? Thanks. Eugene.
On 2/20/13 11:36 AM, Eugene M. Zheganin wrote:> Hi. > > I have a bunch of FreeBSDs that hangs (and I really want to do > something to fight this). May be it's the zfs or may be it's the pf (I > also have a bunch of really stable ones, so it's hard to isolate and > tell). Since 9.x hang more often I suppose it's pf. I use ichwd.ko and > watchdogd to reboot a machine when it hangs. It works pretty well; > I'm also working on a various WITNESS/INVARIANTS stuff and I'm trying > to report it to gnats, but obviously it would be much nicer if the > system would panic and leave some debuggable core after a hang (so far > I don't have any, so I can only guess). I've read about software > watchdog in kernel and I doesn'y quite understand: it's said that > kernel software watchdog is able to panic when a deadlock occurs. Can > this be achieved with ichwd ? Another one: as far as I understand > ichwd reboots my machine on a hardware level, right ? So am I right > saying that software watchdog can be, in theory, also deadlocked, > thus, being kinda less reliable solution ? >Yes all your assumptions are correct. There is an 'enhanced watchdog' branch that I am working on that offers a "pre-watchdog timeout panic". However since this is done via the software you may not get your pre-timeout panic and only have a reboot. Later revisions may include facilities for generating NMI to trigger panic/logs and the followed by a hard reset by external hardware. Perhaps ichwd offers ability to send NMI? Let me check sources. -Alfred
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/20/2013 11:36, Eugene M. Zheganin wrote:> Hi. > > I have a bunch of FreeBSDs that hangs (and I really want to do > something to fight this). May be it's the zfs or may be it's the pf > (I also have a bunch of really stable ones, so it's hard to isolate > and tell). Since 9.x hang more often I suppose it's pf. I use > ichwd.ko and watchdogd to reboot a machine when it hangs. It works > pretty well; I'm also working on a various WITNESS/INVARIANTS stuff > and I'm trying to report it to gnats, but obviously it would be > much nicer if the system would panic and leave some debuggable core > after a hang (so far I don't have any, so I can only guess). I've > read about software watchdog in kernel and I doesn'y quite > understand: it's said that kernel software watchdog is able to > panic when a deadlock occurs. Can this be achieved with ichwd ? > Another one: as far as I understand ichwd reboots my machine on a > hardware level, right ? So am I right saying that software watchdog > can be, in theory, also deadlocked, thus, being kinda less reliable > solution ?I just want to /metoo that I have 32bit/i386 box running zfs, pf and - -current that is hardlocking randomly (usually has an uptime for a few days to a couple weeks). SW_WATCHDOG won't fire when it locks so it must be locking pretty fast. I just noticed that ichwd will load on this box, so I'll try that instead, but now I'm wondering if the SW_WATCHDOG kernel will interfere or rather if watchdogd is smart enough to handle both? This box used to occasionally panic on the ZFS stack panic so I did the KSTACK_PAGES=4 change to the kernel and now it just hardlocks. I'm not saying they are related. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iEYEARECAAYFAlEnoRgACgkQrDN5kXnx8ybJeACbBjpHrQxeZhkjavnoeBgjEJ9W dDUAnipfLgIuUCbM6mk6/bcrl7AphHxC =84T/ -----END PGP SIGNATURE-----