Daniel Dvořák
2006-Sep-01  23:02 UTC
watchdogd_flags followed by panic watchdog timeout, after reboot my rc.conf disappear
Hi all,
 
first of all, I?m sorry maybe for my bad English.
 
We have 2 routers which I maintain in our mesh wireless community network.
 
The Router 1 has 2 atheros adapters, ath0=wistron cm9, ath1=wistron cm10, of
course some sisX, fxpX and so on.
The Router 2 has 1 atheros adapter, ath0=wistron CM10.
 
My R1 panics and even more it freezes very often. Maybe the reason for
panicing and freezing is the same and maybe not.
 
This is not important now, this story is about R2.
 
I started to use "option SW_WATCHDOG" in both my custom kernels on the
R1
and R2 recently in hope, it is some walkaround for freezing at least if not
for panicing.
 
In the /etc/defaults/rc.conf there are not
"watchdogd_flags=""" option, but
I tried to wrote it to my /etc/rc.conf in this way:
 
watchdogd_enable="YES"
watchdogd_flags="-e ping 10.40.0.72 -s 2 -t 1"
 
I saved my rc.conf without any doubt.
 
I did so, because I wanted to instruct watchdogd to execute my command,
common pinging some IP address. I was not satisfied with a trivial file
system check instead.
 
After saving the rc.conf file, I restarted watchdogd deamon at once.
 
... and ... 2 seconds ... my ssh client was disconnected ... unexpected end
of ssh session. :)
 
Okay, maybe something wrong, maybe I did a mistake and it panicked.
 
I was waitting for 3 minutes, but R2 did not react at all.
 
So I went to R2 and I powered off and powered on ... but still it was the
same.
 
After I attached monitor and keyboard, I saw that ifconfig did not configure
any interfaces. Why ?
 
Answear: Because rc.conf had 0 Bytes !!!
 
-rw-r--r--  1 root  wheel      6174 Sep  1 XX:XX rc.conf , I do not remember
time of last modification of file.
 
So the content of rc.conf was completly gone !!!
 
Is it possible at all ?
 
Now I am scared that any modification rc.conf will be mean loss of content.
 
I have kernel dump and backtrace of panic.
 
It is in the attachment.
 
 
If I could help with this, I will do it.
 
And please explain me somebody, how I lost the content of rc.conf file. :-O
Thank you.
 
Daniel
 
P.S.: I am not currently subscribed in the freebsd-stable mailling list, so
use my e-mail address. I am ok with freebsd-current mailling list.
-------------- next part --------------
# cd /usr/obj/usr/src/sys/mykernel/
# kgdb kernel.debug /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Unde
fined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-marcel-freebsd".
Unread portion of the kernel message buffer:
interrupt                   total
irq14: ata0                       325735
irq16: fxp1                            5
irq17: ath0                     50298459
irq18: wi0                       3904083
irq19: sis0 fxp0                20167051
cpu0: timer                    604044908
Total                   678740241
panic: watchdog timeout
Uptime: 3d11h53m45s
Dumping 223 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 223MB (57072 pages) 207 191 175 159 143 127 111 95 79 63 47 31 15
#0  doadump () at pcpu.h:165
165             __asm __volatile("movl %%fs:0,%0" : "=r"
(td));
(kgdb) backtrace
#0  doadump () at pcpu.h:165
#1  0xc059c4ee in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:402
#2  0xc059c7a6 in panic (fmt=0xc081050d "watchdog timeout")
    at /usr/src/sys/kern/kern_shutdown.c:558
#3  0xc0571642 in watchdog_fire () at /usr/src/sys/kern/kern_clock.c:583
#4  0xc0571130 in hardclock (frame=0xc1f44780)
    at /usr/src/sys/kern/kern_clock.c:279
#5  0xc07a4631 in lapic_handle_timer (frame      {cf_vec = 0, cf_fs = 8, cf_es =
40, cf_ds = 40, cf_edi = -1040320488, cf_esi = -1040320512, cf_ebp = -890192676,
cf_ebx = 0, cf_edx = 0, cf_ecx = -1041016416, cf_eax = 1000, cf_eip =
-1063283195, cf_cs = 32, cf_eflags = 524818, cf_esp = -890192644, cf_ss =
-1063305969}) at /usr/src/sys/i386/i386/local_apic.c:623
#6  0xc079eb30 in Xtimerint () at apic_vector.s:137
#7  0xc09f9605 in ?? ()
#8  0xcaf0bd04 in ?? ()
#9  0xc07a609f in cpu_idle () at /usr/src/sys/i386/i386/machdep.c:1134
Previous frame inner to this frame (corrupt stack?)
(kgdb) quit
Stefan Bethke
2006-Sep-04  09:16 UTC
watchdogd_flags followed by panic watchdog timeout, after reboot my rc.conf disappear
[ Please do not crosspost. ] Am 02.09.2006 um 01:01 schrieb Daniel Dvo??k:> In the /etc/defaults/rc.conf there are not "watchdogd_flags=""" > option, but > I tried to wrote it to my /etc/rc.conf in this way: > > watchdogd_enable="YES" > watchdogd_flags="-e ping 10.40.0.72 -s 2 -t 1"You probably would have wanted "-e 'ping 10.40.0.72 -s2 -t1'". Without the single quotes, the command is just ping, which will exit with 64 (EX_USAGE), so the command never completes successfully, and the kernel watchdog timer is never reset. Hence the watchdog timeout. It's a bug in watchdogd that it does not complain about the extra arguments.> I saved my rc.conf without any doubt. > > I did so, because I wanted to instruct watchdogd to execute my > command, > common pinging some IP address. I was not satisfied with a trivial > file > system check instead. > > After saving the rc.conf file, I restarted watchdogd deamon at once. > > ... and ... 2 seconds ... my ssh client was disconnected ... > unexpected end > of ssh session. :)Most likely, the rc.conf changes had not been committed to disk when the watchdog timeout occurred, so they got lost. The watchdog facility is meant to recover the machine from serious problems (like deadlocks, livelocks, or similar). As such, it will not do a proper shutdown, since the machine is probably in a state where the shutdown would also hang. It's a last-ditch effort to get the machine to be responsible again, even if there might be damage due to the sudden panic/reboot. If you want to reboot your router when network connectivity is problematic, I'd set up a cron job to run ping and invoke shutdown -r if it fails instead. Stefan -- Stefan Bethke <stb@lassitu.de> Fon +49 170 346 0140
Dmitry Pryanishnikov
2006-Sep-22  04:15 UTC
watchdogd_flags followed by panic watchdog timeout, after reboot my rc.conf disappear
Hello! On Sat, 2 Sep 2006, Daniel Dvo??k wrote:> I saved my rc.conf without any doubt.I believe you, really ;)> Answear: Because rc.conf had 0 Bytes !!! > > -rw-r--r-- 1 root wheel 6174 Sep 1 XX:XX rc.conf , I do not remember > time of last modification of file. > > So the content of rc.conf was completly gone !!!Yes, because by default "/" is mounted in the following fashion: noasync Metadata I/O should be done synchronously, while data I/O should be done asynchronously. This is the default. -----------------------------------------------------^^^^^^^^^^^^^^^^^^^^ So yes, /etc/rc.conf will become empty if you're just edited it, and then, e.g., power disappears. It's a dangerous situation, because box becomes unreachable via network. To guard against it, you can just mount "/" using synchronous mode: sync All I/O to the file system should be done synchronously. I've just modified my test machine's configuration in this way: /dev/ad0s3a / ufs rw,sync 1 1 and done several times "edit /etc/rc.conf" -> "power off/on" sequence (no RESET key on box). The rc.conf is intact (while w/o "sync" it became empty after my second attempt). Note that this will further decrease FS performance for "/" (I always follow old good RELENG_4 advise NOT to turn softupdates on for "/" also). That's why /tmp and /var are separate partiotions (or just symlinks to SU-enabled /usr) in my typical setup.> And please explain me somebody, how I lost the content of rc.conf file. :-OI hope I've just managed to do that ;)> P.S.: I am not currently subscribed in the freebsd-stable mailling list, so > use my e-mail address. I am ok with freebsd-current mailling list.I think my recipe would be more useful in -stable list (which IMHO is "a must" for reading by the production machines admins), that's why I'm sending to the -stable also. Sincerely, Dmitry -- Atlantis ISP, System Administrator e-mail: dmitry@atlantis.dp.ua nic-hdl: LYNX-RIPE