thr3ads.net - freebsd stable - panic in callout_reset: bad link in callwheel [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Andriy Gapon

2009-Jan-24 03:00 UTC

panic in callout_reset: bad link in callwheel

System: FreeBSD 7.1-STABLE i386 (revision 187025)

Panic message:
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xd2006ad0
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0xc05623aa
stack pointer           = 0x28:0xdd4f6c34
frame pointer           = 0x28:0xdd4f6c40
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = 13 (swi4: clock)
trap number             = 12
panic: page fault
KDB: stack backtrace:
db_trace_self_wrapper(c074bb2f,dd4f6b14,c05514af,c0749d10,c07b85e0,...)
at 0xc0478466 = db_trace_self_wrapper+0x26
kdb_backtrace(c0749d10,c07b85e0,c073b02b,dd4f6b20,dd4f6b20,...) at
0xc057a639 = kdb_backtrace+0x29
panic(c073b02b,c0761cb4,c36104dc,1,1,...) at 0xc05514af = panic+0xaf
trap_fatal(c0761bb6,c,c3a89460,c3a8965c,c,...) at 0xc0705723 trap_fatal+0x353
trap(dd4f6bf4) at 0xc07060ca = trap+0x10a
calltrap() at 0xc06f463b = calltrap+0x6
--- trap 0xc, eip = 0xc05623aa, esp = 0xdd4f6c34, ebp = 0xdd4f6c40 ---
callout_reset(c3a8552c,13,c0561940,c3a852b8,c3612690,...) at 0xc05623aa
= callout_reset+0x14a
realitexpire(c3a852b8,2d6100,c3612690,1,dd4f6cbc,...) at 0xc0561ab6
realitexpire+0x176
softclock(0,0,c0747617,4a1,0,...) at 0xc0562c25 = softclock+0x235
ithread_loop(c35e5a20,dd4f6d38,0,0,0,...) at 0xc053268b = ithread_loop+0x1cb
fork_exit(c05324c0,c35e5a20,dd4f6d38) at 0xc052eda1 = fork_exit+0xa1
fork_trampoline() at 0xc06f46b0 = fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xdd4f6d70, ebp = 0 ---

Some debugging:
(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0xc05512b3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc05514ff in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0705723 in trap_fatal (frame=0xdd4f6bf4, eva=3523242704) at
/usr/src/sys/i386/i386/trap.c:939
#4  0xc07060ca in trap (frame=0xdd4f6bf4) at
/usr/src/sys/i386/i386/trap.c:320
#5  0xc06f463b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
#6  0xc05623aa in callout_reset (c=0xc3a8552c, to_ticks=19,
ftn=0xc0561940 <realitexpire>, arg=0xc3a852b8) at
/usr/src/sys/kern/kern_timeout.c:471
#7  0xc0561ab6 in realitexpire (arg=0xc3a852b8) at
/usr/src/sys/kern/kern_time.c:684
#8  0xc0562c25 in softclock (dummy=0x0) at
/usr/src/sys/kern/kern_timeout.c:274
#9  0xc053268b in ithread_loop (arg=0xc35e5a20) at
/usr/src/sys/kern/kern_intr.c:1088
#10 0xc052eda1 in fork_exit (callout=0xc05324c0 <ithread_loop>,
arg=0xc35e5a20, frame=0xdd4f6d38) at /usr/src/sys/kern/kern_fork.c:804
#11 0xc06f46b0 in fork_trampoline () at
/usr/src/sys/i386/i386/exception.s:264
(kgdb) fr 6
#6  0xc05623aa in callout_reset (c=0xc3a8552c, to_ticks=19,
ftn=0xc0561940 <realitexpire>, arg=0xc3a852b8) at
/usr/src/sys/kern/kern_timeout.c:471
471     /usr/src/sys/kern/kern_timeout.c: No such file or directory.
        in /usr/src/sys/kern/kern_timeout.c
(kgdb) p *c
$1 = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev
= 0xd2006ad0}}, c_time = 2974104, c_arg = 0xc3a852b8, c_func 0xc0561940
<realitexpire>, c_mtx = 0x0, c_flags = 22}
(kgdb) p c->c_links.tqe.tqe_prev
$2 = (struct callout **) 0xd2006ad0
(kgdb) p *c->c_links.tqe.tqe_prev
Cannot access memory at address 0xd2006ad0
(kgdb) p callwheel[c->c_time & callwheelmask]
$4 = {tqh_first = 0x0, tqh_last = 0xd2006ad0}

The code:
467         c->c_arg = arg;
468         c->c_flags |= (CALLOUT_ACTIVE | CALLOUT_PENDING);
469         c->c_func = ftn;
470         c->c_time = ticks + to_ticks;
471         TAILQ_INSERT_TAIL(&callwheel[c->c_time & callwheelmask],
472                           c, c_links.tqe);

Additional info:
I recently added some new memory to this system.
The memory survived several passes of memtest86 before booting to
FreeBSD. It also survived one pass after the incident.
Still I wouldn't exclude a possibility of it being bad.

Small analysis:
If this is not because of bad memory, then it probably means that a
struct callout was earlier deallocated somewhere (possibly as a part of
a bigger object), but not unregistered/removed from callout mechanism.
I guess it is quite hard to backtrack that now.
All I can say that was nothing "funny" happening on the machine from
the
point of view of attaching/detaching any HW or loading/unloading modules
or anything like that. Just "normal" work. So it could be something
that
it is always "on", like network stack or ata subsystem, etc.

-- 
Andriy Gapon

Andriy Gapon

2009-Jan-28 04:27 UTC

head link

problem with "cold" hardware? [Was: panic in callout_reset: bad link in callwheel]

on 24/01/2009 13:00 Andriy Gapon said the following:
[snip]> Additional info:
> I recently added some new memory to this system.
> The memory survived several passes of memtest86 before booting to
> FreeBSD. It also survived one pass after the incident.
> Still I wouldn't exclude a possibility of it being bad.
I think that I established that the crash was because of hardware issue.
I had another panic at a different place but with the similar
diagnostics - bad pointer passed to a call. Fortunately, the second time
the pointer was to a well-known long-lived object. So I was able to
compare the bad pointer to an actual address. It turned out that a
single bit was flipped.
Then I realized that in both cases I saw panics after "very cold"
boots,
i.e. the system was powered down for more than 1 hour before the boot.
So I performed memtest86 run again, this time also after a long
power-off. And it reported lots of errors.
I restarted memtest86 10 minutes later and then it could not find any
errors in any tests.

Previously I heard about problems with hardware running hot, but not
with it being "cold". I put the word in quotes, because the system is
in
a room with normal room temperature.

Any guesses what hardware part might be acting up like this?


-- 
Andriy Gapon

Andrew Snow

2009-Jan-28 11:22 UTC

head link

problem with "cold" hardware? [Was: panic in callout_reset: bad link in callwheel]

Andriy Gapon wrote:> Previously I heard about problems with hardware running hot, but not
> with it being "cold". I put the word in quotes, because the
system is in
> a room with normal room temperature.
> 
> Any guesses what hardware part might be acting up like this?
Power supply.  Give all the capacitors a visual check.  Or you may be 
drawing too much power from your rated supply.


- Andrew

Ulf Zimmermann

2009-Jan-28 12:55 UTC

head link

problem with "cold" hardware? [Was: panic in callout_reset: bad link in callwheel]

On Thu, Jan 29, 2009 at 06:22:26AM +1100, Andrew Snow
wrote:> Andriy Gapon wrote:
> >Previously I heard about problems with hardware running hot, but not
> >with it being "cold". I put the word in quotes, because the
system is in
> >a room with normal room temperature.
> >
> >Any guesses what hardware part might be acting up like this?
> 
> Power supply.  Give all the capacitors a visual check.  Or you may be 
> drawing too much power from your rated supply.
Another thing could be bad soldering of the memory slot. It might not have a
full contact when at room temperature, but as it heats up by 10-20C inside
the case it might expand and give full contact. This could apply to
copper runs on the board, contact points from the board to the memory
slot, contact from the slot to the memory.


-- 
Regards, Ulf.

---------------------------------------------------------------------
Ulf Zimmermann, 1525 Pacific Ave., Alameda, CA-94501, #: 510-865-0204
You can find my resume at: http://www.Alameda.net/~ulf/resume.html

Andriy Gapon

2009-Feb-02 03:36 UTC

head link

problem with "cold" hardware? [Was: panic in callout_reset: bad link in callwheel]

on 28/01/2009 21:22 Andrew Snow said the following:> Andriy Gapon wrote:
>> Previously I heard about problems with hardware running hot, but not
>> with it being "cold". I put the word in quotes, because the
system is in
>> a room with normal room temperature.
>>
>> Any guesses what hardware part might be acting up like this?
> 
> Power supply.  Give all the capacitors a visual check.  Or you may be
> drawing too much power from your rated supply.
Right on the target. I opened the PSU after replacing it, visually it
looks OK (too me), nevertheless I have verified that the fault was in it.

Thank you and everybody who helped!



-- 
Andriy Gapon

Maybe Matching Threads

Search for more seemingly similar threads

freebsd stable - Jan 2009 - panic in callout_reset: bad link in callwheel

panic in callout_reset: bad link in callwheel

problem with "cold" hardware? [Was: panic in callout_reset: bad link in callwheel]

problem with "cold" hardware? [Was: panic in callout_reset: bad link in callwheel]

problem with "cold" hardware? [Was: panic in callout_reset: bad link in callwheel]

problem with "cold" hardware? [Was: panic in callout_reset: bad link in callwheel]

Maybe Matching Threads