Hi list
A 2.5" USB hard disk I recently got has been giving me a lot of trouble.
When using the disk, I routinely get panics or random data corruption.
This happens with two separate machines, both running 6-STABLE. I found
that one file residing on the disk, when read, always makes the kernel
panic.
While I know this smells of a hardware error (bad sector, reads
failing), the disk repeatedly passed badblocks' tests, both read-only
and read-write, with no errors. I am therefore thinking that this may
have something to do with FreeBSD's USB stack.
A kernel with no debugger simply reboots when it encounters the error,
without producing a crash dump. When KDB and DDB are compiled in, I end
up in the debugger prompt where "trace" points to a routine apparently
handling USB interrupts. Unfortunately, I have to run "call doadump"
to
get a crash dump, after which kgdb seems to show backtraces of the
doadump call, not of the original error.
I would really appreciate any help in debugging this problem. I have
debug kernels on both machines, have a working test case and am happy to
run any debugger commands required. The output of a kgdb backtrace is
attached, although I fear it's not of much use.
As a final note, the disk is 160GB in size, has a single UFS partition
and is GELI encrypted.
panic: vm_fault: fault on nofault entry, addr: db4f9000
KDB: enter: panic
panic: from debugger
Uptime: 30s
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0xdb4f9000
fault code = supervisor write, page not present
instruction pointer = 0x20:0xc06d7580
stack pointer = 0x28:0xde342464
frame pointer = 0x28:0xde342498
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = resume, IOPL = 0
current process = 21 (irq11: cbb0 bfe0+*)
Dumping 767 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 767MB (196270 pages) 751 735 719 703 687 671 655 639 623 607
591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319
303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15
#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r"
(td));
(kgdb) bt
#0 doadump () at pcpu.h:165
#1 0xc044d196 in db_fncall (dummy1=0, dummy2=0, dummy3=1999,
dummy4=0xde342294 "") at /usr/src/sys/ddb/db_command.c:492
#2 0xc044cf12 in db_command (last_cmdp=0xc07761a4, cmd_table=0x0,
aux_cmd_tablep=0xc07367e0, aux_cmd_tablep_end=0xc07367e4) at
/usr/src/sys/ddb/db_command.c:350
#3 0xc044d025 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
#4 0xc044f265 in db_trap (type=12, code=0) at
/usr/src/sys/ddb/db_main.c:222
#5 0xc0575f07 in kdb_trap (type=0, code=0, tf=0xde342424) at
/usr/src/sys/kern/subr_kdb.c:473
#6 0xc06d98db in trap_fatal (frame=0xde342424, eva=0) at
/usr/src/sys/i386/i386/trap.c:829
#7 0xc06d8ef4 in trap (frame {tf_fs = -567017464, tf_es = -1066532824,
tf_ds = -567017432,
tf_edi = -615542784, tf_esi = -402886656, tf_ebp = -567008104, tf_isp =
-567008176, tf_ebx = -1001486592, tf_edx = 0, tf_ecx = 1024, tf_eax =
-615563264, tf_trapno = 12, tf_err = 2, tf_eip = -1066568320, tf_cs =
32, tf_eflags = 589830, tf_esp = -1001501696, tf_ss = 0})
at /usr/src/sys/i386/i386/trap.c:270
#8 0xc06c376a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#9 0xc06d7580 in memcpy () at /usr/src/sys/i386/i386/support.s:681
Previous frame inner to this frame (corrupt stack?)