Hi all,
I run a following server:
- Supermicro 6047R-E1R36L
- 96 GB RAM
- 1x INTEL CPU E5-2640 v2 @ 2.00GHz
- FreeBSD 10.3-RELEASE-p11
Drive for OS:
- HW RAID1: 2x KINGSTON SV300S37A120G
zpool1:
- 9x WD RED 4TB ; 9x HGST HUS726040ALA610 @ raidz2
- log: mirrored Intel 730 SSD
- cache: single Intel 730 SSD
zpool2:
- 6x HGST HUH721010ALN604 @raidz2
- 6x HGST HUH721010ALN604 @raidz2
The servers works as NFS filer for VMWare ESXi servers. It's been
working flawlessly for several years.
On Mid-December 2019 I performed upgrade from 10.3-RELEASE to
12.1-RELEASE-p1. After 60 days of operating, the system rebooted with
core dump:
Unread portion of the kernel message buffer:
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address??? = 0x410
fault code??? ??? = supervisor read data, page not present
instruction pointer??? = 0x20:0xffffffff80bda098
stack pointer??? ??????? = 0x28:0xfffffe0102085e00
frame pointer??? ??????? = 0x28:0xfffffe0102085eb0
code segment??? ??? = base 0x0, limit 0xfffff, type 0x1b
??? ??? ??? = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags??? = interrupt enabled, resume, IOPL = 0
current process??? ??? = 1067 (nfsd: service)
trap number??? ??? = 12
panic: page fault
cpuid = 2
time = 1581660610
KDB: stack backtrace:
#0 0xffffffff80c1d297 at kdb_backtrace+0x67
#1 0xffffffff80bd05cd at vpanic+0x19d
#2 0xffffffff80bd0423 at panic+0x43
#3 0xffffffff810a7dcc at trap_fatal+0x39c
#4 0xffffffff810a7e19 at trap_pfault+0x49
#5 0xffffffff810a740f at trap+0x29f
#6 0xffffffff81081a0c at calltrap+0x8
#7 0xffffffff828f79fb at zio_change_priority+0x12b
#8 0xffffffff82841dc5 at arc_read+0xf5
#9 0xffffffff8284f5c8 at dbuf_read+0x728
#10 0xffffffff8285ac23 at dmu_buf_hold_array_by_dnode+0x1f3
#11 0xffffffff8285c757 at dmu_read_uio_dnode+0x37
#12 0xffffffff8285c88b at dmu_read_uio_dbuf+0x3b
#13 0xffffffff82926772 at zfs_freebsd_read+0x2c2
#14 0xffffffff8122a1b6 at VOP_READ_APV+0x76
#15 0xffffffff80b014e3 at nfsvno_read+0x373
#16 0xffffffff80af6366 at nfsrvd_read+0x5c6
#17 0xffffffff80adcee8 at nfsrvd_dorpc+0x658
Uptime: 60d11h16m56s
Dumping 11451 out of 98232
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
0xffffffff80bd039b in doadump (textdump=<optimized out>) at
/usr/src/sys/kern/kern_shutdown.c:365
365??? ??? if (dumping)
(kgdb) list *0xffffffff80bda098
0xffffffff80bda098 is in _sx_xlock_hard (/usr/src/sys/sys/systm.h:262).
257??? ??? struct thread_lite *td;
258
259??? ??? td = (struct thread_lite *)curthread;
260??? ??? KASSERT(td->td_critnest != 0,
261??? ??? ??? ("critical_exit: td_critnest == 0"));
262??? ??? __compiler_membar();
263??? ??? td->td_critnest--;
264??? ??? __compiler_membar();
265??? ??? if (__predict_false(td->td_owepreempt))
266??? ??? ??? critical_exit_preempt();
(kgdb) backtrace
#0? 0xffffffff80bd039b in doadump (textdump=<optimized out>) at
/usr/src/sys/kern/kern_shutdown.c:365
#1? 0xffffffff80bd03c5 in doadump (textdump=<optimized out>) at
/usr/src/sys/kern/kern_shutdown.c:371
#2? 0xffffffff80bd01c8 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:450
#3? 0xffffffff80bd0629 in vpanic (fmt=<optimized out>, ap=<optimized
out>) at /usr/src/sys/kern/kern_shutdown.c:873
#4? 0xffffffff80bd0423 in panic (fmt=<unavailable>) at
/usr/src/sys/kern/kern_shutdown.c:803
#5? 0xffffffff810a7dcc in trap_fatal (frame=0xfffffe0102085d40,
eva=1040) at /usr/src/sys/amd64/amd64/trap.c:943
#6? 0xffffffff810a7e19 in trap_pfault (frame=0xfffffe0102085d40,
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:767
#7? 0xffffffff810a740f in trap (frame=0xfffffe0102085d40) at
/usr/src/sys/amd64/amd64/trap.c:443
#8? 0xffffffff81081a0c in alltraps_pushregs_no_rax () at
/usr/src/sys/amd64/amd64/exception.S:273
#9? 0x0000000000000000 in ?? ()
The same situation happened 2 weeks later.
I have coredump saved from 14th of February.
I have temporarily disabled? it from operations and test RAM using
Memtest 4.8.7, however after 12-13 hours of tests running no RAM issues
were reported so far.
Is there anything more I can try to debug using kdbg and how to find
root cause?
Best regards,
Marek
--
Marek Salwerowicz
MISAL-SYSTEM