Eugene Grosbein
2017-Jan-13 19:21 UTC
stable/11 debugging kernel unable to produce crashdump
Hi! I'm struggling to debug a panic in 11.0-STABLE/i386 that successfully produces crashdump but I want more information. So I've rebuilt my custom kernel to include options INVARIANTS, WITNESS and DEADLKRES. Now any panic results in quick unclean reboot without crashdump generation. Serial console shows: Script started on Sat Jan 14 02:03:16 2017 Command: cu -l cuau0 -s 115200 Connected root at gw:~ # sysctl debug.kdb.panic=1 debug.kdb.panic:panic: kdb_sysctl_panic KDB: stack backtrace: db_trace_self_wrapper(e8bc2ae8,0,e8bc2ae8,e8bc2a48,c06b4af4,...) at 0xc04c457b = db_trace_self_wrapper+0x2b/frame 0xe8bc2a20 vpanic(c0a4916e,e8bc2a54,e8bc2a54,e8bc2a5c,c06e881f,...) at 0xc06b4a7f = vpanic+0x6f/frame 0xe8bc2a34 panic(c0a4916e,1,c0ae7e48,e8bc2a88,c06bfdd3,...) at 0xc06b4af4 = panic+0x14/frame 0xe8bc2a48 kdb_sysctl_panic(c0ae7e48,0,0,0,e8bc2ae8) at 0xc06e881f = kdb_sysctl_panic+0x4f/frame 0xe8bc2a5c sysctl_root_handler_locked(0,0,e8bc2ae8,e8bc2aa8) at 0xc06bfdd3 = sysctl_root_handler_locked+0x83/frame 0xe8bc2a88 sysctl_root(0,e8bc2ae8) at 0xc06bf744 = sysctl_root+0x144/frame 0xe8bc2ad8 userland_sysctl(c759f9c0,e8bc2b60,3,0,0,0,bfbfdc5c,4,e8bc2bc0,0) at 0xc06bfb9d = userland_sysctl+0x12d/frame 0xe8bc2b30 sys___sysctl(c759f9c0,e8bc2c00) at 0xc06bfa32 = sys___sysctl+0x52/frame 0xe8bc2bd0 syscall(e8bc2ce8) at 0xc0980801 = syscall+0x2a1/frame 0xe8bc2cdc Xint0x80_syscall() at 0xc096e45e = Xint0x80_syscall+0x2e/frame 0xe8bc2cdc --- syscall (202, FreeBSD ELF32, sys___sysctl), eip = 0x2818541b, esp = 0xbfbfdbc8, ebp = 0xbfbfdbf0 --- Uptime: 4m36s panic: malloc: called with spinlock or critical section held Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382 Uptime: 4m36s panic: /boot/config: -D FreeBSD/x86 boot Default: 0:ad(0,a)/boot/loader boot: Also, I get a Lock Order Reversal at boot time: lock order reversal: 1st 0xc6c2aa30 ufs (ufs) @ /home/src/sys/kern/vfs_subr.c:2523 2nd 0xd99a3b80 bufwait (bufwait) @ /home/src/sys/ufs/ffs/ffs_vnops.c:277 3rd 0xc6c53a30 ufs (ufs) @ /home/src/sys/kern/vfs_subr.c:2523 stack backtrace: #0 0xc0700442 at witness_debugger+0x62 #1 0xc0700386 at witness_checkorder+0xb56 #2 0xc069757c at __lockmgr_args+0x5bc #3 0xc0908db2 at ffs_lock+0x62 #4 0xc099f63f at VOP_LOCK1_APV+0xbf #5 0xc0762b8e at _vn_lock+0x8e #6 0xc0755477 at vget+0x57 #7 0xc0749783 at vfs_hash_get+0xb3 #8 0xc0904da7 at ffs_vgetf+0x27 #9 0xc08fc47d at softdep_sync_buf+0xa1d #10 0xc09099b1 at ffs_syncvnode+0x281 #11 0xc090758c at ffs_sync+0x19c #12 0xc074eb93 at dounmount+0x583 #13 0xc074e529 at sys_unmount+0x229 #14 0xc0980801 at syscall+0x2a1 #15 0xc096e45e at Xint0x80_syscall+0x2e Done This is r311924 with custom kernel having SCHED_4BSD instead of default SCHED_ULE: # GW kernel config # CPU Geode LX 800 options INCLUDE_CONFIG_FILE machine i386 cpu I586_CPU cpu I686_CPU options CPU_GEODE options CPU_SOEKRIS ident GW maxusers 0 options SCHED_4BSD options INET #InterNETworking options SCTP options FFS #Berkeley Fast Filesystem options SOFTUPDATES #Enable FFS soft updates support options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KTRACE #ktrace(1) support options UFS_GJOURNAL # Enable gjournal-based UFS journaling options GEOM_JOURNAL options GEOM_LABEL options GEOM_CACHE options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options UDF options LIBICONV options CD9660_ICONV options MSDOSFS_ICONV options UDF_ICONV options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options P1003_1B_SEMAPHORES # POSIX-style semaphores options PRINTF_BUFR_SIZE=512 # Prevent printf output being interspersed. options KBD_INSTALL_CDEV # install a CDEV entry in /dev options NFSCL # Network Filesystem Client options NFSD # Network Filesystem Server options NFSLOCKD # Network Lock Manager options NFS_ROOT # NFS usable as /, requires NFSCLIENT options COMPAT_LINUX options PROCFS # Process filesystem (requires PSEUDOFS) options LINPROCFS options PSEUDOFS # Pseudo-filesystem framework device eisa device pci device atkbdc device atkbd options ATKBD_DFLT_KEYMAP # specify the built-in keymap makeoptions ATKBD_DFLT_KEYMAP=ru.koi8-r device vga device sc options SC_HISTORY_SIZE=4000 options SC_DFLT_FONT # compile font in makeoptions SC_DFLT_FONT=cp866 device uart device speaker device miibus # MII bus support device vr device loop # Network loopback device random device ether # Ethernet support device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md device gif # IPv6 and IPv4 tunneling device vlan device firewire # FireWire bus code device sbp # SCSI over FireWire (Requires scbus and da) device fwe # Ethernet over FireWire (non-standard!) device fwip # IP over FireWire (RFC 2734,3146) device dcons # Dumb console driver device dcons_crom # Configuration ROM for dcons device bpf #Berkeley packet filter device ata device ohci device ehci device usb # USB Bus (required) device umass # Disks/Mass storage - Requires scbus and da device ulpt options USB_VERBOSE device scbus # SCSI bus (required for SCSI) device da # Direct Access (disks) device pass # Passthrough device (direct SCSI access) options LIBALIAS options IPFIREWALL options IPDIVERT options IPFIREWALL_NAT options DUMMYNET makeoptions DEBUG=-g options NETGRAPH options NETGRAPH_BPF options NETGRAPH_ECHO options NETGRAPH_ETHER options NETGRAPH_IFACE options NETGRAPH_EIFACE options NETGRAPH_IPFW options NETGRAPH_SOCKET options NETGRAPH_KSOCKET options NETGRAPH_L2TP options NETGRAPH_TEE options NETGRAPH_NAT options NETGRAPH_MPPC_ENCRYPTION options NETGRAPH_TCPMSS options NETGRAPH_PPTPGRE options NETGRAPH_PPP options NETGRAPH_PPPOE options NETGRAPH_VJC device crypto device glxsb device enc options IPSEC #option IPSEC_SUPPORT options IPSEC_FILTERTUNNEL device cpuctl options ROUTETABLES=16 # Debugging kernel options KDB # Enable kernel debugger support. options KDB_UNATTENDED # Enable kernel debugger support. options KDB_TRACE # Enable kernel debugger support. options KDTRACE_HOOKS options DDB # Support DDB. options DDB_NUMSYM options GDB # Support remote GDB. options ALT_BREAK_TO_DEBUGGER options INVARIANTS # Enable calls of extra sanity checking options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS options DEADLKRES options WITNESS options WITNESS_SKIPSPIN device wlan # 802.11 support device wlan_wep # 802.11 WEP support device wlan_ccmp # 802.11 CCMP support device wlan_tkip # 802.11 TKIP support device wlan_amrr # AMRR transmit rate control algorithm device ath # Atheros pci/cardbus NIC's device ath_hal # pci/cardbus chip support device ath_pci options AH_SUPPORT_AR5416 # enable AR5416 tx/rx descriptors device ath_rate_sample # SampleRate tx rate control for ath Can anyone verify if debugging kernel fails to generate crashdump in stable/11? Egene Grosbein
Mark Johnston
2017-Jan-13 19:37 UTC
stable/11 debugging kernel unable to produce crashdump
On Sat, Jan 14, 2017 at 02:21:23AM +0700, Eugene Grosbein wrote:> Hi! > > I'm struggling to debug a panic in 11.0-STABLE/i386 that successfully produces crashdump > but I want more information. So I've rebuilt my custom kernel to include > options INVARIANTS, WITNESS and DEADLKRES. Now any panic results in quick unclean reboot > without crashdump generation. Serial console shows: > > Script started on Sat Jan 14 02:03:16 2017 > Command: cu -l cuau0 -s 115200 > Connected > > root at gw:~ # sysctl debug.kdb.panic=1 > debug.kdb.panic:panic: kdb_sysctl_panic > KDB: stack backtrace: > db_trace_self_wrapper(e8bc2ae8,0,e8bc2ae8,e8bc2a48,c06b4af4,...) at 0xc04c457b = db_trace_self_wrapper+0x2b/frame 0xe8bc2a20 > vpanic(c0a4916e,e8bc2a54,e8bc2a54,e8bc2a5c,c06e881f,...) at 0xc06b4a7f = vpanic+0x6f/frame 0xe8bc2a34 > panic(c0a4916e,1,c0ae7e48,e8bc2a88,c06bfdd3,...) at 0xc06b4af4 = panic+0x14/frame 0xe8bc2a48 > kdb_sysctl_panic(c0ae7e48,0,0,0,e8bc2ae8) at 0xc06e881f = kdb_sysctl_panic+0x4f/frame 0xe8bc2a5c > sysctl_root_handler_locked(0,0,e8bc2ae8,e8bc2aa8) at 0xc06bfdd3 = sysctl_root_handler_locked+0x83/frame 0xe8bc2a88 > sysctl_root(0,e8bc2ae8) at 0xc06bf744 = sysctl_root+0x144/frame 0xe8bc2ad8 > userland_sysctl(c759f9c0,e8bc2b60,3,0,0,0,bfbfdc5c,4,e8bc2bc0,0) at 0xc06bfb9d = userland_sysctl+0x12d/frame 0xe8bc2b30 > sys___sysctl(c759f9c0,e8bc2c00) at 0xc06bfa32 = sys___sysctl+0x52/frame 0xe8bc2bd0 > syscall(e8bc2ce8) at 0xc0980801 = syscall+0x2a1/frame 0xe8bc2cdc > Xint0x80_syscall() at 0xc096e45e = Xint0x80_syscall+0x2e/frame 0xe8bc2cdc > --- syscall (202, FreeBSD ELF32, sys___sysctl), eip = 0x2818541b, esp = 0xbfbfdbc8, ebp = 0xbfbfdbf0 --- > Uptime: 4m36s > panic: malloc: called with spinlock or critical section held > Uptime: 4m36s > panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /home/src/sys/cam/ata/ata_da.c:3382I suspect that this is because we only stop the scheduler upon a panic if SMP is configured. Can you retest with the patch below applied? Index: sys/kern/kern_shutdown.c ==================================================================--- sys/kern/kern_shutdown.c (revision 312082) +++ sys/kern/kern_shutdown.c (working copy) @@ -713,6 +713,7 @@ CPU_CLR(PCPU_GET(cpuid), &other_cpus); stop_cpus_hard(other_cpus); } +#endif /* * Ensure that the scheduler is stopped while panicking, even if panic @@ -719,7 +720,6 @@ * has been entered from kdb. */ td->td_stopsched = 1; -#endif bootopt = RB_AUTOBOOT; newpanic = 0;