On Fri, Oct 19, 2018 at 01:10:15PM +0200, Sebastian Wojtczak
wrote:> Hi,
>
> I would like to report a kernel crash while dd on ssd drive.
>
> Just found that my PC crashed several times during below command:
> dd if=/dev/ada2 of=file_name bs=10m.
>
> I was trying to make an image from my ssd drive. Once dump file hit size
> 41G or 52G kernel crashes and reboot the system.
>
> Oct 18 12:30:11 username syslogd: kernel boot file is /boot/kernel/kernel
> Oct 18 12:30:11 username kernel:
> Oct 18 12:30:11 username kernel:
> Oct 18 12:30:11 username kernel: Fatal trap 12: page fault while in kernel
> mode
> Oct 18 12:30:11 username kernel: cpuid = 1; apic id = 01
> Oct 18 12:30:11 username kernel: fault virtual address = 0x5a
> Oct 18 12:30:11 username kernel: fault code = supervisor read
> data, page not present
> Oct 18 12:30:11 username kernel: instruction pointer >
0x20:0xffffffff80e67f6d
> Oct 18 12:30:11 username kernel: stack pointer >
0x28:0xfffffe084b408f40
> Oct 18 12:30:11 username kernel: frame pointer >
0x28:0xfffffe084b408f80
> Oct 18 12:30:11 username kernel: code segment = base 0x0, limit
> 0xfffff, type 0x1b
> Oct 18 12:30:11 username kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
> Oct 18 12:30:11 username kernel: processor eflags = interrupt
> enabled, resume, IOPL = 0
> Oct 18 12:30:11 username kernel: current process = 0
> (zio_write_issue_8)
> Oct 18 12:30:11 username kernel: trap number = 12
> Oct 18 12:30:11 username kernel: panic: page fault
> Oct 18 12:30:11 username kernel: cpuid = 1
> Oct 18 12:30:11 username kernel: KDB: stack backtrace:
> Oct 18 12:30:11 username kernel: #0 0xffffffff80b50087 at
kdb_backtrace+0x67
> Oct 18 12:30:11 username kernel: #1 0xffffffff80b099f7 at vpanic+0x177
> Oct 18 12:30:11 username kernel: #2 0xffffffff80b09873 at panic+0x43
> Oct 18 12:30:11 username kernel: #3 0xffffffff80fe105f at trap_fatal+0x35f
> Oct 18 12:30:11 username kernel: #4 0xffffffff80fe10b9 at trap_pfault+0x49
> Oct 18 12:30:11 username kernel: #5 0xffffffff80fe0887 at trap+0x2c7
> Oct 18 12:30:11 username kernel: #6 0xffffffff80fc04cc at calltrap+0x8
> Oct 18 12:30:11 username kernel: #7 0xffffffff80e56df2 at kmem_back+0xf2
> Oct 18 12:30:11 username kernel: #8 0xffffffff80e56cd0 at kmem_malloc+0x60
> Oct 18 12:30:11 username kernel: #9 0xffffffff80e4e752 at
> keg_alloc_slab+0xe2
> Oct 18 12:30:11 username kernel: #10 0xffffffff80e5118e at
> keg_fetch_slab+0x14e
> Oct 18 12:30:11 username kernel: #11 0xffffffff80e509a4 at
> zone_fetch_slab+0x64
> Oct 18 12:30:11 username kernel: #12 0xffffffff80e50a7f at zone_import+0x3f
> Oct 18 12:30:11 username kernel: #13 0xffffffff80e4d199 at
> uma_zalloc_arg+0x3d9
> Oct 18 12:30:11 username kernel: #14 0xffffffff832d2ab2 at
> zio_write_compress+0x1e2
> Oct 18 12:30:11 username kernel: #15 0xffffffff832d174c at zio_execute+0xac
> Oct 18 12:30:11 username kernel: #16 0xffffffff80b617e4 at
> taskqueue_run_locked+0x154
> Oct 18 12:30:11 username kernel: #17 0xffffffff80b62918 at
> taskqueue_thread_loop+0x98
> Oct 18 12:30:11 username kernel: Uptime: 5m50s
>
> One virtual machine is started with bhyve at startup but even if I shutdown
> it, same crash happen. Disabling vmm does not help but only extend time to
> crash during ssd dump.
>
> Current zfs setup is zraid on 3 (500GB) hdd drives with compress=on. Drive
> ada0 is not part of zraid and is not attached/mount what ever.
>
> Any help how to investigate it is appreciated.
The stack suggests a bug in the kmem_* KPI, but I'm having trouble
seeing the problem. In particular, the fault address suggests that we
crashed while testing (m->flags & PG_ZERO) == 0, but it shouldn't be
possible for m to be NULL there. My attempts to reproduce this on
12-CURRENT haven't yielded anything yet. Would you (or anyone else
seeing the problem) be willing to share a kernel dump? I'd need the
vmcore, the contents of /boot/kernel and /usr/lib/debug/boot/kernel.