thr3ads.net - freebsd stable - Trap 12 in vm_page_alloc

If this information is useful, please help other people find it:
Share via:

Garrett Wollman

2018-Nov-26 04:35 UTC

Trap 12 in vm_page_alloc_after()

<<On Mon, 19 Nov 2018 07:09:44 +0200, Konstantin Belousov <kostikbel at
gmail.com> said:
> On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote:
>> Has anyone seen this before?  It's on a busy NFS server, but
hasn't
>> been observed on any of our other NFS servers.
>> 
>>
------------------------------------------------------------------------
>> Fatal trap 12: page fault while in kernel mode
>> --- trap 0xc, rip = 0xffffffff809a903d, rsp = 0xfffffe17eb8d0710, rbp =
0xfffffe17eb8d0750 ---
>> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame
0xfffffe17eb8d0750
> What is the line number for vm_page_alloc_after+0x15d ?
> Do you have NUMA enabled on 11 ?
If gdb is to be believed, the trap is at line 1687:

        /*
         *  At this point we had better have found a good page.
         */
        KASSERT(m != NULL, ("missing page"));
        free_count = vm_phys_freecnt_adj(m, -1);>>>>>>  if ((m->flags & PG_ZERO) != 0)                vm_page_zero_count--;
        mtx_unlock(&vm_page_queue_free_mtx);
        vm_page_alloc_check(m);

The faulting instruction is:

0xffffffff809a903d <vm_page_alloc_after+349>:   testb  $0x8,0x5a(%r14)

There are no options matching /numa/i in the configuration.  (This is
a non-debugging configuration so the KASSERT is inoperative, I
assume.)  I have about a dozen other servers with the same kernel and
they're not crashing, but obviously they all have different loads and
sets of active clients.

-GAWollman

Mark Johnston

2018-Nov-29 00:19 UTC

head link

Trap 12 in vm_page_alloc_after()

On Sun, Nov 25, 2018 at 11:35:30PM -0500, Garrett Wollman
wrote:> <<On Mon, 19 Nov 2018 07:09:44 +0200, Konstantin Belousov
<kostikbel at gmail.com> said:
> 
> > On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote:
> >> Has anyone seen this before?  It's on a busy NFS server, but
hasn't
> >> been observed on any of our other NFS servers.
> >> 
> >>
------------------------------------------------------------------------
> >> Fatal trap 12: page fault while in kernel mode
> 
> >> --- trap 0xc, rip = 0xffffffff809a903d, rsp = 0xfffffe17eb8d0710,
rbp = 0xfffffe17eb8d0750 ---
> >> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame
0xfffffe17eb8d0750
> 
> > What is the line number for vm_page_alloc_after+0x15d ?
> > Do you have NUMA enabled on 11 ?
> 
> If gdb is to be believed, the trap is at line 1687:
> 
>         /*
>          *  At this point we had better have found a good page.
>          */
>         KASSERT(m != NULL, ("missing page"));
>         free_count = vm_phys_freecnt_adj(m, -1);
> >>>>>>  if ((m->flags & PG_ZERO) != 0)
>                 vm_page_zero_count--;
>         mtx_unlock(&vm_page_queue_free_mtx);
>         vm_page_alloc_check(m);
> 
> The faulting instruction is:
> 
> 0xffffffff809a903d <vm_page_alloc_after+349>:   testb 
$0x8,0x5a(%r14)
> 
> There are no options matching /numa/i in the configuration.  (This is
> a non-debugging configuration so the KASSERT is inoperative, I
> assume.)  I have about a dozen other servers with the same kernel and
> they're not crashing, but obviously they all have different loads and
> sets of active clients.
If you're using a Skylake, I suspect that you can set the
hw.skz63_enable tunable to 0 as a workaround, assuming you're not using
any code that relies on Intel TSX.  (I don't think there's anything in
the base system that does.)  There are some details in
https://reviews.freebsd.org/D18374

freebsd stable - Nov 2018 - Trap 12 in vm_page_alloc_after()

Trap 12 in vm_page_alloc_after()

Trap 12 in vm_page_alloc_after()