Up front disclaimer: I may very well be wrong on this..
This looks like a pmap bug I ran into a recently on a pre-release of 8.0.
It's
still present in stock 8.0, but I have not looked at 8.1 (see disclaimer).
The bug was caused by a combination of a shortcut in pmap_unuse_pt() which
didn't wipe pte entries when un-mapping VA's from kernel space and the
VM
entering multi-page allocations one at a time.
Littering of the page table page by pmap_unuse_pt() does not cause problems by
itself because the pages are sort of locked in place by the reservation.
It may however fool the promotion check at the end of pmap_enter() in the case
the kernel maps a multi-page allocation where; 1) cause the backing
reservation to become fully populated, and 2) the backing pages has been used
(mapped and unmapped) earlier.
>From pmap.c:
/*
* If both the page table page and the reservation are fully
* populated, then attempt promotion.
*/
if ((mpte == NULL || mpte->wire_count == NPTEPG) &&
pg_ps_enabled && vm_reserv_level_iffullpop(m) == 0)
pmap_promote_pde(pmap, pde, va);
When pmap_enter() of the first page in the allocation the following holds true:
- mpte is null for kernel space VA's
- pg_ps_enabled is set (otherwise there would be no 2M maps)
- vm_reserv_level_iffullpop() will return 0
, so pmap_promote_pde() will be called. It may succeed in creating a 2MB
map because of the litter pmap_unuse_pt() left behind (helped by the fact that
other PTE attributes likely are the same because this is only used by the
kernel)
When pmap_enter() of the second page in the large allocation, the assert
reported triggers. This fits the trace given
db> bt
Tracing pid 0 tid 0 td 0xffffffff80c67140
kdb_enter() at kdbenter+0x3d
panic() at panic+0x17b
pmap_enter() at pmap_enter+0x641
kmem_malloc() at kmem_malloc+0x1b5
uma_large_malloc() at uma_large_malloc+0x4a
malloc() at malloc+0xd7
acpi_alloc_wakeup_handler() at acpi_alloc_wakeup_handler+0x82
mi_startup() at mi_startup+0x59
btext() at btext+0x2c
db>
My work-around was to add a check in pmap_enter() on whether the page entered
already was correctly represented in the page table and if it was, leave
things as they were. It had the nice side effect of limiting TLB shoot downs
from page faults on same page from multiple CPUs in a threaded application.
I considered changing pmap_unuse_pt() to clear the PTE entries for kernel
space VA's but figured it would require TLB shoot downs too, which
wasn't
exactly what I wanted.
Another idea was to make a pmap_enter_multiple() function to handle this case
correctly by not trying to promote until all pages has been entered.
Of course, turning off auto-promotion by setting vm.pmap.pg_ps_enabled to 0
also makes the problem go away, at the cost of less TLB coverage.
Hope this was useful (helped more that it confused).
Kurt A