Valentin Vidic
2018-Feb-27 19:22 UTC
[Pkg-xen-devel] Bug#880554: Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
On Tue, Feb 27, 2018 at 05:05:06PM +0100, Hans van Kranenburg wrote:> ad 1. Christian, Valentin, can you give more specific info that can help > someone else to set up a test environment to trigger > 32 values.I can't touch the original VM that had this issue and tried to reproduce on another host with recent stretch kernels but without success. The maximum number I can get now is nr_frames=11. Another info that I forgot to mention before is that my VMs were using DRBD disks. Since DRBD acts like a slow disk it could cause IO requests to pile up and hit the limit faster. Since I can't reproduce it easily anymore I suspect something was fixed in the meanwhile. My original report was for 4.9.30-2+deb9u2 and since then there seems to be a number of fixes that could be related to this: linux (4.9.65-3) stretch; urgency=medium * xen/time: do not decrease steal time after live migration on xen linux (4.9.65-1) stretch; urgency=medium - swiotlb-xen: implement xen_swiotlb_dma_mmap callback - xen-netback: Use GFP_ATOMIC to allocate hash - xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap() - xen/manage: correct return value check on xenbus_scanf() - xen: don't print error message in case of missing Xenstore entry - xen/netback: set default upper limit of tx/rx queues to 8 linux (4.9.47-1) stretch; urgency=medium - nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() - nvme: avoid to use blk_mq_abort_requeue_list() - efi: Don't issue error message when booted under Xen - xen/privcmd: Support correctly 64KB page granularity when mapping memory - xen/blkback: fix disconnect while I/Os in flight - xen/blkback: don't use xen_blkif_get() in xen-blkback kthread - xen/blkback: don't free be structure too early - xen-netback: fix memory leaks on XenBus disconnect - xen-netback: protect resource cleaning on XenBus disconnect - swiotlb-xen: update dev_addr after swapping pages - xen-netfront: Fix Rx stall during network stress and OOM - [x86] mm: Fix flush_tlb_page() on Xen - xen-netfront: Rework the fix for Rx stall during OOM and network stress - xen/scsiback: Fix a TMR related use-after-free - [x86] xen: allow userspace access during hypercalls - [armhf] Xen: Zero reserved fields of xatp before making hypervisor call - xen-netback: correctly schedule rate-limited queues - nbd: blk_mq_init_queue returns an error code on failure, not NULL - xen: fix bio vec merging (CVE-2017-12134) (Closes: #866511) - blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL - xen-blkfront: use a right index when checking requests linux (4.9.30-2+deb9u4) stretch-security; urgency=high * xen: fix bio vec merging (CVE-2017-12134) (Closes: #866511) linux (4.9.30-2+deb9u3) stretch-security; urgency=high * xen-blkback: don't leak stack data via response ring * (CVE-2017-10911) * mqueue: fix a use-after-free in sys_mq_notify() (CVE-2017-11176) In fact the original big VM with this problem runs happily with: domid=1: nr_frames=11, max_nr_frames=256 so it is quite possible raising the limit is not needed anymore with the latest stretch kernels. If no-one else can reproduce this anymore I suggest you close the issue but include the xen-diag tool in the updated package. That way if someone reports the problem again it should be easy to detect. -- Valentin