thr3ads.net - Pkg xen devel - [Pkg-xen-devel] Bug#880554: Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64 [Feb 2018]

If this information is useful, please help other people find it:
Share via:

Valentin Vidic

2018-Feb-27 19:22 UTC

[Pkg-xen-devel] Bug#880554: Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

On Tue, Feb 27, 2018 at 05:05:06PM +0100, Hans van Kranenburg
wrote:> ad 1. Christian, Valentin, can you give more specific info that can help
> someone else to set up a test environment to trigger > 32 values.
I can't touch the original VM that had this issue and tried to
reproduce on another host with recent stretch kernels but without
success.  The maximum number I can get now is nr_frames=11.

Another info that I forgot to mention before is that my VMs were
using DRBD disks. Since DRBD acts like a slow disk it could cause
IO requests to pile up and hit the limit faster.

Since I can't reproduce it easily anymore I suspect something was
fixed in the meanwhile.  My original report was for 4.9.30-2+deb9u2
and since then there seems to be a number of fixes that could be
related to this:

linux (4.9.65-3) stretch; urgency=medium
  * xen/time: do not decrease steal time after live migration on xen
linux (4.9.65-1) stretch; urgency=medium
    - swiotlb-xen: implement xen_swiotlb_dma_mmap callback
    - xen-netback: Use GFP_ATOMIC to allocate hash
    - xen/gntdev: avoid out of bounds access in case of partial
      gntdev_mmap()
    - xen/manage: correct return value check on xenbus_scanf()
    - xen: don't print error message in case of missing Xenstore entry
    - xen/netback: set default upper limit of tx/rx queues to 8
linux (4.9.47-1) stretch; urgency=medium
    - nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
    - nvme: avoid to use blk_mq_abort_requeue_list()
    - efi: Don't issue error message when booted under Xen
    - xen/privcmd: Support correctly 64KB page granularity when mapping
      memory
    - xen/blkback: fix disconnect while I/Os in flight
    - xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
    - xen/blkback: don't free be structure too early
    - xen-netback: fix memory leaks on XenBus disconnect
    - xen-netback: protect resource cleaning on XenBus disconnect
    - swiotlb-xen: update dev_addr after swapping pages
    - xen-netfront: Fix Rx stall during network stress and OOM
    - [x86] mm: Fix flush_tlb_page() on Xen
    - xen-netfront: Rework the fix for Rx stall during OOM and network
      stress
    - xen/scsiback: Fix a TMR related use-after-free
    - [x86] xen: allow userspace access during hypercalls
    - [armhf] Xen: Zero reserved fields of xatp before making hypervisor
      call
    - xen-netback: correctly schedule rate-limited queues
    - nbd: blk_mq_init_queue returns an error code on failure, not NULL
    - xen: fix bio vec merging (CVE-2017-12134) (Closes: #866511)
    - blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL
    - xen-blkfront: use a right index when checking requests
linux (4.9.30-2+deb9u4) stretch-security; urgency=high
  * xen: fix bio vec merging (CVE-2017-12134) (Closes: #866511)
linux (4.9.30-2+deb9u3) stretch-security; urgency=high
  * xen-blkback: don't leak stack data via response ring
  * (CVE-2017-10911)
  * mqueue: fix a use-after-free in sys_mq_notify() (CVE-2017-11176)

In fact the original big VM with this problem runs happily with:

  domid=1: nr_frames=11, max_nr_frames=256

so it is quite possible raising the limit is not needed anymore
with the latest stretch kernels.

If no-one else can reproduce this anymore I suggest you close the
issue but include the xen-diag tool in the updated package.  That
way if someone reports the problem again it should be easy to detect.

-- 
Valentin

Pkg xen devel - Feb 2018 - Bug#880554: Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

[Pkg-xen-devel] Bug#880554: Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64