Samuel Thibault
2019-Feb-08 23:16 UTC
[Pkg-xen-devel] [admin] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.
Hello, Hans van Kranenburg, le ven. 08 févr. 2019 20:18:44 +0100, a ecrit:> Upstream mailing list is at: > xen-devel at lists.xenproject.orgApparently it didn't get the mails, I guess because it's subscriber-only posting? I have now forwarded the mails.> Apparently, > xen at packages.debian.org > results in a message to > pkg-xen-devel at lists.alioth.debian.orgYes, since that's the maintainer of the package.> On 2/8/19 6:13 PM, Samuel Thibault wrote: > > > > Sacha, le ven. 08 févr. 2019 18:00:22 +0100, a ecrit: > >> On Debian GNU/Linux 9.7 (stretch) amd64, we have a bug on the last Xen > >> Hypervisor version: > >> > >> xen-hypervisor-4.8-amd64 4.8.5+shim4.10.2+xsa282 > > > > (Read: 4.8.5+shim4.10.2+xsa282-1+deb9u11) > > > >> The rollback on the previous package version corrected the problem: > >> > >> xen-hypervisor-4.8-amd64 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > > Since this is the first message arriving about this in my inbox, can you > explain what "the problem" is?I have forwarded the original mail: all VM I/O get stuck, and thus the VM becomes unusable.> > (Only the hypervisor needed to be downgraded to fix the issue) > > > >> The errors are on the domU a frozen file system until a kernel panic. > > Do you have a reproducable case that shows success with the previous Xen > hypervisor package and failure with the new one, while keeping all other > things the same?We have a production system which gets to hang within about a day. We don't know what exactly triggers the issue.> This seems like an upstream thing, because for 4.8, the Debian package > updates are almost exclusively shipping upstream stable udpates.Ok. Samuel
Hans van Kranenburg
2019-Feb-09 16:01 UTC
[Pkg-xen-devel] [Xen-devel] [admin] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.
Hi, On 2/9/19 12:16 AM, Samuel Thibault wrote:> > Hans van Kranenburg, le ven. 08 févr. 2019 20:18:44 +0100, a ecrit: >> [...] >> >> On 2/8/19 6:13 PM, Samuel Thibault wrote: >>> >>> Sacha, le ven. 08 févr. 2019 18:00:22 +0100, a ecrit: >>>> On Debian GNU/Linux 9.7 (stretch) amd64, we have a bug on the last Xen >>>> Hypervisor version: >>>> >>>> xen-hypervisor-4.8-amd64 4.8.5+shim4.10.2+xsa282 >>> >>> (Read: 4.8.5+shim4.10.2+xsa282-1+deb9u11) >>> >>>> The rollback on the previous package version corrected the problem: >>>> >>>> xen-hypervisor-4.8-amd64 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 >> >> Since this is the first message arriving about this in my inbox, can you >> explain what "the problem" is? > > I have forwarded the original mail: all VM I/O get stuck, and thus the > VM becomes unusable.These are in many cases the symptoms of running out of "grant frames". So let's verify first if this is the case or not. Your xen-utils-4.8 packages contains a program at /usr/lib/xen-4.8/bin/xen-diag that you can use in the dom0 to gather information. e.g. -# ./xen-diag gnttab_query_size 5 domid=5: nr_frames=11, max_nr_frames=32 If this nr_frames hits the max allowed, then randomly things will stall. This does not have to happen directly after domU boot, but it likely happens later, when disks/cpus are actually used. There is no useful message/hint at all in the domU kernel (yet) abuot this when it happens. Can you verify if this is happening? With Xen 4.8, you can add gnttab_max_frames=64 (or another number, but higher than the default 32) to the xen hypervisor command line and reboot. For Xen 4.11 which will be in Buster, the default is 64 and the way to configure higher values/limits for dom0 and domU have changed. There will be some text about this recurring problem in the README.Debian under known issues.>>> (Only the hypervisor needed to be downgraded to fix the issue) >>> >>>> The errors are on the domU a frozen file system until a kernel panic. >> >> Do you have a reproducable case that shows success with the previous Xen >> hypervisor package and failure with the new one, while keeping all other >> things the same? > > We have a production system which gets to hang within about a day. We > don't know what exactly triggers the issue. > >> This seems like an upstream thing, because for 4.8, the Debian package >> updates are almost exclusively shipping upstream stable udpates. > > Ok.Related: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554 Hans
Samuel Thibault
2019-Feb-09 16:35 UTC
[Pkg-xen-devel] [admin] [Xen-devel] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.
Hello, Hans van Kranenburg, le sam. 09 févr. 2019 17:01:55 +0100, a ecrit:> > I have forwarded the original mail: all VM I/O get stuck, and thus the > > VM becomes unusable. > > These are in many cases the symptoms of running out of "grant frames".Oh! That could be it indeed. I'm wondering what could be monopolizing them, though, and why +deb9u11 is affected while +deb9u10 is not. I'm afraid increasing the gnttab max size to 32 might just defer filling it up.> -# ./xen-diag gnttab_query_size 5 > domid=5: nr_frames=11, max_nr_frames=32The current value is 31 over max 32 indeed.> With Xen 4.8, you can add gnttab_max_frames=64 (or another number, but > higher than the default 32) to the xen hypervisor command line and reboot.admin@: I made the modification in the grub config. We can probably try to reboot with the newer hypervisor, and monitor that value. Samuel