thr3ads.net - Pkg xen devel - [Pkg-xen-devel] [Xen-devel] [admin] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds. [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Samuel Thibault

2019-Feb-08 23:16 UTC

[Pkg-xen-devel] [admin] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.

Hello,

Hans van Kranenburg, le ven. 08 févr. 2019 20:18:44 +0100, a
ecrit:> Upstream mailing list is at:
>   xen-devel at lists.xenproject.org
Apparently it didn't get the mails, I guess because it's subscriber-only
posting? I have now forwarded the mails.
> Apparently,
>   xen at packages.debian.org
> results in a message to
>   pkg-xen-devel at lists.alioth.debian.org
Yes, since that's the maintainer of the package.
> On 2/8/19 6:13 PM, Samuel Thibault wrote:
> > 
> > Sacha, le ven. 08 févr. 2019 18:00:22 +0100, a ecrit:
> >> On  Debian GNU/Linux 9.7 (stretch) amd64, we have a bug on the
last Xen
> >> Hypervisor version:
> >>
> >>     xen-hypervisor-4.8-amd64 4.8.5+shim4.10.2+xsa282
> > 
> > (Read: 4.8.5+shim4.10.2+xsa282-1+deb9u11)
> > 
> >> The rollback on the previous package version corrected the
problem:
> >>
> >>     xen-hypervisor-4.8-amd64
4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10
> 
> Since this is the first message arriving about this in my inbox, can you
> explain what "the problem" is?
I have forwarded the original mail: all VM I/O get stuck, and thus the
VM becomes unusable.
> > (Only the hypervisor needed to be downgraded to fix the issue)
> > 
> >> The errors are on the domU a frozen file system until a kernel
panic.
> 
> Do you have a reproducable case that shows success with the previous Xen
> hypervisor package and failure with the new one, while keeping all other
> things the same?
We have a production system which gets to hang within about a day. We
don't know what exactly triggers the issue.
> This seems like an upstream thing, because for 4.8, the Debian package
> updates are almost exclusively shipping upstream stable udpates.
Ok.

Samuel

Hans van Kranenburg

2019-Feb-09 16:01 UTC

head link

[Pkg-xen-devel] [Xen-devel] [admin] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.

Hi,

On 2/9/19 12:16 AM, Samuel Thibault wrote:> 
> Hans van Kranenburg, le ven. 08 févr. 2019 20:18:44 +0100, a ecrit:
>> [...]
>>
>> On 2/8/19 6:13 PM, Samuel Thibault wrote:
>>>
>>> Sacha, le ven. 08 févr. 2019 18:00:22 +0100, a ecrit:
>>>> On  Debian GNU/Linux 9.7 (stretch) amd64, we have a bug on the
last Xen
>>>> Hypervisor version:
>>>>
>>>>     xen-hypervisor-4.8-amd64 4.8.5+shim4.10.2+xsa282
>>>
>>> (Read: 4.8.5+shim4.10.2+xsa282-1+deb9u11)
>>>
>>>> The rollback on the previous package version corrected the
problem:
>>>>
>>>>     xen-hypervisor-4.8-amd64
4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10
>>
>> Since this is the first message arriving about this in my inbox, can
you
>> explain what "the problem" is?
> 
> I have forwarded the original mail: all VM I/O get stuck, and thus the
> VM becomes unusable.
These are in many cases the symptoms of running out of "grant frames".
So let's verify first if this is the case or not.

Your xen-utils-4.8 packages contains a program at
/usr/lib/xen-4.8/bin/xen-diag that you can use in the dom0 to gather
information.

e.g.

  -# ./xen-diag  gnttab_query_size 5
  domid=5: nr_frames=11, max_nr_frames=32

If this nr_frames hits the max allowed, then randomly things will stall.
This does not have to happen directly after domU boot, but it likely
happens later, when disks/cpus are actually used. There is no useful
message/hint at all in the domU kernel (yet) abuot this when it happens.

Can you verify if this is happening?

With Xen 4.8, you can add gnttab_max_frames=64 (or another number, but
higher than the default 32) to the xen hypervisor command line and reboot.

For Xen 4.11 which will be in Buster, the default is 64 and the way to
configure higher values/limits for dom0 and domU have changed. There
will be some text about this recurring problem in the README.Debian
under known issues.
>>> (Only the hypervisor needed to be downgraded to fix the issue)
>>>
>>>> The errors are on the domU a frozen file system until a kernel
panic.
>>
>> Do you have a reproducable case that shows success with the previous
Xen
>> hypervisor package and failure with the new one, while keeping all
other
>> things the same?
> 
> We have a production system which gets to hang within about a day. We
> don't know what exactly triggers the issue.
> 
>> This seems like an upstream thing, because for 4.8, the Debian package
>> updates are almost exclusively shipping upstream stable udpates.
> 
> Ok.
Related:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554

Hans

Samuel Thibault

2019-Feb-09 16:35 UTC

head link

[Pkg-xen-devel] [admin] [Xen-devel] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.

Hello,

Hans van Kranenburg, le sam. 09 févr. 2019 17:01:55 +0100, a
ecrit:> > I have forwarded the original mail: all VM I/O get stuck, and thus the
> > VM becomes unusable.
> 
> These are in many cases the symptoms of running out of "grant
frames".
Oh!  That could be it indeed.  I'm wondering what could be monopolizing
them, though, and why +deb9u11 is affected while +deb9u10 is not.  I'm
afraid increasing the gnttab max size to 32 might just defer filling it
up.
>   -# ./xen-diag  gnttab_query_size 5
>   domid=5: nr_frames=11, max_nr_frames=32
The current value is 31 over max 32 indeed.
> With Xen 4.8, you can add gnttab_max_frames=64 (or another number, but
> higher than the default 32) to the xen hypervisor command line and reboot.
admin@: I made the modification in the grub config. We can probably try
to reboot with the newer hypervisor, and monitor that value.

Samuel

Pkg xen devel - Feb 2019 - [Xen-devel] [admin] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.

[Pkg-xen-devel] [admin] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.

[Pkg-xen-devel] [Xen-devel] [admin] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.

[Pkg-xen-devel] [admin] [Xen-devel] [BUG] task jbd2/xvda4-8:174 blocked for more than 120 seconds.