Hans van Kranenburg
2018-Jan-12 00:34 UTC
[Pkg-xen-devel] Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
Hi, On 08/01/2018 13:38, Valentin Vidic wrote:> On Sun, Jan 07, 2018 at 07:36:40PM +0100, Hans van Kranenburg wrote: >> Recently a tool was added to "dump guest grant table info". You could >> see if it compiles on the 4.8 source and see if it works? Would be >> interesting to get some idea about how high or low these numbers are in >> different scenarios. I mean, I'm using 128, you 256, and we even don't >> know if the actual value is maybe just above 32? :] >> >> https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=df36d82e3fc91bee2ff1681fd438c815fa324b6a > > The diag tool does not build inside xen-4.8: > > xen-diag.c: In function ?gnttab_query_size_func?: > xen-diag.c:50:10: error: implicit declaration of function ?xc_gnttab_query_size? [-Werror=implicit-function-declaration] > rc = xc_gnttab_query_size(xch, &query); > ^~~~~~~~~~~~~~~~~~~~Too bad. :|> but I think the same info is available in the thread on xen-devel: > > https://www.mail-archive.com/xen-devel at lists.xen.org/msg116910.htmlAh, great, didn't see that one yet.> When the domU hangs crash reports nr_grant_frames=32. After increasing > the gnttab_max_frames=256 the domU reports using nr_grant_frames=59. > > So the new default of gnttab_max_frames=64 might be a bit close to 59, > but I suppose 128 would be just as safe as 256 I currently use (if > you prefer 128).Is the 59 your lots-o-vcpu-monster? I just finished with the initial preparation of a Xen 4.10 package for unstable and have it running in my test environment. So, yay, I have xen-diag now. -# /usr/lib/xen-4.10/bin/xen-diag xen-diag: xen diagnostic utility Usage: xen-diag command [args] Commands: help display this help gnttab_query_size <domid> dump the current and max grant frames for <domid> -# /usr/lib/xen-4.10/bin/xen-diag gnttab_query_size 0 domid=0: nr_frames=1, max_nr_frames=64 That's a 10vcpu PVHv2 domU with two disks attached, running 4.14 guest kernel, which has only been booted up and is idling now. So, at least, nice to have some extra tooling available to help.>> If this is something users are going to run into while not doing more >> unusual things like having dozens of vcpus or network interfaces, then >> changing the default could prevent hours of frustration and debugging >> for them. > > Yes, the failure case is quite nasty, as the domU just hangs without > even suggesting grant frames might be the problem. Not sure if domU > can detect this situation at all?I can't comment on that, since I don't know. Anyone who does, please chime in.> Anyway, if the value cannot be increased, the situation should at least > be mentioned in the NEWS.Debian of the xen package.Since this has been reported multiple times already, and upstream has bumped it to 64, my verdict would be: * Bump default to 64 already like upstream did in a later version. * Properly document this issue in NEWS.Debian and also mention the option with documentation in the template grub config file, so there's a bigger chance users who run unusual big numbers of disks/nics/cpus/etc will find it. ...so we also better accomodate users who are using newer kernels in the domU with blk-mq, and prevent them from wasting too much time and getting frustrated for no reason. I wouldn't be comfortable with bumping it above the current latest greatest upstream default, since it would mean we would need to keep a patch in later versions. I'll prepare a patch to bump the default to 64 in 4.8, taking changes from the upstream patch. I probably have to ask upstream (Juergen Gross) why the commit that was referenced earlier bumps the default without mentioning it in the commit message. Since I just joined the Debian Xen team, I'll run anything I can come up with through the team to get it approved. We'll target the next Stretch stable update to get it in. Thanks, Hans