Ben Hutchings
2014-Nov-05 20:40 UTC
[Pkg-xen-devel] kernel crashes after soft lockups in xen domU
On Wed, 2014-11-05 at 17:56 +0100, Jonas Meurer wrote: [...]> So the question is: why does the VM run stable on xen1 while it > crashes all the time on xen2. If I compare xen1 and xen2, only > real difference is mainboard (Supermicro X8 on xen1; Supermicro > X9 on xen2) and CPU (Xeon L5939 on xen1; E5-2609 on xen2) > > As a next step I'll put the harddisks into another X8/Xeon L5639 > server system and try to reproduce the crashes there. My bet is > that this system will not crash anymore. In other words, I guess > that this very bug is only triggered with the X9 + E-2609 > combination. > > > Can I do anything additional to help debugging the bug? Shall I report > > it > > to Xen upstream or send it to lkml? > > Still the same question. Shall I send the bugreport to upstream? > Unfortunately nobody from Debian Linux kernel and/or Xen team seems > to care :-/[...] Sorry you haven't had a response from us so far. This seems to be fairly clearly a Linux/Xen interaction and I don't know enough about Xen to suggest how to debug it. As it involves a relatively old kernel version, I don't think Linux upstream developers will want to hear about this unless you can also reproduce it with a more recent version. Linux 3.16 is available (in testing and wheezy-backports) if you would like to try that. I don't know whether the Xen upstream developers will accept a bug report against this version. Ben. -- Ben Hutchings The program is absolutely right; therefore, the computer must be wrong. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 811 bytes Desc: This is a digitally signed message part URL: <http://lists.alioth.debian.org/pipermail/pkg-xen-devel/attachments/20141105/7030712a/attachment.sig>
Jonas Meurer
2014-Nov-12 15:28 UTC
[Pkg-xen-devel] kernel crashes after soft lockups in xen domU
reassign 758622 linux-image-3.16.0-4-amd64 thanks Hi Ben, thanks for your response. Am 2014-11-05 21:40, schrieb Ben Hutchings:> On Wed, 2014-11-05 at 17:56 +0100, Jonas Meurer wrote: > [...] >> So the question is: why does the VM run stable on xen1 while it >> crashes all the time on xen2. If I compare xen1 and xen2, only >> real difference is mainboard (Supermicro X8 on xen1; Supermicro >> X9 on xen2) and CPU (Xeon L5939 on xen1; E5-2609 on xen2) >> >> As a next step I'll put the harddisks into another X8/Xeon L5639 >> server system and try to reproduce the crashes there. My bet is >> that this system will not crash anymore. In other words, I guess >> that this very bug is only triggered with the X9 + E-2609 >> combination. >> >> > Can I do anything additional to help debugging the bug? Shall I report >> > it >> > to Xen upstream or send it to lkml? >> >> Still the same question. Shall I send the bugreport to upstream? >> Unfortunately nobody from Debian Linux kernel and/or Xen team seems >> to care :-/ > [...] > > Sorry you haven't had a response from us so far. This seems to be > fairly clearly a Linux/Xen interaction and I don't know enough about > Xen > to suggest how to debug it. > > As it involves a relatively old kernel version, I don't think Linux > upstream developers will want to hear about this unless you can also > reproduce it with a more recent version. Linux 3.16 is available (in > testing and wheezy-backports) if you would like to try that.I tried linux-image-3.16 from wheezy-backports as VM kernel in the meantime. Sorry to report back that the bug is still reproducible with this kernel. I'm reassigning it to the jessie kernel for that reason. The kernel backtrace was slightly different, but the behaviour was the same: After putting the webserver on test VM under heavy load with siege and pylot, the load exploded until the machine crashed. Now I replaced the hardware again with a Supermicro S8 board and a Intel Xeon L5639 CPU - and you know what: the bug disappeared. I'll have to put the system back into production mode now, so further debugging will be complicated. To sum up the situation: -> a setup with Debian Wheezy Dom0 and Debian Wheezy or Jessie VM -> the VM runs an apache webserver with mysql backend, nothing more -> the VM crashes under load if Dom0 CPU is Intel Xeon E5-2609 -> the VM doesn't crash under load if Dom0 CPU is Intel Xeon 5639 -> tested on four completely different hardware setups, all components except harddisks replaced several times Kind regards, jonas
Ian Campbell
2014-Nov-12 17:04 UTC
[Pkg-xen-devel] kernel crashes after soft lockups in xen domU
On Wed, 2014-11-12 at 16:28 +0100, Jonas Meurer wrote:> -> the VM crashes under load if Dom0 CPU is Intel Xeon E5-2609 > -> the VM doesn't crash under load if Dom0 CPU is Intel Xeon 5639Someone just suggested to me (by their own admission on a hunch) that a microcode update might help with this sort of issue. That probably means updating to the latest BIOS as a first step (which will probably include a microcode update), then if that doesn't help messing around with the runtime microcode update tools (since there will undoubtedly be something newer than what the BIOS contains). Ian.