Roalt Zijlstra
2018-Nov-05 11:37 UTC
[Pkg-xen-devel] Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Package: src:xen Version: 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 Severity: important Updating Xen to the latest 4.8 version from the security repo makes servers unstable. The servers randomly reset without any logs. We have serveral Debian Stretch servers running Xen 4.8 and only the ones updated to the 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 version tend to crash ranging from 'twice a day' to 'once every two weeks'. We have already ruled out if hardware was an issue, since we have 4 individual servers which are different in hardware setup and also were bought at different times. And these servers ran stable with the previsous version 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9. These servers are acting exactly the same. Every thing works as it should, but without any logs it crashes and resets at a certain point. It looks like it could have something to do with DomUs running older (3.16) Linux kernels. As a test we applied 4.9 kernels to all Jessie DomU servers and so far it runs for 13 days (but this server did crash twice on a day). We have seen this behaviour with Xen on CentOS6 and 7 too, but the trouble seems to be fixed after some more updates. As said.. I cannot provide logs since it simply resets without notice. -- System Information: Debian Release: 9.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-8-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) xen-hypervisor-4.8-amd64 depends on no packages. Versions of packages xen-hypervisor-4.8-amd64 recommends: ii xen-utils-4.8 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 xen-hypervisor-4.8-amd64 suggests no packages. -- Configuration Files: /etc/default/grub.d/xen.cfg changed: echo "Including Xen overrides from /etc/default/grub.d/xen.cfg" GRUB_CMDLINE_XEN="dom0_mem=3072M,max:4096M loglvl=all guest_loglvl=all dom0_max_vcpus=4 dom0_vcpus_pin noreboot" if [ "$XEN_OVERRIDE_GRUB_DEFAULT" = "" ]; then echo "WARNING: GRUB_DEFAULT changed to boot into Xen by default!" echo " Edit /etc/default/grub.d/xen.cfg to avoid this warning." XEN_OVERRIDE_GRUB_DEFAULT=1 fi if [ "$XEN_OVERRIDE_GRUB_DEFAULT" = "1" ]; then GRUB_DEFAULT="Debian GNU/Linux, with Xen hypervisor" fi -- no debconf information
Hans van Kranenburg
2018-Nov-06 17:54 UTC
[Pkg-xen-devel] Bug#912975: Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Hi, On 11/5/18 12:37 PM, Roalt Zijlstra wrote:> Package: src:xen > Version: 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > Severity: important > > Updating Xen to the latest 4.8 version from the security repo makes servers unstable.Can you confirm that this is the only change that you made between the before/after scenario? I mean, if you downgrade the packages, or you drop the old hypervisor xen-x.y-amd64.gz in /boot again, it's stable again?> The servers randomly reset without any logs.Do you have the noreboot option set on the Xen hypervisor command line? Are you able to configure and capture output from serial console? First interesting thing to know is if it's the Dom0 that crashes, or if it's the hypervisor itself, and the logging will tell you that.> We have serveral Debian Stretch servers running Xen 4.8 and only the ones updated to the 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > version tend to crash ranging from 'twice a day' to 'once every two weeks'. We have already ruled out if hardware was an > issue, since we have 4 individual servers which are different in hardware setup and also were bought at different times. > And these servers ran stable with the previsous version 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9. > These servers are acting exactly the same. Every thing works as it should, but without any logs it crashes and resets at > a certain point. > > It looks like it could have something to do with DomUs running older (3.16) Linux kernels. As a test we applied 4.9 kernels to > all Jessie DomU servers and so far it runs for 13 days (but this server did crash twice on a day). > We have seen this behaviour with Xen on CentOS6 and 7 too, but the trouble seems to be fixed after some more updates.It can be frustrating that there's not much response on the mailing lists. But, these kinds of problems can be really hard to debug and solve. Unless there's a clear reproduction scenario and debug output, there's often noone who can help you remotely.> As said.. I cannot provide logs since it simply resets without notice.It's still the best starting point... Thanks, Hans
Roalt Zijlstra | webpower
2018-Nov-07 11:48 UTC
[Pkg-xen-devel] Bug#912975: Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Hi Hans, Op di 6 nov. 2018 om 18:54 schreef Hans van Kranenburg <hans at knorrie.org>:> Hi, > > On 11/5/18 12:37 PM, Roalt Zijlstra wrote: > > Package: src:xen > > Version: 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > > Severity: important > > > > Updating Xen to the latest 4.8 version from the security repo makes > servers unstable. > > Can you confirm that this is the only change that you made between the > before/after scenario? I mean, if you downgrade the packages, or you > drop the old hypervisor xen-x.y-amd64.gz in /boot again, it's stable again? >We have several servers running the previous versions and those are still stable. The servers that we upgraded using 'apt-get update; apt-get upgrade' were rock solid before the upgrade. I did prepare a downgrade script if needed, but atm. the crash interval in days seems to be higher then before. We did have servers crashing every 2 days or even one crashing twice a day.> > > The servers randomly reset without any logs. > > Do you have the noreboot option set on the Xen hypervisor command line? > >For now one busy servers runs an older 4.9.0-4-amd64 kernel with a 3.16 kernel DomU with MySQL server on it. The second busy server runs all domUs with 4.9 (backport) kernels on the lastest 4.9.0-8-amd64 kernel for the Dom0. Currently we are awaiting any crash. The last mentioned server was rebooted with the noreboot option, so we could eventually check the console for errors once it crashes. The remain two servers are our fall-back servers and are not that busy. We have seen them crash too, but we noticed that the less busy servers did not crash that often. But once they were busy they crashed as quickly as the master servers.> Are you able to configure and capture output from serial console? >Oh wow.. Using old technology for debugging :-) I will need to see how that configuration is done. We could connect up physical serial cables between different machines.> First interesting thing to know is if it's the Dom0 that crashes, or if > it's the hypervisor itself, and the logging will tell you that. > > > We have serveral Debian Stretch servers running Xen 4.8 and only the > ones updated to the 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > > version tend to crash ranging from 'twice a day' to 'once every two > weeks'. We have already ruled out if hardware was an > > issue, since we have 4 individual servers which are different in > hardware setup and also were bought at different times. > > And these servers ran stable with the previsous version > 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9. > > These servers are acting exactly the same. Every thing works as it > should, but without any logs it crashes and resets at > > a certain point. > > > > It looks like it could have something to do with DomUs running older > (3.16) Linux kernels. As a test we applied 4.9 kernels to > > all Jessie DomU servers and so far it runs for 13 days (but this server > did crash twice on a day). > > We have seen this behaviour with Xen on CentOS6 and 7 too, but the > trouble seems to be fixed after some more updates. > > It can be frustrating that there's not much response on the mailing > lists. But, these kinds of problems can be really hard to debug and > solve. Unless there's a clear reproduction scenario and debug output, > there's often noone who can help you remotely. >Well we have been having the issues since february this year with unstable Xen servers crashing once in a months or so. The first issues were on fresh Cent OS 7 servers, but then we also got them with updated Cent OS 6 servers. We then decided to use Debian Stretch and the first tests were pretty stable. We did install a new R740 with it (Xen 4.8.4-pre) and that ran for 110 days pretty well.> > As said.. I cannot provide logs since it simply resets without notice. > > It's still the best starting point...Well hopefully the 'noreboot' provided server crashes soon for some logs. I will check if we can do any serial console tricks. Roalt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20181107/502e3784/attachment.html>
Hans van Kranenburg
2018-Nov-07 13:30 UTC
[Pkg-xen-devel] Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Hi, On 11/7/18 12:48 PM, Roalt Zijlstra | webpower wrote:> > Op di 6 nov. 2018 om 18:54 schreef Hans van Kranenburg <hans at knorrie.org > <mailto:hans at knorrie.org>>: > > Hi, > > On 11/5/18 12:37 PM, Roalt Zijlstra wrote: > > Package: src:xen > > Version: 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > > Severity: important > > > > Updating Xen to the latest 4.8 version from the security repo > makes servers unstable. > > Can you confirm that this is the only change that you made between the > before/after scenario? I mean, if you downgrade the packages, or you > drop the old hypervisor xen-x.y-amd64.gz in /boot again, it's stable > again? > > > We have several servers running the previous versions and those are > still stable. The servers that we upgraded using 'apt-get update; > apt-get upgrade' were rock solid before the upgrade.Yes, that's why I was asking. Did that apt-get upgrade also upgrade your dom0 kernel? You can look back in /var/log/dpkg.log* about what happened. This is very relevant information.> I did prepare a downgrade script if needed, but atm. the crash interval > in days seems to be higher then before. We did have servers crashing > every 2 days or even one crashing twice a day.> > The servers randomly reset without any logs. > > Do you have the noreboot option set on the Xen hypervisor command line? > > > For now one busy servers runs an older 4.9.0-4-amd64 kernel with a 3.16 > kernel DomU with MySQL server on it. The second busy server runs all > domUs with 4.9 (backport) kernels on the lastest 4.9.0-8-amd64 kernel > for the Dom0. Currently we are awaiting any crash.In Debian, 4.9.0-8-amd64 is in the name of the package, but the real kernel version is in the version of that package. So, if you have linux-image-4.9.0-8-amd64, you should always also mention the real version, which is now e.g. 4.9.110-3+deb9u6. This means it's based on 4.9.110 upstream. The kernel team follows the 4.9 LTS releases, but only if the changes have to break the ABI (so custom modules have to be rebuilt), they up the number in the package name to trigger that process.> The last mentioned server was rebooted with the noreboot option, so we > could eventually check the console for errors once it crashes. > The remain two servers are our fall-back servers and are not that busy. > We have seen them crashtoo, but we noticed that the less busy servers > did not crash that often. But once they were busy they crashed as > quickly as the master servers.Ok, that's interesting extra data.> Are you able to configure and capture output from serial console? > > > Oh wow.. Using old technology for debugging :-) I will need to see how > that configuration is done. We could connect up physical serial cables > between different machines.Well... old... It's the best way to capture text after everything crashes. On a vga display it scrolls away and you can't copy paste. If you're using recent Dell hardware, then I guess your drac provides an extra emulated serial console. I use HP hardware, there it's the ilo virtual serial port.> First interesting thing to know is if it's the Dom0 that crashes, or if > it's the hypervisor itself, and the logging will tell you that. > > > We have serveral Debian Stretch servers running Xen 4.8 and only > the ones updated to the 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > > version tend to crash ranging from 'twice a day' to 'once every > two weeks'. We have already ruled out if hardware was an > > issue, since we have 4 individual servers which are different in > hardware setup and also were bought at different times. > > And these servers ran stable with the previsous version > 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9. > > These servers are acting exactly the same. Every thing works as it > should, but without any logs it crashes and resets at > > a certain point. > > > > It looks like it could have something to do with DomUs running > older (3.16) Linux kernels. As a test we applied 4.9 kernels to > > all Jessie DomU servers and so far it runs for 13 days (but this > server did crash twice on a day). > > We have seen this behaviour with Xen on CentOS6 and 7 too, but the > trouble seems to be fixed after some more updates. > > It can be frustrating that there's not much response on the mailing > lists. But, these kinds of problems can be really hard to debug and > solve. Unless there's a clear reproduction scenario and debug output, > there's often noone who can help you remotely. > > > Well we have been having the issues since february this year with > unstable Xen servers crashing once in a months or so. The first issues > were on fresh Cent OS 7 servers, but then we also got them with updated > Cent OS 6 servers. We then decided to use Debian Stretch and the first > tests were pretty stable. We did install a new R740 with it (Xen > 4.8.4-pre) and that ran for 110 days pretty well.I know this feeling. I've been debugging similar kinds of issues this year that appeared "every few weeks".> > As said.. I cannot provide logs since it simply resets without notice. > > It's still the best starting point... > > > Well hopefully the 'noreboot' provided server crashes soon for some > logs. I will check if we can do any serial console tricks.Yes. Hans
Roalt Zijlstra | webpower
2018-Nov-07 14:43 UTC
[Pkg-xen-devel] Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Op wo 7 nov. 2018 om 14:30 schreef Hans van Kranenburg <hans at knorrie.org>:> Hi, > > On 11/7/18 12:48 PM, Roalt Zijlstra | webpower wrote: > > > > Op di 6 nov. 2018 om 18:54 schreef Hans van Kranenburg <hans at knorrie.org > > <mailto:hans at knorrie.org>>: > > > > Hi, > > > > On 11/5/18 12:37 PM, Roalt Zijlstra wrote: > > > Package: src:xen > > > Version: 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > > > Severity: important > > > > > > Updating Xen to the latest 4.8 version from the security repo > > makes servers unstable. > > > > Can you confirm that this is the only change that you made between > the > > before/after scenario? I mean, if you downgrade the packages, or you > > drop the old hypervisor xen-x.y-amd64.gz in /boot again, it's stable > > again? > > > > > > We have several servers running the previous versions and those are > > still stable. The servers that we upgraded using 'apt-get update; > > apt-get upgrade' were rock solid before the upgrade. > > Yes, that's why I was asking. Did that apt-get upgrade also upgrade your > dom0 kernel? You can look back in /var/log/dpkg.log* about what > happened. This is very relevant information. >Two servers have been installed at 2018-04-24 and then upgraded: 2018-10-08 19:40:57 upgrade xen-hypervisor-4.8-amd64:amd64 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 2018-10-08 19:41:14 status installed xen-hypervisor-4.8-amd64:amd64 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 2018-07-31 18:50:01 upgrade xen-hypervisor-4.8-amd64:amd64 4.8.3+comet2+shim4.10.0+comet3-1+deb9u5 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9 2018-07-31 18:50:45 status installed xen-hypervisor-4.8-amd64:amd64 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9 2018-04-24 16:22:56 install xen-hypervisor-4.8-amd64:amd64 <none> 4.8.3+comet2+shim4.10.0+comet3-1+deb9u5 2018-04-24 16:23:05 status installed xen-hypervisor-4.8-amd64:amd64 4.8.3+comet2+shim4.10.0+comet3-1+deb9u5 The two other servers ran Cent OS first and were converted to Debian for other reasons and so are fresh installs: 2018-09-26 22:01:34 install xen-hypervisor-4.8-amd64:amd64 <none> 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 2018-09-26 22:01:57 status installed xen-hypervisor-4.8-amd64:amd64 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10> > > I did prepare a downgrade script if needed, but atm. the crash interval > > in days seems to be higher then before. We did have servers crashing > > every 2 days or even one crashing twice a day. > > > > The servers randomly reset without any logs. > > > > Do you have the noreboot option set on the Xen hypervisor command > line? > > > > > > For now one busy servers runs an older 4.9.0-4-amd64 kernel with a 3.16 > > kernel DomU with MySQL server on it. The second busy server runs all > > domUs with 4.9 (backport) kernels on the lastest 4.9.0-8-amd64 kernel > > for the Dom0. Currently we are awaiting any crash. > > In Debian, 4.9.0-8-amd64 is in the name of the package, but the real > kernel version is in the version of that package. > > So, if you have linux-image-4.9.0-8-amd64, you should always also > mention the real version, which is now e.g. 4.9.110-3+deb9u6. This means > it's based on 4.9.110 upstream. > > The kernel team follows the 4.9 LTS releases, but only if the changes > have to break the ABI (so custom modules have to be rebuilt), they up > the number in the package name to trigger that process. >Right I completely missed that detail: Two heavy used servers run kernels: 4.9.65-3+deb9u1 with one Jessie DomU kernel: 3.16.57-2 4.9.110-3+deb9u6 with a few Jessie DomU kernels: 4.9.110-3+deb9u5~deb8u1 Two less used servers run: 4.9.110-3+deb9u5 with one Jessie DomU kernel: 3.16.59-1 4.9.110-3+deb9u5 with a few mixed Jessie DomU kernels: 3.16.59-1 and 3.16.57-2> > > The last mentioned server was rebooted with the noreboot option, so we > > could eventually check the console for errors once it crashes. > > The remain two servers are our fall-back servers and are not that busy. > > We have seen them crashtoo, but we noticed that the less busy servers > > did not crash that often. But once they were busy they crashed as > > quickly as the master servers. > > Ok, that's interesting extra data. > > > Are you able to configure and capture output from serial console? > > > > > > Oh wow.. Using old technology for debugging :-) I will need to see how > > that configuration is done. We could connect up physical serial cables > > between different machines. > > Well... old... It's the best way to capture text after everything > crashes. On a vga display it scrolls away and you can't copy paste. > > If you're using recent Dell hardware, then I guess your drac provides an > extra emulated serial console. I use HP hardware, there it's the ilo > virtual serial port. >I will get into this, never used it before as most crashes so far, did log errors before things stop to work.> > > First interesting thing to know is if it's the Dom0 that crashes, or > if > > it's the hypervisor itself, and the logging will tell you that. > > > > > We have serveral Debian Stretch servers running Xen 4.8 and only > > the ones updated to the 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > > > version tend to crash ranging from 'twice a day' to 'once every > > two weeks'. We have already ruled out if hardware was an > > > issue, since we have 4 individual servers which are different in > > hardware setup and also were bought at different times. > > > And these servers ran stable with the previsous version > > 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9. > > > These servers are acting exactly the same. Every thing works as it > > should, but without any logs it crashes and resets at > > > a certain point. > > > > > > It looks like it could have something to do with DomUs running > > older (3.16) Linux kernels. As a test we applied 4.9 kernels to > > > all Jessie DomU servers and so far it runs for 13 days (but this > > server did crash twice on a day). > > > We have seen this behaviour with Xen on CentOS6 and 7 too, but the > > trouble seems to be fixed after some more updates. > > > > It can be frustrating that there's not much response on the mailing > > lists. But, these kinds of problems can be really hard to debug and > > solve. Unless there's a clear reproduction scenario and debug output, > > there's often noone who can help you remotely. > > > > > > Well we have been having the issues since february this year with > > unstable Xen servers crashing once in a months or so. The first issues > > were on fresh Cent OS 7 servers, but then we also got them with updated > > Cent OS 6 servers. We then decided to use Debian Stretch and the first > > tests were pretty stable. We did install a new R740 with it (Xen > > 4.8.4-pre) and that ran for 110 days pretty well. > > I know this feeling. I've been debugging similar kinds of issues this > year that appeared "every few weeks". > > > > As said.. I cannot provide logs since it simply resets without > notice. > > > > It's still the best starting point... > > > > > > Well hopefully the 'noreboot' provided server crashes soon for some > > logs. I will check if we can do any serial console tricks. > > Yes. >Oh and before I forget.. Thanks for all the feedback/help! Roalt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20181107/f42fb4b1/attachment-0001.html>
Patrick Beckmann
2019-Jan-03 10:02 UTC
[Pkg-xen-devel] Bug#912975: Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Hi, this bug description sounds a lot like a problem we have with two Xen Dom0s, so I am replying here. One of our machines has been running stable on Debian 8 and was newly upgraded to Debian 9, another one is new hardware with a fresh installation. With the most recent Debian 9 they crash at a rate from every 3 days to 3 times a day, suspected to be depending on load. Versions are - Xen hypervisor: 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 - Linux Kernel: 4.9.130-2 On Tue, 6 Nov 2018 18:54:53 +0100 Hans van Kranenburg <hans at knorrie.org> wrote:> Are you able to configure and capture output from serial console?We have been able to capture the output of our new machine crashing. Please find it attached to this e-mail. Unfortunately it lacks the lines during boot time. If you need them or any other information, please let me know.> Can you confirm that this is the only change that you made between the > before/after scenario? I mean, if you downgrade the packages, or you > drop the old hypervisor xen-x.y-amd64.gz in /boot again, it's stable again?We would try this next with Xen version 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9. Best Regards, Patrick Beckmann -------------- next part -------------- [SOL Session operational. Use ~? for help] [ 99.992731] xen-blkback: backend/vbd/19/51712: prepare for reconnect [ 101.634684] xen-blkback: backend/vbd/20/51712: prepare for reconnect [ 103.653671] xen-blkback: backend/vbd/19/51712: using 4 queues, protocol 1 (x86_64-abi) persistent grants [ 103.827314] vif vif-19-0 vif19.0: Guest Rx ready [ 103.827427] IPv6: ADDRCONF(NETDEV_CHANGE): vif19.0: link becomes ready [ 103.827534] br02: port 15(vif19.0) entered blocking state [ 103.827541] br02: port 15(vif19.0) entered forwarding state [ 104.476998] xen-blkback: backend/vbd/20/51712: using 4 queues, protocol 1 (x86_64-abi) persistent grants [ 104.660889] vif vif-20-0 vif20.0: Guest Rx ready [ 104.661018] IPv6: ADDRCONF(NETDEV_CHANGE): vif20.0: link becomes ready [ 104.661168] br026: port 2(vif20.0) entered blocking state [ 104.661184] br026: port 2(vif20.0) entered forwarding state (XEN) d8 L1TF-vulnerable L1e 0000000001a23320 - Shadowing (XEN) d8 L1TF-vulnerable L1e 0000000001a23320 - Shadowing (XEN) d8 L1TF-vulnerable L1e 0000000001a23320 - Shadowing (XEN) d11 L1TF-vulnerable L1e 00000000020c3320 - Shadowing (XEN) d13 L1TF-vulnerable L1e 0000000001a3b320 - Shadowing (XEN) d15 L1TF-vulnerable L1e 0000000001a23320 - Shadowing Debian GNU/Linux 9 caribou hvc0 caribou login: Debian GNU/Linux 9 caribou hvc0 caribou login: [ 4676.600094] br02: port 14(vif17.0) entered disabled state [ 4676.744064] br02: port 14(vif17.0) entered disabled state [ 4676.745573] device vif17.0 left promiscuous mode [ 4676.745618] br02: port 14(vif17.0) entered disabled state [ 4683.146619] br02: port 14(vif21.0) entered blocking state [ 4683.146678] br02: port 14(vif21.0) entered disabled state [ 4683.146921] device vif21.0 entered promiscuous mode [ 4683.153997] IPv6: ADDRCONF(NETDEV_UP): vif21.0: link is not ready [ 4683.639331] xen-blkback: backend/vbd/21/51712: using 1 queues, protocol 1 (x86_64-abi) [ 4684.544484] xen-blkback: backend/vbd/21/51712: prepare for reconnect [ 4684.938636] xen-blkback: backend/vbd/21/51712: using 1 queues, protocol 1 (x86_64-abi) [ 4692.235692] xen-blkback: backend/vbd/21/51712: prepare for reconnect [ 4694.917436] vif vif-21-0 vif21.0: Guest Rx ready [ 4694.917800] IPv6: ADDRCONF(NETDEV_CHANGE): vif21.0: link becomes ready [ 4694.917918] br02: port 14(vif21.0) entered blocking state [ 4694.917926] br02: port 14(vif21.0) entered forwarding state [ 4694.921344] xen-blkback: backend/vbd/21/51712: using 2 queues, protocol 1 (x86_64-abi) persistent grants Debian GNU/Linux 9 caribou hvc0 caribou login: (XEN) ----[ Xen-4.8.5-pre x86_64 debug=n Not tainted ]---- (XEN) CPU: 32 (XEN) RIP: e008:[<ffff82d08023116d>] guest_4.o#sh_page_fault__guest_4+0x75d/0x1e30 (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d8v0) (XEN) rax: 00007fb5797e6580 rbx: ffff8310f4372000 rcx: ffff81c0e0600000 (XEN) rdx: 0000000000000000 rsi: ffff8310f4372000 rdi: 000000000001fed5 (XEN) rbp: ffff8310f4372000 rsp: ffff8340250e7c78 r8: 000000000001fed5 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: ffff81c0e06ff6a8 r13: 000000000407fad6 r14: ffff830078da7000 (XEN) r15: ffff8340250e7ef8 cr0: 0000000080050033 cr4: 0000000000372660 (XEN) cr3: 000000407ec02001 cr2: ffff81c0e06ff6a8 (XEN) fsb: 00007fb58fc26700 gsb: 0000000000000000 gss: ffff8801fea00000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen code around <ffff82d08023116d> (guest_4.o#sh_page_fault__guest_4+0x75d/0x1e30): (XEN) ff ff 03 00 4e 8d 24 c1 <49> 8b 0c 24 f6 c1 01 0f 84 b6 06 00 00 48 c1 e1 (XEN) Xen stack trace from rsp=ffff8340250e7c78: (XEN) 00007fb5797e6580 00000000027372df ffff82d080323600 ffff8310f4372648 (XEN) ffff8310f43726a8 00000000027372df ffff8340250e7d50 ffff8340250e7d98 (XEN) 00000007fb5797e6 0000000000000090 ffff82d080323618 00000000000007f8 (XEN) 00000000000006a8 0000000000000e58 0000000000000f30 ffff82d000000000 (XEN) 000000000000000d 0000005100000002 00000000000001e6 ffff8340250e7d20 (XEN) 00000000000000e0 0000000000000000 000000000277f512 ffff830078da7000 (XEN) 0000000000000001 ffff830078da7bc0 00000000020dd93d 00007fb5797e6580 (XEN) 0000002700075067 000000280ae61067 000000280ca6f067 00000027372df967 (XEN) 000000000267c9a0 0000000002700075 000000000280ae61 000000000280ca6f (XEN) 000000407faf7067 ffff830078da7000 ffff8310f4372000 ffff8340250e7ef8 (XEN) ffff82d08023a910 0000000000000000 000000005c2d2f4a ffff82d08023a780 (XEN) ffff8310f4372000 ffff8340250e7fff ffff830078da7000 ffff82d08023aa0f (XEN) ffff82d08023f913 ffff82d08023f907 ffff82d08023f913 ffff82d08023f907 (XEN) ffff82d08023f913 ffff82d08023f907 ffff82d08023f913 ffff82d08023f907 (XEN) ffff82d08023f913 ffff82d08023f907 ffff82d08023f913 ffff82d08023f907 (XEN) ffff82d08023f913 ffff82d08023f907 ffff82d08023f913 ffff8340250e7ef8 (XEN) 00007fb5797e6580 ffff830078da7000 0000000000000014 ffff8310f4372000 (XEN) 0000000000000000 ffff82d08019f5a2 ffff82d08023f913 ffff82d08023f907 (XEN) ffff82d08023f913 ffff830078da7000 0000000000000000 0000000000000000 (XEN) 0000000000000000 ffff8340250e7fff 0000000000000000 ffff82d08023f9d9 (XEN) Xen call trace: (XEN) [<ffff82d08023116d>] guest_4.o#sh_page_fault__guest_4+0x75d/0x1e30 (XEN) [<ffff82d08023a910>] do_iret+0/0x1c0 (XEN) [<ffff82d08023a780>] toggle_guest_pt+0x30/0x160 (XEN) [<ffff82d08023aa0f>] do_iret+0xff/0x1c0 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08023f907>] handle_exception+0x8f/0xf9 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08023f907>] handle_exception+0x8f/0xf9 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08023f907>] handle_exception+0x8f/0xf9 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08023f907>] handle_exception+0x8f/0xf9 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08023f907>] handle_exception+0x8f/0xf9 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08023f907>] handle_exception+0x8f/0xf9 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08023f907>] handle_exception+0x8f/0xf9 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08019f5a2>] do_page_fault+0x1f2/0x4c0 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08023f907>] handle_exception+0x8f/0xf9 (XEN) [<ffff82d08023f913>] handle_exception+0x9b/0xf9 (XEN) [<ffff82d08023f9d9>] entry.o#handle_exception_saved+0x68/0x94 (XEN) (XEN) Pagetable walk from ffff81c0e06ff6a8: (XEN) L4[0x103] = 000000407ec02063 ffffffffffffffff (XEN) L3[0x103] = 000000407ec02063 ffffffffffffffff (XEN) L2[0x103] = 000000407ec02063 ffffffffffffffff (XEN) L1[0x0ff] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 32: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: ffff81c0e06ff6a8 (XEN) **************************************** (XEN) (XEN) Manual reset required ('noreboot' specified)
Hans van Kranenburg
2019-Jan-03 22:15 UTC
[Pkg-xen-devel] Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
On 1/3/19 11:02 AM, Patrick Beckmann wrote:> > this bug description sounds a lot like a problem we have with two Xen > Dom0s, so I am replying here.The original reporter for this bug didn't show errors yet to compare, but at least it's a "something crashes" scenario. ;]> One of our machines has been running stable on Debian 8 and was newly > upgraded to Debian 9, another one is new hardware with a fresh > installation. With the most recent Debian 9 they crash at a rate from > every 3 days to 3 times a day, suspected to be depending on load. > Versions are > - Xen hypervisor: 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > - Linux Kernel: 4.9.130-2 > > On Tue, 6 Nov 2018 18:54:53 +0100 Hans van Kranenburg <hans at knorrie.org> > wrote: >> Are you able to configure and capture output from serial console? > > We have been able to capture the output of our new machine crashing. > Please find it attached to this e-mail. Unfortunately it lacks the lines > during boot time. If you need them or any other information, please let > me know.That's the same error as discussed in this thread, and it looks like it's not narrowed down to something reproducible yet. https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00938.html I don't think the Debian packaging people can be of great help here by sitting in between upstream and you. This is an upstream bug, and I've never encountered that one myself, nor do I know how to cause it to help debugging. Maybe you can join in on that discussion on the Xen mailing list to provide more info about your situation? Hans
Hans van Kranenburg
2019-Feb-22 18:24 UTC
[Pkg-xen-devel] Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
tags 912975 + moreinfo thanks Hi, On 11/7/18 3:43 PM, Roalt Zijlstra | webpower wrote:> > Op wo 7 nov. 2018 om 14:30 schreef Hans van Kranenburg <hans at knorrie.org > <mailto:hans at knorrie.org>>: > > > > Well hopefully the 'noreboot' provided server crashes soon for some > > logs. I will check if we can do any serial console tricks. > > Oh and before I forget.. Thanks for all the feedback/help!Do you have any update here? Any debug logging? The current state of this bug does not really allow anyone other than yourself to cause it to progress. Hans
Debian Bug Tracking System
2019-Feb-22 18:27 UTC
[Pkg-xen-devel] Processed: Re: Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Processing commands for control at bugs.debian.org:> tags 912975 + moreinfoBug #912975 [src:xen] xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4 Added tag(s) moreinfo.> thanksStopping processing here. Please contact me if you need assistance. -- 912975: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912975 Debian Bug Tracking System Contact owner at bugs.debian.org with problems
Alexander Dahl
2019-Feb-22 20:41 UTC
[Pkg-xen-devel] Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Hei hei, On Fri, Feb 22, 2019 at 07:24:11PM +0100, Hans van Kranenburg wrote:> The current state of this bug does not really allow anyone other than > yourself to cause it to progress.FWIW, I also have problems with Xen and stretch on amd64. Since upgrading from jessie I get random crashes, which means the system hangs and I only can do a hard powercycle. I'm currently reorganizing the partitions to get enough space for a debug kernel on the rootfs, otherwise the stacktraces are probably not of big help? (I would have upgraded to buster already, but pygrub is broken there, so maybe we get stretch fixed until then.) Greets Alex -- /"\ ASCII RIBBON | »With the first link, the chain is forged. The first \ / CAMPAIGN | speech censured, the first thought forbidden, the X AGAINST | first freedom denied, chains us all irrevocably.« / \ HTML MAIL | (Jean-Luc Picard, quoting Judge Aaron Satie) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20190222/60caf269/attachment.sig>
Hans van Kranenburg
2019-Feb-22 21:06 UTC
[Pkg-xen-devel] Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Hi Alexander, On 2/22/19 9:41 PM, Alexander Dahl wrote:> > On Fri, Feb 22, 2019 at 07:24:11PM +0100, Hans van Kranenburg wrote: >> The current state of this bug does not really allow anyone other than >> yourself to cause it to progress. > > FWIW, I also have problems with Xen and stretch on amd64. Since > upgrading from jessie I get random crashes, which means the system > hangs and I only can do a hard powercycle.Ok. I'm gonna sound a bit strict/rigorous/stern here (I don't know which of those is the right one, not a native speaker). Please don't use an existing bug for an "I'm also having similar problems" report. It might seem helpful to group similar symptoms together, but often there seem to be different causes in the end, and people from the package maintainer team will get confused about what the real issue was and if it's fixed now or not, and you might end up with a closed bug report while your issue was not dealt with, or the original reporter might end up with a closed bug because your me-too problem was fixed.>From your problem description, it seems you're experiencing dom0 or evenxen hypervisor problems, not domU (virtual machine) problems. Is that right?> I'm currently reorganizing > the partitions to get enough space for a debug kernel on the rootfs, > otherwise the stacktraces are probably not of big help?As long as you don't share any stack trace at all, they won't be of any help no. :) You might be experiencing a known problem, or a problem that we know already has been fixed upstream in later Xen or Linux, or a new problem.> (I would have upgraded to buster already, but pygrub is broken there, > so maybe we get stretch fixed until then.)The pygrub fixes are in the upload to unstable that was done today. https://tracker.debian.org/news/1031793/accepted-xen-411126-g87f51bf366-2-all-amd64-source-into-unstable/ Hans Off-the-record: 2018 was not a good year for the Linux kernel in general, also thanks to all the spectre/meltdown things happening. My own experience is that the Linux 4.9 LTS kernel is quite unusable (as dom0 and domU) with Xen, and I jumped over it, towards Xen 4.11 and Linux 4.19, which is a great success so far. So, with my personal (and $dayjob) hat on, I can recommend leaving the current situation behind and at least please run the Linux 4.19 kernel from stretch-backports instead. With my Debian Xen package mainainer hat on... Yes, I'd like to help you, but please create an additional bug after you have been able to collect more logs and stacktraces and explosions happening etc. Thanks.
Roalt Zijlstra | webpower
2019-Feb-25 09:03 UTC
[Pkg-xen-devel] Bug#912975: xen-hypervisor-4.8-amd64: Dom0 crashes randomly without logs on Debian Stretch with Xen 4.8.4
Hi Hans, We did have some crashes , but now more every 1 or 2 months. We did have the noreboot option, but so far this did not show anything. We were looking into doing syslog reporting, but I am not sure if we fixed that on all Xen servers. So maybe we should close the bug report since with the latest kernel updates things are a bit better. There is one thing to mention, which might help others on this matter. We are now only running pretty stable with: - the Dom0 running with kernel: Debian 4.9.110-3+deb9u6 (2018-10-08) or Debian 4.9.130-2 (2018-10-27) with Xen 4.8.5-pre. And on all DomU servers with a 4.9 kernel. - the Dom0 running with kernel: Debian 4.9.65-3+deb9u1 (2017-12-23) with Xen 4.8.5-pre. This setup runs with mixed 3.16 and 4.9 kernels fine but without spectre/meltdown fixes. Other combination like running DomUs with 3.16 kernels on the 4.9.110-3+deb9u6 (2018-10-08) Dom0 kernel is bound to crash quickly on heavily used servers. Be it Nginx SSL offloading or an active MySQL database. We have not tested the latest Debian Kernels with 3.16 kernels again. Greetings, Roalt Op vr 22 feb. 2019 om 19:24 schreef Hans van Kranenburg <hans at knorrie.org>:> tags 912975 + moreinfo > thanks > > Hi, > > On 11/7/18 3:43 PM, Roalt Zijlstra | webpower wrote: > > > > Op wo 7 nov. 2018 om 14:30 schreef Hans van Kranenburg <hans at knorrie.org > > <mailto:hans at knorrie.org>>: > > > > > > Well hopefully the 'noreboot' provided server crashes soon for some > > > logs. I will check if we can do any serial console tricks. > > > > Oh and before I forget.. Thanks for all the feedback/help! > > Do you have any update here? Any debug logging? > > The current state of this bug does not really allow anyone other than > yourself to cause it to progress. > > Hans >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20190225/6dee98df/attachment-0001.html>