Hi, I''m still seeing a very strange issue here. First, let''s clarify that the issue has never occurred with the good old xen 3.x and the good old 2.6.18 kernel. So the issue is, that with xen 4.x (including 4.1.1) pretty much any kernel (including kernel from [1] and vanilla 3.0.0, didn''t test the 2.6.18), the machine freezes during a reboot. The machine won''t come up again, not even the BIOS screen will show. It doesn''t happen when running the kernel on bare metal. Also the fact that it doesn''t happen with xen 3.x + 2.6.18 might meen, that this is a regression of some sort. This issue has prevented my move from xen 3.x to xen 4.x for many years now. I already asked about this issue, and nobody replied. So I hoped, that the kernel from kernel.org would solve this, once that pvops dom0 enabled kernels were available. Well, it didn''t. I''m still stuck with this issue. Every time I want to reboot my machine, I have to call my hosting company to reboot the server. It''s a MSI X58 Pro-E (MS-7522) motherboard, equipped with Intel Core i7 920 and an nvidia graphics card. Did anybody ever experience a similar issue? Does anybody have any suggestions how to continue? This seems to be a very weired issue, and even pushing the computers reset button doesn''t seem to help. (It''s a remote machine, and I can remotely push the reset button). I have already updated the BIOS, and disabled virtualization (only paravirt domUs). However, no improvement. Kind Regards, Sven [1] http://code.google.com/p/gentoo-xen-kernel/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/09/2011 00:15, Sven Köhler wrote:> Hi, > > I''m still seeing a very strange issue here. First, let''s clarify that > the issue has never occurred with the good old xen 3.x and the good old > 2.6.18 kernel. > > So the issue is, that with xen 4.x (including 4.1.1) pretty much any > kernel (including kernel from [1] and vanilla 3.0.0, didn''t test the > 2.6.18), the machine freezes during a reboot. The machine won''t come up > again, not even the BIOS screen will show. > > It doesn''t happen when running the kernel on bare metal. Also the fact > that it doesn''t happen with xen 3.x + 2.6.18 might meen, that this is a > regression of some sort. > > This issue has prevented my move from xen 3.x to xen 4.x for many years > now. I already asked about this issue, and nobody replied. So I hoped, > that the kernel from kernel.org would solve this, once that pvops dom0 > enabled kernels were available. Well, it didn''t. I''m still stuck with > this issue. Every time I want to reboot my machine, I have to call my > hosting company to reboot the server. > > It''s a MSI X58 Pro-E (MS-7522) motherboard, equipped with Intel Core i7 > 920 and an nvidia graphics card. > > Did anybody ever experience a similar issue? > Does anybody have any suggestions how to continue? > > This seems to be a very weired issue, and even pushing the computers > reset button doesn''t seem to help. (It''s a remote machine, and I can > remotely push the reset button).Does your "remote" method involve actually pushing the reset button, and does this method actually work under normal circumstances? As for the problem itself, do you have C states enabled in the BIOS? This sounds similar to several errata published for the i7 series. ~Andrew> > I have already updated the BIOS, and disabled virtualization (only > paravirt domUs). However, no improvement. > > > Kind Regards, > Sven > > > [1] http://code.google.com/p/gentoo-xen-kernel/ > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Am 09.09.2011 01:17, schrieb Andrew Cooper:> Does your "remote" method involve actually pushing the reset button, and > does this method actually work under normal circumstances?I think, there is a device connected to the connector on the motherboard, to which the reset button would normally be attached.> As for the problem itself, do you have C states enabled in the BIOS? > This sounds similar to several errata published for the i7 series.I''m not sure how to tell whether C states are disabled/enabled. What would those BIOS options typically be called? Also, should I enable or disable them, in order to workaround those errata that you mention? Should those errors have occurred with xen 3.x as well, if those were a result of the errata you mention? Regards, Sven _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/09/11 00:40, Sven Köhler wrote:> Am 09.09.2011 01:17, schrieb Andrew Cooper: >> Does your "remote" method involve actually pushing the reset button, and >> does this method actually work under normal circumstances? > I think, there is a device connected to the connector on the > motherboard, to which the reset button would normally be attached.Ok, in which case the state your computer is getting into is a very broken state, if the reset button is not working>> As for the problem itself, do you have C states enabled in the BIOS? >> This sounds similar to several errata published for the i7 series. > I''m not sure how to tell whether C states are disabled/enabled. > What would those BIOS options typically be called?That is too bios dependent to say for sure, but typically "C states" or "deep sleep", with some intel ones going for "C1e"> Also, should I enable or disable them, in order to workaround those > errata that you mention?They should be disabled. The errata state that there are several situations when moving in and our of deep c states which cause processors to lock up irreparably.> Should those errors have occurred with xen 3.x as well, if those were a > result of the errata you mention?The power management code has changed quite a lot between 3.x and 4.x, so it is quite possible that xen 3.x just managed to miss these errata.> > Regards, > Sven > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Am 09.09.2011 14:01, schrieb Andrew Cooper:> On 09/09/11 00:40, Sven Köhler wrote: >> Am 09.09.2011 01:17, schrieb Andrew Cooper: >>> Does your "remote" method involve actually pushing the reset button, and >>> does this method actually work under normal circumstances? >> I think, there is a device connected to the connector on the >> motherboard, to which the reset button would normally be attached. > > Ok, in which case the state your computer is getting into is a very > broken state, if the reset button is not working > >>> As for the problem itself, do you have C states enabled in the BIOS? >>> This sounds similar to several errata published for the i7 series. >> I''m not sure how to tell whether C states are disabled/enabled. >> What would those BIOS options typically be called? > > That is too bios dependent to say for sure, but typically "C states" or > "deep sleep", with some intel ones going for "C1e" > >> Also, should I enable or disable them, in order to workaround those >> errata that you mention? > > They should be disabled. The errata state that there are several > situations when moving in and our of deep c states which cause > processors to lock up irreparably.Thanks for you help so far. I will try to disable the C-states as soon as I have the time. One more thing: are you aware of any way for telling from inside dom0 whether these C-states are enabled/disabled? Is the kernel or the xen hypervisor able to tell whether they are active? Also, is there any xen hypervisor command line option to disable the use of them? Regards, Sven _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Am 09.09.2011 14:55, schrieb Sven Köhler: > <snip> Without knowing much of the previous discussion: is this related to Hetzner-servers (from seeing the mainboard type, I can only guess that it''s a machine from the "new" Hetzner-series)? If that''s the case, use: "acpi=off" on the Dom0-kernel commandline (I use a Gentoo-adapted xen-sources-2.6.38 [rebased SuSE-Dom0-kernel]), that should solve the reboot problems. IIRC somewhere on the Hetzner-site they mention this, too. No reboot hangs/problems for me after that. -- --- Heiko. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Am 09.09.2011 15:31, schrieb Heiko Wundram:> Am 09.09.2011 14:55, schrieb Sven Köhler: >> <snip> > > Without knowing much of the previous discussion: is this related to > Hetzner-servers (from seeing the mainboard type, I can only guess that > it''s a machine from the "new" Hetzner-series)?Yes it is a Hetzner machine!> If that''s the case, use: "acpi=off" on the Dom0-kernel commandline (I > use a Gentoo-adapted xen-sources-2.6.38 [rebased SuSE-Dom0-kernel]), > that should solve the reboot problems. IIRC somewhere on the > Hetzner-site they mention this, too. No reboot hangs/problems for me > after that.I will try acpi=off as soon as possible. I wonder, what the disadvantage are. The hypervisor will still regulate CPU frequency, will it not? Also, is the dom0 kernel doing something that it shouldn''t do? (maybe something that collides with the ACPI-related activities of the hypervisor, if there are any?) Regards, Sven _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Am 09.09.2011 15:45, schrieb Sven Köhler:> I wonder, what the disadvantage are. > The hypervisor will still regulate CPU frequency, will it not?No, it will not.> Also, is the dom0 kernel doing something that it shouldn''t do? > (maybe something that collides with the ACPI-related activities of the > hypervisor, if there are any?)I guess the BIOS is simply reporting broken ACPI tables to the operating system (the board is a "consumer" board, so you can guess that the manufacturer only tests the ACPI-tables for compatability with Windows). The ACPI tables (AFAIK, someone correct me) also contain a method for rebooting the system, which simply doesn''t work/is broken when Xen is involved. Forcing acpi=off means that the normal triple-fault or kbd-controller reset machinery is always used, as ACPI isn''t even initialized. What struck me as odd, though: you can configure Linux to use "some other" form of hard reset through a kernel parameter, but setting that to explicitly use triple-faults didn''t work, either (same hangs), so possibly it''s some form of additional interaction between Xen, the board and the hypervisor. Anyway, the Hetzner "recommended" fix is just what I sent you, and I can confirm that works. -- --- Heiko. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Am 09.09.2011 15:52, schrieb Heiko Wundram:> Am 09.09.2011 15:45, schrieb Sven Köhler: >> I wonder, what the disadvantage are. >> The hypervisor will still regulate CPU frequency, will it not? > > No, it will not.In xen 3.x, the hypervisor did the cpufreq-like CPU frequency switching. Has this changes in xen 4.x and the dom0 kernel is now responsible?>> Also, is the dom0 kernel doing something that it shouldn''t do? >> (maybe something that collides with the ACPI-related activities of the >> hypervisor, if there are any?) > > I guess the BIOS is simply reporting broken ACPI tables to the operating > system (the board is a "consumer" board, so you can guess that the > manufacturer only tests the ACPI-tables for compatability with Windows). > > The ACPI tables (AFAIK, someone correct me) also contain a method for > rebooting the system, which simply doesn''t work/is broken when Xen is > involved. Forcing acpi=off means that the normal triple-fault or > kbd-controller reset machinery is always used, as ACPI isn''t even > initialized. > > What struck me as odd, though: you can configure Linux to use "some > other" form of hard reset through a kernel parameter, but setting that > to explicitly use triple-faults didn''t work, either (same hangs), so > possibly it''s some form of additional interaction between Xen, the board > and the hypervisor. Anyway, the Hetzner "recommended" fix is just what I > sent you, and I can confirm that works.Thanks for the explanation. Here''s another thing: why does rebooting work, if xen is not involved, i.e. if the same kernel runs without xen? (I''m pretty sure this was true the last time I tried) I would assume, that broken ACPI tables would result in no reboot no matter what. Also, does the dom0 kernel do the reboot, or the hypervisor? In the past, there were some reboot/poweroff related patches for the xen part of the kernel. I assumed, that the dom0 kernel is not using the "normal" reboot/poweroff code and instead instructs the hypervisor reboot/poweroff the machine. On the other hand, all the patches that went into linux 3.0 which were aimed at making poweroff/reboot as similar to windows as possible sounded promising, but didn''t help in the Hetzner case :-( Regards, Sven _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Am 09.09.2011 15:31, schrieb Heiko Wundram:> Am 09.09.2011 14:55, schrieb Sven Köhler: >> <snip> > > Without knowing much of the previous discussion: is this related to > Hetzner-servers (from seeing the mainboard type, I can only guess that > it''s a machine from the "new" Hetzner-series)? > > If that''s the case, use: "acpi=off" on the Dom0-kernel commandline (I > use a Gentoo-adapted xen-sources-2.6.38 [rebased SuSE-Dom0-kernel]), > that should solve the reboot problems.In the wiki, I found the use of acpi=off in the xen command line. Well, I have tried acpi=off on the xen command line and/or the domß kernel command line. The kernel would refuse to boot (see the other thread I started). So to anyone who''s using upstream kernels: don''t even bother trying acpi=off Regards, Sven _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Am 09.09.2011 14:01, schrieb Andrew Cooper:>>> As for the problem itself, do you have C states enabled in the BIOS? >>> This sounds similar to several errata published for the i7 series. >> I''m not sure how to tell whether C states are disabled/enabled. >> What would those BIOS options typically be called? > > That is too bios dependent to say for sure, but typically "C states" or > "deep sleep", with some intel ones going for "C1e"I found two BIOS options for C1E and C2, C3, etc. I disabled both options. So C-states should have been disabled. However, the issue reoccured. So it''s not C-state related. Regards, Sven _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel