Hi guys, I have been liasing with Ian Campbell on the xen-user list and Ian suggested I should take this to xen-devel. The issue I am currently facing is as follows: I seem to be unable to powerdown my system under the XEN hypervisor by issuing the command shutdown -h now from my gentoo-hardened 3.9.5 dom0 machine. Instead of turning off power, it goes through a BIOS power-on sequence and reboots. There''s no other domU running at the time of the attempted powerdown. If I do the same using *exactly the same dom0 kernel* without XEN involved (i.e. boot from my gentoo-hardened 3.9.5 kernel only), powerdown reliably works as expected and ''shutdown -h now'' turns off the system''s power. I have tested both 4.2.2 (the gentoo e-build) and 4.3 (downloaded directly from xenbits). There''s no difference between those two versions in terms of reboot versus powerdown. Upon advise from Ian I have also experimented with the various reboot= options on the command line - also with no success. Adding a serial console (also thanks to Ian''s input) I was able to examine the messages during shutdown. I have attached those from 4.2.2 for your reference. For the latest test using Xen 4.2.2 I made a few changes to the source file xen/arch/x86/acpi/power.c to see where the actual problem may be hidden. I''m far from being a kernel or XEN programmer, but I am able to read and basically understand and (to an extent) modify C code. Supported by finding and identifying the messages I had on the serial console I decided to add a few additional printk statements after the last message that was displayed on the console to see where the system probably crashes / the problems could possibly start: After my change, the relevant code snippet now looks as follows (NOTE: The printk messages starting with "After" or "Before" stem from me, the first one and the one within the if-construct are both unchanged; the initial one was originally always displayed on the serial console as the final line): printk("Entering ACPI S%d state.\n", state); local_irq_save(flags); printk("After local_irq_save\n"); spin_debug_disable(); printk("After spin_debug_disable\n"); if ( (error = device_power_down()) ) { printk(XENLOG_ERR "Some devices failed to power down."); system_state = SYS_STATE_resume; goto done; } printk("Before ACPI_FLUSH_CPU_CACHE\n"); ACPI_FLUSH_CPU_CACHE(); printk("After ACPI_FLUSH_CPU_CACHE\n"); The final few messages of the *new* output *after my amateur mods* on the serial console now read as follows: (XEN) Entering ACPI S5 state. (XEN) After local_irq_save (XEN) After spin_debug_disable There is neither a message reading (XEN) Some devices failed to power down. (NOTE: this printk statement however has a XENLOG_ERR before the text - so I am not sure whether that should actually appear on the serial console at all) nor one reading (XEN) Before ACPI_FLUSH_CPU_CACHE This to me seems to indicate, that the problematic code is somewhere in between the following lines: if ( (error = device_power_down()) ) { printk(XENLOG_ERR "Some devices failed to power down."); system_state = SYS_STATE_resume; goto done; } I hope that might provide you with some insight which, with your help, could be used to make a step forward. On the other hand I might be completely on the wrong track as I have no clue where the actual requested power-down (or as is: reboot) actually happens. That was not obvious for me from the code in power.c without further knowledge ... Many thanks in advance for suggestions. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 14.08.13 at 10:48, Atom2 <ariel.atom2@web2web.at> wrote: > spin_debug_disable(); > printk("After spin_debug_disable\n"); > > if ( (error = device_power_down()) ) > { > printk(XENLOG_ERR "Some devices failed to power down."); > system_state = SYS_STATE_resume; > goto done; > } > > printk("Before ACPI_FLUSH_CPU_CACHE\n"); > ACPI_FLUSH_CPU_CACHE(); > printk("After ACPI_FLUSH_CPU_CACHE\n"); > > The final few messages of the *new* output *after my amateur mods* on > the serial console now read as follows: > (XEN) Entering ACPI S5 state. > (XEN) After local_irq_save > (XEN) After spin_debug_disable > > There is neither a message reading > (XEN) Some devices failed to power down. > > nor one reading > (XEN) Before ACPI_FLUSH_CPU_CACHERight, possibly because the first thing device_power_down() does is console_suspend(). Comment that out via your debug patch, and you might see more. It would be particularly interesting to know whether perhaps some of the ACPI registers live in memory space on that system - I already have a patch queued up (but not submitted yet) that fixes problems in that case. Jan
Hi Jan, thanks for reply. You are obviously right that the first thing device_power_down does, is console_suspend(). I don''t know why that escaped my eyes when I originally searched the file ... Anyways, I have now disabled console_suspend() and also added a few more lines to the code with printk statements indicating up to which point the system had gone (without errors). With hindsight I guess the new printk() statements might not have been required as now, with the console still active, a panic message pops up at the end, resulting in rebooting the system: (XEN) Disabling non-boot CPUs ... (XEN) Entering ACPI S5 state. (XEN) After local_irq_save (XEN) After spin_debug_disable (XEN) After time_suspend (XEN) After li8259_suspend (XEN) After ioapic_suspend (XEN) DMAR_IQA_REG = 80d87c002 (XEN) DMAR_IQH_REG = 120 (XEN) DMAR_IQT_REG = 140 (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) queue invalidate wait descriptor was not executed (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. NOTE: All the messages starting with "(XEN) After" are from my changes to the code; the rest is as is - except me commenting out console_suspend() in power.c. I hope that helps in resolving the issue and the panic is not just the result of a knock-on effect from commenting out console_suspend() earlier. Am 14.08.13 12:30, schrieb Jan Beulich: [...]> It would be particularly interesting to know whether perhaps > some of the ACPI registers live in memory space on that > system - I already have a patch queued up (but not submitted > yet) that fixes problems in that case.I don''t know how I can find out whether ACPI registers live in memory, but if you can tell me what I need to do to provide you with that information, I am more than happy to help on that issue. Many thanks
On 14/08/13 14:52, Atom2 wrote:> Hi Jan, > thanks for reply. You are obviously right that the first thing > device_power_down does, is console_suspend(). I don''t know why that > escaped my eyes when I originally searched the file ... > > Anyways, I have now disabled console_suspend() and also added a few > more lines to the code with printk statements indicating up to which > point the system had gone (without errors). With hindsight I guess the > new printk() statements might not have been required as now, with the > console still active, a panic message pops up at the end, resulting in > rebooting the system: > > (XEN) Disabling non-boot CPUs ... > (XEN) Entering ACPI S5 state. > (XEN) After local_irq_save > (XEN) After spin_debug_disable > (XEN) After time_suspend > (XEN) After li8259_suspend > (XEN) After ioapic_suspend > (XEN) DMAR_IQA_REG = 80d87c002 > (XEN) DMAR_IQH_REG = 120 > (XEN) DMAR_IQT_REG = 140 > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) queue invalidate wait descriptor was not executed > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. > > NOTE: All the messages starting with "(XEN) After" are from my changes > to the code; the rest is as is - except me commenting out > console_suspend() in power.c. I hope that helps in resolving the issue > and the panic is not just the result of a knock-on effect from > commenting out console_suspend() earlier.Huh - I thought I had fixed this issue already. Can you confirm exactly which version of Xen you are using (including changeset), and perhaps compile in this patch: diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c index 6a410d8..d023b26 100644 --- a/xen/drivers/passthrough/vtd/qinval.c +++ b/xen/drivers/passthrough/vtd/qinval.c @@ -220,6 +220,7 @@ static int queue_invalidate_wait(struct iommu *iommu, if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) ) { print_qi_regs(iommu); + WARN(); panic("queue invalidate wait descriptor was not executed\n"); } cpu_relax(); ~Andrew
Hi Andrew, please see my inline comments further below. And many thanks to all of you for your support so far! Am 14.08.13 16:00, schrieb Andrew Cooper:> On 14/08/13 14:52, Atom2 wrote: >> Hi Jan, >> thanks for reply. You are obviously right that the first thing >> device_power_down does, is console_suspend(). I don''t know why that >> escaped my eyes when I originally searched the file ... >> >> Anyways, I have now disabled console_suspend() and also added a few >> more lines to the code with printk statements indicating up to which >> point the system had gone (without errors). With hindsight I guess the >> new printk() statements might not have been required as now, with the >> console still active, a panic message pops up at the end, resulting in >> rebooting the system: >> >> (XEN) Disabling non-boot CPUs ... >> (XEN) Entering ACPI S5 state. >> (XEN) After local_irq_save >> (XEN) After spin_debug_disable >> (XEN) After time_suspend >> (XEN) After li8259_suspend >> (XEN) After ioapic_suspend >> (XEN) DMAR_IQA_REG = 80d87c002 >> (XEN) DMAR_IQH_REG = 120 >> (XEN) DMAR_IQT_REG = 140 >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) queue invalidate wait descriptor was not executed >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. >> >> NOTE: All the messages starting with "(XEN) After" are from my changes >> to the code; the rest is as is - except me commenting out >> console_suspend() in power.c. I hope that helps in resolving the issue >> and the panic is not just the result of a knock-on effect from >> commenting out console_suspend() earlier. > > Huh - I thought I had fixed this issue already. > > Can you confirm exactly which version of Xen you are using (including > changeset), and perhaps compile in this patch:The version I am using is 4.2.2 from the gentoo ebuild. Interesting enough, the log in the consoles states (XEN) Latest ChangeSet: unavailable I don''t know what to make out of this - or is there any other way to figure out, what you are after? The alternative would be to apply the WARN() to the 4.3.0 version I have downloaded yesterday from xenbits at http://www.xenproject.org/downloads/xen-archives/supported-xen-43-series/xen-430/287-xen-430-2/file.html (FYI: the reboot also happened there). If that helps, I''ll rerun it on the 4.3.0 version. So far I have used the gentoo version as this allows me to stay within the portage system.> > diff --git a/xen/drivers/passthrough/vtd/qinval.c > b/xen/drivers/passthrough/vtd/qinval.c > index 6a410d8..d023b26 100644 > --- a/xen/drivers/passthrough/vtd/qinval.c > +++ b/xen/drivers/passthrough/vtd/qinval.c > @@ -220,6 +220,7 @@ static int queue_invalidate_wait(struct iommu *iommu, > if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) ) > { > print_qi_regs(iommu); > + WARN(); > panic("queue invalidate wait descriptor was not > executed\n"); > } > cpu_relax();I have manually applied the patch - which was just an added WARN(); inbetween if I read the patch correctly; the rest was already there in 4.2.2 (and also 4.3.0 - I checked its source as well). I have attached the serial log from my 4.2.2 run to prevent line-wrap. I hope that helps.> > ~Andrew > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 14/08/13 18:00, Atom2 wrote:> Hi Andrew, > please see my inline comments further below. > > And many thanks to all of you for your support so far! > > Am 14.08.13 16:00, schrieb Andrew Cooper: >> On 14/08/13 14:52, Atom2 wrote: >>> Hi Jan, >>> thanks for reply. You are obviously right that the first thing >>> device_power_down does, is console_suspend(). I don''t know why that >>> escaped my eyes when I originally searched the file ... >>> >>> Anyways, I have now disabled console_suspend() and also added a few >>> more lines to the code with printk statements indicating up to which >>> point the system had gone (without errors). With hindsight I guess the >>> new printk() statements might not have been required as now, with the >>> console still active, a panic message pops up at the end, resulting in >>> rebooting the system: >>> >>> (XEN) Disabling non-boot CPUs ... >>> (XEN) Entering ACPI S5 state. >>> (XEN) After local_irq_save >>> (XEN) After spin_debug_disable >>> (XEN) After time_suspend >>> (XEN) After li8259_suspend >>> (XEN) After ioapic_suspend >>> (XEN) DMAR_IQA_REG = 80d87c002 >>> (XEN) DMAR_IQH_REG = 120 >>> (XEN) DMAR_IQT_REG = 140 >>> (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 0: >>> (XEN) queue invalidate wait descriptor was not executed >>> (XEN) **************************************** >>> (XEN) >>> (XEN) Reboot in five seconds... >>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. >>> >>> NOTE: All the messages starting with "(XEN) After" are from my changes >>> to the code; the rest is as is - except me commenting out >>> console_suspend() in power.c. I hope that helps in resolving the issue >>> and the panic is not just the result of a knock-on effect from >>> commenting out console_suspend() earlier. >> >> Huh - I thought I had fixed this issue already. >> >> Can you confirm exactly which version of Xen you are using (including >> changeset), and perhaps compile in this patch: > The version I am using is 4.2.2 from the gentoo ebuild. Interesting > enough, the log in the consoles states > (XEN) Latest ChangeSet: unavailable > I don''t know what to make out of this - or is there any other way to > figure out, what you are after?Ah - that good old gem. The old logic to detect changesets was very bad if the build wasn''t happening in a mercurial tree. It is better in 4.3.> > The alternative would be to apply the WARN() to the 4.3.0 version I > have downloaded yesterday from xenbits at > http://www.xenproject.org/downloads/xen-archives/supported-xen-43-series/xen-430/287-xen-430-2/file.html > (FYI: the reboot also happened there). If that helps, I''ll rerun it on > the 4.3.0 version. So far I have used the gentoo version as this > allows me to stay within the portage system.That''s fine - the bug is likely more damage from XSA-36>> >> diff --git a/xen/drivers/passthrough/vtd/qinval.c >> b/xen/drivers/passthrough/vtd/qinval.c >> index 6a410d8..d023b26 100644 >> --- a/xen/drivers/passthrough/vtd/qinval.c >> +++ b/xen/drivers/passthrough/vtd/qinval.c >> @@ -220,6 +220,7 @@ static int queue_invalidate_wait(struct iommu >> *iommu, >> if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) ) >> { >> print_qi_regs(iommu); >> + WARN(); >> panic("queue invalidate wait descriptor was not >> executed\n"); >> } >> cpu_relax(); > I have manually applied the patch - which was just an added > WARN(); > inbetween if I read the patch correctly; the rest was already there in > 4.2.2 (and also 4.3.0 - I checked its source as well). I have attached > the serial log from my 4.2.2 run to prevent line-wrap. I hope that helps.Do you have the boot time dmesg output? The problem here is that a queued_invalidate wait descriptor has been issued, and has not been completed within 1 second. In all previous cases I have debugged, this is actually because we already turned off the IOMMU, then tried to turn it off again. Could you perhaps try with this patch as well? diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 071a91b..45fff48 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -791,6 +791,9 @@ static void iommu_disable_translation(struct iommu *iommu) u32 sts; unsigned long flags; + printk("**Debug: Disabling translation for iommu %"PRId32"\n", iommu->index); + WARN(); + /* apply platform specific errata workarounds */ vtd_ops_preamble_quirk(iommu); ~Andrew
Hi Andrew Am 14.08.13 19:30, schrieb Andrew Cooper: [...]> Do you have the boot time dmesg output?Sure, please see the attached log file. Please note that this dmesg data comes from a system that I have booted from the standard XEN 4.2.2 kernel (i.e. w/o the WARN() statement and the various printk statements I have added; furthermore this was w/o the serial console being active)> > The problem here is that a queued_invalidate wait descriptor has been > issued, and has not been completed within 1 second. In all previous > cases I have debugged, this is actually because we already turned off > the IOMMU, then tried to turn it off again. > > Could you perhaps try with this patch as well? > > diff --git a/xen/drivers/passthrough/vtd/iommu.c > b/xen/drivers/passthrough/vtd/iommu.c > index 071a91b..45fff48 100644 > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -791,6 +791,9 @@ static void iommu_disable_translation(struct iommu > *iommu) > u32 sts; > unsigned long flags; > > + printk("**Debug: Disabling translation for iommu %"PRId32"\n", > iommu->index); > + WARN(); > + > /* apply platform specific errata workarounds */ > vtd_ops_preamble_quirk(iommu);I''ll get back to you in due course with the log output after I have applied the patch and restarted & then shutdown XEN again.> > > ~Andrew >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Hi again Andrew, interestingly enough, the new WARN() did not trigger. The output on the serial console only grossly lists the same information from the WARN() in qinval.c:223 we have already seen before. In any case, please find the (this time: complete) new log attached to this mail. BTW I have double-checked, whether I have copied the correct file to /boot by grepping for your "**Debug: ..." message and it was found in the (uncompressed) xen-test.gz file which is started from grub. Many thanks again. Am 14.08.13 20:40, schrieb Atom2:> Hi Andrew > > Am 14.08.13 19:30, schrieb Andrew Cooper: > [...] >> Do you have the boot time dmesg output? > Sure, please see the attached log file. Please note that this dmesg data > comes from a system that I have booted from the standard XEN 4.2.2 > kernel (i.e. w/o the WARN() statement and the various printk statements > I have added; furthermore this was w/o the serial console being active) > >> >> The problem here is that a queued_invalidate wait descriptor has been >> issued, and has not been completed within 1 second. In all previous >> cases I have debugged, this is actually because we already turned off >> the IOMMU, then tried to turn it off again. >> >> Could you perhaps try with this patch as well? >> >> diff --git a/xen/drivers/passthrough/vtd/iommu.c >> b/xen/drivers/passthrough/vtd/iommu.c >> index 071a91b..45fff48 100644 >> --- a/xen/drivers/passthrough/vtd/iommu.c >> +++ b/xen/drivers/passthrough/vtd/iommu.c >> @@ -791,6 +791,9 @@ static void iommu_disable_translation(struct iommu >> *iommu) >> u32 sts; >> unsigned long flags; >> >> + printk("**Debug: Disabling translation for iommu %"PRId32"\n", >> iommu->index); >> + WARN(); >> + >> /* apply platform specific errata workarounds */ >> vtd_ops_preamble_quirk(iommu); > I''ll get back to you in due course with the log output after I have > applied the patch and restarted & then shutdown XEN again. > >> >> >> ~Andrew >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 14/08/13 20:10, Atom2 wrote:> Hi again Andrew, > interestingly enough, the new WARN() did not trigger. > > The output on the serial console only grossly lists the same > information from the WARN() in qinval.c:223 we have already seen before. > > In any case, please find the (this time: complete) new log attached to > this mail. > > BTW I have double-checked, whether I have copied the correct file to > /boot by grepping for your "**Debug: ..." message and it was found in > the (uncompressed) xen-test.gz file which is started from grub. > > Many thanks again.Hmm - curious. That would indicate that the timeout occurs during the very first attempt to shut down the iommu Can you grab the full log with "iommu=1,debug,verbose" ? Can you also please attach "lspci -vv" and "lspci -tv" from dom0 ~Andrew
Hi Andrew, all requested information is attached, please see notes below. My ongoing & continued thanks anyway ... Am 14.08.13 21:18, schrieb Andrew Cooper: [...]> Hmm - curious. That would indicate that the timeout occurs during the > very first attempt to shut down the iommu > > Can you grab the full log with "iommu=1,debug,verbose" ?Sure, please see attachment "XEN console (debug, verbose)"> > Can you also please attach "lspci -vv" and "lspci -tv" from dom0No problem - please see attachements "lspci-vv" and "lspci-tv"> > ~Andrew >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 14/08/13 20:39, Atom2 wrote:> Hi Andrew, > all requested information is attached, please see notes below. > > My ongoing & continued thanks anyway ... > > Am 14.08.13 21:18, schrieb Andrew Cooper: > [...] >> Hmm - curious. That would indicate that the timeout occurs during the >> very first attempt to shut down the iommu >> >> Can you grab the full log with "iommu=1,debug,verbose" ? > Sure, please see attachment "XEN console (debug, verbose)" >> >> Can you also please attach "lspci -vv" and "lspci -tv" from dom0 > No problem - please see attachements "lspci-vv" and "lspci-tv" >> >> ~Andrew >>Ok thanks. Do you mind confirming whether S5 works with "iommu=off" on the Xen command line? ~Andrew
Am 14.08.13 22:18, schrieb Andrew Cooper: [...]> > Ok thanks. > > Do you mind confirming whether S5 works with "iommu=off" on the Xen > command line? > > ~Andrew >Yes, that works: The system powers off after issuing shutdown -h now from the dom0.
And just to provide full information, attached please find the serial log file when iommu=off is set on the XEN command line via grub. This obviously (and expectedly) turns off I/O virtualization. Am 14.08.13 22:24, schrieb Atom2:> Am 14.08.13 22:18, schrieb Andrew Cooper: > [...] >> >> Ok thanks. >> >> Do you mind confirming whether S5 works with "iommu=off" on the Xen >> command line? >> >> ~Andrew >> > Yes, that works: The system powers off after issuing > shutdown -h now > from the dom0. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
This hang looks exactly like this one that I posted about in June: http://www.gossamer-threads.com/lists/xen/devel/284943?do=post_view_threaded#284943 This ended up being an interaction with the i915 driver in the kernel, and pvops use of PAT. If you revert the following linux commit: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a and then apply: https://lkml.org/lkml/2012/2/10/229 You may get good results. It helped me with a similar problem, at least. Ben On Wed, Aug 14, 2013 at 4:24 PM, Atom2 <ariel.atom2@web2web.at> wrote:> Am 14.08.13 22:18, schrieb Andrew Cooper: > [...] > >> >> Ok thanks. >> >> Do you mind confirming whether S5 works with "iommu=off" on the Xen >> command line? >> >> ~Andrew >> >> Yes, that works: The system powers off after issuing > shutdown -h now > from the dom0. > > > ______________________________**_________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, Aug 14, 2013 at 04:34:21PM -0400, Ben Guthro wrote:> This hang looks exactly like this one that I posted about in June: > http://www.gossamer-threads.com/lists/xen/devel/284943?do=post_view_threaded#284943 > > This ended up being an interaction with the i915 driver in the kernel, and > pvops use of PAT. > > If you revert the following linux commit: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c79c49826270b8b0061b2fca840fc3f013c8a78aAnd probably also need to revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1 ?> > and then apply: > https://lkml.org/lkml/2012/2/10/229 > > You may get good results. > It helped me with a similar problem, at least. > > Ben > > > > On Wed, Aug 14, 2013 at 4:24 PM, Atom2 <ariel.atom2@web2web.at> wrote: > > > Am 14.08.13 22:18, schrieb Andrew Cooper: > > [...] > > > >> > >> Ok thanks. > >> > >> Do you mind confirming whether S5 works with "iommu=off" on the Xen > >> command line? > >> > >> ~Andrew > >> > >> Yes, that works: The system powers off after issuing > > shutdown -h now > > from the dom0. > > > > > > ______________________________**_________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > >> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 14/08/13 21:24, Atom2 wrote:> Am 14.08.13 22:18, schrieb Andrew Cooper: > [...] >> >> Ok thanks. >> >> Do you mind confirming whether S5 works with "iommu=off" on the Xen >> command line? >> >> ~Andrew >> > Yes, that works: The system powers off after issuing > shutdown -h now > from the dom0.So it is certainly an iommu interaction issue, which was sadly suspected given that we have seen similar problems in the past. Curiously, there are two IOMMUs on the system. I am not familiar enough with Cougar Point chipsets to know how they are layed out. Perhaps the ACPI tables might have more information. ACPI interaction with Xen is tricky at the best of times. Xen as no AML interpreter, so relies on Linux in dom0 to most of the ACPI legwork. My best guess at the moment is that something in the ACPI code for S5 is turning off enough of the PCH that Xen can no longer talk to the one of the IOMMUs. ~Andrew
Andrew, that does not sound too promising at the moment. Is there anything else I could provide to come to a resolution given that I/O virtualization is what the system is supposed to do. You mention ACPI tables - I have no clue how to provide those, but I am more than happy to do what I can? Ian (Campbell) originally diverted me to Jan (Beulich) who seemed to have an idea before you thankfully jumped in. Jan''s idea seemed to also revolve around ACPI - although not tables, but registers: quote from Jan Beulich: > It would be particularly interesting to know whether perhaps > some of the ACPI registers live in memory space on that > system - I already have a patch queued up (but not submitted > yet) that fixes problems in that case. @Jan: would it make sense to go down that route? Thanks again to everybody for their help so far. Am 14.08.13 22:38, schrieb Andrew Cooper:> On 14/08/13 21:24, Atom2 wrote: >> Am 14.08.13 22:18, schrieb Andrew Cooper: >> [...] >>> >>> Ok thanks. >>> >>> Do you mind confirming whether S5 works with "iommu=off" on the Xen >>> command line? >>> >>> ~Andrew >>> >> Yes, that works: The system powers off after issuing >> shutdown -h now >> from the dom0. > > So it is certainly an iommu interaction issue, which was sadly suspected > given that we have seen similar problems in the past. > > Curiously, there are two IOMMUs on the system. I am not familiar enough > with Cougar Point chipsets to know how they are layed out. Perhaps the > ACPI tables might have more information.I''m just curios, but where did you see two IOMMUs?> > ACPI interaction with Xen is tricky at the best of times. Xen as no AML > interpreter, so relies on Linux in dom0 to most of the ACPI legwork. > > My best guess at the moment is that something in the ACPI code for S5 is > turning off enough of the PCH that Xen can no longer talk to the one of > the IOMMUs. > > ~Andrew >
On 14/08/13 21:54, Atom2 wrote:> Andrew, > that does not sound too promising at the moment. Is there anything > else I could provide to come to a resolution given that I/O > virtualization is what the system is supposed to do. > > You mention ACPI tables - I have no clue how to provide those, but I > am more than happy to do what I can? > > Ian (Campbell) originally diverted me to Jan (Beulich) who seemed to > have an idea before you thankfully jumped in. Jan''s idea seemed to > also revolve around ACPI - although not tables, but registers: > > quote from Jan Beulich: > > It would be particularly interesting to know whether perhaps > > some of the ACPI registers live in memory space on that > > system - I already have a patch queued up (but not submitted > > yet) that fixes problems in that case. > > @Jan: would it make sense to go down that route? > > Thanks again to everybody for their help so far.I would first start with the suggestion from Ben & Konrad, especially as this smells the same in terms of breakage. If that fails, you can get at the acpi tables using `acpidump` which looks to live in the pmtools package in gentoo. ~Andrew> > > Am 14.08.13 22:38, schrieb Andrew Cooper: >> On 14/08/13 21:24, Atom2 wrote: >>> Am 14.08.13 22:18, schrieb Andrew Cooper: >>> [...] >>>> >>>> Ok thanks. >>>> >>>> Do you mind confirming whether S5 works with "iommu=off" on the Xen >>>> command line? >>>> >>>> ~Andrew >>>> >>> Yes, that works: The system powers off after issuing >>> shutdown -h now >>> from the dom0. >> >> So it is certainly an iommu interaction issue, which was sadly suspected >> given that we have seen similar problems in the past. >> >> Curiously, there are two IOMMUs on the system. I am not familiar enough >> with Cougar Point chipsets to know how they are layed out. Perhaps the >> ACPI tables might have more information. > I''m just curios, but where did you see two IOMMUs?(XEN) PCI: Using MCFG for segment 0000 bus 00-3f (XEN) Intel VT-d iommu 0 supported page sizes: 4kB. (XEN) Intel VT-d iommu 1 supported page sizes: 4kB. (XEN) Intel VT-d Snoop Control not enabled. (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. (XEN) Intel VT-d Queued Invalidation enabled. (XEN) Intel VT-d Interrupt Remapping enabled. (XEN) Intel VT-d Shared EPT tables not enabled. (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed I added the iommu index into that loop because we found multi-socket servers with multiple iommus, but I was not expecting to see two iommus on a single socket workstation chipset.>> >> ACPI interaction with Xen is tricky at the best of times. Xen as no AML >> interpreter, so relies on Linux in dom0 to most of the ACPI legwork. >> >> My best guess at the moment is that something in the ACPI code for S5 is >> turning off enough of the PCH that Xen can no longer talk to the one of >> the IOMMUs. >> >> ~Andrew >>
Hi Ben & Konrad, thanks for your suggestions. I have to admit, I don''t fully understand the commit system yet and don''t know how to apply that to my gentoo sources. But I read through the thread that Ben has posted and thought, given that Ben had linked the issue to the Intel i915 driver, I''ll try something that crossed my mind: lspci lists the Intel IGD as follows: 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 Processor Family Integrated Graphics Controller (rev 09) So my thought process was if the IGD is to blame, I''ll just exclude that from the dom0 by adding it to the xen-pciback.hide list on the module line of grub. This line now reads as follows (NOTE: all other devices except the first one (00:02.0) have been there already before my latest try): module /boot/gentoo-3.9.5-hardened-xen rootfstype=ext4 root=/dev/sda6 \ xen-pciback.hide=(00:02.0)(00:1b.0)(02:00.0)(03:00.0)(04:00.0)(05:00.0)(06:00.0)(09:00.0)(09:02.0)(0a:08.0)(0a:09.0)(0a:0a.0 )(0a:0b.0) Clearly this meant that after the dom0 (gentoo hardened) was started, there was no more console output available. So I had to use ssh to login. But once I started a (remote) ''shutdown -h now'' from the login bash and ... the system actually powerd off. I guess this really proves that the (in-kernel) i915-driver which, given that the IGD is no longer available, could not have been active in the dom0, is to blame for the powerdown issue. I am not sure whether that brings us any further in resolving the issue, but it might make sense to involve someone responsibel for the Intel IGD driver? Also I could not find much documentation for the i915.modeset kernel parameter- probably there''s something in there that could help? What are your thoughts on this. And as always thanks for your input so far and also thanks in advance for your continued help and ideas. Am 14.08.13 22:37, schrieb Konrad Rzeszutek Wilk:> On Wed, Aug 14, 2013 at 04:34:21PM -0400, Ben Guthro wrote: >> This hang looks exactly like this one that I posted about in June: >> http://www.gossamer-threads.com/lists/xen/devel/284943?do=post_view_threaded#284943 >> >> This ended up being an interaction with the i915 driver in the kernel, and >> pvops use of PAT. >> >> If you revert the following linux commit: >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a > > And probably also need to revert > 8eaffa67b43e99ae581622c5133e20b0f48bcef1 > ? >> >> and then apply: >> https://lkml.org/lkml/2012/2/10/229 >> >> You may get good results. >> It helped me with a similar problem, at least. >> >> Ben >> >> >> >> On Wed, Aug 14, 2013 at 4:24 PM, Atom2 <ariel.atom2@web2web.at> wrote: >> >>> Am 14.08.13 22:18, schrieb Andrew Cooper: >>> [...] >>> >>>> >>>> Ok thanks. >>>> >>>> Do you mind confirming whether S5 works with "iommu=off" on the Xen >>>> command line? >>>> >>>> ~Andrew >>>> >>>> Yes, that works: The system powers off after issuing >>> shutdown -h now >>> from the dom0. >>> >>> >>> ______________________________**_________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> http://lists.xen.org/xen-devel >>> > >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Hi, I admit, I don''t know how the gentoo build system works, but the general idea here is that you want to revert those 2 commits, and apply the third. If you don''t have a git tree, you can download the two commits from these two links http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=8eaffa67b43e99ae581622c5133e20b0f48bcef1 You''ll want to apply them in reverse patch -p1 -R < c79c498.patch patch -p1 -R < 8eaffa67.patch Then apply the patch from https://lkml.org/lkml/2012/2/10/229 I hope this helps. Ben On Wed, Aug 14, 2013 at 5:56 PM, Atom2 <ariel.atom2@web2web.at> wrote:> Hi Ben & Konrad, > thanks for your suggestions. I have to admit, I don''t fully understand the > commit system yet and don''t know how to apply that to my gentoo sources. > But I read through the thread that Ben has posted and thought, given that > Ben had linked the issue to the Intel i915 driver, I''ll try something that > crossed my mind: > > lspci lists the Intel IGD as follows: > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 > Processor Family Integrated Graphics Controller (rev 09) > > So my thought process was if the IGD is to blame, I''ll just exclude that > from the dom0 by adding it to the xen-pciback.hide list on the module line > of grub. This line now reads as follows (NOTE: all other devices except the > first one (00:02.0) have been there already before my latest try): > > module /boot/gentoo-3.9.5-hardened-**xen rootfstype=ext4 root=/dev/sda6 \ > > xen-pciback.hide=(00:02.0)(00:**1b.0)(02:00.0)(03:00.0)(04:00.** > 0)(05:00.0)(06:00.0)(09:00.0)(**09:02.0)(0a:08.0)(0a:09.0)(0a:**0a.0 > )(0a:0b.0) > > Clearly this meant that after the dom0 (gentoo hardened) was started, > there was no more console output available. So I had to use ssh to login. > But once I started a (remote) ''shutdown -h now'' from the login bash and ... > the system actually powerd off. > > I guess this really proves that the (in-kernel) i915-driver which, given > that the IGD is no longer available, could not have been active in the > dom0, is to blame for the powerdown issue. > > I am not sure whether that brings us any further in resolving the issue, > but it might make sense to involve someone responsibel for the Intel IGD > driver? Also I could not find much documentation for the i915.modeset > kernel parameter- probably there''s something in there that could help? > > What are your thoughts on this. > > And as always thanks for your input so far and also thanks in advance for > your continued help and ideas. > > Am 14.08.13 22:37, schrieb Konrad Rzeszutek Wilk: > > On Wed, Aug 14, 2013 at 04:34:21PM -0400, Ben Guthro wrote: >> >>> This hang looks exactly like this one that I posted about in June: >>> http://www.gossamer-threads.**com/lists/xen/devel/284943?do=** >>> post_view_threaded#284943<http://www.gossamer-threads.com/lists/xen/devel/284943?do=post_view_threaded#284943> >>> >>> This ended up being an interaction with the i915 driver in the kernel, >>> and >>> pvops use of PAT. >>> >>> If you revert the following linux commit: >>> http://git.kernel.org/cgit/**linux/kernel/git/torvalds/** >>> linux.git/commit/?id=**c79c49826270b8b0061b2fca840fc3**f013c8a78a<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a> >>> >> >> And probably also need to revert >> 8eaffa67b43e99ae581622c5133e20**b0f48bcef1 >> ? >> >>> >>> and then apply: >>> https://lkml.org/lkml/2012/2/**10/229<https://lkml.org/lkml/2012/2/10/229> >>> >>> You may get good results. >>> It helped me with a similar problem, at least. >>> >>> Ben >>> >>> >>> >>> On Wed, Aug 14, 2013 at 4:24 PM, Atom2 <ariel.atom2@web2web.at> wrote: >>> >>> Am 14.08.13 22:18, schrieb Andrew Cooper: >>>> [...] >>>> >>>> >>>>> Ok thanks. >>>>> >>>>> Do you mind confirming whether S5 works with "iommu=off" on the Xen >>>>> command line? >>>>> >>>>> ~Andrew >>>>> >>>>> Yes, that works: The system powers off after issuing >>>>> >>>> shutdown -h now >>>> from the dom0. >>>> >>>> >>>> ______________________________****_________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xen.org >>>> http://lists.xen.org/xen-devel >>>> >>>> >> ______________________________**_________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> http://lists.xen.org/xen-devel >>> >> >> >> ______________________________**_________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 14.08.13 at 22:54, Atom2 <ariel.atom2@web2web.at> wrote: > Ian (Campbell) originally diverted me to Jan (Beulich) who seemed to > have an idea before you thankfully jumped in. Jan''s idea seemed to also > revolve around ACPI - although not tables, but registers: > > quote from Jan Beulich: > > It would be particularly interesting to know whether perhaps > > some of the ACPI registers live in memory space on that > > system - I already have a patch queued up (but not submitted > > yet) that fixes problems in that case. > > @Jan: would it make sense to go down that route?No, that''s clearly not an issue here, given the more complete logs you sent meanwhile. Jan
Jan, many thanks for that clarification. Am 15.08.13 10:12, schrieb Jan Beulich:>>>> On 14.08.13 at 22:54, Atom2 <ariel.atom2@web2web.at> wrote: >> Ian (Campbell) originally diverted me to Jan (Beulich) who seemed to >> have an idea before you thankfully jumped in. Jan''s idea seemed to also >> revolve around ACPI - although not tables, but registers: >> >> quote from Jan Beulich: >> > It would be particularly interesting to know whether perhaps >> > some of the ACPI registers live in memory space on that >> > system - I already have a patch queued up (but not submitted >> > yet) that fixes problems in that case. >> >> @Jan: would it make sense to go down that route? > > No, that''s clearly not an issue here, given the more complete > logs you sent meanwhile. > > Jan >
On Wed, Aug 14, 2013 at 11:56:10PM +0200, Atom2 wrote:> Hi Ben & Konrad, > thanks for your suggestions. I have to admit, I don''t fully > understand the commit system yet and don''t know how to apply that to > my gentoo sources. But I read through the thread that Ben has posted > and thought, given that Ben had linked the issue to the Intel i915 > driver, I''ll try something that crossed my mind: > > lspci lists the Intel IGD as follows: > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 > Processor Family Integrated Graphics Controller (rev 09) > > So my thought process was if the IGD is to blame, I''ll just exclude > that from the dom0 by adding it to the xen-pciback.hide list on the > module line of grub. This line now reads as follows (NOTE: all other > devices except the first one (00:02.0) have been there already > before my latest try): > > module /boot/gentoo-3.9.5-hardened-xen rootfstype=ext4 root=/dev/sda6 \ > > xen-pciback.hide=(00:02.0)(00:1b.0)(02:00.0)(03:00.0)(04:00.0)(05:00.0)(06:00.0)(09:00.0)(09:02.0)(0a:08.0)(0a:09.0)(0a:0a.0 > )(0a:0b.0) > > Clearly this meant that after the dom0 (gentoo hardened) was > started, there was no more console output available. So I had to use > ssh to login. But once I started a (remote) ''shutdown -h now'' from > the login bash and ... the system actually powerd off. > > I guess this really proves that the (in-kernel) i915-driver which, > given that the IGD is no longer available, could not have been > active in the dom0, is to blame for the powerdown issue.Perhaps. If this theory is correct then if you boot baremetal (so no Xen) and with ''pat=0'' on the command line you should see a similar issue. That is theory. But it would be really really fantastic if we could confirm that indeed having those three patches (well, two reverts and one patch) indeed fix the issue - without you resorting to disabling it altogether.> > I am not sure whether that brings us any further in resolving the > issue, but it might make sense to involve someone responsibel for > the Intel IGD driver? Also I could not find much documentation for > the i915.modeset kernel parameter- probably there''s something in > there that could help?If you have it set to zero it will make X not work. In effect it should have the same behavior as if you used the xen-pciback.hide=..> > What are your thoughts on this.Lets make sure that the patches work for you. If they do then you don''t have to use the work-around (disabling either iommu or not using IGD).> > And as always thanks for your input so far and also thanks in > advance for your continued help and ideas. > > Am 14.08.13 22:37, schrieb Konrad Rzeszutek Wilk: > >On Wed, Aug 14, 2013 at 04:34:21PM -0400, Ben Guthro wrote: > >>This hang looks exactly like this one that I posted about in June: > >>http://www.gossamer-threads.com/lists/xen/devel/284943?do=post_view_threaded#284943 > >> > >>This ended up being an interaction with the i915 driver in the kernel, and > >>pvops use of PAT. > >> > >>If you revert the following linux commit: > >>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a > > > >And probably also need to revert > >8eaffa67b43e99ae581622c5133e20b0f48bcef1 > >? > >> > >>and then apply: > >>https://lkml.org/lkml/2012/2/10/229 > >> > >>You may get good results. > >>It helped me with a similar problem, at least. > >> > >>Ben > >> > >> > >> > >>On Wed, Aug 14, 2013 at 4:24 PM, Atom2 <ariel.atom2@web2web.at> wrote: > >> > >>>Am 14.08.13 22:18, schrieb Andrew Cooper: > >>>[...] > >>> > >>>> > >>>>Ok thanks. > >>>> > >>>>Do you mind confirming whether S5 works with "iommu=off" on the Xen > >>>>command line? > >>>> > >>>>~Andrew > >>>> > >>>> Yes, that works: The system powers off after issuing > >>> shutdown -h now > >>>from the dom0. > >>> > >>> > >>>______________________________**_________________ > >>>Xen-devel mailing list > >>>Xen-devel@lists.xen.org > >>>http://lists.xen.org/xen-devel > >>> > > > >>_______________________________________________ > >>Xen-devel mailing list > >>Xen-devel@lists.xen.org > >>http://lists.xen.org/xen-devel > > > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xen.org > >http://lists.xen.org/xen-devel > >
Hi guys, thanks for your further input. Following through Ben''s mail below and Konrad''s later mail suggesting the same, I tried to get these patches in. I''d however require your help before I feel I can safely proceed. Please see below: Am 15.08.13 03:58, schrieb Ben Guthro: [...]> I admit, I don''t know how the gentoo build system works, but the general > idea here is that you want to revert those 2 commits, and apply the third. > > If you don''t have a git tree, you can download the two commits from > these two links > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=8eaffa67b43e99ae581622c5133e20b0f48bcef1 > > You''ll want to apply them in reverseAfter consultation with the manual I decided to give it a dry-run before and check with you guys first. First of all, I assume I''m righht that this is a patch to the *linux kernel* and not the xen-sources as I could not find the referenced files in the xen tree.> patch -p1 -R < c79c498.patchvm-host # patch --dry-run -p1 -R < c79c498.patch patching file arch/x86/xen/enlighten.c Hunk #2 succeeded at 1431 (offset 14 lines). I am slightly worried about the last message, not so much about the offset, but rather only the "Hunk #2" success. Why is there no "Hunk #1" when there''s a "Hunk #2"?> patch -p1 -R < 8eaffa67.patchvm-host # patch --dry-run -p1 -R < 8eaffa67.patch patching file arch/x86/xen/enlighten.c Hunk #1 succeeded at 1367 (offset 226 lines). patching file arch/x86/xen/mmu.c Hunk #1 succeeded at 434 (offset 19 lines). Hunk #2 succeeded at 482 (offset 19 lines). Hunk #3 succeeded at 495 (offset 19 lines). That seems to be o.k. from my understanding?> > Then apply the patch from > https://lkml.org/lkml/2012/2/10/229For this patch I copied the complete text from the https address above and copied it to a file named 229.patch. Then I issued the following command: vm-host # patch --dry-run -p1 -R < 229.patch patching file arch/x86/include/asm/pgtable.h Unreversed patch detected! Ignore -R? [n] I am not sure what to make out of this? Could you please provide some input. Thanks and sorry for those probably dumb questions. I''m new to this (automated) patching thing, and with a little help, the first time usually works out well.
On Thu, Aug 15, 2013 at 09:28:24PM +0200, Atom2 wrote:> Hi guys, > thanks for your further input. > > Following through Ben''s mail below and Konrad''s later mail > suggesting the same, I tried to get these patches in. I''d however > require your help before I feel I can safely proceed. > > Please see below: > > Am 15.08.13 03:58, schrieb Ben Guthro: > [...] > >I admit, I don''t know how the gentoo build system works, but the general > >idea here is that you want to revert those 2 commits, and apply the third. > > > >If you don''t have a git tree, you can download the two commits from > >these two links > >http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a > >http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=8eaffa67b43e99ae581622c5133e20b0f48bcef1 > > > >You''ll want to apply them in reverse > After consultation with the manual I decided to give it a dry-run > before and check with you guys first. First of all, I assume I''m > righht that this is a patch to the *linux kernel* and not the > xen-sources as I could not find the referenced files in the xen > tree.Right. You also need to compile the kernel. Usually I pluck the /boot/config-my-exisitng-kernel-vresion and put it in the linux directory as .config.> > >patch -p1 -R < c79c498.patch > vm-host # patch --dry-run -p1 -R < c79c498.patch > patching file arch/x86/xen/enlighten.c > Hunk #2 succeeded at 1431 (offset 14 lines). > > I am slightly worried about the last message, not so much about the > offset, but rather only the "Hunk #2" success. Why is there no "Hunk > #1" when there''s a "Hunk #2"? > > >patch -p1 -R < 8eaffa67.patch > vm-host # patch --dry-run -p1 -R < 8eaffa67.patch > patching file arch/x86/xen/enlighten.c > Hunk #1 succeeded at 1367 (offset 226 lines). > patching file arch/x86/xen/mmu.c > Hunk #1 succeeded at 434 (offset 19 lines). > Hunk #2 succeeded at 482 (offset 19 lines). > Hunk #3 succeeded at 495 (offset 19 lines). > > That seems to be o.k. from my understanding? > > > >Then apply the patch from > >https://lkml.org/lkml/2012/2/10/229 > For this patch I copied the complete text from the https address > above and copied it to a file named 229.patch. Then I issued the > following command: > vm-host # patch --dry-run -p1 -R < 229.patch > patching file arch/x86/include/asm/pgtable.h > Unreversed patch detected! Ignore -R? [n]Note that you had been using --dry-run which means that the changes did NOT go in effect.> > I am not sure what to make out of this? Could you please provide some input.Attaching the full part thanks to Martin Cerveny <martin@c-home.cz> doing it in another thread (about the Nvidia and CUDA). You basically want those changes that the diff file has. After the patching, if you run git diff you should see a similar result to what the attached patch had. Then just do ''make -j3141567891238901948248092840932480932; sudo make modules_install; sudo make install;sudo grub2-mkconfig -o /boot/grub2/grub.cfg'' and reboot the new kernel.> Thanks and sorry for those probably dumb questions. I''m new to this > (automated) patching thing, and with a little help, the first time > usually works out well.P.S. No need to do the -j31415.. It should be just the amount of CPUs you have.>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Hi guys, the good news is that the latest patched kernel now powers down my machine as expected. Thanks for all your input and help. The console is still working and there''s no need to hide the Intel IGD from dom0 to get proper powerdown. I am however still unclear what this may mean further down the line? Are those patches someting I have to manually apply for every new kernel release that I''m going to install? Or would those patches be something that makes it into the upstream kernel sources to then again drip downstream allowing me to use my distribution''s sources (gentoo in my case) without change in the future? Are there any negative knock-on effects / reduced functionalities to be expected from the patches I have applied? Being kernel patches I assume they at least should have no effect on upgrading from XEN 4.2.2 to newer versions in the future. Also just for reference in case someone else faces the same problem and stumbles across this thread please find a few comments / clarifications below to the questions I had raised and to Konrad''s subsequent answers. Am 15.08.13 22:26, schrieb Konrad Rzeszutek Wilk:> On Thu, Aug 15, 2013 at 09:28:24PM +0200, Atom2 wrote: >> Hi guys, >> thanks for your further input. >> >> Following through Ben''s mail below and Konrad''s later mail >> suggesting the same, I tried to get these patches in. I''d however >> require your help before I feel I can safely proceed. >> >> Please see below: >> >> Am 15.08.13 03:58, schrieb Ben Guthro: >> [...] >>> I admit, I don''t know how the gentoo build system works, but the general >>> idea here is that you want to revert those 2 commits, and apply the third. >>> >>> If you don''t have a git tree, you can download the two commits from >>> these two links >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=8eaffa67b43e99ae581622c5133e20b0f48bcef1 >>> >>> You''ll want to apply them in reverse >> After consultation with the manual I decided to give it a dry-run >> before and check with you guys first. First of all, I assume I''m >> righht that this is a patch to the *linux kernel* and not the >> xen-sources as I could not find the referenced files in the xen >> tree. > > Right. You also need to compile the kernel. Usually I pluck the > /boot/config-my-exisitng-kernel-vresion and put it in the linux > directory as .config.Extracting .config from a kernel image requires the kernel configuration option CONFIG_IKCONFIG. One can then either extract the .config through scripts/extract-ikconfig (located under the linux directory) or if additionally CONFIG_IKCONFIG_PROC is configured, by accessing /proc/config.gz. In my case (and most likely for all gentoo users) it was even easier as I had originally built the running kernel myself and .config was readily available in the right directory anyways.> >> >>> patch -p1 -R < c79c498.patch >> vm-host # patch --dry-run -p1 -R < c79c498.patch >> patching file arch/x86/xen/enlighten.c >> Hunk #2 succeeded at 1431 (offset 14 lines). >> >> I am slightly worried about the last message, not so much about the >> offset, but rather only the "Hunk #2" success. Why is there no "Hunk >> #1" when there''s a "Hunk #2"?That "Hunk #2" message seems to be harmless as a check of my patched sources against Konrad''s diff attachement suggests. Still don''t know where it comes from though.>> >>> patch -p1 -R < 8eaffa67.patch >> vm-host # patch --dry-run -p1 -R < 8eaffa67.patch >> patching file arch/x86/xen/enlighten.c >> Hunk #1 succeeded at 1367 (offset 226 lines). >> patching file arch/x86/xen/mmu.c >> Hunk #1 succeeded at 434 (offset 19 lines). >> Hunk #2 succeeded at 482 (offset 19 lines). >> Hunk #3 succeeded at 495 (offset 19 lines). >> >> That seems to be o.k. from my understanding?A check against Konrad''s diff attachment after running the final patch command again confirmed everything o.k.>>> >>> Then apply the patch from >>> https://lkml.org/lkml/2012/2/10/229 >> For this patch I copied the complete text from the https address >> above and copied it to a file named 229.patch. Then I issued the >> following command: >> vm-host # patch --dry-run -p1 -R < 229.patch >> patching file arch/x86/include/asm/pgtable.h >> Unreversed patch detected! Ignore -R? [n] > > Note that you had been using --dry-run which means that the changes > did NOT go in effect. >> >> I am not sure what to make out of this? Could you please provide some input.The issue was not the --dry-run (which I was aware of), but rather the -R option. This patch does not need to be *reversed* (the -R), but rather *applied* (as Ben had already suggested in his e-Mail). And that was what the message actually meant ... I have also added a -b option to all patch commands (and clearly removed the --dry-run option for all patches) to create a backup copy just in case ...> > Attaching the full part thanks to Martin Cerveny <martin@c-home.cz> > doing it in another thread (about the Nvidia and CUDA). > > You basically want those changes that the diff file has. > > After the patching, if you run git diff you should see a similar > result to what the attached patch had. > > Then just do ''make -j3141567891238901948248092840932480932; sudo make modules_install; sudo make > install;sudo grub2-mkconfig -o /boot/grub2/grub.cfg'' and reboot the new > kernel.I had to do this slightly differently, not only because I use grub instead of grub2 - but that''s something Konrad could not possibly have been aware of.> >> Thanks and sorry for those probably dumb questions. I''m new to this >> (automated) patching thing, and with a little help, the first time >> usually works out well. > > P.S. > No need to do the -j31415.. It should be just the amount of CPUs > you have.Yeah, in my case it was just a -j9 using a 4-core CPU with hyperthreading>>
On Thu, Aug 15, 2013 at 11:39:14PM +0200, Atom2 wrote:> Hi guys, > the good news is that the latest patched kernel now powers down my > machine as expected. Thanks for all your input and help. > > The console is still working and there''s no need to hide the Intel > IGD from dom0 to get proper powerdown. > > I am however still unclear what this may mean further down the line? > Are those patches someting I have to manually apply for every new > kernel release that I''m going to install?For right now, yes.> > Or would those patches be something that makes it into the upstream > kernel sources to then again drip downstream allowing me to use my > distribution''s sources (gentoo in my case) without change in the > future?That is the goal. The way that the x86 maintainer wanted this to be done is to have some sort of bit lookup "box". Were the generic code does some form of pte_wc(pte) and we lookup in the PAT for the index bit for WC. And vice-versa to pte_wb(pte). The complication is that there is also the PSE bit to think off. It is not as simple as it sounds. CC-ing Stefan Bader who took a stab at it. Maybe he had a chance to look in this a bit more. (sorry if this is confusing, the WC stands for Write Combine - while WB is WriteBack. WC is heavily used in graphics).> > Are there any negative knock-on effects / reduced functionalities to > be expected from the patches I have applied?No. The opposite actually.> > Being kernel patches I assume they at least should have no effect on > upgrading from XEN 4.2.2 to newer versions in the future.Correct.> > > Also just for reference in case someone else faces the same problem > and stumbles across this thread please find a few comments / > clarifications below to the questions I had raised and to Konrad''s > subsequent answers.Thanks!> > Am 15.08.13 22:26, schrieb Konrad Rzeszutek Wilk: > >On Thu, Aug 15, 2013 at 09:28:24PM +0200, Atom2 wrote: > >>Hi guys, > >>thanks for your further input. > >> > >>Following through Ben''s mail below and Konrad''s later mail > >>suggesting the same, I tried to get these patches in. I''d however > >>require your help before I feel I can safely proceed. > >> > >>Please see below: > >> > >>Am 15.08.13 03:58, schrieb Ben Guthro: > >>[...] > >>>I admit, I don''t know how the gentoo build system works, but the general > >>>idea here is that you want to revert those 2 commits, and apply the third. > >>> > >>>If you don''t have a git tree, you can download the two commits from > >>>these two links > >>>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a > >>>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=8eaffa67b43e99ae581622c5133e20b0f48bcef1 > >>> > >>>You''ll want to apply them in reverse > >>After consultation with the manual I decided to give it a dry-run > >>before and check with you guys first. First of all, I assume I''m > >>righht that this is a patch to the *linux kernel* and not the > >>xen-sources as I could not find the referenced files in the xen > >>tree. > > > >Right. You also need to compile the kernel. Usually I pluck the > >/boot/config-my-exisitng-kernel-vresion and put it in the linux > >directory as .config. > Extracting .config from a kernel image requires the kernel > configuration option CONFIG_IKCONFIG. One can then either extract > the .config through scripts/extract-ikconfig (located under the > linux directory) or if additionally CONFIG_IKCONFIG_PROC is > configured, by accessing /proc/config.gz. > > In my case (and most likely for all gentoo users) it was even easier > as I had originally built the running kernel myself and .config was > readily available in the right directory anyways. > > > >> > >>>patch -p1 -R < c79c498.patch > >>vm-host # patch --dry-run -p1 -R < c79c498.patch > >>patching file arch/x86/xen/enlighten.c > >>Hunk #2 succeeded at 1431 (offset 14 lines). > >> > >>I am slightly worried about the last message, not so much about the > >>offset, but rather only the "Hunk #2" success. Why is there no "Hunk > >>#1" when there''s a "Hunk #2"? > That "Hunk #2" message seems to be harmless as a check of my patched > sources against Konrad''s diff attachement suggests. Still don''t know > where it comes from though. > > >> > >>>patch -p1 -R < 8eaffa67.patch > >>vm-host # patch --dry-run -p1 -R < 8eaffa67.patch > >>patching file arch/x86/xen/enlighten.c > >>Hunk #1 succeeded at 1367 (offset 226 lines). > >>patching file arch/x86/xen/mmu.c > >>Hunk #1 succeeded at 434 (offset 19 lines). > >>Hunk #2 succeeded at 482 (offset 19 lines). > >>Hunk #3 succeeded at 495 (offset 19 lines). > >> > >>That seems to be o.k. from my understanding? > A check against Konrad''s diff attachment after running the final > patch command again confirmed everything o.k. > > >>> > >>>Then apply the patch from > >>>https://lkml.org/lkml/2012/2/10/229 > >>For this patch I copied the complete text from the https address > >>above and copied it to a file named 229.patch. Then I issued the > >>following command: > >>vm-host # patch --dry-run -p1 -R < 229.patch > >>patching file arch/x86/include/asm/pgtable.h > >>Unreversed patch detected! Ignore -R? [n] > > > >Note that you had been using --dry-run which means that the changes > >did NOT go in effect. > >> > >>I am not sure what to make out of this? Could you please provide some input. > The issue was not the --dry-run (which I was aware of), but rather > the -R option. This patch does not need to be *reversed* (the -R), > but rather *applied* (as Ben had already suggested in his e-Mail). > And that was what the message actually meant ... > > I have also added a -b option to all patch commands (and clearly > removed the --dry-run option for all patches) to create a backup > copy just in case ... > > > > >Attaching the full part thanks to Martin Cerveny <martin@c-home.cz> > >doing it in another thread (about the Nvidia and CUDA). > > > >You basically want those changes that the diff file has. > > > >After the patching, if you run git diff you should see a similar > >result to what the attached patch had. > > > >Then just do ''make -j3141567891238901948248092840932480932; sudo make modules_install; sudo make > >install;sudo grub2-mkconfig -o /boot/grub2/grub.cfg'' and reboot the new > >kernel. > I had to do this slightly differently, not only because I use grub > instead of grub2 - but that''s something Konrad could not possibly > have been aware of. > > > > >>Thanks and sorry for those probably dumb questions. I''m new to this > >>(automated) patching thing, and with a little help, the first time > >>usually works out well. > > > >P.S. > >No need to do the -j31415.. It should be just the amount of CPUs > >you have. > Yeah, in my case it was just a -j9 using a 4-core CPU with hyperthreading > >>
On Thu, Aug 15, 2013 at 11:39:14PM +0200, Atom2 wrote:> Hi guys, > the good news is that the latest patched kernel now powers down my > machine as expected. Thanks for all your input and help. > > The console is still working and there''s no need to hide the Intel > IGD from dom0 to get proper powerdown. > > I am however still unclear what this may mean further down the line? > Are those patches someting I have to manually apply for every new > kernel release that I''m going to install? > > Or would those patches be something that makes it into the upstream > kernel sources to then again drip downstream allowing me to use my > distribution''s sources (gentoo in my case) without change in the > future? > > Are there any negative knock-on effects / reduced functionalities to > be expected from the patches I have applied?Could you refresh my memory of which ones? Can you just do a git diff please on your Linux tree?> > Being kernel patches I assume they at least should have no effect on > upgrading from XEN 4.2.2 to newer versions in the future. > > > Also just for reference in case someone else faces the same problem > and stumbles across this thread please find a few comments / > clarifications below to the questions I had raised and to Konrad''s > subsequent answers. > > Am 15.08.13 22:26, schrieb Konrad Rzeszutek Wilk: > >On Thu, Aug 15, 2013 at 09:28:24PM +0200, Atom2 wrote: > >>Hi guys, > >>thanks for your further input. > >> > >>Following through Ben''s mail below and Konrad''s later mail > >>suggesting the same, I tried to get these patches in. I''d however > >>require your help before I feel I can safely proceed. > >> > >>Please see below: > >> > >>Am 15.08.13 03:58, schrieb Ben Guthro: > >>[...] > >>>I admit, I don''t know how the gentoo build system works, but the general > >>>idea here is that you want to revert those 2 commits, and apply the third. > >>> > >>>If you don''t have a git tree, you can download the two commits from > >>>these two links > >>>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=c79c49826270b8b0061b2fca840fc3f013c8a78a > >>>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=8eaffa67b43e99ae581622c5133e20b0f48bcef1 > >>> > >>>You''ll want to apply them in reverse > >>After consultation with the manual I decided to give it a dry-run > >>before and check with you guys first. First of all, I assume I''m > >>righht that this is a patch to the *linux kernel* and not the > >>xen-sources as I could not find the referenced files in the xen > >>tree. > > > >Right. You also need to compile the kernel. Usually I pluck the > >/boot/config-my-exisitng-kernel-vresion and put it in the linux > >directory as .config. > Extracting .config from a kernel image requires the kernel > configuration option CONFIG_IKCONFIG. One can then either extract > the .config through scripts/extract-ikconfig (located under the > linux directory) or if additionally CONFIG_IKCONFIG_PROC is > configured, by accessing /proc/config.gz. > > In my case (and most likely for all gentoo users) it was even easier > as I had originally built the running kernel myself and .config was > readily available in the right directory anyways. > > > >> > >>>patch -p1 -R < c79c498.patch > >>vm-host # patch --dry-run -p1 -R < c79c498.patch > >>patching file arch/x86/xen/enlighten.c > >>Hunk #2 succeeded at 1431 (offset 14 lines). > >> > >>I am slightly worried about the last message, not so much about the > >>offset, but rather only the "Hunk #2" success. Why is there no "Hunk > >>#1" when there''s a "Hunk #2"? > That "Hunk #2" message seems to be harmless as a check of my patched > sources against Konrad''s diff attachement suggests. Still don''t know > where it comes from though. > > >> > >>>patch -p1 -R < 8eaffa67.patch > >>vm-host # patch --dry-run -p1 -R < 8eaffa67.patch > >>patching file arch/x86/xen/enlighten.c > >>Hunk #1 succeeded at 1367 (offset 226 lines). > >>patching file arch/x86/xen/mmu.c > >>Hunk #1 succeeded at 434 (offset 19 lines). > >>Hunk #2 succeeded at 482 (offset 19 lines). > >>Hunk #3 succeeded at 495 (offset 19 lines). > >> > >>That seems to be o.k. from my understanding? > A check against Konrad''s diff attachment after running the final > patch command again confirmed everything o.k. > > >>> > >>>Then apply the patch from > >>>https://lkml.org/lkml/2012/2/10/229 > >>For this patch I copied the complete text from the https address > >>above and copied it to a file named 229.patch. Then I issued the > >>following command: > >>vm-host # patch --dry-run -p1 -R < 229.patch > >>patching file arch/x86/include/asm/pgtable.h > >>Unreversed patch detected! Ignore -R? [n] > > > >Note that you had been using --dry-run which means that the changes > >did NOT go in effect. > >> > >>I am not sure what to make out of this? Could you please provide some input. > The issue was not the --dry-run (which I was aware of), but rather > the -R option. This patch does not need to be *reversed* (the -R), > but rather *applied* (as Ben had already suggested in his e-Mail). > And that was what the message actually meant ... > > I have also added a -b option to all patch commands (and clearly > removed the --dry-run option for all patches) to create a backup > copy just in case ... > > > > >Attaching the full part thanks to Martin Cerveny <martin@c-home.cz> > >doing it in another thread (about the Nvidia and CUDA). > > > >You basically want those changes that the diff file has. > > > >After the patching, if you run git diff you should see a similar > >result to what the attached patch had. > > > >Then just do ''make -j3141567891238901948248092840932480932; sudo make modules_install; sudo make > >install;sudo grub2-mkconfig -o /boot/grub2/grub.cfg'' and reboot the new > >kernel. > I had to do this slightly differently, not only because I use grub > instead of grub2 - but that''s something Konrad could not possibly > have been aware of. > > > > >>Thanks and sorry for those probably dumb questions. I''m new to this > >>(automated) patching thing, and with a little help, the first time > >>usually works out well. > > > >P.S. > >No need to do the -j31415.. It should be just the amount of CPUs > >you have. > Yeah, in my case it was just a -j9 using a 4-core CPU with hyperthreading > >>