Christopher S. Aker
2011-Oct-12 19:34 UTC
[Xen-devel] Xen 4 occasionally hangs during boot
Since I started playing with Xen 4 (vs 3.x), machines often hang during reboot at exactly the same place: (XEN) HVM: Hardware Assisted Paging detected. ( ... and then nothing. I have to RPC bounce them. On some occasions it takes four or five attempts to get beyond this point. A normal boot looks like this: (XEN) HVM: Hardware Assisted Paging detected. (XEN) Brought up 16 CPUs 4.1.2-rc @ 23159. All of the Xen 4.x I''ve tried have done this, but I''d need to dig up which ones those are. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 12.10.11 at 21:34, "Christopher S. Aker" <caker@theshore.net> wrote: > Since I started playing with Xen 4 (vs 3.x), machines often hang during > reboot at exactly the same place:Do those machines have something in common hardware-wise? As you would certainly assume, this isn''t a problem generally, and hence telling us on what hardware you observe this might help guessing... Also, any chance you could try recent -unstable? Jan> > (XEN) HVM: Hardware Assisted Paging detected. > ( > > ... and then nothing. I have to RPC bounce them. On some occasions it > takes four or five attempts to get beyond this point. A normal boot > looks like this: > > (XEN) HVM: Hardware Assisted Paging detected. > (XEN) Brought up 16 CPUs > > 4.1.2-rc @ 23159. All of the Xen 4.x I''ve tried have done this, but I''d > need to dig up which ones those are. > > -Chris > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 15:34 -0400 on 12 Oct (1318433686), Christopher S. Aker wrote:> Since I started playing with Xen 4 (vs 3.x), machines often hang during > reboot at exactly the same place: > > (XEN) HVM: Hardware Assisted Paging detected. > ( > > ... and then nothing. I have to RPC bounce them. On some occasions it > takes four or five attempts to get beyond this point. A normal boot > looks like this: > > (XEN) HVM: Hardware Assisted Paging detected. > (XEN) Brought up 16 CPUs:( If you add "cpuinfo" to the xen commend-line arguments does it print anything more useful? Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/13/11 6:46 AM, Tim Deegan wrote:> At 15:34 -0400 on 12 Oct (1318433686), Christopher S. Aker wrote: >> Since I started playing with Xen 4 (vs 3.x), machines often hang during >> reboot at exactly the same place:We''re still seeing this occasionally, even with ''cpuinfo'' added to Xen args. Serial console stops responding during Xen booting - every time in the exact same place: (XEN) CPU9: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz stepping 05 (XEN) CPU 10 The machine continues to boot and becomes available via network, however nothing I do from then on can get serial to start working again. Control-AAA, sending massive amounts to /dev/console, etc. If I issue the reboot command via dom0 something will tickle Xen and a page or two of buffered OLD data will flush out the serial before the machine reboots, which is interesting. In this state hvc_console receives no interrupts. When not in this state hvc_console seems to get interrupts occasionally. Not sure of its significance. I still have a box in this state if anyone has ideas to try. -Chris
On 20/07/12 18:48, Christopher S. Aker wrote:> On 10/13/11 6:46 AM, Tim Deegan wrote: >> At 15:34 -0400 on 12 Oct (1318433686), Christopher S. Aker wrote: >>> Since I started playing with Xen 4 (vs 3.x), machines often hang during >>> reboot at exactly the same place: > We''re still seeing this occasionally, even with ''cpuinfo'' added to Xen > args. Serial console stops responding during Xen booting - every time > in the exact same place: > > (XEN) CPU9: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz stepping 05 > (XEN) CPU 10 > > The machine continues to boot and becomes available via network, however > nothing I do from then on can get serial to start working again. > Control-AAA, sending massive amounts to /dev/console, etc. If I issue > the reboot command via dom0 something will tickle Xen and a page or two > of buffered OLD data will flush out the serial before the machine > reboots, which is interesting. > > In this state hvc_console receives no interrupts. When not in this > state hvc_console seems to get interrupts occasionally. Not sure of its > significance. > > I still have a box in this state if anyone has ideas to try.Is this an HP box by any chance, and are you accessing serial over iLO? ~Andrew> > -Chris > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
On 7/20/12 1:49 PM, Andrew Cooper wrote:> Is this an HP box by any chance, and are you accessing serial over iLO?It is not. It''s a SM motherboard with an on board UART 16550, which we''re booting with xen args "com1=115200,8n1 console=com1". -Chris
On 20/07/12 18:58, Christopher S. Aker wrote:> On 7/20/12 1:49 PM, Andrew Cooper wrote: >> Is this an HP box by any chance, and are you accessing serial over iLO? > It is not. It''s a SM motherboard with an on board UART 16550, which > we''re booting with xen args "com1=115200,8n1 console=com1".Oh interesting. We periodically see this with HP kit, but nothing else which is why we assumed it was iLO specific. Perhaps it is not after all. As for suggestions, try manually prodding port 3f8 to see whether the UART is actually working? Alternatively, use `xl debug-keys` and `xl dmesg` to see whether you can provoke it back to life.> > -Chris > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
On 7/20/12 2:05 PM, Andrew Cooper wrote:> Alternatively, use `xl debug-keys`Interesting. ''xl debug-keys h'' got me ~1448 bytes onto the serial console, except it is old buffered output from exactly where it left off during boot. 1448 bytes is about the size of ''h'' output from another working box. The same thing happened with ''m''. Xen is flushing an equal amount of characters from the buffer as generated by debug-key command output. I can continue to poke the buffer until I see output from things I''ve issued, however it still refuses to respond to serial input (control-aaa) nor can I get dom0 to echo chars. It''s currently running a * dump which has been going now for over an hour, currently on vcpu 435628. I won''t be doing that again. Any lightblubs going off? -Chris
On 20/07/12 20:10, Christopher S. Aker wrote:> On 7/20/12 2:05 PM, Andrew Cooper wrote: >> Alternatively, use `xl debug-keys` > Interesting. ''xl debug-keys h'' got me ~1448 bytes onto the serial > console, except it is old buffered output from exactly where it left off > during boot. 1448 bytes is about the size of ''h'' output from another > working box. The same thing happened with ''m''. Xen is flushing an > equal amount of characters from the buffer as generated by debug-key > command output. > > I can continue to poke the buffer until I see output from things I''ve > issued, however it still refuses to respond to serial input > (control-aaa) nor can I get dom0 to echo chars. It''s currently running > a * dump which has been going now for over an hour, currently on vcpu > 435628. I won''t be doing that again. > > Any lightblubs going off?Not especially. It sounds like the serial ring buffer filled up and never got drained. How easy is this to reproduce for you? In the past, I have had success debugging Xen like this with an outb(0x3f8, <ascii char>) in certain locations. Perhaps the serial_rx interrupt handler, or failing that, do_irq checking for vector 0xf0. That should allow you to see whether Xen is actually receiving interrupts when you try to send characters.> > -Chris-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
On 20/07/2012 20:10, "Christopher S. Aker" <caker@theshore.net> wrote:> On 7/20/12 2:05 PM, Andrew Cooper wrote: >> Alternatively, use `xl debug-keys` > > Interesting. ''xl debug-keys h'' got me ~1448 bytes onto the serial > console, except it is old buffered output from exactly where it left off > during boot. 1448 bytes is about the size of ''h'' output from another > working box. The same thing happened with ''m''. Xen is flushing an > equal amount of characters from the buffer as generated by debug-key > command output. > > I can continue to poke the buffer until I see output from things I''ve > issued, however it still refuses to respond to serial input > (control-aaa) nor can I get dom0 to echo chars. It''s currently running > a * dump which has been going now for over an hour, currently on vcpu > 435628. I won''t be doing that again. > > Any lightblubs going off?Somehow dom0 disabled the serial-line interrupt during boot. Possibly it appeared as a PnP device in some BIOS table and dom0 decided to disable it because it doesn''t think it is being used. Xen would usually stop this happening via programming of the IO-APIC/XT-PIC but perhaps there is some other method of disabling it on this mainboard, which Xen doesn''t catch. -- Keir> -Chris > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 7/20/12 3:31 PM, Keir Fraser wrote:> Somehow dom0 disabled the serial-line interrupt during boot. Possibly it > appeared as a PnP device in some BIOS table and dom0 decided to disable it > because it doesn''t think it is being used. Xen would usually stop this > happening via programming of the IO-APIC/XT-PIC but perhaps there is some > other method of disabling it on this mainboard, which Xen doesn''t catch.Hmm -- except dom0 hasn''t even booted yet at the time the serial stops working. Xen is 30-60 seconds away from booting dom0 given the RAM scrub still has to happen. -Chris
On 20/07/2012 20:44, "Christopher S. Aker" <caker@theshore.net> wrote:> On 7/20/12 3:31 PM, Keir Fraser wrote: >> Somehow dom0 disabled the serial-line interrupt during boot. Possibly it >> appeared as a PnP device in some BIOS table and dom0 decided to disable it >> because it doesn''t think it is being used. Xen would usually stop this >> happening via programming of the IO-APIC/XT-PIC but perhaps there is some >> other method of disabling it on this mainboard, which Xen doesn''t catch. > > Hmm -- except dom0 hasn''t even booted yet at the time the serial stops > working. Xen is 30-60 seconds away from booting dom0 given the RAM > scrub still has to happen.Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t seen anything like this reported before. Not sure what to suggest really... Gather debug output from interrupt-related debug keys (via the xl debug-keys interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen and dom0 boot logs... something might become apparent. -- Keir> -Chris
On Fri, Jul 20, 2012 at 08:59:29PM +0100, Keir Fraser wrote:> On 20/07/2012 20:44, "Christopher S. Aker" <caker@theshore.net> wrote: > > > On 7/20/12 3:31 PM, Keir Fraser wrote: > >> Somehow dom0 disabled the serial-line interrupt during boot. Possibly it > >> appeared as a PnP device in some BIOS table and dom0 decided to disable it > >> because it doesn''t think it is being used. Xen would usually stop this > >> happening via programming of the IO-APIC/XT-PIC but perhaps there is some > >> other method of disabling it on this mainboard, which Xen doesn''t catch. > > > > Hmm -- except dom0 hasn''t even booted yet at the time the serial stops > > working. Xen is 30-60 seconds away from booting dom0 given the RAM > > scrub still has to happen. > > Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t > seen anything like this reported before. Not sure what to suggest really... > Gather debug output from interrupt-related debug keys (via the xl debug-keys > interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen > and dom0 boot logs... something might become apparent.What about using the serial line without the interrupt? Meaning com1=115200,8n1,0x3f8,0 That ought to make the code go into polling and ignore the interrupt line right?> > -- Keir > > > -Chris > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 23/07/2012 15:13, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:> On Fri, Jul 20, 2012 at 08:59:29PM +0100, Keir Fraser wrote: >> On 20/07/2012 20:44, "Christopher S. Aker" <caker@theshore.net> wrote: >> >>> On 7/20/12 3:31 PM, Keir Fraser wrote: >>>> Somehow dom0 disabled the serial-line interrupt during boot. Possibly it >>>> appeared as a PnP device in some BIOS table and dom0 decided to disable it >>>> because it doesn''t think it is being used. Xen would usually stop this >>>> happening via programming of the IO-APIC/XT-PIC but perhaps there is some >>>> other method of disabling it on this mainboard, which Xen doesn''t catch. >>> >>> Hmm -- except dom0 hasn''t even booted yet at the time the serial stops >>> working. Xen is 30-60 seconds away from booting dom0 given the RAM >>> scrub still has to happen. >> >> Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t >> seen anything like this reported before. Not sure what to suggest really... >> Gather debug output from interrupt-related debug keys (via the xl debug-keys >> interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen >> and dom0 boot logs... something might become apparent. > > What about using the serial line without the interrupt? > Meaning com1=115200,8n1,0x3f8,0 > > That ought to make the code go into polling and ignore the interrupt line > right?Yes, that should work. It does waste some CPU time runnign the poll handler continually, even when the serial line is idle. And of course serial debug key inputs will still not work. -- Keir>> >> -- Keir >> >>> -Chris >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel
On 7/20/12 3:59 PM, Keir Fraser wrote:> Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t > seen anything like this reported before. Not sure what to suggest really... > Gather debug output from interrupt-related debug keys (via the xl debug-keys > interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen > and dom0 boot logs... something might become apparent.We hit this again today, and I grabbed boot and debug-keys output: http://theshore.net/~caker/xen/BUGS/serial/log.txt Thanks, -Chris
Try enabling x2apic mode in the bios if there''s an option for it, or remove any CPU masking that may be applied. Sorry about top posting (remote login forces using outlook). Malcolm -----Original Message----- From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Christopher S. Aker Sent: 23 July 2012 21:54 To: Keir (Xen.org) Cc: Andrew Cooper; xen devel Subject: Re: [Xen-devel] Xen 4 serial hangs during boot On 7/20/12 3:59 PM, Keir Fraser wrote:> Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t > seen anything like this reported before. Not sure what to suggest really... > Gather debug output from interrupt-related debug keys (via the xl debug-keys > interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen > and dom0 boot logs... something might become apparent.We hit this again today, and I grabbed boot and debug-keys output: http://theshore.net/~caker/xen/BUGS/serial/log.txt Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Sorry for the top post again, You can also try adding "apic=bigsmp" to the xen command line. -----Original Message----- From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Christopher S. Aker Sent: 23 July 2012 21:54 To: Keir (Xen.org) Cc: Andrew Cooper; xen devel Subject: Re: [Xen-devel] Xen 4 serial hangs during boot On 7/20/12 3:59 PM, Keir Fraser wrote:> Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t > seen anything like this reported before. Not sure what to suggest really... > Gather debug output from interrupt-related debug keys (via the xl debug-keys > interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen > and dom0 boot logs... something might become apparent.We hit this again today, and I grabbed boot and debug-keys output: http://theshore.net/~caker/xen/BUGS/serial/log.txt Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 23/07/12 21:53, Christopher S. Aker wrote:> On 7/20/12 3:59 PM, Keir Fraser wrote: >> Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t >> seen anything like this reported before. Not sure what to suggest really... >> Gather debug output from interrupt-related debug keys (via the xl debug-keys >> interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen >> and dom0 boot logs... something might become apparent. > We hit this again today, and I grabbed boot and debug-keys output: > > http://theshore.net/~caker/xen/BUGS/serial/log.txt > > Thanks, > -ChrisThe serial interrupt will be IO-APIC #9 pin 4 which is set with its vector as 0xf1. I cant immediately see any other issue with that log unfortunately. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
>>> On 23.07.12 at 22:53, "Christopher S. Aker" <caker@theshore.net> wrote: > On 7/20/12 3:59 PM, Keir Fraser wrote: >> Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t >> seen anything like this reported before. Not sure what to suggest really... >> Gather debug output from interrupt-related debug keys (via the xl debug-keys >> interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen >> and dom0 boot logs... something might become apparent. > > We hit this again today, and I grabbed boot and debug-keys output: > > http://theshore.net/~caker/xen/BUGS/serial/log.txtThis isn''t even 8k that make it over, whereas the transmit buffer is 16k, and dropping of characters would only start when it first got full. The part of the data that didn''t make it out isn''t big enough to overflow the buffer - to check whether that would actually happen, could you increase the log level of both hypervisor and Dom0 kernel? To me this all (particularly the fact that you can make the data appear combined with the amount of data not being big enough to fill the buffer) looks as if there was some buffering happening outside of the control of Xen. Did you check whether this is possibly a problem with the remote end? Does this also happen with "sync_console"? Did you check whether disabling the use of the associated IRQ makes any difference, as suggested by Konrad (I think)? Does the port work flawlessly on native Linux? Jan
>>> On 23.07.12 at 22:53, "Christopher S. Aker" <caker@theshore.net> wrote: > On 7/20/12 3:59 PM, Keir Fraser wrote: >> Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t >> seen anything like this reported before. Not sure what to suggest really... >> Gather debug output from interrupt-related debug keys (via the xl debug-keys >> interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen >> and dom0 boot logs... something might become apparent. > > We hit this again today, and I grabbed boot and debug-keys output: > > http://theshore.net/~caker/xen/BUGS/serial/log.txtOne more thing - having seen various interesting (mis)behavior (or part of the chip set) with the IOMMU turned on, could you also check whether with it disabled you also get the problem? Jan
On Tue, Jul 24, 2012 at 11:32:19AM +0100, Jan Beulich wrote:> >>> On 23.07.12 at 22:53, "Christopher S. Aker" <caker@theshore.net> wrote: > > On 7/20/12 3:59 PM, Keir Fraser wrote: > >> Then it is Xen doing something to kill the serial interrupt. ;-) I haven''t > >> seen anything like this reported before. Not sure what to suggest really... > >> Gather debug output from interrupt-related debug keys (via the xl debug-keys > >> interface) I suppose. I think that would be ''i'' and ''z'' keys. That plus Xen > >> and dom0 boot logs... something might become apparent. > > > > We hit this again today, and I grabbed boot and debug-keys output: > > > > http://theshore.net/~caker/xen/BUGS/serial/log.txt > > This isn''t even 8k that make it over, whereas the transmit buffer > is 16k, and dropping of characters would only start when it first > got full. > > The part of the data that didn''t make it out isn''t big enough to > overflow the buffer - to check whether that would actually > happen, could you increase the log level of both hypervisor and > Dom0 kernel? To me this all (particularly the fact that you can > make the data appear combined with the amount of data not > being big enough to fill the buffer) looks as if there was some > buffering happening outside of the control of Xen. Did you check > whether this is possibly a problem with the remote end?This got me thinking - I''ve one particular AMD machine (prototype) that seems to hang often - but if I use ''sync_console'' it works fine. This issue started oooh, I can''t remember when but I do have some logs that could shed some light on the about date. I guess I was too quick to blame the prototype for being at fault here :-( Then recently (yesterday?) the upstream kernel started doing something wonky on this card: 01:05.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01) Under Xen, when it boots it hits right here: [ 1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002 and then stops [note: I hadn''t really done any investigation to see if the machine is dead or if it continues on, but with the serial port just wedged hard]. On baremetal it can actually read the IO bars: [ 1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002 [ 1.247075] pci 0000:01:05.0: reg 10: [io 0xe050-0xe057] [ 1.252734] pci 0000:01:05.0: reg 14: [io 0xe040-0xe047] [ 1.258394] pci 0000:01:05.0: reg 18: [io 0xe030-0xe037] [ 1.264054] pci 0000:01:05.0: reg 1c: [io 0xe020-0xe027] [ 1.269713] pci 0000:01:05.0: reg 20: [io 0xe010-0xe017] [ 1.275372] pci 0000:01:05.0: reg 24: [io 0xe000-0xe00f] so I am wondering if the back-ports in Xen 4.1 for dealing with PCI have something to do with this?> > Does this also happen with "sync_console"? Did you check > whether disabling the use of the associated IRQ makes any > difference, as suggested by Konrad (I think)? > > Does the port work flawlessly on native Linux? > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
>>> On 26.07.12 at 15:50, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > Then recently (yesterday?) the upstream kernel started doing something > wonky on this card: > > 01:05.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller > (rev 01) > Under Xen, when it boots it hits right here: > [ 1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002 > and then stops [note: I hadn''t really done any investigation to see > if the machine is dead or if it continues on, but with the serial port just > wedged hard].The machine state here, if accessible at all, would of course be very interesting.> On baremetal it can actually read the IO bars: > [ 1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002 > [ 1.247075] pci 0000:01:05.0: reg 10: [io 0xe050-0xe057] > [ 1.252734] pci 0000:01:05.0: reg 14: [io 0xe040-0xe047] > [ 1.258394] pci 0000:01:05.0: reg 18: [io 0xe030-0xe037] > [ 1.264054] pci 0000:01:05.0: reg 1c: [io 0xe020-0xe027] > [ 1.269713] pci 0000:01:05.0: reg 20: [io 0xe010-0xe017] > [ 1.275372] pci 0000:01:05.0: reg 24: [io 0xe000-0xe00f] > > so I am wondering if the back-ports in Xen 4.1 for dealing with > PCI have something to do with this?What backports are you thinking of? I just went through the titles of everything since 4.1.2, and nothing that has "PCI" in it looks in any way dangerous. Jan
On Thu, Jul 26, 2012 at 03:10:25PM +0100, Jan Beulich wrote:> >>> On 26.07.12 at 15:50, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > Then recently (yesterday?) the upstream kernel started doing something > > wonky on this card: > > > > 01:05.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller > > (rev 01) > > Under Xen, when it boots it hits right here: > > [ 1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002 > > and then stops [note: I hadn''t really done any investigation to see > > if the machine is dead or if it continues on, but with the serial port just > > wedged hard]. > > The machine state here, if accessible at all, would of course be > very interesting.<nods> Hope to get to that today.> > > On baremetal it can actually read the IO bars: > > [ 1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002 > > [ 1.247075] pci 0000:01:05.0: reg 10: [io 0xe050-0xe057] > > [ 1.252734] pci 0000:01:05.0: reg 14: [io 0xe040-0xe047] > > [ 1.258394] pci 0000:01:05.0: reg 18: [io 0xe030-0xe037] > > [ 1.264054] pci 0000:01:05.0: reg 1c: [io 0xe020-0xe027] > > [ 1.269713] pci 0000:01:05.0: reg 20: [io 0xe010-0xe017] > > [ 1.275372] pci 0000:01:05.0: reg 24: [io 0xe000-0xe00f] > > > > so I am wondering if the back-ports in Xen 4.1 for dealing with > > PCI have something to do with this? > > What backports are you thinking of? I just went through the titles > of everything since 4.1.2, and nothing that has "PCI" in it looks > in any way dangerous.I know :-( That is why I am thinking it might be the kernel, but when I did a git bisection I got an innocious Documentation patch. But then recently (say 3.5) has been doing some weird stuff in the PCI space (like it seems to have MSI''s and BARs disabled - at least when using them with xen-pciback to hide them).> > Jan