Sadly, we have discovered another line level interrupt race condition in Xen-4.1. The result was that an outstanding un-eoi''d interrupt at the IO-APIC resulted in the mptsas controller offlining the root filesystem. This is now two separate IO-APIC bugs found recently. 1) Cisco C210 M2 server - EOI Broadcast Suppression, io_apci_ack=old 2) Dell R710 - No EOI Broadcast Suppression, io_apic_ack=new Both servers use IO-APIC version 0x20 and have an mptsas controller for their disks, using Legacy PCI line level interrupts. Workload on both servers appear to have more active vcpus Case 1 is now considered stable by the customer after I provided a private fix which caused Xen to never consider turning on EOI Broadcast Suppression. I have re-attached a patch which allows this problem to be "fixed" by specifying "ioapic_ack=new" on the command line, rather than requiring a patch and recompile of Xen. Case 2 has only been seen once (this morning) so we currently have no idea as to its reproducibility. However, given that this hardware is fairly common in our test infrastructure, i would say that it is fairly rare. With Case1 and the new patch, Case1 becomes the same as Case2 with respect to IO-APIC setup, presumably meaning that the Case2 bug still exists with Case1. I will start working on cleaning up the IO-APIC code as soon as I can, as reducing the unnecessary complexity should make race conditions like this easier to find. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel