Hi, while trying to reproduce Andre''s cpupool problem I ran into another issue: Dom0 seems to lose hardware interrupts when it has more vcpus than pcpus available. First I thought this could be due to my cpupool patches, but the problem can be easily reproduced by pinning all Dom0 vcpus to a few physical cpus and doing a parallel build then. I used xen-unstable, kernel 2.6.32.24 from SLES11 SP1 on a 12 core INTEL nehalem machine. I pinned all 12 Dom0 vcpus to pcpu 1-2 and started a parallel build. After about 2 minutes the first missing interrupts were reported, a little bit later the next one, no xen messages are printed: [230644.814834] ata1: lost interrupt (Status 0x50) [230682.814399] ata1: lost interrupt (Status 0x50) [230690.814467] ata1: lost interrupt (Status 0x58) ... [230856.718437] sd 4:2:0:0: [sda] megasas: RESET -843713 cmd=2a retries=0 [230856.739457] megaraid_sas: HBA reset handler invoked without an internal reset condition. [230856.766435] megasas: [ 0]waiting for 16 commands to complete Has anyone observed a similar behavior? Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Am 14.02.2011 07:59, schrieb Juergen Gross:> Hi, > > while trying to reproduce Andre''s cpupool problem I ran into another issue: > > Dom0 seems to lose hardware interrupts when it has more vcpus than pcpus > available. First I thought this could be due to my cpupool patches, but the > problem can be easily reproduced by pinning all Dom0 vcpus to a few physical > cpus and doing a parallel build then. > > I used xen-unstable, kernel 2.6.32.24 from SLES11 SP1 on a 12 core INTEL > nehalem machine. I pinned all 12 Dom0 vcpus to pcpu 1-2 and started a parallel > build. After about 2 minutes the first missing interrupts were reported, a > little bit later the next one, no xen messages are printed: > > [230644.814834] ata1: lost interrupt (Status 0x50) > [230682.814399] ata1: lost interrupt (Status 0x50) > [230690.814467] ata1: lost interrupt (Status 0x58) > ... > [230856.718437] sd 4:2:0:0: [sda] megasas: RESET -843713 cmd=2a retries=0 > [230856.739457] megaraid_sas: HBA reset handler invoked without an internal > reset condition. > [230856.766435] megasas: [ 0]waiting for 16 commands to complete > > Has anyone observed a similar behavior?Yes, me again:-) On the rare occasions where I couldn''t trigger the bug (like when using a restricted Dom0) I observed interrupt problems, which mostly killed the network connection: (XEN) do_IRQ: 0.89 No irq handler for vector (irq -1) I could solve this issue temporarily be down-ing and up-ing the network interface, but the box became unstable later. hypervisor and tools c/s 22858, Dom0 latest tip of PVOPS xen/stable-2.6.32.x (2.6.32.27) Regards, Andre. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 14.02.11 at 07:59, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote: > I used xen-unstable, kernel 2.6.32.24 from SLES11 SP1 on a 12 core INTEL > nehalem machine. I pinned all 12 Dom0 vcpus to pcpu 1-2 and started a > parallel > build. After about 2 minutes the first missing interrupts were reported, a > little bit later the next one, no xen messages are printed:That''s certainly not too surprising, somewhat depending on the maximally tolerated latencies. It seems unlikely to me for a 6-fold CPU over-commit to promise stable operation, yet certain adjustments could probably be done to make it work better (like temporarily boosting the priority of a hardware interrupt''s target vCPU). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/14/11 10:26, Jan Beulich wrote:>>>> On 14.02.11 at 07:59, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >> I used xen-unstable, kernel 2.6.32.24 from SLES11 SP1 on a 12 core INTEL >> nehalem machine. I pinned all 12 Dom0 vcpus to pcpu 1-2 and started a >> parallel >> build. After about 2 minutes the first missing interrupts were reported, a >> little bit later the next one, no xen messages are printed: > > That''s certainly not too surprising, somewhat depending on the > maximally tolerated latencies. It seems unlikely to me for a 6-fold > CPU over-commit to promise stable operation, yet certain > adjustments could probably be done to make it work better (like > temporarily boosting the priority of a hardware interrupt''s target > vCPU).I would understand timeouts. But shouldn''t the interrupt come in sooner or later? At least the megasas driver seems not to be able to recover from this problem, as a result my root filesystem is set to read-only... This would mean there is a problem in the megasas driver, correct? And Andre reports stability problems of his machine in similar cases, but in his case the network driver seems to be the reason. Are you planning to prepare a patch for boosting the priority of vcpus being the target for a hardware interrupt? I think I would have to search some time to find the correct places to change... Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 14.02.11 at 10:38, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote: > On 02/14/11 10:26, Jan Beulich wrote: >>>>> On 14.02.11 at 07:59, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>> I used xen-unstable, kernel 2.6.32.24 from SLES11 SP1 on a 12 core INTEL >>> nehalem machine. I pinned all 12 Dom0 vcpus to pcpu 1-2 and started a >>> parallel >>> build. After about 2 minutes the first missing interrupts were reported, a >>> little bit later the next one, no xen messages are printed: >> >> That''s certainly not too surprising, somewhat depending on the >> maximally tolerated latencies. It seems unlikely to me for a 6-fold >> CPU over-commit to promise stable operation, yet certain >> adjustments could probably be done to make it work better (like >> temporarily boosting the priority of a hardware interrupt''s target >> vCPU). > > I would understand timeouts. But shouldn''t the interrupt come in sooner or > later? At least the megasas driver seems not to be able to recover from this > problem, as a result my root filesystem is set to read-only...I''m sure these interrupts arrive eventually, but the driver not seeing them within an expected time window may still make it report them as "lost".> This would mean there is a problem in the megasas driver, correct? > And Andre reports stability problems of his machine in similar cases, but > in his case the network driver seems to be the reason.Yes, this certainly depends on how the driver is implemented.> Are you planning to prepare a patch for boosting the priority of vcpus being > the target for a hardware interrupt? I think I would have to search some > time > to find the correct places to change...So far I had no plan to do so, and I too would have to do some looking around. Nor am I convinced everyone would appreciate such fiddling with priorities - I was merely suggesting that might be one route to go. George? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
My sense is that: * Pinning N vcpus to N-M pcpus (where M is a significant fraction of N) is just a really bad idea; it would be better just not to do that. It would be ideal if somehow when dom0''s cpu pool shrinks, it automatically offlines an appropriate number of vcpus; but it shouldn''t be difficult for an administrator to do that themselves. * On average, a vcpu shouldn''t have to wait more than 60ms or so for an interrupt. It seems like there''s a non-negligible possibility that there''s some kind of bug in the interrupt delivery and handling, either on the Xen side or the Linux side (or as Jan pointed out, a bug in the driver). In that case, doing something in the scheduler isn''t actually fixing the problem, it''s just making it less likely to happen. (NB that we''ve had intermittent failures in the xen.org testing infrastructure with what looks like might be missed interrupts as well -- and those weren''t on heavily loaded boxes.) * Even if it is ultimately a scheduler bug, understanding exactly what the scheduler is doing and why is key to making a proper fix. It''s possible that there''s just a simple quirk in the algorithm, such that a general fix will make everything work better without needing to introduce a special case for hardware interrupts. * I''m not opposed in principle to a mechanism which will prioritize vcpus awaiting hardware interrupts. But I am wary of guessing what the problem is and then introducing a patch without proper root-cause analysis. Even if it seems to fix the immediate problem, it may simply be masking the real problem, and may also cause problems of its own. Behavior of the scheduler is hard enough to understand already, and every special case makes it even harder. So to conclude: I think the first answer to someone with this problem should be, "Make sure that V<=P", where P is the number of physical cpus a VM can be scheduled on and V is the number of virtual cpus. If there are still problems, then we need to find out how it is that interrupts come to be missing before attempting a fix. -George On Mon, Feb 14, 2011 at 9:58 AM, Jan Beulich <JBeulich@novell.com> wrote:>>>> On 14.02.11 at 10:38, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote: >> On 02/14/11 10:26, Jan Beulich wrote: >>>>>> On 14.02.11 at 07:59, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>>> I used xen-unstable, kernel 2.6.32.24 from SLES11 SP1 on a 12 core INTEL >>>> nehalem machine. I pinned all 12 Dom0 vcpus to pcpu 1-2 and started a >>>> parallel >>>> build. After about 2 minutes the first missing interrupts were reported, a >>>> little bit later the next one, no xen messages are printed: >>> >>> That''s certainly not too surprising, somewhat depending on the >>> maximally tolerated latencies. It seems unlikely to me for a 6-fold >>> CPU over-commit to promise stable operation, yet certain >>> adjustments could probably be done to make it work better (like >>> temporarily boosting the priority of a hardware interrupt''s target >>> vCPU). >> >> I would understand timeouts. But shouldn''t the interrupt come in sooner or >> later? At least the megasas driver seems not to be able to recover from this >> problem, as a result my root filesystem is set to read-only... > > I''m sure these interrupts arrive eventually, but the driver not > seeing them within an expected time window may still make it > report them as "lost". > >> This would mean there is a problem in the megasas driver, correct? >> And Andre reports stability problems of his machine in similar cases, but >> in his case the network driver seems to be the reason. > > Yes, this certainly depends on how the driver is implemented. > >> Are you planning to prepare a patch for boosting the priority of vcpus being >> the target for a hardware interrupt? I think I would have to search some >> time >> to find the correct places to change... > > So far I had no plan to do so, and I too would have to do some > looking around. Nor am I convinced everyone would appreciate > such fiddling with priorities - I was merely suggesting that might > be one route to go. George? > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/14/11 12:21, George Dunlap wrote:> My sense is that: > * Pinning N vcpus to N-M pcpus (where M is a significant fraction of > N) is just a really bad idea; it would be better just not to do that.I just wanted to make sure the interrupts are not lost due to the cpupool operation itself. So I tried with an extreme configuration and was proved right :-)> It would be ideal if somehow when dom0''s cpu pool shrinks, it > automatically offlines an appropriate number of vcpus; but it > shouldn''t be difficult for an administrator to do that themselves.I''ve sent a patch for the cpupool-numa-split case, which will always remove a significant number of physical cpus for dom0.> * On average, a vcpu shouldn''t have to wait more than 60ms or so for > an interrupt. It seems like there''s a non-negligible possibility that > there''s some kind of bug in the interrupt delivery and handling, > either on the Xen side or the Linux side (or as Jan pointed out, a bug > in the driver). In that case, doing something in the scheduler isn''t > actually fixing the problem, it''s just making it less likely to > happen. (NB that we''ve had intermittent failures in the xen.org > testing infrastructure with what looks like might be missed interrupts > as well -- and those weren''t on heavily loaded boxes.)Any idea what I could do to help? Our larger test machines are not just idling, but I could use one from time to time without much problems. It''s rather easy for me to reproduce the problem, OTOH it should be easy for others with a reasonable large machine, too.> * Even if it is ultimately a scheduler bug, understanding exactly what > the scheduler is doing and why is key to making a proper fix. It''s > possible that there''s just a simple quirk in the algorithm, such that > a general fix will make everything work better without needing to > introduce a special case for hardware interrupts. > * I''m not opposed in principle to a mechanism which will prioritize > vcpus awaiting hardware interrupts. But I am wary of guessing what > the problem is and then introducing a patch without proper root-cause > analysis. Even if it seems to fix the immediate problem, it may > simply be masking the real problem, and may also cause problems of its > own. Behavior of the scheduler is hard enough to understand already, > and every special case makes it even harder.I absolutely agree! Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel