Garrett D''Amore
2007-May-29 19:19 UTC
[crossbow-discuss] interrupt loading, intrd, and CMT
One of the things that I''ve been asked to look at lately is interrupt load spreading, and intrd, to improve the network performance of CMT systems. From my read of the code, it looks like intrd only knows about virtual processors. This has major breakage in that all the virtual processors on a single core in a Niagra system share the common interrupt logic. I.e. when the interrupt handler runs, it interrupts _everything_ on that core. What I''m thinking of is to teach intrd about cores, so that instead of considering cpus (virtual), it instead considers load on *cores*. I think this will help it rebalance optimally. My biggest concern with this is whether my understanding of interrupt logic/core/virtual cpus is universally correct. Do the current multicore processors from Intel and AMD also share a common set of interrupt logic amongst cores? (From what I can tell, my Intel Core 2 Duo system seems to be reported as a single physical processor, only one core, with 2 virtual CPUs. I want to make sure that on systems like this I won''t wind up turning off interrupts that could be handled simultaneously.) I''m assuming that this is the way the kstats report, but again, I want to be sure that this is the design, and not just a coincidence of the current implementations. Here''s the output from the x86 machines I could find: psrinfo -vp on a single socket Intel Core 2 Duo system: The physical processor has 2 virtual processors (0 1) x86 (GenuineIntel 6F2 family 6 model 15 step 2 clock 1800 MHz) Intel(r) Core(tm)2 CPU 4300 @ 1.80GHz psrinfo -vp on an x4200: The physical processor has 2 virtual processors (0 1) x86 (AuthenticAMD 20F12 family 15 model 33 step 2 clock 2393 MHz) Dual Core AMD Opteron(tm) Processor 280 The physical processor has 2 virtual processors (2 3) x86 (AuthenticAMD 20F12 family 15 model 33 step 2 clock 2393 MHz) Dual Core AMD Opteron(tm) Processor 280 Here''s psrinfo -vp on a T2000 (Niagra) system: The physical processor has 8 cores and 32 virtual processors (0-31) The core has 4 virtual processors (0-3) The core has 4 virtual processors (4-7) The core has 4 virtual processors (8-11) The core has 4 virtual processors (12-15) The core has 4 virtual processors (16-19) The core has 4 virtual processors (20-23) The core has 4 virtual processors (24-27) The core has 4 virtual processors (28-31) UltraSPARC-T1 (clock 1000 MHz) Also, it seems to me that if this is the case, then on typical single socket systems (even those with dual core Opteron or Intel Core 2 Duos), which only have a single "core" (as reported via psrinfo -vp, but multiple virtual processors), there is no need or benefit in rebalancing the interrupts. I''m also thinking that on these CMT systems, we should just stick _all_ of the interrupts on a single virtual processor within a "core". I.e. since only one interrupt can run at a time, what is the point of having support for multiple interrupts on different virtual processors within the core? There is also the question of interrupt impact on normal threads. Is it better to spread interrupts as evenly as possible, even if it that means disrupting non-interrupt threads? Consider the case for IP forwarding, where you need lots of interrupt support for receive, but then you also want to have lots of CPU available for doing the tx handling. If your system is saturated (bombarded!) with rx packets, then if you spread multiple rx interrupts (perhaps for different rx rings) across cpus, then a certain DoS becomes possible where a remote attacker can flood your system with tiny packets, starving not just _one_ cpu, but perhaps a number of them. (Admittedly, crossbow and "polling" is supposed to address this issue to a certain extent, IIUC.) I''m still figuring all this out, so I''d really love to hear from other folks with different ideas. If my understanding is horribly broken, then _please_ let me know so I don''t spend a bunch of time going down the wrong path. -- Garrett
Eric Saxe
2007-May-29 20:20 UTC
[crossbow-discuss] Re: [osol-code] interrupt loading, intrd, and CMT
Garrett D''Amore wrote:> One of the things that I''ve been asked to look at lately is interrupt > load spreading, and intrd, to improve the network performance of CMT > systems. > > From my read of the code, it looks like intrd only knows about virtual > processors. This has major breakage in that all the virtual > processors on a single core in a Niagra system share the common > interrupt logic. I.e. when the interrupt handler runs, it interrupts > _everything_ on that core. > What I''m thinking of is to teach intrd about cores, so that instead of > considering cpus (virtual), it instead considers load on *cores*. > I think this will help it rebalance optimally.Right, so you would want to make intrd aware of the physical sharing relationships between CPUs, and get it to load balance across them....similar to what the dispatcher does (see disp/cmt.c). At what sharing level it makes sense to balance interrupts across may be platform specific.> My biggest concern with this is whether my understanding of interrupt > logic/core/virtual cpus is universally correct.The problem is that what comprises a "core" is very processor implementation specific.> Do the current multicore processors from Intel and AMD also share a > common set of interrupt logic amongst cores? (From what I can tell, > my Intel Core 2 Duo system seems to be reported as a single physical > processor, only one core, with 2 virtual CPUs. I want to make sure > that on systems like this I won''t wind up turning off interrupts that > could be handled simultaneously.) I''m assuming that this is the way > the kstats report, but again, I want to be sure that this is the > design, and not just a coincidence of the current implementations.Perhaps some platform (or even processor) specific code should decide across what level interrupts should be balanced. That information could then be exported to intrd which would implement the policy.> Here''s the output from the x86 machines I could find: > > psrinfo -vp on a single socket Intel Core 2 Duo system: > > The physical processor has 2 virtual processors (0 1) > x86 (GenuineIntel 6F2 family 6 model 15 step 2 clock 1800 MHz) > Intel(r) Core(tm)2 CPU 4300 @ 1.80GHz > > psrinfo -vp on an x4200: > > The physical processor has 2 virtual processors (0 1) > x86 (AuthenticAMD 20F12 family 15 model 33 step 2 clock 2393 MHz) > Dual Core AMD Opteron(tm) Processor 280 > The physical processor has 2 virtual processors (2 3) > x86 (AuthenticAMD 20F12 family 15 model 33 step 2 clock 2393 MHz) > Dual Core AMD Opteron(tm) Processor 280 > > Here''s psrinfo -vp on a T2000 (Niagra) system: > > The physical processor has 8 cores and 32 virtual processors (0-31) > The core has 4 virtual processors (0-3) > The core has 4 virtual processors (4-7) > The core has 4 virtual processors (8-11) > The core has 4 virtual processors (12-15) > The core has 4 virtual processors (16-19) > The core has 4 virtual processors (20-23) > The core has 4 virtual processors (24-27) > The core has 4 virtual processors (28-31) > UltraSPARC-T1 (clock 1000 MHz) > > > Also, it seems to me that if this is the case, then on typical single > socket systems (even those with dual core Opteron or Intel Core 2 > Duos), which only have a single "core" (as reported via psrinfo -vp, > but multiple virtual processors), there is no need or benefit in > rebalancing the interrupts.psrinfo -vp doesn''t report grouping levels with just one CPU. So for a dual core Opteron chip, for example, psrinfo reports the physical processor sharing...but doesn''t report the core level because there is just 1 "virtual" CPU per core. Thanks, -Eric
hi Garrett nice subject. Garrett D''Amore writes: > One of the things that I''ve been asked to look at lately is interrupt > load spreading, and intrd, to improve the network performance of CMT > systems. > My take is that networking has to stop abusing the interrupt for processing. interrupt was a convenient way to be notified that a packet was present in the NIC. In the old time, it made send to go ahead and process it. It still makes sense if there is one or a few packets in the NIC. But once the interrupt start to place any noticeable load on a CPU, I think it''s time to defer the work to a kernel thread; stop the thread pinning and bring back the scheduler into play. > From my read of the code, it looks like intrd only knows about virtual > processors. This has major breakage in that all the virtual processors > on a single core in a Niagra system share the common interrupt logic. > I.e. when the interrupt handler runs, it interrupts _everything_ on that > core. This is not my understanding. Let say which will a nic interrupt service routine is running, other CPUs are still active and running code. Is there a blackout time during which another interrupt cannot be signaled to the same core ? This I don''t know. But the CPUS are mostly active during the ISR. > > What I''m thinking of is to teach intrd about cores, so that instead of > considering cpus (virtual), it instead considers load on *cores*. > > I think this will help it rebalance optimally. > Well, at times things work better when threads are scheduled on the same core (data sharing bound) and at times it''s better to schedule them in separate cores (cpu cycles bound). > My biggest concern with this is whether my understanding of interrupt > logic/core/virtual cpus is universally correct. > > Do the current multicore processors from Intel and AMD also share a > common set of interrupt logic amongst cores? (From what I can tell, my > Intel Core 2 Duo system seems to be reported as a single physical > processor, only one core, with 2 virtual CPUs. I want to make sure that > on systems like this I won''t wind up turning off interrupts that could > be handled simultaneously.) I''m assuming that this is the way the > kstats report, but again, I want to be sure that this is the design, and > not just a coincidence of the current implementations. > > Here''s the output from the x86 machines I could find: > > psrinfo -vp on a single socket Intel Core 2 Duo system: > > The physical processor has 2 virtual processors (0 1) > x86 (GenuineIntel 6F2 family 6 model 15 step 2 clock 1800 MHz) > Intel(r) Core(tm)2 CPU 4300 @ 1.80GHz > > psrinfo -vp on an x4200: > > The physical processor has 2 virtual processors (0 1) > x86 (AuthenticAMD 20F12 family 15 model 33 step 2 clock 2393 MHz) > Dual Core AMD Opteron(tm) Processor 280 > The physical processor has 2 virtual processors (2 3) > x86 (AuthenticAMD 20F12 family 15 model 33 step 2 clock 2393 MHz) > Dual Core AMD Opteron(tm) Processor 280 > > Here''s psrinfo -vp on a T2000 (Niagra) system: > > The physical processor has 8 cores and 32 virtual processors (0-31) > The core has 4 virtual processors (0-3) > The core has 4 virtual processors (4-7) > The core has 4 virtual processors (8-11) > The core has 4 virtual processors (12-15) > The core has 4 virtual processors (16-19) > The core has 4 virtual processors (20-23) > The core has 4 virtual processors (24-27) > The core has 4 virtual processors (28-31) > UltraSPARC-T1 (clock 1000 MHz) > > > Also, it seems to me that if this is the case, then on typical single > socket systems (even those with dual core Opteron or Intel Core 2 Duos), > which only have a single "core" (as reported via psrinfo -vp, but > multiple virtual processors), there is no need or benefit in rebalancing > the interrupts. > > I''m also thinking that on these CMT systems, we should just stick _all_ > of the interrupts on a single virtual processor within a "core". I.e. > since only one interrupt can run at a time, what is the point of having > support for multiple interrupts on different virtual processors within > the core? > > These questions below are still valid since the processing (interrupt or not) still needs to be done : > There is also the question of interrupt impact on normal threads. Is it > better to spread interrupts as evenly as possible, even if it that means > disrupting non-interrupt threads? Consider the case for IP forwarding, > where you need lots of interrupt support for receive, but then you also > want to have lots of CPU available for doing the tx handling. If your > system is saturated (bombarded!) with rx packets, then if you spread > multiple rx interrupts (perhaps for different rx rings) across cpus, > then a certain DoS becomes possible where a remote attacker can flood > your system with tiny packets, starving not just _one_ cpu, but perhaps > a number of them. (Admittedly, crossbow and "polling" is supposed to > address this issue to a certain extent, IIUC.) > yep. QOS needed. > > I''m still figuring all this out, so I''d really love to hear from other > folks with different ideas. If my understanding is horribly broken, > then _please_ let me know so I don''t spend a bunch of time going down > the wrong path. > Let''s first figure out if interrupts do indeed freeze all strands. -r > -- Garrett > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://opensolaris.org/mailman/listinfo/crossbow-discuss