Li, Xin B
2005-Dec-09 07:03 UTC
[Xen-devel] event delay issue on SMP machine when xen0 is SMP enabled
Hi Ian/Keir, We found event delay issue on SMP machine when xen0 is SMP enabled, the worse case is that sometimes seems like it''s lost, the phenomena is: When we start VMX domain on 64bit SMP xen0 on a SMP system with 16 processors, most cases, QEMU device model window pops up with black screen and stops there, "xm list" shows that VMX domain is in block state. If we use "info vmxiopage" in QEMU command line, it reports that the IO request state is 1, STATE_IOREQ_READY state. I added printf just after select() in QEMU DM main loop, and found evtchn fd never got readable, that''s to say, QEMU DM never got notified by the IO request from VMX domain. The root cause is: 1) QEMU DM does an evtchn interdoamin bind and a port number greater than 63 is allocated on SMP xen0 on a big SMP machine, here it''s 65, and in xen HV, this will notify vcpu0 of xen0: /* * We may have lost notifications on the remote unbound port. Fix that up * here by conservatively always setting a notification on the local port. */ evtchn_set_pending(ld->vcpu[lchn->notify_vcpu_id], lport);in evtchn_set_pending: Then in evtchn_set_pending: /* These four operations must happen in strict order. */ if ( !test_and_set_bit(port, &s->evtchn_pending[0]) && !test_bit (port, &s->evtchn_mask[0]) && !test_and_set_bit(port / BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel) && !test_and_set_bit(0, &v->vcpu_info->evtchn_upcall_pending) ) { evtchn_notify(v); } If the port is not masked, bit 1 in evtchn_pending_sel of vcpu0 will be set, this is a typical case. But when doing interdomain evtchn binding, this port is masked, so bit 1 in evtchn_pending_sel of vcpu0 is not set. 2) just after returning from xen HV, this port will be unmasked in xen0 kernel: static inline void unmask_evtchn(int port) { shared_info_t *s = HYPERVISOR_shared_info; vcpu_info_t *vcpu_info &s->vcpu_info[smp_processor_id()]; synch_clear_bit(port, &s->evtchn_mask[0]); /* * The following is basically the equivalent of ''hw_resend_irq''. Just * like a real IO-APIC we ''lose the interrupt edge'' if the channel is * masked. */ if (synch_test_bit(port, &s->evtchn_pending[0]) && !synch_test_and_set_bit(port / BITS_PER_LONG, &vcpu_info->evtchn_pending_sel)) { vcpu_info->evtchn_upcall_pending = 1; if (!vcpu_info->evtchn_upcall_mask) force_evtchn_callback(); } } But this is done on the current vcpu, for most cases, it''s not vcpu0, so this event is notified on the current vcpu, not vcpu0, and bit 1 in evtchn_pending_sel of current vcpu is set. 3) however, this event won''t be handled on the current vcpu. asmlinkage void evtchn_do_upcall(struct pt_regs *regs) { unsigned long l1, l2; unsigned int l1i, l2i, port; int irq, cpu = smp_processor_id(); shared_info_t *s = HYPERVISOR_shared_info; vcpu_info_t *vcpu_info = &s->vcpu_info[cpu]; vcpu_info->evtchn_upcall_pending = 0; /* NB. No need for a barrier here -- XCHG is a barrier on x86. */ l1 = xchg(&vcpu_info->evtchn_pending_sel, 0); while (l1 != 0) { l1i = __ffs(l1); l1 &= ~(1UL << l1i); while ((l2 = active_evtchns(cpu, s, l1i)) != 0) { l2i = __ffs(l2); l2 &= ~(1UL << l2i); port = (l1i * BITS_PER_LONG) + l2i; if ((irq = evtchn_to_irq[port]) != -1) do_IRQ(irq, regs); else evtchn_device_upcall(port); } } } This is because on a SMP kernel active_evtchns is defined as: #define active_evtchns(cpu,sh,idx) \ ((sh)->evtchn_pending[idx] & \ cpu_evtchn_mask[cpu][idx] & \ ~(sh)->evtchn_mask[idx]) While cpu_evtchn_mask is initialized as: static void init_evtchn_cpu_bindings(void) { /* By default all event channels notify CPU#0. */ memset(cpu_evtchn, 0, sizeof(cpu_evtchn)); memset(cpu_evtchn_mask[0], ~0, sizeof(cpu_evtchn_mask[0])); } So vcpu other than vcpu0 won''t handle it, even it sees this event is pending there, and it won''t be delivered to the evtchn device. Only under some cases, bit 1 in evtchn_pending_sel of vcpu0 is set because some other port in this select area is notified, this event will be delivered, but if we are unlucky, bit 1 in evtchn_pending_sel of vcpu0 is not set by chance of any other port notification, we get stuck there and seems this event is lost, though it still may be delivered to evtchn device in unknow time :-( The reason why we didn''t meet this problem before is that, we got an event port smaller than 64 on most machines, and bit 0 in evtchn_pending_sel of vcpu0 is very likely to be set, since this is a hot event port erea. We do need fix this issue because this is also a common issue in complex environment. Actually that will also be a performance issue, even when event channel port is <64 since it depends on other events to get notified. BTW, why vcpu other than vcpu0 won''t handle event by default? thanks -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2005-Dec-09 08:28 UTC
[Xen-devel] Re: event delay issue on SMP machine when xen0 is SMP enabled
On 9 Dec 2005, at 07:03, Li, Xin B wrote:> We do need fix this issue because this is also a common issue in > complex > environment. Actually that will also be a performance issue, even when > event channel port is <64 since it depends on other events to get > notified.Thanks for working out what''s going on. A patch should be quite easy.> BTW, why vcpu other than vcpu0 won''t handle event by default?We could allow any vcpu to process any pending notifications. It would mean vcpus outside an irq''s affinity set could end up processing an interrupt and I''m not sure if that''s a good thing. It would actually slightly simplify evtchn.c though (no need to apply a per-cpu mask to the event-channel port array). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2005-Dec-09 08:31 UTC
Re: [Xen-devel] Re: event delay issue on SMP machine when xen0 is SMP enabled
On 9 Dec 2005, at 08:28, Keir Fraser wrote:>> BTW, why vcpu other than vcpu0 won''t handle event by default? > > We could allow any vcpu to process any pending notifications. It would > mean vcpus outside an irq''s affinity set could end up processing an > interrupt and I''m not sure if that''s a good thing. It would actually > slightly simplify evtchn.c though (no need to apply a per-cpu mask to > the event-channel port array).You always remember a good reason not to do something after sending the email. :-) IPIs and VIRQs must be processed by their home vcpu. Hence we do need to limit evtchn processing to the current bound vcpu, and that leads (currently) to the problem you see. So I think the right fix is just to fix unmask_evtchn(). Maybe you guys want to send a patch to move it into evtchn.c and fix it to send to cpu_from_evtchn(evtchn)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li, Xin B
2005-Dec-09 08:49 UTC
[Xen-devel] RE: event delay issue on SMP machine when xen0 is SMP enabled
>> BTW, why vcpu other than vcpu0 won''t handle event by default? > >We could allow any vcpu to process any pending notifications. It would >mean vcpus outside an irq''s affinity set could end up processing an >interrupt and I''m not sure if that''s a good thing. It would actually >slightly simplify evtchn.c though (no need to apply a per-cpu mask to >the event-channel port array). >I have a question on the following function, why l1 is updated only one time, while l2 is updated in each loop? I think they should handle in the same way. Anyway, in current code, l2 &= ~(1UL << l2i); is not needed. Thanks -Xin /* NB. Interrupts are disabled on entry. */ asmlinkage void evtchn_do_upcall(struct pt_regs *regs) { unsigned long l1, l2; unsigned int l1i, l2i, port; int irq, cpu = smp_processor_id(); shared_info_t *s = HYPERVISOR_shared_info; vcpu_info_t *vcpu_info = &s->vcpu_info[cpu]; vcpu_info->evtchn_upcall_pending = 0; /* NB. No need for a barrier here -- XCHG is a barrier on x86. */ l1 = xchg(&vcpu_info->evtchn_pending_sel, 0); while (l1 != 0) { l1i = __ffs(l1); l1 &= ~(1UL << l1i); while ((l2 = active_evtchns(cpu, s, l1i)) != 0) { l2i = __ffs(l2); l2 &= ~(1UL << l2i); port = (l1i * BITS_PER_LONG) + l2i; if ((irq = evtchn_to_irq[port]) != -1) do_IRQ(irq, regs); else evtchn_device_upcall(port); } } } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li, Xin B
2005-Dec-09 09:08 UTC
RE: [Xen-devel] Re: event delay issue on SMP machine when xen0 is SMP enabled
>>> BTW, why vcpu other than vcpu0 won''t handle event by default? >> >> We could allow any vcpu to process any pending >notifications. It would >> mean vcpus outside an irq''s affinity set could end up processing an >> interrupt and I''m not sure if that''s a good thing. It would actually >> slightly simplify evtchn.c though (no need to apply a >per-cpu mask to >> the event-channel port array). > >You always remember a good reason not to do something after >sending the >email. :-) > >IPIs and VIRQs must be processed by their home vcpu. Hence we do need >to limit evtchn processing to the current bound vcpu, and that leads >(currently) to the problem you see. > >So I think the right fix is just to fix unmask_evtchn(). Maybe >you guys >want to send a patch to move it into evtchn.c and fix it to send to >cpu_from_evtchn(evtchn)? >Since evtchn code is critical to xen, we are very careful when finding any possible fixes :-) Why not turn on cpu_evtchn_mask by default, and when calling bind_evtchn_to_cpu, we turn off it on other cpus. Thanks -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel