Woller, Thomas
2006-Jan-25 15:28 UTC
[Xen-devel] [PATCH][1/3][RESEND] evtchn race condition
Previous post had some corruption in the patch, reposting... This patch fixes a guest hang condition running unmodified guests on an SVM partition. Two test patches follow to facilitate reproduction of the problem. Note that the below patch applies to the xen-unstable-hvm.hg repo changeset 8667, although this problem has been apparent on xen-unstable.hg tree previously. I do not know if this problem occurs on VT partitions, but don''t see why it shouldn''t also be an issue for xen-unstable.hg/testing-3.0. Testing has shown a very infrequent guest hang/block when testing multicore linux unmodified guests on SVM partitions. We have found that this problem is related to the local/remote domain event communications channel (event_channel.c and hvm/io.c). The frequency of the problem is very small, but we notice the problem during multicore testing only. The last instruction in the blocked guest is always an I/O operation. The conditions under which this problem occurs: 1) unmodified SVM guest performing I/O (hv not trapping i/o, so that qemu gets a request) 2) guest running on core >0 (hypervisor is SMP) 3) Dom0 is not SMP (ie. CONFIG_SMP NOT defined) Problem: The common/event_channel.c evtchn_send() function is calling evtchn_set_pending() in the ECS_INTERDOMAIN case, without protecting access to the evtchn_upcall_pending/evtchn_pending_sel bits. The evtchn_set_pending() code executes on core0, concurrently with core1 code that tests and/or sets these same bits. The result is that a guest ends up in a perpetually blocked state. The matching code in hvm/io.c contains 2 location that must be protected by the evtchn spinlock (see below patch). Reproduction of problem: Attached are a couple of test patches, one for hvmloader that will perform continuous I/O (out to 0x3fb) to help reproduce the problem (hvmloader_io_test.patch). Putting udelay(10000) in hvm/io.c between accesses to evtchn_upcall_pending/evtchn_pending_sel will greatly facilitate reproduction of the problem (attached patch hvm_io_udelay_test.patch). With the delay patch and the hvmloader i/o test patch, the problem occurs within the first few I/O test instructions in hvmloader. Adding a cpus=''1'' to pin the guest on core>0 is necessary, and pinning the guest onto core 0 shows that the guest must be running on core>0 to hang. Patch: The patch is an attempt to combine the local domain spinlock and the remote domain spinlock similar to the evtchn_bind_interdomain() logic. The patch removes the initial ld spin_lock() from protecting the port_is_valid() and the evtchn_from_port() functions, which allows using the "ld < rd" spinlock ordering technique... Not sure if this is really ok or even necessary. Opinions are welcome here. Another solution might be to move the remote domain (rd) spinlock exclusively around the ECS_INTERDOMAIN case, and the ld spinlock exclusively around the ECS_IPI case - without any nested spinlocks. I did notice that the send_guest_virq() and VLAPIC code also calls evtchn_set_pending() that is not spinlock protected, but not sure if this is a problem... wanted to bring it to the lists'' attention for someone more familiar with the virq/vlapic and evtchn code. Seems like there would be the potential for losing delivery of a virtual IRQ(?). And my testing to-date has not included the ECS_IPI case code path with the below patch. regards, Tom Woller AMD # HG changeset patch # User twoller@xen-trw1.amd.com # Node ID d8bb56042ef14283d0f3342a114162329985c983 # Parent 41d3adec102cfc24687c171e1c893740e3d596f6 add evtchn remote domain spinlock protection in evtchn_send(). Signed-off-by: Tom Woller <thomas.woller@amd.com> diff -r 41d3adec102c -r d8bb56042ef1 xen/common/event_channel.c --- a/xen/common/event_channel.c Tue Jan 24 20:26:57 2006 +++ b/xen/common/event_channel.c Tue Jan 24 20:26:57 2006 @@ -405,19 +405,34 @@ struct domain *ld = current->domain, *rd; int rport, ret = 0; - spin_lock(&ld->evtchn_lock); - if ( unlikely(!port_is_valid(ld, lport)) ) { - spin_unlock(&ld->evtchn_lock); return -EINVAL; } lchn = evtchn_from_port(ld, lport); + + if ( lchn->state != ECS_INTERDOMAIN ) + rd = ld; + else + rd = lchn->u.interdomain.remote_dom; + + /* Avoid deadlock by first acquiring lock of domain with smaller id. */ + if ( ld < rd ) + { + spin_lock(&ld->evtchn_lock); + spin_lock(&rd->evtchn_lock); + } + else + { + if ( ld != rd ) + spin_lock(&rd->evtchn_lock); + spin_lock(&ld->evtchn_lock); + } + switch ( lchn->state ) { case ECS_INTERDOMAIN: - rd = lchn->u.interdomain.remote_dom; rport = lchn->u.interdomain.remote_port; rchn = evtchn_from_port(rd, rport); evtchn_set_pending(rd->vcpu[rchn->notify_vcpu_id], rport); @@ -433,6 +448,8 @@ } spin_unlock(&ld->evtchn_lock); + if ( ld != rd ) + spin_unlock(&rd->evtchn_lock); return ret; } diff -r 44b96aafa499 -r 41d3adec102c xen/arch/x86/hvm/io.c --- a/xen/arch/x86/hvm/io.c Tue Jan 24 18:36:58 2006 +++ b/xen/arch/x86/hvm/io.c Tue Jan 24 20:26:57 2006 @@ -693,6 +693,7 @@ struct domain *d = v->domain; int port = iopacket_port(d); + spin_lock(&d->evtchn_lock); /* evtchn_pending_sel bit is shared by other event channels. */ if (!d->shared_info->evtchn_pending[port/BITS_PER_LONG]) clear_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel); @@ -700,6 +701,7 @@ /* Note: HVM domains may need upcalls as well. */ if (!v->vcpu_info->evtchn_pending_sel) clear_bit(0, &v->vcpu_info->evtchn_upcall_pending); + spin_unlock(&d->evtchn_lock); /* Clear the pending bit for port. */ return test_and_clear_bit(port, &d->shared_info->evtchn_pending[0]); @@ -729,6 +731,7 @@ void hvm_wait_io(void) { extern void do_block(); + struct domain *d = current->domain; int port = iopacket_port(current->domain); do { @@ -742,8 +745,10 @@ * Events other than IOPACKET_PORT might have woken us up. * In that case, safely go back to sleep. */ + spin_lock(&d->evtchn_lock); clear_bit(port/BITS_PER_LONG, ¤t->vcpu_info->evtchn_pending_sel); clear_bit(0, ¤t->vcpu_info->evtchn_upcall_pending); + spin_unlock(&d->evtchn_lock); } while(1); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel