Hello. I''m try 4.3.0-rc2 and have very very small speed of live migration (is about 2 hours for 1Gb memory) but if i start xend and doing xm migrate domain successeful migrated to destionation and this takes is about 3-6 seconds (i''m use infiniband) Why this happening? And second question - why i can''t migrate from 4.1.3 (xend) to 4.3.0-rc2 (xend) ? -- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
On Fri, May 24, 2013 at 11:40 AM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote:> Hello. I''m try 4.3.0-rc2 and have very very small speed of live > migration (is about 2 hours for 1Gb memory) but if i start xend and > doing xm migrate domain successeful migrated to destionation and this > takes is about 3-6 seconds (i''m use infiniband) > > Why this happening?Hmm -- I think this has been mentioned a couple of times, but I don''t think anyone has looked into it. I''ll see if I can track it down.> > And second question - why i can''t migrate from 4.1.3 (xend) to > 4.3.0-rc2 (xend) ?I think migration is only supported for one major release -- does it work from 4.1 to 4.2, then 4.2 to 4.3? -George
2013/5/24 George Dunlap <dunlapg@umich.edu>:> Hmm -- I think this has been mentioned a couple of times, but I don''t > think anyone has looked into it. I''ll see if I can track it down. >Thanks!>> >> And second question - why i can''t migrate from 4.1.3 (xend) to >> 4.3.0-rc2 (xend) ? > > I think migration is only supported for one major release -- does it > work from 4.1 to 4.2, then 4.2 to 4.3?No, because xen 4.2.2 migration does work like in 4.3.0 - very very small =( -- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
On Fri, May 24, 2013 at 01:46:11PM +0100, George Dunlap wrote:> On Fri, May 24, 2013 at 11:40 AM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote: > > Hello. I''m try 4.3.0-rc2 and have very very small speed of live > > migration (is about 2 hours for 1Gb memory) but if i start xend and > > doing xm migrate domain successeful migrated to destionation and this > > takes is about 3-6 seconds (i''m use infiniband) > > > > Why this happening? > > Hmm -- I think this has been mentioned a couple of times, but I don''t > think anyone has looked into it. I''ll see if I can track it down.I''ve noticed on Xen 4.1 (and Xen 4.3) that if I use a 32-bit dom0 and local migrated any 32/64 PV/PVHVM (so four variations) guest it is incredibly slow. (So xm save <..> in an iSCSI disk && xm restore ...) The ''perf report'' shows that dom0 spends most of its time in xen_version (which is the yield type call). If the same operation is done, but dom0 is a 64-bit, it is quick. And I think this is the issue that Ian''s Jackson nighlty test-system is running in-to - which is that the migration is sooo slow that it times out. (This is with real-linux). Now the oddity is that this I saw this with Xen 4.1, but Vasiliy says he didn''t see this with Xen 4.1 - so perhaps the issue I am seeing is different. Hm, I should re-run this test once more with xen 4.3 just to confirm.> > > > > And second question - why i can''t migrate from 4.1.3 (xend) to > > 4.3.0-rc2 (xend) ? > > I think migration is only supported for one major release -- does it > work from 4.1 to 4.2, then 4.2 to 4.3? > > -George > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
On Fri, May 24, 2013 at 3:11 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Fri, May 24, 2013 at 01:46:11PM +0100, George Dunlap wrote: >> On Fri, May 24, 2013 at 11:40 AM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote: >> > Hello. I''m try 4.3.0-rc2 and have very very small speed of live >> > migration (is about 2 hours for 1Gb memory) but if i start xend and >> > doing xm migrate domain successeful migrated to destionation and this >> > takes is about 3-6 seconds (i''m use infiniband) >> > >> > Why this happening? >> >> Hmm -- I think this has been mentioned a couple of times, but I don''t >> think anyone has looked into it. I''ll see if I can track it down. > > I''ve noticed on Xen 4.1 (and Xen 4.3) that if I use a 32-bit dom0 and > local migrated any 32/64 PV/PVHVM (so four variations) guest it is incredibly slow. > > (So xm save <..> in an iSCSI disk && xm restore ...)Did you mean xm save or xl save? -George
2013/5/24 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:> I''ve noticed on Xen 4.1 (and Xen 4.3) that if I use a 32-bit dom0 and > local migrated any 32/64 PV/PVHVM (so four variations) guest it is incredibly slow. > > (So xm save <..> in an iSCSI disk && xm restore ...) > > The ''perf report'' shows that dom0 spends most of its time in xen_version > (which is the yield type call). > > If the same operation is done, but dom0 is a 64-bit, it is quick. > > And I think this is the issue that Ian''s Jackson nighlty test-system is > running in-to - which is that the migration is sooo slow that it times out. > (This is with real-linux). > > Now the oddity is that this I saw this with Xen 4.1, but Vasiliy says he didn''t see > this with Xen 4.1 - so perhaps the issue I am seeing is different. > > Hm, I should re-run this test once more with xen 4.3 just to confirm.I''m didnt see this slowdown may be because i''m use 4.1.3 from opensuse which have backported some patches and forward ported another patches from suse.... -- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
2013/5/24 George Dunlap <George.Dunlap@eu.citrix.com>:> > Did you mean xm save or xl save?In my case xl save crash domU with messages like followind. And domU crashes centos 2.6.18 and 2.6.32 (xenlinux) and never 3.8.6 kernel and 3.4... [ 1826.587110] PM: late freeze of devices complete after 0.048 msecs [ 1826.591220] ------------[ cut here ]------------ [ 1826.591220] kernel BUG at /build/buildd-linux_3.2.41-2-amd64-Wvc92F/linux-3.2.41/drivers/xen/events.c:1489! [ 1826.591220] invalid opcode: 0000 [#1] SMP [ 1826.591220] CPU 0 [ 1826.591220] Modules linked in: xenfs snd_pcm snd_page_alloc snd_timer snd coretemp soundcore crc32c_intel evdev ghash_clmulni_intel joydev pcspkr cryptd ext3 mbcache jbd xen_netfront xen_blkfront [ 1826.591220] [ 1826.591220] Pid: 6, comm: migration/0 Not tainted 3.2.0-4-amd64 #1 Debian 3.2.41-2 [ 1826.591220] RIP: e030:[<ffffffff8121c4e2>] [<ffffffff8121c4e2>] xen_irq_resume+0xbd/0x28b [ 1826.591220] RSP: e02b:ffff88001ae99d20 EFLAGS: 00010082 [ 1826.591220] RAX: ffffffffffffffef RBX: 0000000000000000 RCX: 0000000000000001 [ 1826.591220] RDX: 0000000000000000 RSI: 00000000deadbeef RDI: 00000000deadbeef [ 1826.591220] RBP: 0000000000000000 R08: ffff88001f026e00 R09: ffff88001ae99d48 [ 1826.591220] R10: 0000000000013780 R11: 0000000000013780 R12: 0000000000000010 [ 1826.591220] R13: 0000000000010dd0 R14: 0000000000010d70 R15: 0000000000000000 [ 1826.591220] FS: 00007f8d6f4907a0(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [ 1826.591220] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1826.591220] CR2: 00007f51210a1e60 CR3: 00000000032af000 CR4: 0000000000002660 [ 1826.591220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1826.591220] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1826.591220] Process migration/0 (pid: 6, threadinfo ffff88001ae98000, task ffff88001ae8e0c0) [ 1826.591220] Stack: [ 1826.591220] 0000000000013780 0000000000000000 ffff880000000000 0000000000010d70 [ 1826.591220] 0000160000000000 0000000000000000 ffff88001affbddc ffffffff810050a2 [ 1826.591220] 0000000000013780 ffffea000009f0b8 ffffffff810043e3 ffff88001affbe40 [ 1826.591220] Call Trace: [ 1826.591220] [<ffffffff810050a2>] ? xen_mc_issue+0x3e/0x50 [ 1826.591220] [<ffffffff810043e3>] ? arch_local_irq_restore+0x7/0x8 [ 1826.591220] [<ffffffff8121ca3b>] ? xen_suspend+0x73/0x8b [ 1826.591220] [<ffffffff81087d89>] ? stop_machine_cpu_stop+0x89/0xc3 [ 1826.591220] [<ffffffff81087d00>] ? queue_stop_cpus_work+0xa5/0xa5 [ 1826.591220] [<ffffffff81087b5a>] ? cpu_stopper_thread+0xea/0x177 [ 1826.591220] [<ffffffff810359d7>] ? arch_local_irq_enable+0x7/0x8 [ 1826.591220] [<ffffffff81039854>] ? finish_task_switch+0x88/0xb9 [ 1826.591220] [<ffffffff8134c634>] ? __schedule+0x5ac/0x5c3 [ 1826.591220] [<ffffffff81087a70>] ? cpu_stop_signal_done+0x2a/0x2a [ 1826.591220] [<ffffffff8105f321>] ? kthread+0x76/0x7e [ 1826.591220] [<ffffffff81354ab4>] ? kernel_thread_helper+0x4/0x10 [ 1826.591220] [<ffffffff81352b73>] ? int_ret_from_sys_call+0x7/0x1b [ 1826.591220] [<ffffffff8134dcbc>] ? retint_restore_args+0x5/0x6 [ 1826.591220] [<ffffffff81354ab0>] ? gs_change+0x13/0x13 [ 1826.591220] Code: 74 79 44 89 e7 e8 77 ee ff ff 39 e8 74 02 0f 0b 48 8d 74 24 28 bf 01 00 00 00 89 6c 24 28 89 5c 24 2c e8 19 ec ff ff 85 c0 74 02 <0f> 0b 8b 44 24 30 44 89 e7 89 44 24 14 e8 58 e9 ff ff 0f b7 4c [ 1826.591220] RIP [<ffffffff8121c4e2>] xen_irq_resume+0xbd/0x28b [ 1826.591220] RSP <ffff88001ae99d20> [ 1826.591220] ---[ end trace 60605833d257c851 ]--- [ 1826.591220] ------------[ cut here ]------------ [ 1826.591220] WARNING: at /build/buildd-linux_3.2.41-2-amd64-Wvc92F/linux-3.2.41/kernel/time/timekeeping.c:265 ktime_get+0x1e/0x86() [ 1826.591220] Modules linked in: xenfs snd_pcm snd_page_alloc snd_timer snd coretemp soundcore crc32c_intel evdev ghash_clmulni_intel joydev pcspkr cryptd ext3 mbcache jbd xen_netfront xen_blkfront [ 1826.591220] Pid: 0, comm: swapper/0 Tainted: G D 3.2.0-4-amd64 #1 Debian 3.2.41-2 [ 1826.591220] Call Trace: [ 1826.591220] [<ffffffff81046a55>] ? warn_slowpath_common+0x78/0x8c [ 1826.591220] [<ffffffff81066447>] ? ktime_get+0x1e/0x86 [ 1826.591220] [<ffffffff8106c21b>] ? tick_nohz_stop_sched_tick+0x61/0x327 [ 1826.591220] [<ffffffff8100d210>] ? cpu_idle+0x72/0xf2 [ 1826.591220] [<ffffffff816abb36>] ? start_kernel+0x3b8/0x3c3 [ 1826.591220] [<ffffffff816ad4d9>] ? xen_start_kernel+0x412/0x418 [ 1826.591220] ---[ end trace 60605833d257c852 ]--- -- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
On Fri, May 24, 2013 at 03:38:46PM +0100, George Dunlap wrote:> On Fri, May 24, 2013 at 3:11 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > On Fri, May 24, 2013 at 01:46:11PM +0100, George Dunlap wrote: > >> On Fri, May 24, 2013 at 11:40 AM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote: > >> > Hello. I''m try 4.3.0-rc2 and have very very small speed of live > >> > migration (is about 2 hours for 1Gb memory) but if i start xend and > >> > doing xm migrate domain successeful migrated to destionation and this > >> > takes is about 3-6 seconds (i''m use infiniband) > >> > > >> > Why this happening? > >> > >> Hmm -- I think this has been mentioned a couple of times, but I don''t > >> think anyone has looked into it. I''ll see if I can track it down. > > > > I''ve noticed on Xen 4.1 (and Xen 4.3) that if I use a 32-bit dom0 and > > local migrated any 32/64 PV/PVHVM (so four variations) guest it is incredibly slow. > > > > (So xm save <..> in an iSCSI disk && xm restore ...) > > Did you mean xm save or xl save?''xl'' for Xen 4.3 and ''xm'' for Xen 4.1> > -George
On Sat, May 25, 2013 at 12:15:44AM +0400, Vasiliy Tolstov wrote:> 2013/5/24 George Dunlap <George.Dunlap@eu.citrix.com>: > > > > Did you mean xm save or xl save? > > > In my case xl save crash domU with messages like followind. And domU > crashes centos 2.6.18 and 2.6.32 (xenlinux) and never 3.8.6 kernel and > 3.4...Is the 3.8.6 crashing at the same point?> > [ 1826.587110] PM: late freeze of devices complete after 0.048 msecs > [ 1826.591220] ------------[ cut here ]------------ > [ 1826.591220] kernel BUG at > /build/buildd-linux_3.2.41-2-amd64-Wvc92F/linux-3.2.41/drivers/xen/events.c:1489!That looks to be this (https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c) if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, &bind_virq) != 0) BUG(); which is odd. Would you be able to instrument evtchn_bind_virq (this is in Xen) with some printks, like this (hand''t compile tested it): diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 2d7afc9..c109cee 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -270,24 +270,34 @@ static long evtchn_bind_virq(evtchn_bind_virq_t *bind) int port, virq = bind->virq, vcpu = bind->vcpu; long rc = 0; - if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) + if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) } +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, virq, -EINVAL); return -EINVAL; - - if ( virq_is_global(virq) && (vcpu != 0) ) + } + if ( virq_is_global(virq) && (vcpu != 0) ) { +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq_is_global:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, virq_is_global(virq), -EINVAL); return -EINVAL; - + } if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - ((v = d->vcpu[vcpu]) == NULL) ) + ((v = d->vcpu[vcpu]) == NULL) ) { +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, max_vcpus:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, v, d->max_vcpus, -ENOENT); return -ENOENT; - + } spin_lock(&d->event_lock); - if ( v->virq_to_evtchn[virq] != 0 ) + if ( v->virq_to_evtchn[virq] != 0 ) { +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, evtchn:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, v->virq_to_evtchn[virq] , -EEXIST); ERROR_EXIT(-EEXIST); - - if ( (port = get_free_port(d)) < 0 ) + } + if ( (port = get_free_port(d)) < 0 ) { +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, port, port); ERROR_EXIT(port); - + } chn = evtchn_from_port(d, port); chn->state = ECS_VIRQ; chn->notify_vcpu_id = vcpu;
2013/5/25 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:> Is the 3.8.6 crashing at the same point?Yes, but now i can''t get trace. may be after compile new xen with debug printfs that you provide.> That looks to be this (https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c) > > if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, > &bind_virq) != 0) > BUG(); > > which is odd. Would you be able to instrument evtchn_bind_virq (this is > in Xen) with some printks, like this (hand''t compile tested it):I''m try it. Another bug happened while i change memory inside domU (ubuntu lts precise 10.04 with stock kernel) (XEN) traps.c:3072: GPF (0060): ffff82c4c015e73e -> ffff82c4c022a3ff (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1156e85 (pfn 10fb9) (XEN) mm.c:2990:d17 Error while pinning mfn 1156e85 (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 11ae21b (pfn 39c23) (XEN) mm.c:2990:d17 Error while pinning mfn 11ae21b (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1156e85 (pfn 10fb9) (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392 (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for type 2000000000000000: caf=8000000000000003 taf=2000000000000001 (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3 (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001 (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2990:d17 Error while pinning mfn 11ac89d (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1156e85 (pfn 10fb9) (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392 (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for type 2000000000000000: caf=8000000000000003 taf=2000000000000001 (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3 (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for type 3000000000000000: caf=8000000000000002 taf=3000000000000001 (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2990:d17 Error while pinning mfn 11ad971 (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000, od=0000000000000000, caf=180000000000000, taf=3000000000000001 (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976 (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2758:d17 Error while installing new baseptr 11ac89d (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000, od=0000000000000000, caf=180000000000000, taf=3000000000000001 (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976 (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:3116:d17 Error while installing new mfn 11ad971 -- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
On Mon, May 27, 2013 at 09:32:51AM +0400, Vasiliy Tolstov wrote:> 2013/5/25 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>: > > Is the 3.8.6 crashing at the same point? > > Yes, but now i can''t get trace. may be after compile new xen with > debug printfs that you provide. > > > That looks to be this (https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c) > > > > if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, > > &bind_virq) != 0) > > BUG(); > > > > which is odd. Would you be able to instrument evtchn_bind_virq (this is > > in Xen) with some printks, like this (hand''t compile tested it): > > I''m try it.Thank you.> > > Another bug happened while i change memory inside domU (ubuntu lts > precise 10.04 with stock kernel)I don''t understand what you mean by ''change memory'' What is it that you do to ''change memory''? Is that with a PV guest? HVM guest?> > (XEN) traps.c:3072: GPF (0060): ffff82c4c015e73e -> ffff82c4c022a3ff > (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp > 1000000000000000) for mfn 1156e85 (pfn 10fb9) > (XEN) mm.c:2990:d17 Error while pinning mfn 1156e85 > (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp > 1000000000000000) for mfn 11ae21b (pfn 39c23) > (XEN) mm.c:2990:d17 Error while pinning mfn 11ae21b > (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp > 1000000000000000) for mfn 1156e85 (pfn 10fb9) > (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms > (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392 > (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for > type 2000000000000000: caf=8000000000000003 taf=2000000000000001 > (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms > (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3 > (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for > type 3000000000000000: caf=8000000000000003 taf=3000000000000001 > (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms > (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 > (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for > type 4000000000000000: caf=8000000000000003 taf=4000000000000001 > (XEN) mm.c:2990:d17 Error while pinning mfn 11ac89d > (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp > 1000000000000000) for mfn 1156e85 (pfn 10fb9) > (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms > (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392 > (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for > type 2000000000000000: caf=8000000000000003 taf=2000000000000001 > (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms > (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3 > (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for > type 3000000000000000: caf=8000000000000002 taf=3000000000000001 > (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms > (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 > (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for > type 4000000000000000: caf=8000000000000003 taf=4000000000000001 > (XEN) mm.c:2990:d17 Error while pinning mfn 11ad971 > (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000, > od=0000000000000000, caf=180000000000000, taf=3000000000000001 > (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976 > (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms > (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 > (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for > type 4000000000000000: caf=8000000000000003 taf=4000000000000001 > (XEN) mm.c:2758:d17 Error while installing new baseptr 11ac89d > (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000, > od=0000000000000000, caf=180000000000000, taf=3000000000000001 > (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976 > (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms > (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 > (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for > type 4000000000000000: caf=8000000000000003 taf=4000000000000001 > (XEN) mm.c:3116:d17 Error while installing new mfn 11ad971 > > > > -- > Vasiliy Tolstov, > e-mail: v.tolstov@selfip.ru > jabber: vase@selfip.ru
change memory means echo xxx > /sys/class/xxx/xxx/target inside domU , I''m use pv guest. 2013/5/28 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:> On Mon, May 27, 2013 at 09:32:51AM +0400, Vasiliy Tolstov wrote: >> 2013/5/25 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>: >> > Is the 3.8.6 crashing at the same point? >> >> Yes, but now i can''t get trace. may be after compile new xen with >> debug printfs that you provide. >> >> > That looks to be this (https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c) >> > >> > if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, >> > &bind_virq) != 0) >> > BUG(); >> > >> > which is odd. Would you be able to instrument evtchn_bind_virq (this is >> > in Xen) with some printks, like this (hand''t compile tested it): >> >> I''m try it. > > Thank you. > >> >> >> Another bug happened while i change memory inside domU (ubuntu lts >> precise 10.04 with stock kernel) > > I don''t understand what you mean by ''change memory'' What is it that you do to > ''change memory''? > > Is that with a PV guest? HVM guest? >> >> (XEN) traps.c:3072: GPF (0060): ffff82c4c015e73e -> ffff82c4c022a3ff >> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp >> 1000000000000000) for mfn 1156e85 (pfn 10fb9) >> (XEN) mm.c:2990:d17 Error while pinning mfn 1156e85 >> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp >> 1000000000000000) for mfn 11ae21b (pfn 39c23) >> (XEN) mm.c:2990:d17 Error while pinning mfn 11ae21b >> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp >> 1000000000000000) for mfn 1156e85 (pfn 10fb9) >> (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms >> (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392 >> (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for >> type 2000000000000000: caf=8000000000000003 taf=2000000000000001 >> (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms >> (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3 >> (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for >> type 3000000000000000: caf=8000000000000003 taf=3000000000000001 >> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms >> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 >> (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for >> type 4000000000000000: caf=8000000000000003 taf=4000000000000001 >> (XEN) mm.c:2990:d17 Error while pinning mfn 11ac89d >> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp >> 1000000000000000) for mfn 1156e85 (pfn 10fb9) >> (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms >> (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392 >> (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for >> type 2000000000000000: caf=8000000000000003 taf=2000000000000001 >> (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms >> (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3 >> (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for >> type 3000000000000000: caf=8000000000000002 taf=3000000000000001 >> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms >> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 >> (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for >> type 4000000000000000: caf=8000000000000003 taf=4000000000000001 >> (XEN) mm.c:2990:d17 Error while pinning mfn 11ad971 >> (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000, >> od=0000000000000000, caf=180000000000000, taf=3000000000000001 >> (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976 >> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms >> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 >> (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for >> type 4000000000000000: caf=8000000000000003 taf=4000000000000001 >> (XEN) mm.c:2758:d17 Error while installing new baseptr 11ac89d >> (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000, >> od=0000000000000000, caf=180000000000000, taf=3000000000000001 >> (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976 >> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms >> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0 >> (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for >> type 4000000000000000: caf=8000000000000003 taf=4000000000000001 >> (XEN) mm.c:3116:d17 Error while installing new mfn 11ad971 >> >> >> >> -- >> Vasiliy Tolstov, >> e-mail: v.tolstov@selfip.ru >> jabber: vase@selfip.ru-- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
migration with qemu-xen-traditional: xen16:~ # xl migrate --debug 21-10887 ib-xen06.kh11.clodo.ru the global config option vifscript is deprecated, please switch to vif.default.script the global config option vifscript is deprecated, please switch to vif.default.script migration target: Ready to receive domain. Saving to migration stream new xl format (info 0x0/0x0/631) Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/631) Savefile contains xl domain config xc: progress: Reloading memory pages: 53248/1048576 5% xc: progress: Reloading memory pages: 105472/1048576 10% xc: progress: Reloading memory pages: 157658/1048576 15% xc: progress: Reloading memory pages: 209882/1048576 20% xc: progress: Reloading memory pages: 263130/1048576 25% migration receiver stream contained unexpected data instead of ready message (command run was: exec ssh ib-xen06.kh11.clodo.ru xl migrate-receive -d ) migration target: Transfer complete, requesting permission to start domain. libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading GO message from migration stream migration target: Failure, destroying our copy. migration child [15697] not exiting, no longer waiting (exit status will be unreported) Migration failed, resuming at sender. migration target: Cleanup OK, granting sender permission to resume. xl dmesg: (XEN) event_channel.c:297:d1 d1v0 [evtchn_bind_virq:297], port:3, rc:-17 (XEN) event_channel.c:298:d1 EVTCHNOP failure: error -17 xl console: [ 981.869689] PM: late freeze of devices complete after 0.073 msecs [ 981.873833] ------------[ cut here ]------------ [ 981.873833] kernel BUG at /build/buildd-linux_3.2.41-2+deb7u2-amd64-NHQI9B/linux-3.2.41/drivers/xen/events.c:1489! [ 981.873833] invalid opcode: 0000 [#1] SMP [ 981.873833] CPU 0 [ 981.873833] Modules linked in: xenfs snd_pcm snd_page_alloc snd_timer snd coretemp soundcore crc32c_intel evdev joydev pcspkr ext3 mbcache jbd xen_blkfront xen_netfront [ 981.873833] [ 981.873833] Pid: 6, comm: migration/0 Not tainted 3.2.0-4-amd64 #1 Debian 3.2.41-2+deb7u2 [ 981.873833] RIP: e030:[<ffffffff8121c4e2>] [<ffffffff8121c4e2>] xen_irq_resume+0xbd/0x28b [ 981.873833] RSP: e02b:ffff88001ae99d20 EFLAGS: 00010082 [ 981.873833] RAX: ffffffffffffffef RBX: 0000000000000000 RCX: 0000000000000001 [ 981.873833] RDX: 0000000000000000 RSI: 00000000deadbeef RDI: 00000000deadbeef [ 981.873833] RBP: 0000000000000000 R08: ffff88001f026e00 R09: ffff88001ae99d48 [ 981.873833] R10: 0000000000013780 R11: 0000000000013780 R12: 0000000000000010 [ 981.873833] R13: 0000000000010dd0 R14: 0000000000010d70 R15: 0000000000000000 [ 981.873833] FS: 00007f1fff8d37a0(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [ 981.873833] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 981.873833] CR2: 000000f8400b5410 CR3: 00000000033ad000 CR4: 0000000000002660 [ 981.873833] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 981.873833] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 981.873833] Process migration/0 (pid: 6, threadinfo ffff88001ae98000, task ffff88001ae8e0c0) [ 981.873833] Stack: [ 981.873833] 0000000000013780 0000000000000000 ffff880000000000 0000000000010d70 [ 981.873833] 0000160000000000 0000000000000000 ffff88001affbddc ffffffff810050a2 [ 981.873833] 0000000000013780 ffffea00005ab258 ffffffff810043e3 ffff88001affbe40 [ 981.873833] Call Trace: [ 981.873833] [<ffffffff810050a2>] ? xen_mc_issue+0x3e/0x50 [ 981.873833] [<ffffffff810043e3>] ? arch_local_irq_restore+0x7/0x8 [ 981.873833] [<ffffffff8121ca3b>] ? xen_suspend+0x73/0x8b [ 981.873833] [<ffffffff81087d91>] ? stop_machine_cpu_stop+0x89/0xc3 [ 981.873833] [<ffffffff81087d08>] ? queue_stop_cpus_work+0xa5/0xa5 [ 981.873833] [<ffffffff81087b62>] ? cpu_stopper_thread+0xea/0x177 [ 981.873833] [<ffffffff810359d7>] ? arch_local_irq_enable+0x7/0x8 [ 981.873833] [<ffffffff81039854>] ? finish_task_switch+0x88/0xb9 [ 981.873833] [<ffffffff8134c694>] ? __schedule+0x5ac/0x5c3 [ 981.873833] [<ffffffff81087a78>] ? cpu_stop_signal_done+0x2a/0x2a [ 981.873833] [<ffffffff8105f329>] ? kthread+0x76/0x7e [ 981.873833] [<ffffffff81354b34>] ? kernel_thread_helper+0x4/0x10 [ 981.873833] [<ffffffff81352bf3>] ? int_ret_from_sys_call+0x7/0x1b [ 981.873833] [<ffffffff8134dd3c>] ? retint_restore_args+0x5/0x6 [ 981.873833] [<ffffffff81354b30>] ? gs_change+0x13/0x13 [ 981.873833] Code: 74 79 44 89 e7 e8 77 ee ff ff 39 e8 74 02 0f 0b 48 8d 74 24 28 bf 01 00 00 00 89 6c 24 28 89 5c 24 2c e8 19 ec ff ff 85 c0 74 02 <0f> 0b 8b 44 24 30 44 89 e7 89 44 24 14 e8 58 e9 ff ff 0f b7 4c [ 981.873833] RIP [<ffffffff8121c4e2>] xen_irq_resume+0xbd/0x28b [ 981.873833] RSP <ffff88001ae99d20> [ 981.873833] ---[ end trace 8243bb8e343ac633 ]--- [ 981.873833] ------------[ cut here ]------------ [ 981.873833] WARNING: at /build/buildd-linux_3.2.41-2+deb7u2-amd64-NHQI9B/linux-3.2.41/kernel/time/timekeeping.c:265 ktime_get+0x1e/0x86() [ 981.873833] Modules linked in: xenfs snd_pcm snd_page_alloc snd_timer snd coretemp soundcore crc32c_intel evdev joydev pcspkr ext3 mbcache jbd xen_blkfront xen_netfront [ 981.873833] Pid: 0, comm: swapper/0 Tainted: G D 3.2.0-4-amd64 #1 Debian 3.2.41-2+deb7u2 [ 981.873833] Call Trace: [ 981.873833] [<ffffffff81046a55>] ? warn_slowpath_common+0x78/0x8c [ 981.873833] [<ffffffff8106644f>] ? ktime_get+0x1e/0x86 [ 981.873833] [<ffffffff8106c223>] ? tick_nohz_stop_sched_tick+0x61/0x327 [ 981.873833] [<ffffffff8100d210>] ? cpu_idle+0x72/0xf2 [ 981.873833] [<ffffffff816abb36>] ? start_kernel+0x3b8/0x3c3 [ 981.873833] [<ffffffff816ad4d9>] ? xen_start_kernel+0x412/0x418 [ 981.873833] ---[ end trace 8243bb8e343ac634 ]--- 2013/5/25 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:> On Sat, May 25, 2013 at 12:15:44AM +0400, Vasiliy Tolstov wrote: >> 2013/5/24 George Dunlap <George.Dunlap@eu.citrix.com>: >> > >> > Did you mean xm save or xl save? >> >> >> In my case xl save crash domU with messages like followind. And domU >> crashes centos 2.6.18 and 2.6.32 (xenlinux) and never 3.8.6 kernel and >> 3.4... > > Is the 3.8.6 crashing at the same point? >> >> [ 1826.587110] PM: late freeze of devices complete after 0.048 msecs >> [ 1826.591220] ------------[ cut here ]------------ >> [ 1826.591220] kernel BUG at >> /build/buildd-linux_3.2.41-2-amd64-Wvc92F/linux-3.2.41/drivers/xen/events.c:1489! > > That looks to be this (https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c) > > if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, > &bind_virq) != 0) > BUG(); > > which is odd. Would you be able to instrument evtchn_bind_virq (this is > in Xen) with some printks, like this (hand''t compile tested it): > > diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c > index 2d7afc9..c109cee 100644 > --- a/xen/common/event_channel.c > +++ b/xen/common/event_channel.c > @@ -270,24 +270,34 @@ static long evtchn_bind_virq(evtchn_bind_virq_t *bind) > int port, virq = bind->virq, vcpu = bind->vcpu; > long rc = 0; > > - if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) > + if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) } > +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq:%d, rc:%ld\n", d->domain_id, > + vcpu, __func__,__LINE__, virq, -EINVAL); > return -EINVAL; > - > - if ( virq_is_global(virq) && (vcpu != 0) ) > + } > + if ( virq_is_global(virq) && (vcpu != 0) ) { > +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq_is_global:%d, rc:%ld\n", d->domain_id, > + vcpu, __func__,__LINE__, virq_is_global(virq), -EINVAL); > return -EINVAL; > - > + } > if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || > - ((v = d->vcpu[vcpu]) == NULL) ) > + ((v = d->vcpu[vcpu]) == NULL) ) { > +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, max_vcpus:%d, rc:%ld\n", d->domain_id, > + vcpu, __func__,__LINE__, v, d->max_vcpus, -ENOENT); > return -ENOENT; > - > + } > spin_lock(&d->event_lock); > > - if ( v->virq_to_evtchn[virq] != 0 ) > + if ( v->virq_to_evtchn[virq] != 0 ) { > +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, evtchn:%d, rc:%ld\n", d->domain_id, > + vcpu, __func__,__LINE__, v->virq_to_evtchn[virq] , -EEXIST); > ERROR_EXIT(-EEXIST); > - > - if ( (port = get_free_port(d)) < 0 ) > + } > + if ( (port = get_free_port(d)) < 0 ) { > +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n", d->domain_id, > + vcpu, __func__,__LINE__, port, port); > ERROR_EXIT(port); > - > + } > chn = evtchn_from_port(d, port); > chn->state = ECS_VIRQ; > chn->notify_vcpu_id = vcpu;-- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
On Fri, May 31, 2013 at 08:56:47AM +0400, Vasiliy Tolstov wrote:> migration with qemu-xen-traditional: > xen16:~ # xl migrate --debug 21-10887 ib-xen06.kh11.clodo.ru > the global config option vifscript is deprecated, please switch to > vif.default.script > the global config option vifscript is deprecated, please switch to > vif.default.script > migration target: Ready to receive domain. > Saving to migration stream new xl format (info 0x0/0x0/631) > Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/631) > Savefile contains xl domain config > xc: progress: Reloading memory pages: 53248/1048576 5% > xc: progress: Reloading memory pages: 105472/1048576 10% > xc: progress: Reloading memory pages: 157658/1048576 15% > xc: progress: Reloading memory pages: 209882/1048576 20% > xc: progress: Reloading memory pages: 263130/1048576 25% > migration receiver stream contained unexpected data instead of ready message > (command run was: exec ssh ib-xen06.kh11.clodo.ru xl migrate-receive -d ) > migration target: Transfer complete, requesting permission to start domain. > libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream > truncated reading GO message from migration stream > migration target: Failure, destroying our copy. > migration child [15697] not exiting, no longer waiting (exit status > will be unreported) > Migration failed, resuming at sender. > migration target: Cleanup OK, granting sender permission to resume. > > xl dmesg: > (XEN) event_channel.c:297:d1 d1v0 [evtchn_bind_virq:297], port:3, rc:-17 > (XEN) event_channel.c:298:d1 EVTCHNOP failure: error -17The non-debug version tells me it is: 289 if ( (port = get_free_port(d)) < 0 ) 290 ERROR_EXIT(port); Which gets -EEXIST from get_free_port. But get_free_port only returns -EINVAL, -ENOMEM, and -ENOSPC in failure modes. But we get -EEXIST? Could you re-run git diff and attach output to this email? I think you tweaked the debug code a bit so I am looking at something different?
2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:> The non-debug version tells me it is: > > 289 if ( (port = get_free_port(d)) < 0 ) > 290 ERROR_EXIT(port); > > Which gets -EEXIST from get_free_port. But get_free_port only returns > -EINVAL, -ENOMEM, and -ENOSPC in failure modes. > > But we get -EEXIST? Could you re-run git diff and attach output to > this email? I think you tweaked the debug code a bit so I am looking > at something different?Oh sorry. Yes i modify you patch to this version: --- a/xen/common/event_channel.c 2013-05-16 15:05:25.000000000 +0400 +++ b/xen/common/event_channel.c 2013-05-27 10:53:05.000000000 +0400 @@ -271,23 +271,38 @@ static long evtchn_bind_virq(evtchn_bind int port, virq = bind->virq, vcpu = bind->vcpu; long rc = 0; - if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) + if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) { + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, virq, (long)-EINVAL); return -EINVAL; + } - if ( virq_is_global(virq) && (vcpu != 0) ) + if ( virq_is_global(virq) && (vcpu != 0) ) { + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq_is_global:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, virq_is_global(virq), (long)-EINVAL); return -EINVAL; + } if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - ((v = d->vcpu[vcpu]) == NULL) ) + ((v = d->vcpu[vcpu]) == NULL) ) { + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, max_vcpus:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, d->vcpu[vcpu], d->max_vcpus, (long)-ENOENT); return -ENOENT; + } spin_lock(&d->event_lock); - if ( v->virq_to_evtchn[virq] != 0 ) + if ( v->virq_to_evtchn[virq] != 0 ) { + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, v->virq_to_evtchn[virq], (long)-EEXIST); ERROR_EXIT(-EEXIST); + } - if ( (port = get_free_port(d)) < 0 ) + if ( (port = get_free_port(d)) < 0 ) { + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n", d->domain_id, + vcpu, __func__,__LINE__, port, (long)-EEXIST); ERROR_EXIT(port); + } chn = evtchn_from_port(d, port); chn->state = ECS_VIRQ; -- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
Konrad Rzeszutek Wilk
2013-Jun-05 18:50 UTC
Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report
On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov wrote:> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>: > > The non-debug version tells me it is: > > > > 289 if ( (port = get_free_port(d)) < 0 ) > > 290 ERROR_EXIT(port); > > > > Which gets -EEXIST from get_free_port. But get_free_port only returns > > -EINVAL, -ENOMEM, and -ENOSPC in failure modes. > > > > But we get -EEXIST? Could you re-run git diff and attach output to > > this email? I think you tweaked the debug code a bit so I am looking > > at something different? > > > Oh sorry. Yes i modify you patch to this version:That is OK.> - if ( v->virq_to_evtchn[virq] != 0 ) > + if ( v->virq_to_evtchn[virq] != 0 ) { > + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n", > d->domain_id, > + vcpu, __func__,__LINE__, v->virq_to_evtchn[virq], > (long)-EEXIST); > ERROR_EXIT(-EEXIST);OK, so the value was 3 (event channel), and I am not sure what the virq value was. But it looks as if somebody did not clear that and we are tripping over it. George, have you seen issues with events not being cleared during migration?
George Dunlap
2013-Jun-06 09:23 UTC
Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report
On 05/06/13 19:50, Konrad Rzeszutek Wilk wrote:> On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov wrote: >> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>: >>> The non-debug version tells me it is: >>> >>> 289 if ( (port = get_free_port(d)) < 0 ) >>> 290 ERROR_EXIT(port); >>> >>> Which gets -EEXIST from get_free_port. But get_free_port only returns >>> -EINVAL, -ENOMEM, and -ENOSPC in failure modes. >>> >>> But we get -EEXIST? Could you re-run git diff and attach output to >>> this email? I think you tweaked the debug code a bit so I am looking >>> at something different? >> >> Oh sorry. Yes i modify you patch to this version: > That is OK. >> - if ( v->virq_to_evtchn[virq] != 0 ) >> + if ( v->virq_to_evtchn[virq] != 0 ) { >> + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n", >> d->domain_id, >> + vcpu, __func__,__LINE__, v->virq_to_evtchn[virq], >> (long)-EEXIST); >> ERROR_EXIT(-EEXIST); > OK, so the value was 3 (event channel), and I am not sure what the virq value > was. But it looks as if somebody did not clear that and we are > tripping over it. > > George, have you seen issues with events not being cleared during migration?I haven''t, no. Do you know where the virq is supposed to be clear? The BUG() is in restore_cpu_virqs(), but at a quick glance I can''t find a corresponding function to tear down virqs. -George
George Dunlap
2013-Jun-06 09:25 UTC
Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report
On 05/06/13 19:50, Konrad Rzeszutek Wilk wrote:> On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov wrote: >> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>: >>> The non-debug version tells me it is: >>> >>> 289 if ( (port = get_free_port(d)) < 0 ) >>> 290 ERROR_EXIT(port); >>> >>> Which gets -EEXIST from get_free_port. But get_free_port only returns >>> -EINVAL, -ENOMEM, and -ENOSPC in failure modes. >>> >>> But we get -EEXIST? Could you re-run git diff and attach output to >>> this email? I think you tweaked the debug code a bit so I am looking >>> at something different? >> >> Oh sorry. Yes i modify you patch to this version: > That is OK. >> - if ( v->virq_to_evtchn[virq] != 0 ) >> + if ( v->virq_to_evtchn[virq] != 0 ) { >> + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n", >> d->domain_id, >> + vcpu, __func__,__LINE__, v->virq_to_evtchn[virq], >> (long)-EEXIST); >> ERROR_EXIT(-EEXIST); > OK, so the value was 3 (event channel), and I am not sure what the virq value > was. But it looks as if somebody did not clear that and we are > tripping over it. > > George, have you seen issues with events not being cleared during migration?The other possibility, of course, is that the virq has been cleared, but that somehow the kernel is requesting the same virq twice. -George
Vasiliy Tolstov
2013-Jun-13 11:22 UTC
Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report
Any news about this bug? I don''t understand why in case of using xend/xm bug not appeared. I think that xl and xm use identical sequence and identical operations when migrate domain... 2013/6/6 George Dunlap <george.dunlap@eu.citrix.com>:> On 05/06/13 19:50, Konrad Rzeszutek Wilk wrote: >> >> On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov wrote: >>> >>> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>: >>>> >>>> The non-debug version tells me it is: >>>> >>>> 289 if ( (port = get_free_port(d)) < 0 ) >>>> 290 ERROR_EXIT(port); >>>> >>>> Which gets -EEXIST from get_free_port. But get_free_port only returns >>>> -EINVAL, -ENOMEM, and -ENOSPC in failure modes. >>>> >>>> But we get -EEXIST? Could you re-run git diff and attach output to >>>> this email? I think you tweaked the debug code a bit so I am looking >>>> at something different? >>> >>> >>> Oh sorry. Yes i modify you patch to this version: >> >> That is OK. >>> >>> - if ( v->virq_to_evtchn[virq] != 0 ) >>> + if ( v->virq_to_evtchn[virq] != 0 ) { >>> + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n", >>> d->domain_id, >>> + vcpu, __func__,__LINE__, v->virq_to_evtchn[virq], >>> (long)-EEXIST); >>> ERROR_EXIT(-EEXIST); >> >> OK, so the value was 3 (event channel), and I am not sure what the virq >> value >> was. But it looks as if somebody did not clear that and we are >> tripping over it. >> >> George, have you seen issues with events not being cleared during >> migration? > > > The other possibility, of course, is that the virq has been cleared, but > that somehow the kernel is requesting the same virq twice. > > -George > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel-- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
Vasiliy Tolstov
2013-Jun-13 11:24 UTC
Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report
Konrad, George do you have any news about this bug? I can test xen 4.3-rc4 but if this bug not have been fixed i think my tests can''t be productive... 2013/6/5 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:> On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov wrote: >> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>: >> > The non-debug version tells me it is: >> > >> > 289 if ( (port = get_free_port(d)) < 0 ) >> > 290 ERROR_EXIT(port); >> > >> > Which gets -EEXIST from get_free_port. But get_free_port only returns >> > -EINVAL, -ENOMEM, and -ENOSPC in failure modes. >> > >> > But we get -EEXIST? Could you re-run git diff and attach output to >> > this email? I think you tweaked the debug code a bit so I am looking >> > at something different? >> >> >> Oh sorry. Yes i modify you patch to this version: > > That is OK. >> - if ( v->virq_to_evtchn[virq] != 0 ) >> + if ( v->virq_to_evtchn[virq] != 0 ) { >> + gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n", >> d->domain_id, >> + vcpu, __func__,__LINE__, v->virq_to_evtchn[virq], >> (long)-EEXIST); >> ERROR_EXIT(-EEXIST); > > OK, so the value was 3 (event channel), and I am not sure what the virq value > was. But it looks as if somebody did not clear that and we are > tripping over it. > > George, have you seen issues with events not being cleared during migration?-- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
Konrad Rzeszutek Wilk
2013-Jun-13 13:14 UTC
Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report
On Thu, Jun 13, 2013 at 03:22:17PM +0400, Vasiliy Tolstov wrote:> Any news about this bug? I don''t understand why in case of using > xend/xm bug not appeared. > I think that xl and xm use identical sequence and identical operations > when migrate domain...Hey Vasiliy, I''ve been busy with another bug in the Xen code and am wrapping it up now.
Vasiliy Tolstov
2013-Jun-13 13:17 UTC
Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report
2013/6/13 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:> Hey Vasiliy, > > I''ve been busy with another bug in the Xen code and am wrapping it up > now.Very big thanks. I''m waiting =) -- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru