thr3ads.net - Xen devel - xen 4.3 test report [May 2013]

If this information is useful, please help other people find it:
Share via:

Vasiliy Tolstov

2013-May-24 10:40 UTC

xen 4.3 test report

Hello. I''m try 4.3.0-rc2 and have very very small speed of live
migration (is about 2 hours for 1Gb memory) but if i start xend and
doing xm migrate domain successeful migrated to destionation and this
takes is about 3-6 seconds (i''m use infiniband)

Why this happening?

And second question - why i can''t migrate from 4.1.3 (xend) to
4.3.0-rc2 (xend) ?

--
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

George Dunlap

2013-May-24 12:46 UTC

head link

Re: xen 4.3 test report

On Fri, May 24, 2013 at 11:40 AM, Vasiliy Tolstov <v.tolstov@selfip.ru>
wrote:> Hello. I''m try 4.3.0-rc2 and have very very small speed of live
> migration (is about 2 hours for 1Gb memory) but if i start xend and
> doing xm migrate domain successeful migrated to destionation and this
> takes is about 3-6 seconds (i''m use infiniband)
>
> Why this happening?
Hmm -- I think this has been mentioned a couple of times, but I don''t
think anyone has looked into it.  I''ll see if I can track it down.
>
> And second question - why i can''t migrate from 4.1.3 (xend) to
> 4.3.0-rc2 (xend) ?
I think migration is only supported for one major release -- does it
work from 4.1 to 4.2, then 4.2 to 4.3?

 -George

Vasiliy Tolstov

2013-May-24 13:15 UTC

head link

Re: xen 4.3 test report

2013/5/24 George Dunlap <dunlapg@umich.edu>:> Hmm -- I think this has been mentioned a couple of times, but I
don''t
> think anyone has looked into it.  I''ll see if I can track it down.
>
Thanks!
>>
>> And second question - why i can''t migrate from 4.1.3 (xend) to
>> 4.3.0-rc2 (xend) ?
>
> I think migration is only supported for one major release -- does it
> work from 4.1 to 4.2, then 4.2 to 4.3?
No, because xen 4.2.2 migration does work like in 4.3.0 - very very small =(


--
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Konrad Rzeszutek Wilk

2013-May-24 14:11 UTC

head link

Re: xen 4.3 test report

On Fri, May 24, 2013 at 01:46:11PM +0100, George Dunlap
wrote:> On Fri, May 24, 2013 at 11:40 AM, Vasiliy Tolstov
<v.tolstov@selfip.ru> wrote:
> > Hello. I''m try 4.3.0-rc2 and have very very small speed of
live
> > migration (is about 2 hours for 1Gb memory) but if i start xend and
> > doing xm migrate domain successeful migrated to destionation and this
> > takes is about 3-6 seconds (i''m use infiniband)
> >
> > Why this happening?
> 
> Hmm -- I think this has been mentioned a couple of times, but I
don''t
> think anyone has looked into it.  I''ll see if I can track it down.
I''ve noticed on Xen 4.1 (and Xen 4.3) that if I use a 32-bit dom0 and
local migrated any 32/64 PV/PVHVM (so four variations) guest it is incredibly
slow.

(So xm save <..> in an iSCSI disk && xm restore ...)

The ''perf report'' shows that dom0 spends most of its time in
xen_version
(which is the yield type call).

If the same operation is done, but dom0 is a 64-bit, it is quick.

And I think this is the issue that Ian''s Jackson nighlty test-system is
running in-to - which is that the migration is sooo slow that it times out.
(This is with real-linux).

Now the oddity is that this I saw this with Xen 4.1, but Vasiliy says he
didn''t see
this with Xen 4.1 - so perhaps the issue I am seeing is different.

Hm, I should re-run this test once more with xen 4.3 just to confirm.
> 
> >
> > And second question - why i can''t migrate from 4.1.3 (xend)
to
> > 4.3.0-rc2 (xend) ?
> 
> I think migration is only supported for one major release -- does it
> work from 4.1 to 4.2, then 4.2 to 4.3?
> 
>  -George
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

George Dunlap

2013-May-24 14:38 UTC

head link

Re: xen 4.3 test report

On Fri, May 24, 2013 at 3:11 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:> On Fri, May 24, 2013 at 01:46:11PM +0100, George Dunlap wrote:
>> On Fri, May 24, 2013 at 11:40 AM, Vasiliy Tolstov
<v.tolstov@selfip.ru> wrote:
>> > Hello. I''m try 4.3.0-rc2 and have very very small speed
of live
>> > migration (is about 2 hours for 1Gb memory) but if i start xend
and
>> > doing xm migrate domain successeful migrated to destionation and
this
>> > takes is about 3-6 seconds (i''m use infiniband)
>> >
>> > Why this happening?
>>
>> Hmm -- I think this has been mentioned a couple of times, but I
don''t
>> think anyone has looked into it.  I''ll see if I can track it
down.
>
> I''ve noticed on Xen 4.1 (and Xen 4.3) that if I use a 32-bit dom0
and
> local migrated any 32/64 PV/PVHVM (so four variations) guest it is
incredibly slow.
>
> (So xm save <..> in an iSCSI disk && xm restore ...)
Did you mean xm save or xl save?

 -George

Vasiliy Tolstov

2013-May-24 20:13 UTC

head link

Re: xen 4.3 test report

2013/5/24 Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com>:> I''ve noticed on Xen 4.1 (and Xen 4.3) that if I use a 32-bit dom0
and
> local migrated any 32/64 PV/PVHVM (so four variations) guest it is
incredibly slow.
>
> (So xm save <..> in an iSCSI disk && xm restore ...)
>
> The ''perf report'' shows that dom0 spends most of its time
in xen_version
> (which is the yield type call).
>
> If the same operation is done, but dom0 is a 64-bit, it is quick.
>
> And I think this is the issue that Ian''s Jackson nighlty
test-system is
> running in-to - which is that the migration is sooo slow that it times out.
> (This is with real-linux).
>
> Now the oddity is that this I saw this with Xen 4.1, but Vasiliy says he
didn''t see
> this with Xen 4.1 - so perhaps the issue I am seeing is different.
>
> Hm, I should re-run this test once more with xen 4.3 just to confirm.

I''m didnt see this slowdown may be because i''m use 4.1.3 from
opensuse
which have backported some patches and forward ported another patches
from suse....




--
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Vasiliy Tolstov

2013-May-24 20:15 UTC

head link

Re: xen 4.3 test report

2013/5/24 George Dunlap
<George.Dunlap@eu.citrix.com>:>
> Did you mean xm save or xl save?

In my case xl save crash domU with messages like followind. And domU
crashes centos 2.6.18 and 2.6.32 (xenlinux) and never 3.8.6 kernel and
3.4...

[ 1826.587110] PM: late freeze of devices complete after 0.048 msecs
[ 1826.591220] ------------[ cut here ]------------
[ 1826.591220] kernel BUG at
/build/buildd-linux_3.2.41-2-amd64-Wvc92F/linux-3.2.41/drivers/xen/events.c:1489!
[ 1826.591220] invalid opcode: 0000 [#1] SMP
[ 1826.591220] CPU 0
[ 1826.591220] Modules linked in: xenfs snd_pcm snd_page_alloc
snd_timer snd coretemp soundcore crc32c_intel evdev
ghash_clmulni_intel joydev pcspkr cryptd ext3 mbcache jbd xen_netfront
xen_blkfront
[ 1826.591220]
[ 1826.591220] Pid: 6, comm: migration/0 Not tainted 3.2.0-4-amd64 #1
Debian 3.2.41-2
[ 1826.591220] RIP: e030:[<ffffffff8121c4e2>]  [<ffffffff8121c4e2>]
xen_irq_resume+0xbd/0x28b
[ 1826.591220] RSP: e02b:ffff88001ae99d20  EFLAGS: 00010082
[ 1826.591220] RAX: ffffffffffffffef RBX: 0000000000000000 RCX: 0000000000000001
[ 1826.591220] RDX: 0000000000000000 RSI: 00000000deadbeef RDI: 00000000deadbeef
[ 1826.591220] RBP: 0000000000000000 R08: ffff88001f026e00 R09: ffff88001ae99d48
[ 1826.591220] R10: 0000000000013780 R11: 0000000000013780 R12: 0000000000000010
[ 1826.591220] R13: 0000000000010dd0 R14: 0000000000010d70 R15: 0000000000000000
[ 1826.591220] FS:  00007f8d6f4907a0(0000) GS:ffff88001fc00000(0000)
knlGS:0000000000000000
[ 1826.591220] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1826.591220] CR2: 00007f51210a1e60 CR3: 00000000032af000 CR4: 0000000000002660
[ 1826.591220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1826.591220] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1826.591220] Process migration/0 (pid: 6, threadinfo
ffff88001ae98000, task ffff88001ae8e0c0)
[ 1826.591220] Stack:
[ 1826.591220]  0000000000013780 0000000000000000 ffff880000000000
0000000000010d70
[ 1826.591220]  0000160000000000 0000000000000000 ffff88001affbddc
ffffffff810050a2
[ 1826.591220]  0000000000013780 ffffea000009f0b8 ffffffff810043e3
ffff88001affbe40
[ 1826.591220] Call Trace:
[ 1826.591220]  [<ffffffff810050a2>] ? xen_mc_issue+0x3e/0x50
[ 1826.591220]  [<ffffffff810043e3>] ? arch_local_irq_restore+0x7/0x8
[ 1826.591220]  [<ffffffff8121ca3b>] ? xen_suspend+0x73/0x8b
[ 1826.591220]  [<ffffffff81087d89>] ? stop_machine_cpu_stop+0x89/0xc3
[ 1826.591220]  [<ffffffff81087d00>] ? queue_stop_cpus_work+0xa5/0xa5
[ 1826.591220]  [<ffffffff81087b5a>] ? cpu_stopper_thread+0xea/0x177
[ 1826.591220]  [<ffffffff810359d7>] ? arch_local_irq_enable+0x7/0x8
[ 1826.591220]  [<ffffffff81039854>] ? finish_task_switch+0x88/0xb9
[ 1826.591220]  [<ffffffff8134c634>] ? __schedule+0x5ac/0x5c3
[ 1826.591220]  [<ffffffff81087a70>] ? cpu_stop_signal_done+0x2a/0x2a
[ 1826.591220]  [<ffffffff8105f321>] ? kthread+0x76/0x7e
[ 1826.591220]  [<ffffffff81354ab4>] ? kernel_thread_helper+0x4/0x10
[ 1826.591220]  [<ffffffff81352b73>] ? int_ret_from_sys_call+0x7/0x1b
[ 1826.591220]  [<ffffffff8134dcbc>] ? retint_restore_args+0x5/0x6
[ 1826.591220]  [<ffffffff81354ab0>] ? gs_change+0x13/0x13
[ 1826.591220] Code: 74 79 44 89 e7 e8 77 ee ff ff 39 e8 74 02 0f 0b
48 8d 74 24 28 bf 01 00 00 00 89 6c 24 28 89 5c 24 2c e8 19 ec ff ff
85 c0 74 02 <0f> 0b 8b 44 24 30 44 89 e7 89 44 24 14 e8 58 e9 ff ff 0f
b7 4c
[ 1826.591220] RIP  [<ffffffff8121c4e2>] xen_irq_resume+0xbd/0x28b
[ 1826.591220]  RSP <ffff88001ae99d20>
[ 1826.591220] ---[ end trace 60605833d257c851 ]---
[ 1826.591220] ------------[ cut here ]------------
[ 1826.591220] WARNING: at
/build/buildd-linux_3.2.41-2-amd64-Wvc92F/linux-3.2.41/kernel/time/timekeeping.c:265
ktime_get+0x1e/0x86()
[ 1826.591220] Modules linked in: xenfs snd_pcm snd_page_alloc
snd_timer snd coretemp soundcore crc32c_intel evdev
ghash_clmulni_intel joydev pcspkr cryptd ext3 mbcache jbd xen_netfront
xen_blkfront
[ 1826.591220] Pid: 0, comm: swapper/0 Tainted: G      D
3.2.0-4-amd64 #1 Debian 3.2.41-2
[ 1826.591220] Call Trace:
[ 1826.591220]  [<ffffffff81046a55>] ? warn_slowpath_common+0x78/0x8c
[ 1826.591220]  [<ffffffff81066447>] ? ktime_get+0x1e/0x86
[ 1826.591220]  [<ffffffff8106c21b>] ?
tick_nohz_stop_sched_tick+0x61/0x327
[ 1826.591220]  [<ffffffff8100d210>] ? cpu_idle+0x72/0xf2
[ 1826.591220]  [<ffffffff816abb36>] ? start_kernel+0x3b8/0x3c3
[ 1826.591220]  [<ffffffff816ad4d9>] ? xen_start_kernel+0x412/0x418
[ 1826.591220] ---[ end trace 60605833d257c852 ]---

--
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Konrad Rzeszutek Wilk

2013-May-25 11:27 UTC

head link

Re: xen 4.3 test report

On Fri, May 24, 2013 at 03:38:46PM +0100, George Dunlap
wrote:> On Fri, May 24, 2013 at 3:11 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Fri, May 24, 2013 at 01:46:11PM +0100, George Dunlap wrote:
> >> On Fri, May 24, 2013 at 11:40 AM, Vasiliy Tolstov
<v.tolstov@selfip.ru> wrote:
> >> > Hello. I''m try 4.3.0-rc2 and have very very small
speed of live
> >> > migration (is about 2 hours for 1Gb memory) but if i start
xend and
> >> > doing xm migrate domain successeful migrated to destionation
and this
> >> > takes is about 3-6 seconds (i''m use infiniband)
> >> >
> >> > Why this happening?
> >>
> >> Hmm -- I think this has been mentioned a couple of times, but I
don''t
> >> think anyone has looked into it.  I''ll see if I can track
it down.
> >
> > I''ve noticed on Xen 4.1 (and Xen 4.3) that if I use a 32-bit
dom0 and
> > local migrated any 32/64 PV/PVHVM (so four variations) guest it is
incredibly slow.
> >
> > (So xm save <..> in an iSCSI disk && xm restore ...)
> 
> Did you mean xm save or xl save?
''xl'' for Xen 4.3 and ''xm'' for Xen
4.1> 
>  -George

Konrad Rzeszutek Wilk

2013-May-25 11:40 UTC

head link

Re: xen 4.3 test report

On Sat, May 25, 2013 at 12:15:44AM +0400, Vasiliy Tolstov
wrote:> 2013/5/24 George Dunlap <George.Dunlap@eu.citrix.com>:
> >
> > Did you mean xm save or xl save?
> 
> 
> In my case xl save crash domU with messages like followind. And domU
> crashes centos 2.6.18 and 2.6.32 (xenlinux) and never 3.8.6 kernel and
> 3.4...
Is the 3.8.6 crashing at the same point?> 
> [ 1826.587110] PM: late freeze of devices complete after 0.048 msecs
> [ 1826.591220] ------------[ cut here ]------------
> [ 1826.591220] kernel BUG at
>
/build/buildd-linux_3.2.41-2-amd64-Wvc92F/linux-3.2.41/drivers/xen/events.c:1489!
That looks to be this
(https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c)

	if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
						&bind_virq) != 0)
			BUG();

which is odd. Would you be able to instrument evtchn_bind_virq (this is
in Xen) with some printks, like this (hand''t compile tested it):

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 2d7afc9..c109cee 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -270,24 +270,34 @@ static long evtchn_bind_virq(evtchn_bind_virq_t *bind)
     int            port, virq = bind->virq, vcpu = bind->vcpu;
     long           rc = 0;
 
-    if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
+    if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) }
+gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq:%d, rc:%ld\n",
d->domain_id,
+	vcpu, __func__,__LINE__, virq, -EINVAL);
         return -EINVAL;
-
-    if ( virq_is_global(virq) && (vcpu != 0) )
+    }
+    if ( virq_is_global(virq) && (vcpu != 0) ) {
+gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq_is_global:%d,
rc:%ld\n", d->domain_id,
+	vcpu, __func__,__LINE__, virq_is_global(virq), -EINVAL);
         return -EINVAL;
-
+    }
     if ( (vcpu < 0) || (vcpu >= d->max_vcpus) ||
-         ((v = d->vcpu[vcpu]) == NULL) )
+         ((v = d->vcpu[vcpu]) == NULL) ) {
+gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, max_vcpus:%d,
rc:%ld\n", d->domain_id,
+	vcpu, __func__,__LINE__, v, d->max_vcpus, -ENOENT);
         return -ENOENT;
-
+    }
     spin_lock(&d->event_lock);
 
-    if ( v->virq_to_evtchn[virq] != 0 )
+    if ( v->virq_to_evtchn[virq] != 0 ) {
+gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, evtchn:%d, rc:%ld\n",
d->domain_id,
+	vcpu, __func__,__LINE__, v->virq_to_evtchn[virq] , -EEXIST);
         ERROR_EXIT(-EEXIST);
-
-    if ( (port = get_free_port(d)) < 0 )
+    }
+    if ( (port = get_free_port(d)) < 0 ) {
+gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n",
d->domain_id,
+	vcpu, __func__,__LINE__, port, port);
         ERROR_EXIT(port);
-
+    }
     chn = evtchn_from_port(d, port);
     chn->state          = ECS_VIRQ;
     chn->notify_vcpu_id = vcpu;

Vasiliy Tolstov

2013-May-27 05:32 UTC

head link

Re: xen 4.3 test report

2013/5/25 Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com>:> Is the 3.8.6 crashing at the same point?
Yes, but now i can''t get trace. may be after compile new xen with
debug printfs that you provide.
> That looks to be this
(https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c)
>
>         if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
>                                                 &bind_virq) != 0)
>                         BUG();
>
> which is odd. Would you be able to instrument evtchn_bind_virq (this is
> in Xen) with some printks, like this (hand''t compile tested it):
I''m try it.


Another bug happened while i change memory inside domU (ubuntu lts
precise 10.04 with stock kernel)

(XEN) traps.c:3072: GPF (0060): ffff82c4c015e73e -> ffff82c4c022a3ff
(XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
1000000000000000) for mfn 1156e85 (pfn 10fb9)
(XEN) mm.c:2990:d17 Error while pinning mfn 1156e85
(XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
1000000000000000) for mfn 11ae21b (pfn 39c23)
(XEN) mm.c:2990:d17 Error while pinning mfn 11ae21b
(XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
1000000000000000) for mfn 1156e85 (pfn 10fb9)
(XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms
(XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392
(XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for
type 2000000000000000: caf=8000000000000003 taf=2000000000000001
(XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms
(XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for
type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
(XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
(XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for
type 4000000000000000: caf=8000000000000003 taf=4000000000000001
(XEN) mm.c:2990:d17 Error while pinning mfn 11ac89d
(XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
1000000000000000) for mfn 1156e85 (pfn 10fb9)
(XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms
(XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392
(XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for
type 2000000000000000: caf=8000000000000003 taf=2000000000000001
(XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms
(XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for
type 3000000000000000: caf=8000000000000002 taf=3000000000000001
(XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
(XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
(XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for
type 4000000000000000: caf=8000000000000003 taf=4000000000000001
(XEN) mm.c:2990:d17 Error while pinning mfn 11ad971
(XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000,
od=0000000000000000, caf=180000000000000, taf=3000000000000001
(XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976
(XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
(XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
(XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for
type 4000000000000000: caf=8000000000000003 taf=4000000000000001
(XEN) mm.c:2758:d17 Error while installing new baseptr 11ac89d
(XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000,
od=0000000000000000, caf=180000000000000, taf=3000000000000001
(XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976
(XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
(XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
(XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for
type 4000000000000000: caf=8000000000000003 taf=4000000000000001
(XEN) mm.c:3116:d17 Error while installing new mfn 11ad971



--
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Konrad Rzeszutek Wilk

2013-May-28 15:31 UTC

head link

Re: xen 4.3 test report

On Mon, May 27, 2013 at 09:32:51AM +0400, Vasiliy Tolstov
wrote:> 2013/5/25 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
> > Is the 3.8.6 crashing at the same point?
> 
> Yes, but now i can''t get trace. may be after compile new xen with
> debug printfs that you provide.
> 
> > That looks to be this
(https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c)
> >
> >         if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
> >                                                 &bind_virq) != 0)
> >                         BUG();
> >
> > which is odd. Would you be able to instrument evtchn_bind_virq (this
is
> > in Xen) with some printks, like this (hand''t compile tested
it):
> 
> I''m try it.
Thank you.
> 
> 
> Another bug happened while i change memory inside domU (ubuntu lts
> precise 10.04 with stock kernel)
I don''t understand what you mean by ''change memory''
What is it that you do to
''change memory''?

Is that with a PV guest? HVM guest?> 
> (XEN) traps.c:3072: GPF (0060): ffff82c4c015e73e -> ffff82c4c022a3ff
> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
> 1000000000000000) for mfn 1156e85 (pfn 10fb9)
> (XEN) mm.c:2990:d17 Error while pinning mfn 1156e85
> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
> 1000000000000000) for mfn 11ae21b (pfn 39c23)
> (XEN) mm.c:2990:d17 Error while pinning mfn 11ae21b
> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
> 1000000000000000) for mfn 1156e85 (pfn 10fb9)
> (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms
> (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392
> (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for
> type 2000000000000000: caf=8000000000000003 taf=2000000000000001
> (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms
> (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3
> (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for
> type 3000000000000000: caf=8000000000000003 taf=3000000000000001
> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
> (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for
> type 4000000000000000: caf=8000000000000003 taf=4000000000000001
> (XEN) mm.c:2990:d17 Error while pinning mfn 11ac89d
> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
> 1000000000000000) for mfn 1156e85 (pfn 10fb9)
> (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms
> (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392
> (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for
> type 2000000000000000: caf=8000000000000003 taf=2000000000000001
> (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms
> (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3
> (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for
> type 3000000000000000: caf=8000000000000002 taf=3000000000000001
> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
> (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for
> type 4000000000000000: caf=8000000000000003 taf=4000000000000001
> (XEN) mm.c:2990:d17 Error while pinning mfn 11ad971
> (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000,
> od=0000000000000000, caf=180000000000000, taf=3000000000000001
> (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976
> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
> (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for
> type 4000000000000000: caf=8000000000000003 taf=4000000000000001
> (XEN) mm.c:2758:d17 Error while installing new baseptr 11ac89d
> (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000,
> od=0000000000000000, caf=180000000000000, taf=3000000000000001
> (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976
> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
> (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for
> type 4000000000000000: caf=8000000000000003 taf=4000000000000001
> (XEN) mm.c:3116:d17 Error while installing new mfn 11ad971
> 
> 
> 
> --
> Vasiliy Tolstov,
> e-mail: v.tolstov@selfip.ru
> jabber: vase@selfip.ru

Vasiliy Tolstov

2013-May-28 20:58 UTC

head link

Re: xen 4.3 test report

change memory means echo xxx > /sys/class/xxx/xxx/target inside domU ,
I''m use pv guest.

2013/5/28 Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com>:> On Mon, May 27, 2013 at 09:32:51AM +0400, Vasiliy Tolstov wrote:
>> 2013/5/25 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
>> > Is the 3.8.6 crashing at the same point?
>>
>> Yes, but now i can''t get trace. may be after compile new xen
with
>> debug printfs that you provide.
>>
>> > That looks to be this
(https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c)
>> >
>> >         if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
>> >                                                 &bind_virq) !=
0)
>> >                         BUG();
>> >
>> > which is odd. Would you be able to instrument evtchn_bind_virq
(this is
>> > in Xen) with some printks, like this (hand''t compile
tested it):
>>
>> I''m try it.
>
> Thank you.
>
>>
>>
>> Another bug happened while i change memory inside domU (ubuntu lts
>> precise 10.04 with stock kernel)
>
> I don''t understand what you mean by ''change
memory'' What is it that you do to
> ''change memory''?
>
> Is that with a PV guest? HVM guest?
>>
>> (XEN) traps.c:3072: GPF (0060): ffff82c4c015e73e -> ffff82c4c022a3ff
>> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
>> 1000000000000000) for mfn 1156e85 (pfn 10fb9)
>> (XEN) mm.c:2990:d17 Error while pinning mfn 1156e85
>> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
>> 1000000000000000) for mfn 11ae21b (pfn 39c23)
>> (XEN) mm.c:2990:d17 Error while pinning mfn 11ae21b
>> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
>> 1000000000000000) for mfn 1156e85 (pfn 10fb9)
>> (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms
>> (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392
>> (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for
>> type 2000000000000000: caf=8000000000000003 taf=2000000000000001
>> (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms
>> (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3
>> (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for
>> type 3000000000000000: caf=8000000000000003 taf=3000000000000001
>> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
>> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
>> (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for
>> type 4000000000000000: caf=8000000000000003 taf=4000000000000001
>> (XEN) mm.c:2990:d17 Error while pinning mfn 11ac89d
>> (XEN) mm.c:2348:d17 Bad type (saw 7400000000000001 != exp
>> 1000000000000000) for mfn 1156e85 (pfn 10fb9)
>> (XEN) mm.c:903:d17 Attempt to create linear p.t. with write perms
>> (XEN) mm.c:1293:d17 Failure in alloc_l2_table: entry 392
>> (XEN) mm.c:2095:d17 Error while validating mfn 1157cf3 (pfn 1014b) for
>> type 2000000000000000: caf=8000000000000003 taf=2000000000000001
>> (XEN) mm.c:945:d17 Attempt to create linear p.t. with write perms
>> (XEN) mm.c:1375:d17 Failure in alloc_l3_table: entry 3
>> (XEN) mm.c:2095:d17 Error while validating mfn 11ae976 (pfn 394c8) for
>> type 3000000000000000: caf=8000000000000002 taf=3000000000000001
>> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
>> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
>> (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for
>> type 4000000000000000: caf=8000000000000003 taf=4000000000000001
>> (XEN) mm.c:2990:d17 Error while pinning mfn 11ad971
>> (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000,
>> od=0000000000000000, caf=180000000000000, taf=3000000000000001
>> (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976
>> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
>> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
>> (XEN) mm.c:2095:d17 Error while validating mfn 11ac89d (pfn 3b5a1) for
>> type 4000000000000000: caf=8000000000000003 taf=4000000000000001
>> (XEN) mm.c:2758:d17 Error while installing new baseptr 11ac89d
>> (XEN) mm.c:2015:d17 Error pfn 11ae976: rd=ffff8308098ca000,
>> od=0000000000000000, caf=180000000000000, taf=3000000000000001
>> (XEN) mm.c:618:d17 Could not get page ref for pfn 11ae976
>> (XEN) mm.c:969:d17 Attempt to create linear p.t. with write perms
>> (XEN) mm.c:1434:d17 Failure in alloc_l4_table: entry 0
>> (XEN) mm.c:2095:d17 Error while validating mfn 11ad971 (pfn 3a4cd) for
>> type 4000000000000000: caf=8000000000000003 taf=4000000000000001
>> (XEN) mm.c:3116:d17 Error while installing new mfn 11ad971
>>
>>
>>
>> --
>> Vasiliy Tolstov,
>> e-mail: v.tolstov@selfip.ru
>> jabber: vase@selfip.ru


-- 
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Vasiliy Tolstov

2013-May-31 04:56 UTC

head link

Re: xen 4.3 test report

migration with qemu-xen-traditional:
xen16:~ # xl migrate --debug 21-10887 ib-xen06.kh11.clodo.ru
the global config option vifscript is deprecated, please switch to
vif.default.script
the global config option vifscript is deprecated, please switch to
vif.default.script
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/631)
Loading new save file <incoming migration stream> (new xl fmt info
0x0/0x0/631)
 Savefile contains xl domain config
xc: progress: Reloading memory pages: 53248/1048576    5%
xc: progress: Reloading memory pages: 105472/1048576   10%
xc: progress: Reloading memory pages: 157658/1048576   15%
xc: progress: Reloading memory pages: 209882/1048576   20%
xc: progress: Reloading memory pages: 263130/1048576   25%
migration receiver stream contained unexpected data instead of ready message
(command run was: exec ssh ib-xen06.kh11.clodo.ru xl migrate-receive -d )
migration target: Transfer complete, requesting permission to start domain.
libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream
truncated reading GO message from migration stream
migration target: Failure, destroying our copy.
migration child [15697] not exiting, no longer waiting (exit status
will be unreported)
Migration failed, resuming at sender.
migration target: Cleanup OK, granting sender permission to resume.

xl dmesg:
(XEN) event_channel.c:297:d1 d1v0 [evtchn_bind_virq:297], port:3, rc:-17
(XEN) event_channel.c:298:d1 EVTCHNOP failure: error -17


xl console:
[  981.869689] PM: late freeze of devices complete after 0.073 msecs
[  981.873833] ------------[ cut here ]------------
[  981.873833] kernel BUG at
/build/buildd-linux_3.2.41-2+deb7u2-amd64-NHQI9B/linux-3.2.41/drivers/xen/events.c:1489!
[  981.873833] invalid opcode: 0000 [#1] SMP
[  981.873833] CPU 0
[  981.873833] Modules linked in: xenfs snd_pcm snd_page_alloc
snd_timer snd coretemp soundcore crc32c_intel evdev joydev pcspkr ext3
mbcache jbd xen_blkfront xen_netfront
[  981.873833]
[  981.873833] Pid: 6, comm: migration/0 Not tainted 3.2.0-4-amd64 #1
Debian 3.2.41-2+deb7u2
[  981.873833] RIP: e030:[<ffffffff8121c4e2>]  [<ffffffff8121c4e2>]
xen_irq_resume+0xbd/0x28b
[  981.873833] RSP: e02b:ffff88001ae99d20  EFLAGS: 00010082
[  981.873833] RAX: ffffffffffffffef RBX: 0000000000000000 RCX: 0000000000000001
[  981.873833] RDX: 0000000000000000 RSI: 00000000deadbeef RDI: 00000000deadbeef
[  981.873833] RBP: 0000000000000000 R08: ffff88001f026e00 R09: ffff88001ae99d48
[  981.873833] R10: 0000000000013780 R11: 0000000000013780 R12: 0000000000000010
[  981.873833] R13: 0000000000010dd0 R14: 0000000000010d70 R15: 0000000000000000
[  981.873833] FS:  00007f1fff8d37a0(0000) GS:ffff88001fc00000(0000)
knlGS:0000000000000000
[  981.873833] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  981.873833] CR2: 000000f8400b5410 CR3: 00000000033ad000 CR4: 0000000000002660
[  981.873833] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  981.873833] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  981.873833] Process migration/0 (pid: 6, threadinfo
ffff88001ae98000, task ffff88001ae8e0c0)
[  981.873833] Stack:
[  981.873833]  0000000000013780 0000000000000000 ffff880000000000
0000000000010d70
[  981.873833]  0000160000000000 0000000000000000 ffff88001affbddc
ffffffff810050a2
[  981.873833]  0000000000013780 ffffea00005ab258 ffffffff810043e3
ffff88001affbe40
[  981.873833] Call Trace:
[  981.873833]  [<ffffffff810050a2>] ? xen_mc_issue+0x3e/0x50
[  981.873833]  [<ffffffff810043e3>] ? arch_local_irq_restore+0x7/0x8
[  981.873833]  [<ffffffff8121ca3b>] ? xen_suspend+0x73/0x8b
[  981.873833]  [<ffffffff81087d91>] ? stop_machine_cpu_stop+0x89/0xc3
[  981.873833]  [<ffffffff81087d08>] ? queue_stop_cpus_work+0xa5/0xa5
[  981.873833]  [<ffffffff81087b62>] ? cpu_stopper_thread+0xea/0x177
[  981.873833]  [<ffffffff810359d7>] ? arch_local_irq_enable+0x7/0x8
[  981.873833]  [<ffffffff81039854>] ? finish_task_switch+0x88/0xb9
[  981.873833]  [<ffffffff8134c694>] ? __schedule+0x5ac/0x5c3
[  981.873833]  [<ffffffff81087a78>] ? cpu_stop_signal_done+0x2a/0x2a
[  981.873833]  [<ffffffff8105f329>] ? kthread+0x76/0x7e
[  981.873833]  [<ffffffff81354b34>] ? kernel_thread_helper+0x4/0x10
[  981.873833]  [<ffffffff81352bf3>] ? int_ret_from_sys_call+0x7/0x1b
[  981.873833]  [<ffffffff8134dd3c>] ? retint_restore_args+0x5/0x6
[  981.873833]  [<ffffffff81354b30>] ? gs_change+0x13/0x13
[  981.873833] Code: 74 79 44 89 e7 e8 77 ee ff ff 39 e8 74 02 0f 0b
48 8d 74 24 28 bf 01 00 00 00 89 6c 24 28 89 5c 24 2c e8 19 ec ff ff
85 c0 74 02 <0f> 0b 8b 44 24 30 44 89 e7 89 44 24 14 e8 58 e9 ff ff 0f
b7 4c
[  981.873833] RIP  [<ffffffff8121c4e2>] xen_irq_resume+0xbd/0x28b
[  981.873833]  RSP <ffff88001ae99d20>
[  981.873833] ---[ end trace 8243bb8e343ac633 ]---
[  981.873833] ------------[ cut here ]------------
[  981.873833] WARNING: at
/build/buildd-linux_3.2.41-2+deb7u2-amd64-NHQI9B/linux-3.2.41/kernel/time/timekeeping.c:265
ktime_get+0x1e/0x86()
[  981.873833] Modules linked in: xenfs snd_pcm snd_page_alloc
snd_timer snd coretemp soundcore crc32c_intel evdev joydev pcspkr ext3
mbcache jbd xen_blkfront xen_netfront
[  981.873833] Pid: 0, comm: swapper/0 Tainted: G      D
3.2.0-4-amd64 #1 Debian 3.2.41-2+deb7u2
[  981.873833] Call Trace:
[  981.873833]  [<ffffffff81046a55>] ? warn_slowpath_common+0x78/0x8c
[  981.873833]  [<ffffffff8106644f>] ? ktime_get+0x1e/0x86
[  981.873833]  [<ffffffff8106c223>] ?
tick_nohz_stop_sched_tick+0x61/0x327
[  981.873833]  [<ffffffff8100d210>] ? cpu_idle+0x72/0xf2
[  981.873833]  [<ffffffff816abb36>] ? start_kernel+0x3b8/0x3c3
[  981.873833]  [<ffffffff816ad4d9>] ? xen_start_kernel+0x412/0x418
[  981.873833] ---[ end trace 8243bb8e343ac634 ]---

2013/5/25 Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com>:> On Sat, May 25, 2013 at 12:15:44AM +0400, Vasiliy Tolstov wrote:
>> 2013/5/24 George Dunlap <George.Dunlap@eu.citrix.com>:
>> >
>> > Did you mean xm save or xl save?
>>
>>
>> In my case xl save crash domU with messages like followind. And domU
>> crashes centos 2.6.18 and 2.6.32 (xenlinux) and never 3.8.6 kernel and
>> 3.4...
>
> Is the 3.8.6 crashing at the same point?
>>
>> [ 1826.587110] PM: late freeze of devices complete after 0.048 msecs
>> [ 1826.591220] ------------[ cut here ]------------
>> [ 1826.591220] kernel BUG at
>>
/build/buildd-linux_3.2.41-2-amd64-Wvc92F/linux-3.2.41/drivers/xen/events.c:1489!
>
> That looks to be this
(https://git.kernel.org/cgit/linux/kernel/git/bwh/linux-3.2.y.git/tree/drivers/xen/events.c)
>
>         if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
>                                                 &bind_virq) != 0)
>                         BUG();
>
> which is odd. Would you be able to instrument evtchn_bind_virq (this is
> in Xen) with some printks, like this (hand''t compile tested it):
>
> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
> index 2d7afc9..c109cee 100644
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -270,24 +270,34 @@ static long evtchn_bind_virq(evtchn_bind_virq_t
*bind)
>      int            port, virq = bind->virq, vcpu = bind->vcpu;
>      long           rc = 0;
>
> -    if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
> +    if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
}
> +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq:%d, rc:%ld\n",
d->domain_id,
> +       vcpu, __func__,__LINE__, virq, -EINVAL);
>          return -EINVAL;
> -
> -    if ( virq_is_global(virq) && (vcpu != 0) )
> +    }
> +    if ( virq_is_global(virq) && (vcpu != 0) ) {
> +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq_is_global:%d,
rc:%ld\n", d->domain_id,
> +       vcpu, __func__,__LINE__, virq_is_global(virq), -EINVAL);
>          return -EINVAL;
> -
> +    }
>      if ( (vcpu < 0) || (vcpu >= d->max_vcpus) ||
> -         ((v = d->vcpu[vcpu]) == NULL) )
> +         ((v = d->vcpu[vcpu]) == NULL) ) {
> +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, max_vcpus:%d,
rc:%ld\n", d->domain_id,
> +       vcpu, __func__,__LINE__, v, d->max_vcpus, -ENOENT);
>          return -ENOENT;
> -
> +    }
>      spin_lock(&d->event_lock);
>
> -    if ( v->virq_to_evtchn[virq] != 0 )
> +    if ( v->virq_to_evtchn[virq] != 0 ) {
> +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, evtchn:%d,
rc:%ld\n", d->domain_id,
> +       vcpu, __func__,__LINE__, v->virq_to_evtchn[virq] , -EEXIST);
>          ERROR_EXIT(-EEXIST);
> -
> -    if ( (port = get_free_port(d)) < 0 )
> +    }
> +    if ( (port = get_free_port(d)) < 0 ) {
> +gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n",
d->domain_id,
> +       vcpu, __func__,__LINE__, port, port);
>          ERROR_EXIT(port);
> -
> +    }
>      chn = evtchn_from_port(d, port);
>      chn->state          = ECS_VIRQ;
>      chn->notify_vcpu_id = vcpu;


-- 
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Konrad Rzeszutek Wilk

2013-Jun-03 14:08 UTC

head link

Re: xen 4.3 test report

On Fri, May 31, 2013 at 08:56:47AM +0400, Vasiliy Tolstov
wrote:> migration with qemu-xen-traditional:
> xen16:~ # xl migrate --debug 21-10887 ib-xen06.kh11.clodo.ru
> the global config option vifscript is deprecated, please switch to
> vif.default.script
> the global config option vifscript is deprecated, please switch to
> vif.default.script
> migration target: Ready to receive domain.
> Saving to migration stream new xl format (info 0x0/0x0/631)
> Loading new save file <incoming migration stream> (new xl fmt info
0x0/0x0/631)
>  Savefile contains xl domain config
> xc: progress: Reloading memory pages: 53248/1048576    5%
> xc: progress: Reloading memory pages: 105472/1048576   10%
> xc: progress: Reloading memory pages: 157658/1048576   15%
> xc: progress: Reloading memory pages: 209882/1048576   20%
> xc: progress: Reloading memory pages: 263130/1048576   25%
> migration receiver stream contained unexpected data instead of ready
message
> (command run was: exec ssh ib-xen06.kh11.clodo.ru xl migrate-receive -d )
> migration target: Transfer complete, requesting permission to start domain.
> libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream
> truncated reading GO message from migration stream
> migration target: Failure, destroying our copy.
> migration child [15697] not exiting, no longer waiting (exit status
> will be unreported)
> Migration failed, resuming at sender.
> migration target: Cleanup OK, granting sender permission to resume.
> 
> xl dmesg:
> (XEN) event_channel.c:297:d1 d1v0 [evtchn_bind_virq:297], port:3, rc:-17
> (XEN) event_channel.c:298:d1 EVTCHNOP failure: error -17
The non-debug version tells me it is:

289     if ( (port = get_free_port(d)) < 0 )
290         ERROR_EXIT(port);            

Which gets -EEXIST from get_free_port. But get_free_port only returns
-EINVAL, -ENOMEM, and -ENOSPC in failure modes.

But we get -EEXIST? Could you re-run git diff and attach output to
this email? I think you tweaked the debug code a bit so I am looking
at something different?

Vasiliy Tolstov

2013-Jun-04 12:17 UTC

head link

Re: xen 4.3 test report

2013/6/3 Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com>:> The non-debug version tells me it is:
>
> 289     if ( (port = get_free_port(d)) < 0 )
> 290         ERROR_EXIT(port);
>
> Which gets -EEXIST from get_free_port. But get_free_port only returns
> -EINVAL, -ENOMEM, and -ENOSPC in failure modes.
>
> But we get -EEXIST? Could you re-run git diff and attach output to
> this email? I think you tweaked the debug code a bit so I am looking
> at something different?

Oh sorry. Yes i modify you patch to this version:
--- a/xen/common/event_channel.c        2013-05-16 15:05:25.000000000 +0400
+++ b/xen/common/event_channel.c        2013-05-27 10:53:05.000000000 +0400
@@ -271,23 +271,38 @@ static long evtchn_bind_virq(evtchn_bind
     int            port, virq = bind->virq, vcpu = bind->vcpu;
     long           rc = 0;

-    if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
+    if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) {
+       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq:%d, rc:%ld\n",
d->domain_id,
+               vcpu, __func__,__LINE__, virq, (long)-EINVAL);
         return -EINVAL;
+    }

-    if ( virq_is_global(virq) && (vcpu != 0) )
+    if ( virq_is_global(virq) && (vcpu != 0) ) {
+       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], virq_is_global:%d,
rc:%ld\n", d->domain_id,
+               vcpu, __func__,__LINE__, virq_is_global(virq), (long)-EINVAL);
         return -EINVAL;
+    }

     if ( (vcpu < 0) || (vcpu >= d->max_vcpus) ||
-         ((v = d->vcpu[vcpu]) == NULL) )
+         ((v = d->vcpu[vcpu]) == NULL) ) {
+       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], v:%p, max_vcpus:%d,
rc:%ld\n", d->domain_id,
+               vcpu, __func__,__LINE__, d->vcpu[vcpu], d->max_vcpus,
(long)-ENOENT);
         return -ENOENT;
+    }

     spin_lock(&d->event_lock);

-    if ( v->virq_to_evtchn[virq] != 0 )
+    if ( v->virq_to_evtchn[virq] != 0 ) {
+       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n",
d->domain_id,
+               vcpu, __func__,__LINE__, v->virq_to_evtchn[virq],
(long)-EEXIST);
         ERROR_EXIT(-EEXIST);
+    }

-    if ( (port = get_free_port(d)) < 0 )
+    if ( (port = get_free_port(d)) < 0 ) {
+       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d, rc:%ld\n",
d->domain_id,
+               vcpu, __func__,__LINE__, port, (long)-EEXIST);
         ERROR_EXIT(port);
+    }

     chn = evtchn_from_port(d, port);
     chn->state          = ECS_VIRQ;


--
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Konrad Rzeszutek Wilk

2013-Jun-05 18:50 UTC

head link

Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov
wrote:> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
> > The non-debug version tells me it is:
> >
> > 289     if ( (port = get_free_port(d)) < 0 )
> > 290         ERROR_EXIT(port);
> >
> > Which gets -EEXIST from get_free_port. But get_free_port only returns
> > -EINVAL, -ENOMEM, and -ENOSPC in failure modes.
> >
> > But we get -EEXIST? Could you re-run git diff and attach output to
> > this email? I think you tweaked the debug code a bit so I am looking
> > at something different?
> 
> 
> Oh sorry. Yes i modify you patch to this version:
That is OK.> -    if ( v->virq_to_evtchn[virq] != 0 )
> +    if ( v->virq_to_evtchn[virq] != 0 ) {
> +       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d,
rc:%ld\n",
> d->domain_id,
> +               vcpu, __func__,__LINE__, v->virq_to_evtchn[virq],
> (long)-EEXIST);
>          ERROR_EXIT(-EEXIST);
OK, so the value was 3 (event channel), and I am not sure what the virq value
was. But it looks as if somebody did not clear that and we are
tripping over it.

George, have you seen issues with events not being cleared during migration?

George Dunlap

2013-Jun-06 09:23 UTC

head link

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

On 05/06/13 19:50, Konrad Rzeszutek Wilk wrote:> On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov wrote:
>> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
>>> The non-debug version tells me it is:
>>>
>>> 289     if ( (port = get_free_port(d)) < 0 )
>>> 290         ERROR_EXIT(port);
>>>
>>> Which gets -EEXIST from get_free_port. But get_free_port only
returns
>>> -EINVAL, -ENOMEM, and -ENOSPC in failure modes.
>>>
>>> But we get -EEXIST? Could you re-run git diff and attach output to
>>> this email? I think you tweaked the debug code a bit so I am
looking
>>> at something different?
>>
>> Oh sorry. Yes i modify you patch to this version:
> That is OK.
>> -    if ( v->virq_to_evtchn[virq] != 0 )
>> +    if ( v->virq_to_evtchn[virq] != 0 ) {
>> +       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d,
rc:%ld\n",
>> d->domain_id,
>> +               vcpu, __func__,__LINE__, v->virq_to_evtchn[virq],
>> (long)-EEXIST);
>>           ERROR_EXIT(-EEXIST);
> OK, so the value was 3 (event channel), and I am not sure what the virq
value
> was. But it looks as if somebody did not clear that and we are
> tripping over it.
>
> George, have you seen issues with events not being cleared during
migration?
I haven''t, no.  Do you know where the virq is supposed to be clear? The
BUG() is in restore_cpu_virqs(), but at a quick glance I can''t find a 
corresponding function to tear down virqs.

  -George

George Dunlap

2013-Jun-06 09:25 UTC

head link

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

On 05/06/13 19:50, Konrad Rzeszutek Wilk wrote:> On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov wrote:
>> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
>>> The non-debug version tells me it is:
>>>
>>> 289     if ( (port = get_free_port(d)) < 0 )
>>> 290         ERROR_EXIT(port);
>>>
>>> Which gets -EEXIST from get_free_port. But get_free_port only
returns
>>> -EINVAL, -ENOMEM, and -ENOSPC in failure modes.
>>>
>>> But we get -EEXIST? Could you re-run git diff and attach output to
>>> this email? I think you tweaked the debug code a bit so I am
looking
>>> at something different?
>>
>> Oh sorry. Yes i modify you patch to this version:
> That is OK.
>> -    if ( v->virq_to_evtchn[virq] != 0 )
>> +    if ( v->virq_to_evtchn[virq] != 0 ) {
>> +       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d,
rc:%ld\n",
>> d->domain_id,
>> +               vcpu, __func__,__LINE__, v->virq_to_evtchn[virq],
>> (long)-EEXIST);
>>           ERROR_EXIT(-EEXIST);
> OK, so the value was 3 (event channel), and I am not sure what the virq
value
> was. But it looks as if somebody did not clear that and we are
> tripping over it.
>
> George, have you seen issues with events not being cleared during
migration?
The other possibility, of course, is that the virq has been cleared, but 
that somehow the kernel is requesting the same virq twice.

  -George

Vasiliy Tolstov

2013-Jun-13 11:22 UTC

head link

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

Any news about this bug? I don''t understand why in case of using
xend/xm bug not appeared.
I think that xl and xm use identical sequence and identical operations
when migrate domain...

2013/6/6 George Dunlap
<george.dunlap@eu.citrix.com>:> On 05/06/13 19:50, Konrad Rzeszutek Wilk wrote:
>>
>> On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov wrote:
>>>
>>> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
>>>>
>>>> The non-debug version tells me it is:
>>>>
>>>> 289     if ( (port = get_free_port(d)) < 0 )
>>>> 290         ERROR_EXIT(port);
>>>>
>>>> Which gets -EEXIST from get_free_port. But get_free_port only
returns
>>>> -EINVAL, -ENOMEM, and -ENOSPC in failure modes.
>>>>
>>>> But we get -EEXIST? Could you re-run git diff and attach output
to
>>>> this email? I think you tweaked the debug code a bit so I am
looking
>>>> at something different?
>>>
>>>
>>> Oh sorry. Yes i modify you patch to this version:
>>
>> That is OK.
>>>
>>> -    if ( v->virq_to_evtchn[virq] != 0 )
>>> +    if ( v->virq_to_evtchn[virq] != 0 ) {
>>> +       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d,
rc:%ld\n",
>>> d->domain_id,
>>> +               vcpu, __func__,__LINE__,
v->virq_to_evtchn[virq],
>>> (long)-EEXIST);
>>>           ERROR_EXIT(-EEXIST);
>>
>> OK, so the value was 3 (event channel), and I am not sure what the virq
>> value
>> was. But it looks as if somebody did not clear that and we are
>> tripping over it.
>>
>> George, have you seen issues with events not being cleared during
>> migration?
>
>
> The other possibility, of course, is that the virq has been cleared, but
> that somehow the kernel is requesting the same virq twice.
>
>  -George
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


-- 
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Vasiliy Tolstov

2013-Jun-13 11:24 UTC

head link

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

Konrad, George do you have any news about this bug?
I can test xen 4.3-rc4 but if this bug not have been fixed i think my
tests can''t be productive...

2013/6/5 Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com>:> On Tue, Jun 04, 2013 at 04:17:55PM +0400, Vasiliy Tolstov wrote:
>> 2013/6/3 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
>> > The non-debug version tells me it is:
>> >
>> > 289     if ( (port = get_free_port(d)) < 0 )
>> > 290         ERROR_EXIT(port);
>> >
>> > Which gets -EEXIST from get_free_port. But get_free_port only
returns
>> > -EINVAL, -ENOMEM, and -ENOSPC in failure modes.
>> >
>> > But we get -EEXIST? Could you re-run git diff and attach output to
>> > this email? I think you tweaked the debug code a bit so I am
looking
>> > at something different?
>>
>>
>> Oh sorry. Yes i modify you patch to this version:
>
> That is OK.
>> -    if ( v->virq_to_evtchn[virq] != 0 )
>> +    if ( v->virq_to_evtchn[virq] != 0 ) {
>> +       gdprintk(XENLOG_WARNING, "d%dv%d [%s:%d], port:%d,
rc:%ld\n",
>> d->domain_id,
>> +               vcpu, __func__,__LINE__, v->virq_to_evtchn[virq],
>> (long)-EEXIST);
>>          ERROR_EXIT(-EEXIST);
>
> OK, so the value was 3 (event channel), and I am not sure what the virq
value
> was. But it looks as if somebody did not clear that and we are
> tripping over it.
>
> George, have you seen issues with events not being cleared during
migration?


-- 
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Konrad Rzeszutek Wilk

2013-Jun-13 13:14 UTC

head link

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

On Thu, Jun 13, 2013 at 03:22:17PM +0400, Vasiliy Tolstov
wrote:> Any news about this bug? I don''t understand why in case of using
> xend/xm bug not appeared.
> I think that xl and xm use identical sequence and identical operations
> when migrate domain...
Hey Vasiliy,

I''ve been busy with another bug in the Xen code and am wrapping it up
now.

Vasiliy Tolstov

2013-Jun-13 13:17 UTC

head link

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

2013/6/13 Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com>:> Hey Vasiliy,
>
> I''ve been busy with another bug in the Xen code and am wrapping it
up
> now.

Very big thanks. I''m waiting =)

--
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Xen devel - May 2013 - xen 4.3 test report

xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Re: xen 4.3 test report

Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report

Re: Is: events not being cleared during fast migration over InfiniBand Was: Re: xen 4.3 test report