=================STATUS =================I do the heavy load test of create/destroy. CREDIT scheduler(cshed_schedule) checks BUG_ON(!vcpu_running) at the end of code. It makes error. The reason is that atomic_inc(&v->pausecnt)@vcpu_pause() is called without lock. (spin_lock(&schedule_data[cpu].schedule_lock)) This lock-less "pausecnt" makes vcpu_running state changing during the lock of spin_lock_irq(&schedule_data[cpu].schedule_lock)@__enter_schduler(). The code of cshed_schedule exists within this lock. ==================REPRODUCE THE ERROR ==================This problem occured by doing heavily create/destroy loop. ===========Discussion ===========Credit scheduler do the very strict check at the end of code. I think two solution(to be discussed) 1)Remove BUG_ON(!vcpu_running) I propose the patch to fix the problem. it is just removing BUG_ON(!vcpu_running) I use this method because consider the following circum stances. a)SEDF, BVT scheduler is not checked on this (CREDIT scheduler should use same policy) b)We have some possibility to change the vcpu state within runq. (vcpu_running => !vcpu_running) c)This function is scheduler main route. so we want to avoid using the lock. This proposal assumes pausecnt policy is loosely locked. 2)Implement Strict Lock Policy in scheduler Implement v->pausecnt policy within the lock of schedule_data[cpu].schedule_lock.>I got a trace log. > >(XEN) BUG at sched_credit.c:1075 >(XEN) die_if_kernel: bug check 0 >(XEN) d 0xf0000000041d00c8 domid 7 >(XEN) vcpu 0xf0000000041c0000 vcpu 0 >(XEN) >(XEN) CPU 1 >(XEN) psr : 0000101008222018 ifs : 8000000000000a98 ip : >[<f0000000040375a0>] >(XEN) ip is at csched_schedule+0x970/0xf70 >(XEN) unat: 0000000000000000 pfs : 0000000000000a98 rsc : 0000000000000003 >(XEN) rnat: 0000121008226018 bsps: f00000000405a6c0 pr : 000000000001aaa9 >(XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f >(XEN) csd : 0000000000000000 ssd : 0000000000000000 >(XEN) b0 : f0000000040375a0 b6 : f000000004049c80 b7 : e000000000100800 >(XEN) f6 : 0fffbccccccccc8c00000 f7 : 0ffd9a200000000000000 >(XEN) f8 : 0ffff8000000000000000 f9 : 10002a000000000000000 >(XEN) f10 : 0fffbccccccccc8c00000 f11 : 1003e0000000000000000 >(XEN) r1 : f000000004302c70 r2 : 0000000000005ba9 r3 : f0000000041c7fe8 >(XEN) r8 : 0000000000000000 r9 : 0000000000000000 r10 : 0000000000000000 >(XEN) r11 : 0009804c0270033f r12 : f0000000041c78e0 r13 : f0000000041c0000 >(XEN) r14 : 0000000000000000 r15 : f0000000041119b8 r16 : 0000000000004001 >(XEN) r17 : f000000004105214 r18 : 0000000000001ba9 r19 : f000000004105210 >(XEN) r20 : a00000010095be10 r21 : 0000000000000000 r22 : 0000000000000001 >(XEN) r23 : 0000000000000000 r24 : f0000000041c7e20 r25 : f0000000041c7e28 >(XEN) r26 : 0000000000000000 r27 : 0000000000000000 r28 : 000000000000001d >(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f000000004114098 >(XEN) >(XEN) Call Trace: >(XEN) [<f000000004094820>] show_stack+0x80/0xa0 >(XEN) sp=f0000000041c7500 >bsp=f0000000041c1018 >(XEN) [<f000000004075c00>] die_if_kernel+0x80/0xd0 >(XEN) sp=f0000000041c76d0 >bsp=f0000000041c0fe0 >(XEN) [<f00000000406b7a0>] ia64_handle_break+0x1d0/0x290 >(XEN) sp=f0000000041c76d0 >bsp=f0000000041c0fa0 >(XEN) [<f0000000040934c0>] ia64_leave_kernel+0x0/0x310 >(XEN) sp=f0000000041c76e0 >bsp=f0000000041c0fa0 >(XEN) [<f0000000040375a0>] csched_schedule+0x970/0xf70 >(XEN) sp=f0000000041c78e0 >bsp=f0000000041c0ee0 >(XEN) [<f00000000403f0b0>] __enter_scheduler+0x150/0x6b0 >(XEN) sp=f0000000041c78f0 >bsp=f0000000041c0e60 >(XEN) [<f00000000403f6a0>] do_yield+0x90/0xb0 >(XEN) sp=f0000000041c7910 >bsp=f0000000041c0e48 >(XEN) [<f00000000403f970>] do_sched_op_compat+0x120/0x170 >(XEN) sp=f0000000041c7910 >bsp=f0000000041c0e18 >(XEN) [<f00000000405a6e0>] ia64_hypercall+0xe50/0xe90 >(XEN) sp=f0000000041c7910 >bsp=f0000000041c0db0 >(XEN) [<f00000000406b7f0>] ia64_handle_break+0x220/0x290 >(XEN) sp=f0000000041c7df0 >bsp=f0000000041c0d70 >(XEN) [<f0000000040934c0>] ia64_leave_kernel+0x0/0x310 >(XEN) sp=f0000000041c7e00 >bsp=f0000000041c0d70 >(XEN) [<e000000000100810>] ??? >(XEN) sp=f0000000041c8000 >bsp=f0000000041c0d20 >(XEN) [<a000000100067170>] ??? >(XEN) sp=f0000000041c8000 >bsp=f0000000041c0d00 >(XEN) domain_crash_sync called from xenmisc.c:109 >(XEN) Domain 7 (vcpu#0) crashed on cpu#1: >(XEN) d 0xf0000000041d00c8 domid 7 >(XEN) vcpu 0xf0000000041c0000 vcpu 0 >(XEN) >(XEN) CPU 1 >(XEN) psr : 00001012083a6010 ifs : 800000000000050a ip : >[<e000000000100810>] >(XEN) ip is at ??? >(XEN) unat: 0000000000000000 pfs : 8000000000000209 rsc : 0000000000000008 >(XEN) rnat: 0000000000000000 bsps: a000000100955028 pr : 000000000001aa85 >(XEN) ldrs: 0000000000700000 ccv : 0000000000000000 fpsr: 0009804c8a70433f >(XEN) csd : 0000000000000000 ssd : 0000000000000000 >(XEN) b0 : a000000100067170 b6 : a000000100148100 b7 : e000000000100800 >(XEN) f6 : 000000000000000000000 f7 : 1003e28f5c28f5c28f5c3 >(XEN) f8 : 000000000000000000000 f9 : 100068000000000000000 >(XEN) f10 : 1003e0000000000000000 f11 : 1003e0000000000000000 >(XEN) r1 : a000000100d071b0 r2 : 0000000000001000 r3 : 8000000000000209 >(XEN) r8 : a000000100067170 r9 : 0000000000000100 r10 : 0000000000000000 >(XEN) r11 : 0000000000010ac5 r12 : a00000010095bd80 r13 : a000000100954000 >(XEN) r14 : 0000000000000001 r15 : 0000000000000000 r16 : f100000000004c18 >(XEN) r17 : a00000010095bdb0 r18 : a00000010095bdb1 r19 : a00000010095be90 >(XEN) r20 : a00000010095be10 r21 : 0000000000000000 r22 : 0000000000000001 >(XEN) r23 : 0000000000000000 r24 : a000000100b22ae0 r25 : a000000100954f10 >(XEN) r26 : 0000000000000000 r27 : a0000001009550f0 r28 : 000000000000001d >(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : 0000000000000000 >(XEN) r32 : a00000010095bbc0 r33 : 0000000000000000 r34 : 0000000000000004 >(XEN) r35 : 0000000000000000 r36 : 0000000000000c58 r37 : a0000001009540d0 >(XEN) r38 : ffffffffffff49c0 r39 : 0000000000000000 r40 : a00000010001a930 >(XEN) r41 : 8000000000000307 > >Thanks, >Fujita >> -----Original Message----- >> From: xen-ia64-devel-bounces@lists.xensource.com >> [mailto:xen-ia64-devel-bounces@lists.xensource.com] On Behalf Of yo.fujita >> Sent: Tuesday, June 20, 2006 9:03 AM >> To: ''You, Yongkang''; ''Alex Williamson'' >> Cc: xen-ia64-devel@lists.xensource.com >> Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24] >> >> Hi Yongkang, Alex, >> >> I tried the latest cset 10419 too. >> And the same problem was reproduced. >> I think this may be caused by the "credit" scheduler. >> Now our developers are researching this problem. >> >> Setting >> server :tiger4 >> dom0mem :512M >> domUmem :512M >> domU cpus :2 >> sched :credit >> >> Test details >> create 3/10 hung with dom0. >> destroy 4/7 hung with dom0. >> >> Thanks, >> Fujita >> > -----Original Message----- >> > From: xen-ia64-devel-bounces@lists.xensource.com >> > [mailto:xen-ia64-devel-bounces@lists.xensource.com] On Behalf Of >yo.fujita >> > Sent: Monday, June 19, 2006 4:21 PM >> > To: ''You, Yongkang''; ''Alex Williamson'' >> > Cc: xen-ia64-devel@lists.xensource.com >> > Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24] >> > >> > Yongkang, >> > >> > I have a request. >> > I''m using a scheduler "credit" for test. >> > If you''re using other scheduler, can you try the "credit"? >> > >> > Thanks, >> > Fujita >> > > -----Original Message----- >> > > From: xen-ia64-devel-bounces@lists.xensource.com >> > > [mailto:xen-ia64-devel-bounces@lists.xensource.com] On Behalf Of >> yo.fujita >> > > Sent: Monday, June 19, 2006 4:03 PM >> > > To: ''You, Yongkang''; ''Alex Williamson'' >> > > Cc: xen-ia64-devel@lists.xensource.com >> > > Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24] >> > > >> > > Hi Yongkang, >> > > >> > > Thanks for your information! >> > > We also must try the latest changeset. >> > > If it happens again, the cause is in our environment. >> > > I''ll inform you a result. Please wait for a while. >> > > >> > > Thanks, >> > > Fujita >> > > > -----Original Message----- >> > > > From: You, Yongkang [mailto:yongkang.you@intel.com] >> > > > Sent: Monday, June 19, 2006 3:37 PM >> > > > To: yo.fujita; Alex Williamson >> > > > Cc: xen-ia64-devel@lists.xensource.com >> > > > Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24] >> > > > >> > > > Hi Fujita, >> > > > >> > > > I tried the latest Changeset 10419. But I couldn''t reproduce this >> > problem >> > > in >> > > > my box. Is it fixed? I create and destroy SMP XenU for more than 100 >> > > times. >> > > > And try to make kernels in xenU 2 times. I didn''t meet the xen0 >hang. >> > > > >> > > > My box has 3072M memory. Xen0 has 512M, xenU has 512M. XenU has been >> > > assigned >> > > > with 2 CPUs. >> > > > >> > > > Best Regards, >> > > > Yongkang (Kangkang) 永康 >> > > > >> > > > >-----Original Message----- >> > > > >From: yo.fujita [mailto:yo.fujita@soft.fujitsu.com] >> > > > >Sent: 2006年6月19日 10:27 >> > > > >To: You, Yongkang; ''Alex Williamson'' >> > > > >Cc: xen-ia64-devel@lists.xensource.com >> > > > >Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24] >> > > > > >> > > > >> Hi Fujita, >> > > > >> >> > > > >> Maybe it is not guest image issue. In our nightly testing for >XenU, >> > we >> > > > >still >> > > > >> focus on the UP stability and basic booting/destroying testing. >> > > Automatic >> > > > >SMP >> > > > >> XenU isn''t fully added into nightly testing (will add it this >> week). >> > I >> > > > >noticed >> > > > >> your report mentioned that the stress testing and keeping >> > > > >creating/destroying >> > > > >> SMP XenU will catch this issue. I can do some trying to see if I >> can >> > > > >reproduce. >> > > > >Hi Yongkang, >> > > > > >> > > > >Thanks for your comments. >> > > > >As you said, I meant the problem is not in the image itself but the >> > > stress >> > > > >of booting domU. In other words, I thought larger size image takes >> more >> > > > >stress on Xen. >> > > > >So I guess a customized image for testing (necessity minimum size) >> > > doesn''t >> > > > >cause >> > > > >the problem. >> > > > >We appreciate it if you can try to reproduce these issues. >> > > > > >> > > > >Thanks, >> > > > >Fujita >> > > > > >> > > > > >> > > > >> >> > > > >> Best Regards, >> > > > >> Yongkang (Kangkang) 永康 >> > > > >> >> > > > >> >-----Original Message----- >> > > > >> >From: yo.fujita [mailto:yo.fujita@soft.fujitsu.com] >> > > > >> >Sent: 2006年6月19日 8:53 >> > > > >> >To: ''Alex Williamson''; You, Yongkang >> > > > >> >Cc: xen-ia64-devel@lists.xensource.com >> > > > >> >Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24] >> > > > >> > >> > > > >> >> Thanks Fujita. Is anyone else seeing these hangs booting >> domU? >> > > > >> >> The Intel status report on the same changeset has no >indication >> of >> > > > >> >> dom0 hangs on their tests. >> > > > >> >Hi Alex, Yongkang, >> > > > >> > >> > > > >> > In Fujitsu, it''s only me who saw this problem because few >> > developers >> > > > >have >> > > > >> >started any tests relating to SMP. >> > > > >> >I think the reason why Intel''s results was not match to ours is >> the >> > > > >> >difference of stress on booting SMP domU because the our guest >> image >> > > is >> > > > >just >> > > > >> >only a copy of root file system of native Linux, which any >> functions >> > > was >> > > > >not >> > > > >> >edited. >> > > > >> > >> > > > >> >My domU environment. >> > > > >> >Disk size :5G >> > > > >> >OS :RHEL4U2 >> > > > >> >Memory :512M >> > > > >> >CPUs :2 >> > > > >> >Yongkang, do you have any comments? >> > > > >> > >> > > > >> >> Has anything else changed in the test environment? Thanks, >> > > > >> >No. We switched this test to SMP from UP two weeks ago. >> > > > >> > >> > > > >> >Thanks, >> > > > >> >Fujita >> > > >> > > >> > > >> > > _______________________________________________ >> > > Xen-ia64-devel mailing list >> > > Xen-ia64-devel@lists.xensource.com >> > > http://lists.xensource.com/xen-ia64-devel >> > >> > >> > >> > _______________________________________________ >> > Xen-ia64-devel mailing list >> > Xen-ia64-devel@lists.xensource.com >> > http://lists.xensource.com/xen-ia64-devel >> >> >> >> _______________________________________________ >> Xen-ia64-devel mailing list >> Xen-ia64-devel@lists.xensource.com >> http://lists.xensource.com/xen-ia64-devel > > > >_______________________________________________ >Xen-ia64-devel mailing list >Xen-ia64-devel@lists.xensource.com >http://lists.xensource.com/xen-ia64-devel >------------------------------------------------------------ 富士通(株) プラットフォーム技術開発本部 仮想システム開発統括部 酒井 敦 Email sakaia@jp.fujitsu.com TEL 7124-4167(4月7日より) _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Steven Hand
2006-Jun-20 10:06 UTC
Re: [Xen-devel] Re: [Xen-ia64-devel] Weekly benchmark results [ww24]
>=================>STATUS >=================>I do the heavy load test of create/destroy. >CREDIT scheduler(cshed_schedule) checks BUG_ON(!vcpu_running) at the end of code.>It makes error. >The reason is that >atomic_inc(&v->pausecnt)@vcpu_pause() is called without lock. >(spin_lock(&schedule_data[cpu].schedule_lock)) >This lock-less "pausecnt" makes vcpu_running state changing >during the lock of >spin_lock_irq(&schedule_data[cpu].schedule_lock)@__enter_schduler(). >The code of cshed_schedule exists within this lock.Thanks for this - we have also reproduced. The assertation was in fact bogus as you surmised. Have checked in a fix. cheers, S. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Maybe Matching Threads
- [PATCH][IOEMU] Fix to Allow blktap to be able to be booted as system volume for PV-on-HVM.
- [RFC PATCH 7/8] rust: add firmware abstractions
- RE: 32E (64bit) VMX keyboard is out of control, ifgiven an addition ''hde''
- RE: 32E (64bit) VMX keyboard is out of control, ifgiven an addition ''hde''
- [PATCH 1/6] scsifront/back drivers'' common Makefile and header