Guillaume Thouvenin
2006-Jan-10  09:01 UTC
[Xen-devel] sedf scheduler may cause a CPU fatal trap
Hello,
 I played with the sEDF scheduler included in the xen-3.0-testing.hg and
everything is just fine except a CPU fatal trap error that appeared
several times. Here is what I''ve done on a SMP (two processors)
machine:
 I started two unprivileged domains and I compiled a kernel in
each of them using the command:
    # time sh -c "make O=/home/guill/build/k2614 oldconfig \
               && make O=/home/guill/build/k2614"
1) Two domains with default sefd (seems to be best-effort):
          |  domain 1   |  domain 2  |
          |-------------|------------|
     real | 11m43.034s  | 11m46.293s |     
     user | 10m20.220s  | 10m25.140s |
     sys  |  1m08.330s  |  1m09.100s |
           --------------------------
   The xentop showed that domain1 was using aroung 99% of the CPU and it
   was the same for domain2.
2) Two domains with 20ms/5ms (ie 25% of CPU time) and 20ms/15ms (ie 75%
   of CPU time) with no extra time: 
     xm sched-sedf 1 20000000 5000000 0 0 0
     xm sched-sedf 2 20000000 15000000 0 0 0
  
          |  domain 1   |  domain 2  |
          |-------------|------------|
     real | 45m35.626s  | 15m04.808s |     
     user | 41m04.300s  | 13m37.940s |
     sys  |  4m24.050s  |  1m25.160s |
           --------------------------
   The xentop showed that domain1 was using around 25% of the CPU 
whereas domain2 was using around 75%.
3) Two domains with 20ms/5ms (ie 25% of CPU time) and 20ms/15ms (ie 75%
   of CPU time) with extra time: 
     xm sched-sedf 1 20000000 5000000 0 1 0
     xm sched-sedf 2 20000000 15000000 0 1 0
  
          |  domain 1   |  domain 2  |
          |-------------|------------|
     real | 11m48.687s  | 11m50.909s |     
     user | 10m36.870s  | 10m36.180s |
     sys  |  1m08.320s  |  1m09.540s |
           --------------------------
   With extra time enabled, the xentop shows that domain 1 is using
around 97% of CPU and domain 2 is using around 97% too. 
4) Two domains with 20ms/5ms (ie 25% of CPU time) and 20ms/15ms (ie 75%
   of CPU time) without extra time but we change the politics when
   compilation in the second domain finished: 
     xm sched-sedf 1 20000000 5000000 0 0 0
     xm sched-sedf 2 20000000 15000000 0 0 0
   when second domain finished its job:
     xm sched-sedf 1 20000000 0 0 1 0
     xm sched-sedf 2 20000000 0 0 1 0
   
 when I changed the politics, the xen hypervisor crashed and I get the
following error:
(XEN) CPU:    1
(XEN) EIP:    e008:[<ff108d7e>] __qdivrem+0x4e/0x580
(XEN) EFLAGS: 00010046   CONTEXT: hypervisor
(XEN) eax: 00000001   ebx: 00000000   ecx: 00000000   edx: 00000000
(XEN) esi: c4b40000   edi: 00000004   ebp: 00000000   esp: ff1afd94
(XEN) cr0: 8005003b   cr3: 6d236000
(XEN) ds: e010   es: e010   fs: 0000   gs: 0033   ss: e010   cs: e008
(XEN) Xen stack trace from esp=ff1afd94:
(XEN)    00000002 00000001 00007100 ff1afe20 00000989 0000ff1f 00000002 00000009
(XEN)    00000002 00000001 0000c000 ff1afde0 ff1afdfc ff1afe18 00000000 00000000
(XEN)    00000000 00000000 00000000 00000000 00000991 ff1afe38 00000571 00000000
(XEN)    00000000 00000000 00000000 0000ff1f 0000c000 00000000 00000000 00000000
(XEN)    00000000 00000000 00000000 0000c944 00004000 00000004 00000000 ff1b5e84
(XEN)    ff1b6d84 ffbfa980 ff10c8b0 00000000 c4b40000 00000004 00000000 ff1092ff
(XEN)    c4b40000 00000004 00000000 00000000 00000000 ff1ad080 b46d68de ff1b5e88
(XEN)    ff1b5e80 c4b40000 00000004 ff10e443 c4b40000 00000004 00000000 00000000
(XEN)    ff1afee4 b2a993ef 000012b4 ff10d898 00001000 00000001 ff1b5080 ffbfa980
(XEN)    ff1b5e80 b2aea7e3 000012b4 ff10d8c0 b2aea7e3 000012b4 ff1b5080 00000080
(XEN)    0000efff 0000fe80 e6525499 000012b4 00000001 ff1924a0 b3355354 000012b4
(XEN)    ffbfa988 00000001 ffbfa990 ffbfa998 00000096 00000001 bfb12eb8 00000096
(XEN)    00000000 00000000 ff174010 ff174010 b2aea7e3 000012b4 ff1aff74 ff10ec3b
(XEN)    ff1aff74 b2aea7e3 000012b4 00000033 0000000c 00000000 00000000 ff12111d
(XEN)    0000000c 00000000 00055080 ff1b5080 00000080 00000000 00000001 ff1b5080
(XEN)    ff1affb4 00000000 ff1249ce ff1affb4 ff1affb4 00000020 00000000 00000080
(XEN)    b7efa860 00000005 bfb12eb8 ff10f732 00000005 bfb12eb8 ff1b5080 ff1354c6
(XEN)    b7ef8ff4 00000000 00000001 b7efa860 00000005 bfb12eb8 00000000 000d0000
(XEN)    b7e2e549 00000073 00010286 bfb12e90 0000007b 0000007b 0000007b 00000000
(XEN)    00000033 00000001 ff1b5080
(XEN) Xen call trace:
(XEN)    [<ff108d7e>] __qdivrem+0x4e/0x580
(XEN)    [<ff10c8b0>] runq_comp+0x0/0x70
(XEN)    [<ff1092ff>] __divdi3+0x4f/0xa0
(XEN)    [<ff10e443>] desched_extra_dom+0x1f3/0x210
(XEN)    [<ff10d898>] sedf_do_schedule+0x228/0x260
(XEN)    [<ff10d8c0>] sedf_do_schedule+0x250/0x260
(XEN)    [<ff10ec3b>] __enter_scheduler+0x7b/0x2e0
(XEN)    [<ff12111d>] mod_l1_entry+0x9d/0xf0
(XEN)    [<ff1249ce>] do_general_protection+0xbe/0x180
(XEN)    [<ff10f732>] do_softirq+0x32/0x50
(XEN)    [<ff1354c6>] process_softirqs+0x6/0x8
(XEN)
(XEN) ************************************
(XEN) CPU1 FATAL TRAP 0 (divide error), ERROR_CODE 0000, IN INTERRUPT CONTEXT. 
(XEN) System shutting down -- need manual reset.
(XEN) ************************************
This fatal trap doesn''t appear if we use 
  xm sched-sedf 1 20000000 5000000 0 1 0
Did someone else have this problem? I can reproduce the bug on my Xeon
x86_64 box so I can provide more inputs. 
Hope this help,
Best regards,
Guillaume
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Guillaume Thouvenin
2006-Jan-13  08:33 UTC
Re: [Xen-devel] sedf scheduler may cause a CPU fatal trap
On Tue, 10 Jan 2006 10:01:13 +0100 Guillaume Thouvenin <guillaume.thouvenin@bull.net> wrote:> 4) Two domains with 20ms/5ms (ie 25% of CPU time) and 20ms/15ms (ie > 75% of CPU time) without extra time but we change the politics when > compilation in the second domain finished: > xm sched-sedf 1 20000000 5000000 0 0 0 > xm sched-sedf 2 20000000 15000000 0 0 0 > when second domain finished its job: > xm sched-sedf 1 20000000 0 0 1 0 > xm sched-sedf 2 20000000 0 0 1 0 > > when I changed the politics, the xen hypervisor crashed and I get the > following error:I forgot to specify that I ran the tests on the last changset 8269 from xen-3.0-testing. Best regards, Guillaume _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guillaume Thouvenin
2006-Jan-13  09:59 UTC
Re: [Xen-devel] sedf scheduler may cause a CPU fatal trap
I ran two commands on the xen-unstable.hg that produced the CPU fatal trap. I tested the changset 8571 on a x86_64 xeon bi-processors with HT enabled. I only started one unprivileged domain. The two commands are: # xm sched-sedf 1 20000000 5000000 0 0 0 # xm sched-sedf 1 20000000 0 0 1 0 And I got the following report: (XEN) CPU: 3 (XEN) RIP: e010:[<ffff830000110972>] desched_extra_dom+0xc2/0x190 (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (XEN) rax: 00000004c4b40000 rbx: ffff830000ff8f00 rcx: ffff830000ff8f30 (XEN) rdx: 0000000000000000 rsi: 0000000000001000 rdi: 00000044a62c0b62 (XEN) rbp: 00000044a62c0b62 rsp: ffff8300001dbe20 r8: ffff8300001eef00 (XEN) r9: ffff8300001e6080 r10: 0000000000000001 r11: 0000000000000001 (XEN) r12: ffff830000ff8f10 r13: 0000000000000003 r14: 0000000000000003 (XEN) r15: ffff830000181f80 cr0: 000000008005003b cr3: 0000000104f07000 (XEN) Xen stack trace from rsp=ffff8300001dbe20: (XEN) ffff83000010ff36 0000000000000086 ffff830000ff8f20 ffff83000011a153 (XEN) 0000000000000008 ffff830000ff8f20 ffff830000ff8f30 ffff830000181fa0 (XEN) 0000000000000003 00000044a62c0b62 0000000000000180 ffff8300001e6080 (XEN) ffff8300001114a8 0000000000000206 ffff830000181e00 ffff88001fa55b68 (XEN) ffff8300001dbec8 0000000000000003 ffff8300001e60a0 0000000000000003 (XEN) 0000000000000000 0000000000000000 ffffffff8010002c 00000000ffffffff (XEN) 0000ffffffff8010 ffffffff803affb0 ffff83000011110b 00000000ffffffff (XEN) ffff83000011119b ffffffff803affb0 ffff8300001e6080 ffffffff802f7580 (XEN) ffff83000013806c ffffffff803affb0 0000ffffffff8010 00000000ffffffff (XEN) ffffffff8010002c ffffffff802f7580 00000000ffffffff 0000000000000246 (XEN) 0000000000000001 00000000ffffabf1 00000000000002b0 0000000000000000 (XEN) ffffffff8010fc5f 0000000000000000 0000000000000000 0000000000000001 (XEN) 0000010000000000 ffffffff8010fc5f 000000000000e033 0000000000000246 (XEN) ffffffff803aff80 000000000000e02b 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000003 ffff8300001e6080 (XEN) Xen call trace: (XEN) [<ffff830000110972>] desched_extra_dom+0xc2/0x190 (XEN) [<ffff83000010ff36>] sedf_do_schedule+0xc6/0x200 (XEN) [<ffff83000011a153>] context_switch+0x173/0x1a0 (XEN) [<ffff8300001114a8>] __enter_scheduler+0x78/0x260 (XEN) [<ffff83000011110b>] do_block+0x7b/0x90 (XEN) [<ffff83000011119b>] do_sched_op+0x7b/0x110 (XEN) [<ffff83000013806c>] syscall_enter+0x5c/0x61 (XEN) (XEN) ************************************ (XEN) CPU3 FATAL TRAP 0 (divide error), ERROR_CODE 0000, IN INTERRUPT CONTEXT. (XEN) System shutting down -- need manual reset. (XEN) ************************************ Best regards, Guillaume _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jan-13  10:10 UTC
Re: [Xen-devel] sedf scheduler may cause a CPU fatal trap
On 13 Jan 2006, at 09:59, Guillaume Thouvenin wrote:> I ran two commands on the xen-unstable.hg that produced the CPU fatal > trap. I tested the changset 8571 on a x86_64 xeon bi-processors with HT > enabled. I only started one unprivileged domain. The two commands are: > > # xm sched-sedf 1 20000000 5000000 0 0 0 > # xm sched-sedf 1 20000000 0 0 1 0There''s a changeset (8577) queued up in our staging tree that should hopefully fix this. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guillaume Thouvenin
2006-Jan-16  10:30 UTC
Re: [Xen-devel] sedf scheduler may cause a CPU fatal trap
On Fri, 13 Jan 2006 10:10:59 +0000 Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> > On 13 Jan 2006, at 09:59, Guillaume Thouvenin wrote: > > > I ran two commands on the xen-unstable.hg that produced the CPU fatal > > trap. I tested the changset 8571 on a x86_64 xeon bi-processors with HT > > enabled. I only started one unprivileged domain. The two commands are: > > > > # xm sched-sedf 1 20000000 5000000 0 0 0 > > # xm sched-sedf 1 20000000 0 0 1 0 > > There''s a changeset (8577) queued up in our staging tree that should > hopefully fix this.I tested the changset 8612 but unfortunately the problem is still there. Here is the report: (XEN) CPU: 3 (XEN) RIP: e010:[<ffff8300001101f2>] desched_extra_dom+0xc2/0x190 (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (XEN) rax: 00000004c4b40000 rbx: ffff830000ff8f00 rcx: ffff830000ff8f30 (XEN) rdx: 0000000000000000 rsi: 0000000000020000 rdi: 00000067879b972c (XEN) rbp: 00000067879b972c rsp: ffff8300001dbe20 r8: ffff8300001eec80 (XEN) r9: ffff830000fce080 r10: 0000000000000001 r11: 0000000000000001 (XEN) r12: ffff830000ff8f10 r13: 0000000000000003 r14: 0000000000000003 (XEN) r15: ffff830000180f80 cr0: 000000008005003b cr3: 00000001059b2000 (XEN) Xen stack trace from rsp=ffff8300001dbe20: (XEN) ffff83000010f7b6 ffff8300001dc080 ffff8300001dc080 ffff83000011a3f3 (XEN) 0000000000000000 ffff830000ff8f20 ffff830000ff8f30 ffff830000180fa0 (XEN) 0000000000000003 00000067879b972c 0000000000000180 ffff830000fce080 (XEN) ffff830000110d98 000000000007a120 ffff830000180e00 000000000007a120 (XEN) ffff830000174ba0 0000000000000000 0000000000000003 ffff830000fce0a0 (XEN) 0000000000000000 0000000000000000 ffffffff8010002c 00000000ffffffff (XEN) 0000ffffffff8010 ffffffff803affb0 ffff8300001109f5 00000000ffffffff (XEN) ffff830000110a7b ffffffff803affb0 ffff830000fce080 ffffffff802f7600 (XEN) ffff8300001384ac ffffffff803affb0 0000ffffffff8010 00000000ffffffff (XEN) ffffffff8010002c ffffffff802f7600 00000000ffffffff 0000000000000246 (XEN) 0000000000000001 00000000ffff98b9 0000000000000180 0000000000000000 (XEN) ffffffff8010fc5f 0000000000000000 0000000000000000 0000000000000001 (XEN) 0000010000000000 ffffffff8010fc5f 000000000000e033 0000000000000246 (XEN) ffffffff803aff80 000000000000e02b 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000003 ffff830000fce080 (XEN) Xen call trace: (XEN) [<ffff8300001101f2>] desched_extra_dom+0xc2/0x190 (XEN) [<ffff83000010f7b6>] sedf_do_schedule+0xc6/0x200 (XEN) [<ffff83000011a3f3>] context_switch+0x173/0x1a0 (XEN) [<ffff830000110d98>] __enter_scheduler+0x78/0x250 (XEN) [<ffff8300001109f5>] do_block+0x85/0x90 (XEN) [<ffff830000110a7b>] do_sched_op+0x7b/0x110 (XEN) [<ffff8300001384ac>] syscall_enter+0x5c/0x61 (XEN) (XEN) ************************************ (XEN) CPU3 FATAL TRAP 0 (divide error), ERROR_CODE 0000, IN INTERRUPT CONTEXT. (XEN) System shutting down -- need manual reset. (XEN) ************************************ Hope this help, Best regards, Guillaume _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guillaume Thouvenin
2006-Jan-18  08:24 UTC
Re: [Xen-devel] sedf scheduler may cause a CPU fatal trap
I tested the changset 8627 and the fatal trap is a little bit different
because new messages (produced by the linux-2.6.12.6-xenU from domain 1
or by Xen???) are caught on the serial link. The test is the same, I
start one unprivileged domain and run two commands that are:     
# xm sched-sedf 1 20000000 5000000 0 0 0
# xm sched-sedf 1 20000000 0 0 1 0
Here is the Xen message:
(XEN) CPU:    3
(XEN) RIP:    e010:[<ffff8300001101f2>] desched_extra_dom+0xc2/0x190
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: 00000004c4b40000   rbx: ffff830000ff8f00   rcx: ffff830000ff8f30
(XEN) rdx: 0000000000000000   rsi: 0000000000020000   rdi: 00000065ac412840
(XEN) rbp: 00000065ac412840   rsp: ffff8300001dbe20   r8:  ffff8300001eec80
(XEN) r9:  ffff830000fce080   r10: 0000000000000001   r11: 0000000000000001
(XEN) r12: ffff830000ff8f10   r13: 0000000000000003   r14: 0000000000000003Timer
 ISR/0: Time went backwards: delta=-40045272 cpu_delta=49954728 shadow=437002448
223 off=87566263 processed=437130059504 cpu_processed=437040059504
(XEN)  0: 437040059504
r15: ffff830000180f80   cr0: 000000008005003b   cr3: 0000000104c3a000 1: 4370900
64536
 2: 437130064536
(XEN)  3: 436660064536
Xen stack trace from rsp=ffff8300001dbe20:Timer ISR/0: Time went backwards: delt
a=-235630433 cpu_delta=234369567 shadow=437002448223 off=311980971 processed=437
550059504
 cpu_processed=437080059504
(XEN)  0: 437080059504
    1: 437550064536
ffff83000010f7b6  2: 437550064536
ffff8300001dc080  3: 436660064536
ffff8300001dc080 Timer ISR/0: Time went backwards: delta=-369042259 cpu_delta=19
0957741 shadow=437002448223 off=498569094 processed=437870059504ffff83000011a3f3
  cpu_processed=437310059504
 0: 437310059504
(XEN)  1: 437870064536
    2: 437870064536
0000000000000000  3: 436660064536
ffff830000ff8f20 Timer ISR/0: Time went backwards: delta=-507439348 cpu_delta=17
2560652 shadow=437002448223 off=670172008 processed=438180059504ffff830000ff8f30
  cpu_processed=437500059504
ffff830000180fa0  0: 437500059504
 1: 438170064536
(XEN)  2: 438170064536
    3: 436660064536
0000000000000003 Timer ISR/0: Time went backwards: delta=-635830026 cpu_delta=17
4169974 shadow=437002448223 off=841781333 processed=43848005950400000065ac412840
  cpu_processed=437670059504
0000000000000180  0: 437670059504
ffff830000fce080  1: 438480064536
 2: 438480064536
(XEN)  3: 436660064536
   Timer ISR/0: Time went backwards: delta=-765939247 cpu_delta=184060753 shadow
=437002448223 off=1021672118 processed=43879005950ffff830000110d98 4 cpu_process
ed=437840059504
000000000007a120  0: 437840059504
ffff830000180e00  1: 438790064536
000000000007a120  2: 438790064536
 3: 436660064536
(XEN) Timer ISR/0: Time went backwards: delta=-907230826 cpu_delta=182769174 sha
dow=437002448223 off=1200380532 processed=43911005950   4 cpu_processed=43802005
9504
ffff830000174ba0  0: 438020059504
0000000000000000  1: 439110064536
0000000000000003  2: 439110064536
ffff830000fce0a0  3: 436660064536
Timer ISR/0: Time went backwards: delta=-726155685 cpu_delta=183844315 shadow=43
7002448223 off=1381455632 processed=43911005950(XEN) 4 cpu_processed=43820005950
4
    0: 438200059504
0000000000000000  1: 439110064536
0000000000000000  2: 439110064536
ffffffff8010002c  3: 436660064536
00000000ffffffff Timer ISR/0: Time went backwards: delta=-1173955180 cpu_delta=1
76044820 shadow=437002448223 off=1553656202 processed=4397300595
04 cpu_processed=438380059504
(XEN)  0: 438380059504
    1: 439730064536
0000ffffffff8010  2: 439730064536
ffffffff803affb0  3: 436660064536
ffff8300001109f5 Timer ISR/0: Time went backwards: delta=-1301160567 cpu_delta=1
78839433 shadow=437002448223 off=1726450802 processed=440030059500000000ffffffff
 04 cpu_processed=438550059504
 0: 438550059504
(XEN)  1: 440030064536
    2: 440030064536
ffff830000110a7b  3: 436660064536
ffffffff803affb0 ffff830000fce080 ffffffff802f7500
(XEN)    ffff8300001384ac ffffffff803affb0 0000ffffffff8010 00000000ffffffff
(XEN)    ffffffff8010002c ffffffff802f7500 00000000ffffffff 0000000000000246
(XEN)    0000000000000001 00000000ffffb1bd 0000000000000310 0000000000000000
(XEN)    ffffffff8010fc5f 0000000000000000 0000000000000000 0000000000000001
(XEN)    0000010000000000 ffffffff8010fc5f 000000000000e033 0000000000000246
(XEN)    ffffffff803aff80 000000000000e02b 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000003 ffff830000fce080
(XEN) Xen call trace:
(XEN)    [<ffff8300001101f2>] desched_extra_dom+0xc2/0x190
(XEN)    [<ffff83000010f7b6>] sedf_do_schedule+0xc6/0x200
(XEN)    [<ffff83000011a3f3>] context_switch+0x173/0x1a0
(XEN)    [<ffff830000110d98>] __enter_scheduler+0x78/0x250
(XEN)    [<ffff8300001109f5>] do_block+0x85/0x90
(XEN)    [<ffff830000110a7b>] do_sched_op+0x7b/0x110
(XEN)    [<ffff8300001384ac>] syscall_enter+0x5c/0x61
(XEN)
(XEN) ************************************
(XEN) CPU3 FATAL TRAP 0 (divide error), ERROR_CODE 0000, IN INTERRUPT CONTEXT.
(XEN) System shutting down -- need manual reset.
(XEN) ************************************
printk: 3 messages suppressed.
Timer ISR/0: Time went backwards: delta=-2202039443 cpu_delta=927960557 shadow=4
37002448223 off=2944602314 processed=442130059504 cpu_processed=439000059504
 0: 439000059504
 1: 442120064536
 2: 442120064536
 3: 436660064536
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Guillaume Thouvenin
2006-Feb-02  12:37 UTC
Re: [Xen-devel] sedf scheduler may cause a CPU fatal trap
On Fri, 13 Jan 2006 10:10:59 +0000 Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> On 13 Jan 2006, at 09:59, Guillaume Thouvenin wrote: > > > I ran two commands on the xen-unstable.hg that produced the CPU fatal > > trap. I tested the changset 8571 on a x86_64 xeon bi-processors with HT > > enabled. I only started one unprivileged domain. The two commands are: > > > > # xm sched-sedf 1 20000000 5000000 0 0 0 > > # xm sched-sedf 1 20000000 0 0 1 0 > > There''s a changeset (8577) queued up in our staging tree that should > hopefully fix this.I tested the last Changeset (8717) and the CPU0 FATAL TRAP is always present. I paste the message generated by Xen at the end of this email. I can reproduce the bug easily (just run the two xm commands). Hope this help, Guillaume. (XEN) CPU: 0 (XEN) RIP: e010:[<ffff83000011e108>] desched_extra_dom+0x1c3/0x38a (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (XEN) rax: 00000004c4b40000 rbx: ffff830000fe5e80 rcx: ffff830000fe5e80 (XEN) rdx: 0000000000000000 rsi: 0000000000000001 rdi: ffff830000fbe080 (XEN) rbp: ffff8300001ebda8 rsp: ffff8300001ebd50 r8: 00000000deadbeef (XEN) r9: 00000000deadbeef r10: ffff8300001ebf28 r11: 0000000000000246 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr3: 0000000087f4c000 (XEN) (file=extable.c, line=77) Pre-exception: ffff83000011e108 -> 0000000000000000 (XEN) Xen stack trace from rsp=ffff8300001ebd68: (XEN) CPU: 1 (XEN) RIP: e010:[<ffff83000011e108>]ffff8300001ebdc0 desched_extra_dom+0x1c3/0x38affff830000fe5e80 (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (XEN) 0000000000020000 rax: 00000004c4b40000 rbx: ffff830000fe4d00 rcx: ffff830000fe4d00 (XEN) 00000001001ebdc0 rdx: 0000000000000000 rsi: 0000000000000001 rdi: ffff830000fc4080 (XEN) (XEN) rbp: ffff83000024fda8 rsp: ffff83000024fd50 r8: 00000000deadbeef (XEN) ffff830000fe5e80 r9: 00000000deadbeef r10: ffff83000024ff28 r11: 0000000000000246 (XEN) ffff830000fbe080 r12: ffffffff8010002c r13: 00000000ffffffff r14: 0000ffffffff8010 (XEN) 000000404424e250 r15: ffffffff803adfb0 cr0: 000000008005003b cr3: 0000000087f4e000 (XEN) ffff8300001f4380 Xen stack trace from rsp=ffff83000024fd68: (XEN) (XEN) ffff8300001ebe38 ffff83000024fdc0 ffff83000011e6c3 ffff830000fe4d00 ffff8300001f4380 000000000002 0000 0000000000000000 000000010024fdc0 (XEN) (XEN) ffff830000fe4d00 ffff8300001ebde8 ffff830000fc4080 ffff830000123c60 000000406cad42ff 000000000000 0001 ffff8300001f4380 0000000000000000 (XEN) (XEN) ffff8300001ebe08 ffff83000024fe38 ffff830000123c16 ffff83000011e6c3 ffff830000241fa0 ffff8300001f 4400 ffff830000241fb0 0000000000000000 (XEN) (XEN) ffff8300001ebe38 ffff83000024fde8 ffff830000fe5e80 ffff830000123c60 ffff830000241f90 000000000000 0002 ffff830000241f80 0000000100000000 (XEN) (XEN) ffff83000024fe08 0000000000214280 ffff830000123c16 000000404424e250 ffff830000ffaea0 ffff8300001e bea8 ffff830000ffaeb0 ffff8300001217ae (XEN) (XEN) ffff83000024fe38 0000000000000000 ffff830000fe4d00 ffff8300001ebe58 ffff830000ffae90 aaaaaaaaaaaa aaaa ffff830000ffae80 aaaaaaaaaaaaaaaa (XEN) 0000000100214300 (XEN) aaaaaaaaaaaaaaaa 000000406cad42ff aaaaaaaaaaaaaaaa ffff83000024fea8 000000404424e250 ffff83000012 17ae (XEN) 0000000000ffaf80 0000000000000000 (XEN) 0000000000000000 ffff83000024fe58 ffff830000fbe080 aaaaaaaaaaaaaaaa ffff830000fbe0f8 aaaaaaaaaaaa aaaa ffff830000fbe080 (XEN) (XEN) ffff8300001ebec8 aaaaaaaaaaaaaaaa ffff8300001212e5 aaaaaaaaaaaaaaaa ffff8300001ebee8 000000406cad 42ff ffff830000fbe080 0000000100ffbd00 (XEN) ffff8300001ebf08 (XEN) 0000000000000000 ffff8300001213a2 ffff830000fc4080 00000040bd20b201 ffff830000fc40f8 0000000100fb e0a0 ffff830000fc4080 (XEN) 0000000000000000 (XEN) ffff83000024fec8 0000000000000000 ffff8300001212e5 00000001deadbeef ffff83000024fee8 ffff830000fb e080 ffff830000fc4080 (XEN) (XEN) ffff83000024ff08 00007cffffe140b7 ffff8300001213a2 ffff830000159260 000000408a76b981 ffffffff8010 d0ca 0000000100fc40a0 0000000000000006 (XEN) (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000001dead beef 0000000000000000 ffff830000fc4080 (XEN) 0000000000000000 (XEN) 00007cffffdb00b7 0000000000000000 ffff830000159260 0000000000000246 ffffffff8010d0ca 000000000000 0004 0000000000000006 (XEN) (XEN) ffffffff803adfb0 00000000ffffac03 0000ffffffff8010 00000000ffffac03 00000000ffffffff 000000000000 0000 ffffffff8010002c ffffffff8010d0ca (XEN) (XEN) 0000000000000000 ffffffff802f6d00 0000000000000000 00000000ffffffff 0000000000000001 000000000000 0246 0000010000000000 0000000000000008 (XEN) ffffffff8010d0ca (XEN) 00000000ffffac47 000000000000e033 00000000ffffac47 0000000000000246 0000000000000000 ffff88000002 7f28 ffffffff8010d0ca (XEN) 000000000000e02b (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000001 000000000000 0000 0000010000000000 (XEN) (XEN) Xen call trace: (XEN) ffffffff8010d0ca [<ffff83000011e108>]000000000000e033 desched_extra_dom+0x1c3/0x38a (XEN) 0000000000000246 [<ffff83000011e6c3>]ffffffff803adf68 sedf_do_schedule+0x123/0x3c6 (XEN) (XEN) 000000000000e02b [<ffff8300001217ae>]0000000000000000 __enter_scheduler+0x106/0x39b (XEN) 0000000000000000 [<ffff8300001212e5>]0000000000000000 do_block+0xb4/0xbb (XEN) (XEN) [<ffff8300001213a2>]Xen call trace: (XEN) do_sched_op+0x4d/0xcf (XEN) [<ffff830000159260>][<ffff83000011e108>] syscall_enter+0xa0/0xfa (XEN) desched_extra_dom+0x1c3/0x38a (XEN) (XEN) [<ffff83000011e6c3>]************************************ (XEN) sedf_do_schedule+0x123/0x3c6 (XEN) CPU0 FATAL TRAP 0 (divide error), ERROR_CODE 0000, IN INTERRUPT CONTEXT. (XEN) [<ffff8300001217ae>]System shutting down -- need manual reset. (XEN) ************************************ (XEN) __enter_scheduler+0x106/0x39b _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2006-Feb-22  21:39 UTC
[Xen-devel] [PATCH] xen: fix empty slice bug in sedf_adjdom()
Whenever the slice of a domU is set to 0, sedf_adjdom() sets extraweight
to 0.  Later, in desched_extra_dom(), if the extrawight is not set, the
vcpu''s score is calculated with this:
 /*domain was running in L1 extraq => score is inverse of
   utilization and is used somewhat incremental!*/
   if ( !inf->extraweight )
       /*NB: use fixed point arithmetic with 10 bits*/
       inf->score[EXTRA_UTIL_Q] = (inf->period << 10) /
           inf->slice;
Which can result in a divide by zero.
The attached patch adds a comments and additional sanity check to
prevent this case from crashing Xen.
-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com
diffstat output:
 sched_sedf.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
---
diff -r 697fac283c9e xen/common/sched_sedf.c
--- a/xen/common/sched_sedf.c	Wed Feb 22 19:11:23 2006
+++ b/xen/common/sched_sedf.c	Wed Feb 22 15:02:21 2006
@@ -1610,10 +1610,10 @@
             /*time driven domains*/
             for_each_vcpu(p, v) {
                 /* sanity checking! */
-                if(cmd->u.sedf.slice > cmd->u.sedf.period )
+                if(!cmd->u.sedf.slice || cmd->u.sedf.slice >
cmd->u.sedf.period)
                     return -EINVAL;
                 EDOM_INFO(v)->weight = 0;
-                EDOM_INFO(v)->extraweight = 0;
+                EDOM_INFO(v)->extraweight = 0; /* disabling extra weight
requires non-zere slice */
                 EDOM_INFO(v)->period_orig = 
                     EDOM_INFO(v)->period   = cmd->u.sedf.period;
                 EDOM_INFO(v)->slice_orig  = 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel