I am seeing a problem with Dom0 crashing on x86_64 whenever I create a DomU. I''ve done some more testing, and it appears that this problem is somehow related to networking. Dom0 crashes as soon as the networking services are started when DomU is coming up. As an experiment, I brought up DomU without networking, and it stayed up. As soon as I started DomU with networking enabled, however, Dom0 crashed. Below is the trace: Unable to handle kernel paging request at ffffc20000036000 RIP: <ffffffff802afff9>{net_rx_action+1209} PGD 13e4067 PUD 13e3067 PMD 13e2067 PTE 0 Oops: 0000 [1] CPU 0 Modules linked in: thermal processor fan button battery ac Pid: 2712, comm: sshd Not tainted 2.6.12-xen0 RIP: e030:[<ffffffff802afff9>] <ffffffff802afff9>{net_rx_action+1209} RSP: e02b:ffff88000290d7f8 EFLAGS: 00010202 RAX: ffffc20000035ff0 RBX: ffff88000de9bb60 RCX: 00000000000000ff RDX: 0000000000000001 RSI: ffffc20000036000 RDI: 000000000000000e RBP: ffff88000b5f7c80 R08: 00000000000000ff R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000010c1a06e R13: ffffffff804df7c0 R14: 0000000000000072 R15: ffffffff804e8800 FS: 00002aaaac231040(0000) GS:ffffffff8050ae80(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process sshd (pid: 2712, threadinfo ffff88000290c000, task ffff88000c02ef30) Stack: ffff88000de9bb60 0000000080397db8 0000000000000001 ffff88000290d840 ffff88000a93d380 ffffffff8014bd98 0000000000000000 ffff88000c02ef30 ffffffff8013000e ffffffff80355f6c Call Trace:<ffffffff8014bd98>{mempool_alloc+152} <ffffffff8013000e>{proc_opensys+30} <ffffffff80355f6c>{nf_iterate+92} <ffffffff80397a50>{br_nf_pre_routing_finish+0} <ffffffff80356b4d>{nf_hook_slow+125} <ffffffff80397a50>{br_nf_pre_routing_finish+0} <ffffffff803984d1>{br_nf_pre_routing+1793} <ffffffff8014bceb>{mempool_free+171} <ffffffff80355f6c>{nf_iterate+92} <ffffffff80393850>{br_handle_fra\uffff\uffff\uffff\uffff\uffff\uffff \uffff"\ud455\uffff}\uffff -- Regards, David F Barrera Linux Technology Center Systems and Technology Group, IBM "The wisest men follow their own direction. " Euripides _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jul 12, 2005 at 01:09:09PM -0500, David F Barrera wrote:> I am seeing a problem with Dom0 crashing on x86_64 whenever I create a > DomU. I''ve done some more testing, and it appears that this problem is > somehow related to networking. Dom0 crashes as soon as the networking > services are started when DomU is coming up. As an experiment, I > brought up DomU without networking, and it stayed up. As soon as I > started DomU with networking enabled, however, Dom0 crashed. Below is > the trace:Hi David, I''m quite confused by other reports. Your latest "Daily Xen build" and Paul Larson''s reply suggest that this bug was fix. As well, is that on SLES9 userspace ? Cheers, -- Vincent Hanquez _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2005-07-13 at 00:44 +0200, Vincent Hanquez wrote:> On Tue, Jul 12, 2005 at 01:09:09PM -0500, David F Barrera wrote: > > I am seeing a problem with Dom0 crashing on x86_64 whenever I create a > > DomU. I''ve done some more testing, and it appears that this problem is > > somehow related to networking. Dom0 crashes as soon as the networking > > services are started when DomU is coming up. As an experiment, I > > brought up DomU without networking, and it stayed up. As soon as I > > started DomU with networking enabled, however, Dom0 crashed. Below is > > the trace: > > Hi David, > > I''m quite confused by other reports. > Your latest "Daily Xen build" and Paul Larson''s reply suggest that this > bug was fix.Vincent, I understand. My report did suggest that the problem was fixed; however, it was incorrect, as I later found out. It turns out that the DomU that I had created did not have networking setup properly, thus the VM seemed functional. When I corrected the networking setup and started a DomU, Dom0 crashed. By the way, I just did it today, and the same thing is happening--Dom0 is crashing. This is the trace that I see on the serial console: Unable to handle kernel NULL pointer dereference at 0000000000000c20 RIP: <ffffffff80118aba>{do_page_fault+426} PGD d313067 PUD d312067 PMD 0 Oops: 0000 [1] CPU 0 Modules linked in: thermal processor fan button battery ac Pid: 0, comm: swapper Not tainted 2.6.12-xen0 RIP: e030:[<ffffffff80118aba>] <ffffffff80118aba>{do_page_fault+426} RSP: e02b:ffffffff8054ba00 EFLAGS: 00010202 RAX: 00000000013e4067 RBX: 0000000000000c20 RCX: 0000000000000000 RDX: 0000000000000067 RSI: 00000000093e4067 RDI: ffff800000000000 RBP: 0000000000000c20 R08: 00000000000000ff R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: ffffc20000036000 R14: 0000000000000000 R15: ffffffff8054bb00 FS: 0000000000000000(0000) GS:ffffffff80537b80(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process swapper (pid: 0, threadinfo ffffffff8054a000, task ffffffff80435680) Stack: ffff88000f414000 fff݅> > As well, is that on SLES9 userspace ? > > Cheers,-- Regards, David F Barrera Linux Technology Center Systems and Technology Group, IBM "The wisest men follow their own direction. " Euripides _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
David F Barrera wrote:> This is the trace that I see on the serial console: > > Unable to handle kernel NULL pointer dereference at > 0000000000000c20 RIP: > <ffffffff80118aba>{do_page_fault+426} > PGD d313067 PUD d312067 PMD 0 > Oops: 0000 [1] > CPU 0 > Modules linked in: thermal processor fan button battery ac > Pid: 0, comm: swapper Not tainted 2.6.12-xen0 > RIP: e030:[<ffffffff80118aba>] > <ffffffff80118aba>{do_page_fault+426} RSP: > e02b:ffffffff8054ba00 EFLAGS: 00010202 > RAX: 00000000013e4067 RBX: 0000000000000c20 RCX: > 0000000000000000 > RDX: 0000000000000067 RSI: 00000000093e4067 RDI: > ffff800000000000 > RBP: 0000000000000c20 R08: 00000000000000ff R09: > 0000000000000000 > R10: 0000000000000000 R11: 0000000000000206 R12: > 0000000000000000 > R13: ffffc20000036000 R14: 0000000000000000 R15: > ffffffff8054bb00 > FS: 0000000000000000(0000) GS:ffffffff80537b80(0000) > knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 > Process swapper (pid: 0, threadinfo ffffffff8054a000, task > ffffffff80435680) > Stack: ffff88000f414000 fffIt is caused by checkin of changeset 5648: Remove non-ISO attributes from public headers.( http://xenbits.xensource.com/xen-unstable.hg?cmd=changeset;node=2b6c1a80 98078f7e53de7cf72227fddf01f0b2b6 ). Actually, on x86_64 xenlinux, only the change to xen/include/public/io/netif.h caused this issue, other part of this changeset are OK. After reverting the changes to this file, this issue is gone, but we need a clean patch to this issue. Here we also found that, on i386 xenlinux, mmap001 of LTP will crash domU, I''m doubting it is also introduced by this changeset. -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
(xen 3.0) Running an infinite loop in dom0 causes all other domains to get _zero_ cpu. Even pings to them suddenly stop. Is there a magic "xm sedf" command that I can use to work around this bug? Rob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rob, That''s interesting! Could you provide me with a bit more details? For example your timing parameters for dom0 and domU... Could you just Ctrl-A three times and then show the schedulker runqueues with ''r''? Thanks, Stephan> (xen 3.0) Running an infinite loop in dom0 causes all other domains to > get _zero_ cpu. Even pings to them suddenly stop. Is there a magic "xm > sedf" command that I can use to work around this bug? > > Rob > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rob, That''s interesting! Could you provide me with a bit more details? For example your timing parameters for dom0 and domU... Could you just Ctrl-A three times and then show the schedulker runqueues with ''r''? Thanks, Stephan> (xen 3.0) Running an infinite loop in dom0 causes all other domains to > get _zero_ cpu. Even pings to them suddenly stop. Is there a magic "xm > sedf" command that I can use to work around this bug? > > Rob > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan Diestelhorst wrote:>Rob, > That''s interesting! Could you provide me with a bit more details? >For example your timing parameters for dom0 and domU... >Could you just Ctrl-A three times and then show the schedulker runqueues with >''r''? > >(XEN) *** Serial input -> Xen (type ''CTRL-a'' three times to switch input to DOM0). (XEN) Scheduler: Simple EDF Scheduler (sedf) (XEN) NOW=0x00003659C0E09949 (XEN) CPU[00] now=59759117961863 (XEN) RUNQ rq ff18cf80 n: ffbf7e04, p: ffbf7e04 (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0 c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%) (XEN) (XEN) WAITQ rq ff18cf88 n: ff1bfe04, p: ff1bfe04 (XEN) 0: 1.0 has=F p=100000000 sl=0 ddl=59759240002401 w=0 c=2936327068131 sc=131072 xtr(yes)=2936327068131 ew=1 (100%) (XEN) (XEN) EXTRAQ (penalty) rq ff18cf90 n: ffbf7e0c, p: ffbf7e0c (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0 c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%) (XEN) (XEN) EXTRAQ (utilization) rq ff18cf98 n: ffbf7e14, p: ff1bfe14 (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0 c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%) (XEN) 1: 1.0 has=F p=100000000 sl=0 ddl=59759240002401 w=0 c=2936327068131 sc=131072 xtr(yes)=2936327068131 ew=1 (100%) (XEN) (XEN) not on Q [ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks, what is your dom0 tight loop doing? Heavy I/O? It looks like dom0 does a lot of I/O, which immediatelly unblocks and then gets a higher priority in the L1 extraq, without anybounds on executiontime, i.e. slice length. Can you confirm this? Stephan> Stephan Diestelhorst wrote: > >Rob, > > That''s interesting! Could you provide me with a bit more details? > >For example your timing parameters for dom0 and domU... > >Could you just Ctrl-A three times and then show the schedulker runqueues > > with ''r''? > > (XEN) *** Serial input -> Xen (type ''CTRL-a'' three times to switch input > to DOM0). > (XEN) Scheduler: Simple EDF Scheduler (sedf) > (XEN) NOW=0x00003659C0E09949 > (XEN) CPU[00] now=59759117961863 > (XEN) RUNQ rq ff18cf80 n: ffbf7e04, p: ffbf7e04 > (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0 > c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%) > (XEN) > (XEN) WAITQ rq ff18cf88 n: ff1bfe04, p: ff1bfe04 > (XEN) 0: 1.0 has=F p=100000000 sl=0 ddl=59759240002401 w=0 > c=2936327068131 sc=131072 xtr(yes)=2936327068131 ew=1 (100%) > (XEN) > (XEN) EXTRAQ (penalty) rq ff18cf90 n: ffbf7e0c, p: ffbf7e0c > (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0 > c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%) > (XEN) > (XEN) EXTRAQ (utilization) rq ff18cf98 n: ffbf7e14, p: ff1bfe14 > (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0 > c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%) > (XEN) 1: 1.0 has=F p=100000000 sl=0 ddl=59759240002401 w=0 > c=2936327068131 sc=131072 xtr(yes)=2936327068131 ew=1 (100%) > (XEN) > (XEN) not on Q > [ > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan Diestelhorst wrote:>Thanks, > what is your dom0 tight loop doing? Heavy I/O? >It looks like dom0 does a lot of I/O, which immediatelly unblocks and then >gets a higher priority in the L1 extraq, without anybounds on executiontime, >i.e. slice length. Can you confirm this? > >Stephan > >The loop was just while(1); There was also a python script running that was reading from xentrace. So no heavy i/o, at least no disk/network i/o. But that doesn''t matter anyway. Here is another dump where dom0 is doing absolutely nothing besides while (1); And dom1 is completely comatose during this time. Rob (XEN) Scheduler: Simple EDF Scheduler (sedf) (XEN) NOW=0x0000004AC8DEB496 (XEN) CPU[00] now=321199566451 (XEN) RUNQ rq ff18cf80 n: ff18cf80, p: ff18cf80 (XEN) (XEN) WAITQ rq ff18cf88 n: ffbf7e04, p: ff1bfe04 (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%) (XEN) 1: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0 c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%) (XEN) (XEN) EXTRAQ (penalty) rq ff18cf90 n: ffbf7e0c, p: ffbf7e0c (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%) (XEN) (XEN) EXTRAQ (utilization) rq ff18cf98 n: ff1bfe14, p: ffbf7e14 (XEN) 0: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0 c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%) (XEN) 1: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%) (XEN) (XEN) not on Q _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Just another question: What kind of hardware are you running on? Might this be related in any way to the strange IA64 bugs?> Stephan Diestelhorst wrote: > >> Thanks, >> what is your dom0 tight loop doing? Heavy I/O? >> It looks like dom0 does a lot of I/O, which immediatelly unblocks and >> then gets a higher priority in the L1 extraq, without anybounds on >> executiontime, i.e. slice length. Can you confirm this? >> >> Stephan >> >> > > The loop was just while(1); There was also a python script running > that was reading from xentrace. So no heavy i/o, at least no > disk/network i/o. But that doesn''t matter anyway. Here is another dump > where dom0 is doing absolutely nothing besides while (1); And dom1 is > completely comatose during this time. > > Rob > > > (XEN) Scheduler: Simple EDF Scheduler (sedf) > (XEN) NOW=0x0000004AC8DEB496 > (XEN) CPU[00] now=321199566451 > (XEN) RUNQ rq ff18cf80 n: ff18cf80, p: ff18cf80 > (XEN) > (XEN) WAITQ rq ff18cf88 n: ffbf7e04, p: ff1bfe04 > (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 > c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%) > (XEN) 1: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0 > c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%) > (XEN) > (XEN) EXTRAQ (penalty) rq ff18cf90 n: ffbf7e0c, p: ffbf7e0c > (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 > c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%) > (XEN) > (XEN) EXTRAQ (utilization) rq ff18cf98 n: ff1bfe14, p: ffbf7e14 > (XEN) 0: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0 > c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%) > (XEN) 1: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 > c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%) > (XEN) > (XEN) not on Q > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>Stephan Diestelhorst wrote:> >> Thanks, >> what is your dom0 tight loop doing? Heavy I/O? >> It looks like dom0 does a lot of I/O, which immediatelly unblocks and >> then gets a higher priority in the L1 extraq, without anybounds on >> executiontime, i.e. slice length. Can you confirm this? >> >> Stephan >> >> > > The loop was just while(1); There was also a python script running > that was reading from xentrace. So no heavy i/o, at least no > disk/network i/o. But that doesn''t matter anyway. Here is another dump > where dom0 is doing absolutely nothing besides while (1); And dom1 is > completely comatose during this time. >Thanks for the dump! It looks as something is seriously broken, as the scheduler thinks that dom0 needs compensation for block loss... Strange! I''ll try to reproduce and fix the bug, probably tonight, but def. tomorrow! Thanks, Stephan> > > > (XEN) Scheduler: Simple EDF Scheduler (sedf) > (XEN) NOW=0x0000004AC8DEB496 > (XEN) CPU[00] now=321199566451 > (XEN) RUNQ rq ff18cf80 n: ff18cf80, p: ff18cf80 > (XEN) > (XEN) WAITQ rq ff18cf88 n: ffbf7e04, p: ff1bfe04 > (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 > c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%) > (XEN) 1: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0 > c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%) > (XEN) > (XEN) EXTRAQ (penalty) rq ff18cf90 n: ffbf7e0c, p: ffbf7e0c > (XEN) 0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 > c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%) > (XEN) > (XEN) EXTRAQ (utilization) rq ff18cf98 n: ff1bfe14, p: ffbf7e14 > (XEN) 0: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0 > c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%) > (XEN) 1: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 > c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%) > (XEN) > (XEN) not on Q > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan Diestelhorst wrote:>Just another question: What kind of hardware are you running on? > >Vanilla x86. 2.6ghz P4 in a consumer grade HP box.>Might this be related in any way to the strange IA64 bugs? > > >Well, Dan Magenheimer sits at the desk right next to mine, so it''s quite possible it''s something contagious. ;) Rob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
This bug is caused by the size of netif_tx_request_t/netif_rx_response_t on x86_64, which is using 8 byte alignment. When PACKET is removed by changeset 5648, their sizes are changed from 12 to 16, then netif_tx_interface_t/netif_rx_interface_t will overflow a page. We have 2 ways to resolve this bug: 1. add back __attribute__((packed)) to the definition of the two structures. 2. add #pragma pack(4) to netif.h as: diff -r 1d026c7023d2 xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Thu Jul 14 23:48:06 2005 +++ b/xen/include/public/io/netif.h Fri Jul 15 19:17:52 2005 @@ -8,6 +8,10 @@ #ifndef __XEN_PUBLIC_IO_NETIF_H__ #define __XEN_PUBLIC_IO_NETIF_H__ + +#ifdef __x86_64__ +#pragma pack(4) +#endif typedef struct netif_tx_request { memory_t addr; /* Machine address of packet. */ 3. define a smaller value on x86_64 for NETIF_TX_RING_SIZE/NETIF_RX_RING_SIZE, 128? Keir, which one do you perfer? -Xin Li, Xin B wrote:> David F Barrera wrote: >> This is the trace that I see on the serial console: >> >> Unable to handle kernel NULL pointer dereference at >> 0000000000000c20 RIP: >> <ffffffff80118aba>{do_page_fault+426} >> PGD d313067 PUD d312067 PMD 0 >> Oops: 0000 [1] >> CPU 0 >> Modules linked in: thermal processor fan button battery >> ac Pid: 0, comm: swapper Not tainted 2.6.12-xen0 >> RIP: e030:[<ffffffff80118aba>] >> <ffffffff80118aba>{do_page_fault+426} RSP: >> e02b:ffffffff8054ba00 EFLAGS: 00010202 >> RAX: 00000000013e4067 RBX: 0000000000000c20 RCX: >> 0000000000000000 >> RDX: 0000000000000067 RSI: 00000000093e4067 RDI: >> ffff800000000000 >> RBP: 0000000000000c20 R08: 00000000000000ff R09: >> 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000206 R12: >> 0000000000000000 >> R13: ffffc20000036000 R14: 0000000000000000 R15: >> ffffffff8054bb00 >> FS: 0000000000000000(0000) GS:ffffffff80537b80(0000) >> knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 >> Process swapper (pid: 0, threadinfo ffffffff8054a000, >> task ffffffff80435680) Stack: ffff88000f414000 fff > > It is caused by checkin of changeset 5648: Remove non-ISO > attributes from public headers.( >http://xenbits.xensource.com/xen-unstable.hg?cmd=changeset;node=2b6c1a80> 98078f7e53de7cf72227fddf01f0b2b6 ). Actually, on x86_64 > xenlinux, only the change to > xen/include/public/io/netif.h caused this issue, other > part of this changeset are OK. After reverting the > changes to this file, this issue is gone, but we need a > clean patch to this issue. Here we also found that, on > i386 xenlinux, mmap001 of LTP will crash domU, I''m > doubting it is also introduced by this changeset. > > -Xin > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15 Jul 2005, at 09:07, Li, Xin B wrote:> 3. define a smaller value on x86_64 for > NETIF_TX_RING_SIZE/NETIF_RX_RING_SIZE, 128? > > Keir, which one do you perfer?This one is fine for now. I''ll add that in with a comment that it can be removed when we switch to grant tables for netfront/netback. That will get rid of the 8-byte memory_t out of the structures and relax natural alignment restrictions. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
This patch fixes x86_64 domU network crashes dom0. This bug is caused by the size of netif_tx_request_t/netif_rx_response_t on x86_64, which is using 8 byte alignment. When PACKET is removed by changeset 5648, their sizes are changed from 12 to 16, then netif_tx_interface_t/netif_rx_interface_t will overflow a page. Signed-off-by: Xin Li <xin.b.li@intel.com> Signed-off-by: Xiaofeng Ling <xiaofeng.lingi@intel.com> -Xin diff -r 1d026c7023d2 xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Thu Jul 14 23:48:06 2005 +++ b/xen/include/public/io/netif.h Fri Jul 15 19:55:23 2005 @@ -21,11 +21,11 @@ s8 status; } netif_tx_response_t; -typedef struct { +typedef struct netif_rx_request { u16 id; /* Echoed in response message. */ } netif_rx_request_t; -typedef struct { +typedef struct netif_rx_response { memory_t addr; /* Machine address of packet. */ u16 csum_valid:1; /* Protocol checksum is validated? */ u16 id:15; @@ -46,8 +46,13 @@ #define MASK_NETIF_RX_IDX(_i) ((_i)&(NETIF_RX_RING_SIZE-1)) #define MASK_NETIF_TX_IDX(_i) ((_i)&(NETIF_TX_RING_SIZE-1)) +#ifdef __x86_64__ +#define NETIF_TX_RING_SIZE 128 +#define NETIF_RX_RING_SIZE 128 +#else #define NETIF_TX_RING_SIZE 256 #define NETIF_RX_RING_SIZE 256 +#endif /* This structure must fit in a memory page. */ typedef struct netif_tx_interface { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Awsome! Patch works...can now do networking in x86-64 domU without crashing Xen. On Fri, 2005-07-15 at 16:26 +0800, Li, Xin B wrote:> This patch fixes x86_64 domU network crashes dom0. > > This bug is caused by the size of netif_tx_request_t/netif_rx_response_t > on x86_64, which is using 8 byte alignment. When PACKET is removed by > changeset 5648, their sizes are changed from 12 to 16, then > netif_tx_interface_t/netif_rx_interface_t will overflow a page. > > Signed-off-by: Xin Li <xin.b.li@intel.com> > Signed-off-by: Xiaofeng Ling <xiaofeng.lingi@intel.com> > > -Xin > > diff -r 1d026c7023d2 xen/include/public/io/netif.h > --- a/xen/include/public/io/netif.h Thu Jul 14 23:48:06 2005 > +++ b/xen/include/public/io/netif.h Fri Jul 15 19:55:23 2005 > @@ -21,11 +21,11 @@ > s8 status; > } netif_tx_response_t; > > -typedef struct { > +typedef struct netif_rx_request { > u16 id; /* Echoed in response message. */ > } netif_rx_request_t; > > -typedef struct { > +typedef struct netif_rx_response { > memory_t addr; /* Machine address of packet. */ > u16 csum_valid:1; /* Protocol checksum is validated? */ > u16 id:15; > @@ -46,8 +46,13 @@ > #define MASK_NETIF_RX_IDX(_i) ((_i)&(NETIF_RX_RING_SIZE-1)) > #define MASK_NETIF_TX_IDX(_i) ((_i)&(NETIF_TX_RING_SIZE-1)) > > +#ifdef __x86_64__ > +#define NETIF_TX_RING_SIZE 128 > +#define NETIF_RX_RING_SIZE 128 > +#else > #define NETIF_TX_RING_SIZE 256 > #define NETIF_RX_RING_SIZE 256 > +#endif > > /* This structure must fit in a memory page. */ > typedef struct netif_tx_interface { > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >-- Jerone Young IBM Linux Technology Center jyoung5@us.ibm.com 512-838-1157 (T/L: 678-1157) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >Might this be related in any way to the strange IA64 bugs? > > Well, Dan Magenheimer sits at the desk right next to mine, so it''s quite > possible it''s something contagious. ;) >Heh. You think Xen''s code is virulent? I wouldn''t go that far, although it has infected quite a number of people. :-) Seriously: Unfortunately I can''t reproduce the bug properly ATM, because the recent builds don''t work properly for me... But as far as I can see (i.e. just dom0 testing) something seems to be wrong, but I guess it is something outside the core-scheduler, as the problem doesn''t happen with my older (bk repo) version, and from that to the current code nothing has changed (at least not inside sedf). But that said, I think I should introduce measures to make the scheduler invulnerable against that kind of external fault! Stephan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Stephan, Could you please tell me if SEDF is available in -testing tree or it is only available in -unstable tree so far? Thanks. Xuehai _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
It is not available in the testing tree, but I can create a patch, if needed! Stephan> Hi Stephan, > Could you please tell me if SEDF is available in -testing tree or it is > only available in -unstable tree so far? > Thanks. > Xuehai_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan Diestelhorst
2005-Jul-22 17:20 UTC
greedy dom0 in sedf fixed! (Re: [Xen-devel] More on sedf scheduler)
Am Donnerstag, 14. Juli 2005 19:16 schrieb Rob Gardner:> Stephan Diestelhorst wrote: > >Thanks, > > what is your dom0 tight loop doing? Heavy I/O? > >It looks like dom0 does a lot of I/O, which immediatelly unblocks and then > >gets a higher priority in the L1 extraq, without anybounds on > > executiontime, i.e. slice length. Can you confirm this? > > > >Stephan > > The loop was just while(1); There was also a python script running that > was reading from xentrace. So no heavy i/o, at least no disk/network > i/o. But that doesn''t matter anyway. Here is another dump where dom0 is > doing absolutely nothing besides while (1); And dom1 is completely > comatose during this time. > > RobShould be fixed in the current repository! Stephan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rob Gardner
2005-Jul-22 17:45 UTC
Re: greedy dom0 in sedf fixed! (Re: [Xen-devel] More on sedf scheduler)
Stephan Diestelhorst wrote:> >>he loop was just while(1); There was also a python script running that >>was reading from xentrace. So no heavy i/o, at least no disk/network >>i/o. But that doesn''t matter anyway. Here is another dump where dom0 is >>doing absolutely nothing besides while (1); And dom1 is completely >>comatose during this time. >> >>Rob >> >> >Should be fixed in the current repository! > >Stephan > >Thanks, will give it a try soon. Rob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel