thr3ads.net - Xen devel - [Xen-devel] Dom0 crashing on x86

If this information is useful, please help other people find it:
Share via:

David F Barrera

2005-Jul-12 18:09 UTC

[Xen-devel] Dom0 crashing on x86_64

I am seeing a problem with Dom0 crashing on x86_64 whenever I create a
DomU. I''ve done some more testing, and it appears that this problem is
somehow related to networking. Dom0 crashes as soon as the networking
services are started when DomU is coming up.  As an experiment, I
brought up DomU without networking, and it stayed up. As soon as I
started DomU with networking enabled, however, Dom0 crashed. Below is
the trace:

Unable to handle kernel paging request at ffffc20000036000 RIP:
<ffffffff802afff9>{net_rx_action+1209}
PGD 13e4067 PUD 13e3067 PMD 13e2067 PTE 0
Oops: 0000 [1]
CPU 0
Modules linked in: thermal processor fan button battery ac
Pid: 2712, comm: sshd Not tainted 2.6.12-xen0
RIP: e030:[<ffffffff802afff9>]
<ffffffff802afff9>{net_rx_action+1209}
RSP: e02b:ffff88000290d7f8  EFLAGS: 00010202
RAX: ffffc20000035ff0 RBX: ffff88000de9bb60 RCX: 00000000000000ff
RDX: 0000000000000001 RSI: ffffc20000036000 RDI: 000000000000000e
RBP: ffff88000b5f7c80 R08: 00000000000000ff R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000010c1a06e
R13: ffffffff804df7c0 R14: 0000000000000072 R15: ffffffff804e8800
FS:  00002aaaac231040(0000) GS:ffffffff8050ae80(0000)
knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process sshd (pid: 2712, threadinfo ffff88000290c000, task
ffff88000c02ef30)
Stack: ffff88000de9bb60 0000000080397db8 0000000000000001
ffff88000290d840
       ffff88000a93d380 ffffffff8014bd98 0000000000000000
ffff88000c02ef30
       ffffffff8013000e ffffffff80355f6c
Call Trace:<ffffffff8014bd98>{mempool_alloc+152}
<ffffffff8013000e>{proc_opensys+30}
       <ffffffff80355f6c>{nf_iterate+92}
<ffffffff80397a50>{br_nf_pre_routing_finish+0}
       <ffffffff80356b4d>{nf_hook_slow+125}
<ffffffff80397a50>{br_nf_pre_routing_finish+0}
       <ffffffff803984d1>{br_nf_pre_routing+1793}
<ffffffff8014bceb>{mempool_free+171}
       <ffffffff80355f6c>{nf_iterate+92}
<ffffffff80393850>{br_handle_fra\uffff\uffff\uffff\uffff\uffff\uffff
\uffff"\ud455\uffff}\uffff


-- 
Regards,

David F Barrera
Linux Technology Center
Systems and Technology Group, IBM

"The wisest men follow their own direction. "
                                                        Euripides


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Vincent Hanquez

2005-Jul-12 22:44 UTC

head link

Re: [Xen-devel] Dom0 crashing on x86_64

On Tue, Jul 12, 2005 at 01:09:09PM -0500, David F Barrera
wrote:> I am seeing a problem with Dom0 crashing on x86_64 whenever I create a
> DomU. I''ve done some more testing, and it appears that this
problem is
> somehow related to networking. Dom0 crashes as soon as the networking
> services are started when DomU is coming up.  As an experiment, I
> brought up DomU without networking, and it stayed up. As soon as I
> started DomU with networking enabled, however, Dom0 crashed. Below is
> the trace:
Hi David,

I''m quite confused by other reports.
Your latest "Daily Xen build" and Paul Larson''s reply suggest
that this
bug was fix.

As well, is that on SLES9 userspace ?

Cheers,
-- 
Vincent Hanquez

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

David F Barrera

2005-Jul-13 14:07 UTC

head link

Re: [Xen-devel] Dom0 crashing on x86_64

On Wed, 2005-07-13 at 00:44 +0200, Vincent Hanquez
wrote:> On Tue, Jul 12, 2005 at 01:09:09PM -0500, David F Barrera wrote:
> > I am seeing a problem with Dom0 crashing on x86_64 whenever I create a
> > DomU. I''ve done some more testing, and it appears that this
problem is
> > somehow related to networking. Dom0 crashes as soon as the networking
> > services are started when DomU is coming up.  As an experiment, I
> > brought up DomU without networking, and it stayed up. As soon as I
> > started DomU with networking enabled, however, Dom0 crashed. Below is
> > the trace:
> 
> Hi David,
> 
> I''m quite confused by other reports.
> Your latest "Daily Xen build" and Paul Larson''s reply
suggest that this
> bug was fix.Vincent,

I understand. My report did suggest that the problem was fixed; however,
it was incorrect, as I later found out. It turns out that the DomU that
I had created did not have networking setup properly, thus the VM seemed
functional. When I corrected the networking setup and started a DomU,
Dom0 crashed. By the way, I just did it today, and the same thing is
happening--Dom0 is crashing. This is the trace that I see on the serial
console:

Unable to handle kernel NULL pointer dereference at 0000000000000c20
RIP:
<ffffffff80118aba>{do_page_fault+426}
PGD d313067 PUD d312067 PMD 0
Oops: 0000 [1]
CPU 0
Modules linked in: thermal processor fan button battery ac
Pid: 0, comm: swapper Not tainted 2.6.12-xen0
RIP: e030:[<ffffffff80118aba>] <ffffffff80118aba>{do_page_fault+426}
RSP: e02b:ffffffff8054ba00  EFLAGS: 00010202
RAX: 00000000013e4067 RBX: 0000000000000c20 RCX: 0000000000000000
RDX: 0000000000000067 RSI: 00000000093e4067 RDI: ffff800000000000
RBP: 0000000000000c20 R08: 00000000000000ff R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: ffffc20000036000 R14: 0000000000000000 R15: ffffffff8054bb00
FS:  0000000000000000(0000) GS:ffffffff80537b80(0000)
knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff8054a000, task
ffffffff80435680)
Stack: ffff88000f414000 fff݅

> 
> As well, is that on SLES9 userspace ?
> 
> Cheers,-- 
Regards,

David F Barrera
Linux Technology Center
Systems and Technology Group, IBM

"The wisest men follow their own direction. "
                                                        Euripides


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Li, Xin B

2005-Jul-13 14:39 UTC

head link

RE: [Xen-devel] Dom0 crashing on x86_64

David F Barrera wrote:> This is the trace that I see on the serial console: 
> 
> Unable to handle kernel NULL pointer dereference at
> 0000000000000c20 RIP:
> <ffffffff80118aba>{do_page_fault+426}
> PGD d313067 PUD d312067 PMD 0
> Oops: 0000 [1]
> CPU 0
> Modules linked in: thermal processor fan button battery ac
> Pid: 0, comm: swapper Not tainted 2.6.12-xen0
> RIP: e030:[<ffffffff80118aba>]
> <ffffffff80118aba>{do_page_fault+426} RSP:
> e02b:ffffffff8054ba00  EFLAGS: 00010202 
> RAX: 00000000013e4067 RBX: 0000000000000c20 RCX:
> 0000000000000000 
> RDX: 0000000000000067 RSI: 00000000093e4067 RDI:
> ffff800000000000 
> RBP: 0000000000000c20 R08: 00000000000000ff R09:
> 0000000000000000 
> R10: 0000000000000000 R11: 0000000000000206 R12:
> 0000000000000000 
> R13: ffffc20000036000 R14: 0000000000000000 R15:
> ffffffff8054bb00 
> FS:  0000000000000000(0000) GS:ffffffff80537b80(0000)
> knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000
> Process swapper (pid: 0, threadinfo ffffffff8054a000, task
> ffffffff80435680)
> Stack: ffff88000f414000 fff
It is caused by checkin of changeset 5648: Remove non-ISO attributes
from public headers.(
http://xenbits.xensource.com/xen-unstable.hg?cmd=changeset;node=2b6c1a80
98078f7e53de7cf72227fddf01f0b2b6 ).  Actually, on x86_64 xenlinux, only
the change to xen/include/public/io/netif.h caused this issue, other
part of this changeset are OK.  After reverting the changes to this
file, this issue is gone, but we need a clean patch to this issue.
Here we also found that, on i386 xenlinux, mmap001 of LTP will crash
domU, I''m doubting it is also introduced by this changeset.

-Xin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rob Gardner

2005-Jul-14 15:59 UTC

head link

[Xen-devel] More on sedf scheduler

(xen 3.0) Running an infinite loop in dom0 causes all other domains to 
get _zero_ cpu. Even pings to them suddenly stop. Is there a magic "xm 
sedf" command that I can use to work around this bug?

Rob




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2005-Jul-14 16:24 UTC

head link

Re: [Xen-devel] More on sedf scheduler

Rob, 
  That''s interesting! Could you provide me with a bit more details?
For example your timing parameters for dom0 and domU...
Could you just Ctrl-A three times and then show the schedulker runqueues with 
''r''?

Thanks,
   Stephan> (xen 3.0) Running an infinite loop in dom0 causes all other domains to
> get _zero_ cpu. Even pings to them suddenly stop. Is there a magic "xm
> sedf" command that I can use to work around this bug?
>
> Rob
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2005-Jul-14 16:25 UTC

head link

Re: [Xen-devel] More on sedf scheduler

Rob, 
  That''s interesting! Could you provide me with a bit more details?
For example your timing parameters for dom0 and domU...
Could you just Ctrl-A three times and then show the schedulker runqueues with 
''r''?

Thanks,
   Stephan> (xen 3.0) Running an infinite loop in dom0 causes all other domains to
> get _zero_ cpu. Even pings to them suddenly stop. Is there a magic "xm
> sedf" command that I can use to work around this bug?
>
> Rob
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rob Gardner

2005-Jul-14 16:45 UTC

head link

Re: [Xen-devel] More on sedf scheduler

Stephan Diestelhorst wrote:
>Rob, 
>  That''s interesting! Could you provide me with a bit more details?
>For example your timing parameters for dom0 and domU...
>Could you just Ctrl-A three times and then show the schedulker runqueues
with
>''r''?
>  
>

(XEN) *** Serial input -> Xen (type ''CTRL-a'' three times to
switch input
to DOM0).
(XEN) Scheduler: Simple EDF Scheduler (sedf)
(XEN) NOW=0x00003659C0E09949
(XEN) CPU[00] now=59759117961863
(XEN) RUNQ rq ff18cf80   n: ffbf7e04, p: ffbf7e04
(XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0 
c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%)
(XEN)
(XEN) WAITQ rq ff18cf88   n: ff1bfe04, p: ff1bfe04
(XEN)   0: 1.0 has=F p=100000000 sl=0 ddl=59759240002401 w=0 
c=2936327068131 sc=131072 xtr(yes)=2936327068131 ew=1 (100%)
(XEN)
(XEN) EXTRAQ (penalty) rq ff18cf90   n: ffbf7e0c, p: ffbf7e0c
(XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0 
c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%)
(XEN)
(XEN) EXTRAQ (utilization) rq ff18cf98   n: ffbf7e14, p: ff1bfe14
(XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0 
c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%)
(XEN)   1: 1.0 has=F p=100000000 sl=0 ddl=59759240002401 w=0 
c=2936327068131 sc=131072 xtr(yes)=2936327068131 ew=1 (100%)
(XEN)
(XEN) not on Q
[


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2005-Jul-14 17:39 UTC

head link

Re: [Xen-devel] More on sedf scheduler

Thanks,
  what is your dom0 tight loop doing? Heavy I/O?
It looks like dom0 does a lot of I/O, which immediatelly unblocks and then 
gets a higher priority in the L1 extraq, without anybounds on executiontime, 
i.e. slice length. Can you confirm this?

Stephan> Stephan Diestelhorst wrote:
> >Rob,
> >  That''s interesting! Could you provide me with a bit more
details?
> >For example your timing parameters for dom0 and domU...
> >Could you just Ctrl-A three times and then show the schedulker
runqueues
> > with ''r''?
>
> (XEN) *** Serial input -> Xen (type ''CTRL-a'' three
times to switch input
> to DOM0).
> (XEN) Scheduler: Simple EDF Scheduler (sedf)
> (XEN) NOW=0x00003659C0E09949
> (XEN) CPU[00] now=59759117961863
> (XEN) RUNQ rq ff18cf80   n: ffbf7e04, p: ffbf7e04
> (XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0
> c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%)
> (XEN)
> (XEN) WAITQ rq ff18cf88   n: ff1bfe04, p: ff1bfe04
> (XEN)   0: 1.0 has=F p=100000000 sl=0 ddl=59759240002401 w=0
> c=2936327068131 sc=131072 xtr(yes)=2936327068131 ew=1 (100%)
> (XEN)
> (XEN) EXTRAQ (penalty) rq ff18cf90   n: ffbf7e0c, p: ffbf7e0c
> (XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0
> c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%)
> (XEN)
> (XEN) EXTRAQ (utilization) rq ff18cf98   n: ffbf7e14, p: ff1bfe14
> (XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=59759130001778 w=0
> c=1818020375509 sc=-181402283 xtr(yes)=507177757997 ew=0 (27%)
> (XEN)   1: 1.0 has=F p=100000000 sl=0 ddl=59759240002401 w=0
> c=2936327068131 sc=131072 xtr(yes)=2936327068131 ew=1 (100%)
> (XEN)
> (XEN) not on Q
> [
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rob Gardner

2005-Jul-14 18:16 UTC

head link

Re: [Xen-devel] More on sedf scheduler

Stephan Diestelhorst wrote:
>Thanks,
>  what is your dom0 tight loop doing? Heavy I/O?
>It looks like dom0 does a lot of I/O, which immediatelly unblocks and then 
>gets a higher priority in the L1 extraq, without anybounds on executiontime,
>i.e. slice length. Can you confirm this?
>
>Stephan
>  
>
The loop was just while(1); There was also a python script running that 
was reading from xentrace. So no heavy i/o, at least no disk/network 
i/o. But that doesn''t matter anyway. Here is another dump where dom0 is
doing absolutely nothing besides while (1); And dom1 is completely 
comatose during this time.

Rob


(XEN) Scheduler: Simple EDF Scheduler (sedf)
(XEN) NOW=0x0000004AC8DEB496
(XEN) CPU[00] now=321199566451
(XEN) RUNQ rq ff18cf80   n: ff18cf80, p: ff18cf80
(XEN)
(XEN) WAITQ rq ff18cf88   n: ffbf7e04, p: ff1bfe04
(XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 
c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%)
(XEN)   1: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0 c=20938650825 
sc=131072 xtr(yes)=20938650825 ew=1 (100%)
(XEN)
(XEN) EXTRAQ (penalty) rq ff18cf90   n: ffbf7e0c, p: ffbf7e0c
(XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 
c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%)
(XEN)
(XEN) EXTRAQ (utilization) rq ff18cf98   n: ff1bfe14, p: ffbf7e14
(XEN)   0: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0 c=20938650825 
sc=131072 xtr(yes)=20938650825 ew=1 (100%)
(XEN)   1: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0 
c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%)
(XEN)
(XEN) not on Q



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2005-Jul-14 18:52 UTC

head link

Re: [Xen-devel] More on sedf scheduler

Just another question: What kind of hardware are you running on?
Might this be related in any way to the strange IA64 bugs?
> Stephan Diestelhorst wrote:
>
>> Thanks,
>>  what is your dom0 tight loop doing? Heavy I/O?
>> It looks like dom0 does a lot of I/O, which immediatelly unblocks and
>> then gets a higher priority in the L1 extraq, without anybounds on
>> executiontime, i.e. slice length. Can you confirm this?
>>
>> Stephan
>>  
>>
>
> The loop was just while(1); There was also a python script running
> that was reading from xentrace. So no heavy i/o, at least no
> disk/network i/o. But that doesn''t matter anyway. Here is another
dump
> where dom0 is doing absolutely nothing besides while (1); And dom1 is
> completely comatose during this time.
>
> Rob
>
>
> (XEN) Scheduler: Simple EDF Scheduler (sedf)
> (XEN) NOW=0x0000004AC8DEB496
> (XEN) CPU[00] now=321199566451
> (XEN) RUNQ rq ff18cf80   n: ff18cf80, p: ff18cf80
> (XEN)
> (XEN) WAITQ rq ff18cf88   n: ffbf7e04, p: ff1bfe04
> (XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0
> c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%)
> (XEN)   1: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0
> c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%)
> (XEN)
> (XEN) EXTRAQ (penalty) rq ff18cf90   n: ffbf7e0c, p: ffbf7e0c
> (XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0
> c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%)
> (XEN)
> (XEN) EXTRAQ (utilization) rq ff18cf98   n: ff1bfe14, p: ffbf7e14
> (XEN)   0: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0
> c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%)
> (XEN)   1: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0
> c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%)
> (XEN)
> (XEN) not on Q
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2005-Jul-14 19:09 UTC

head link

Re: [Xen-devel] More on sedf scheduler

>Stephan Diestelhorst wrote:
>
>> Thanks,
>>  what is your dom0 tight loop doing? Heavy I/O?
>> It looks like dom0 does a lot of I/O, which immediatelly unblocks and
>> then gets a higher priority in the L1 extraq, without anybounds on
>> executiontime, i.e. slice length. Can you confirm this?
>>
>> Stephan
>>  
>>
>
> The loop was just while(1); There was also a python script running
> that was reading from xentrace. So no heavy i/o, at least no
> disk/network i/o. But that doesn''t matter anyway. Here is another
dump
> where dom0 is doing absolutely nothing besides while (1); And dom1 is
> completely comatose during this time.
>Thanks for the dump! It looks as something is seriously broken, as the
scheduler thinks that dom0 needs compensation for block loss... Strange!
I''ll try to reproduce and fix the bug, probably tonight, but def.
tomorrow!

Thanks,
  Stephan
>  
>
>
> (XEN) Scheduler: Simple EDF Scheduler (sedf)
> (XEN) NOW=0x0000004AC8DEB496
> (XEN) CPU[00] now=321199566451
> (XEN) RUNQ rq ff18cf80   n: ff18cf80, p: ff18cf80
> (XEN)
> (XEN) WAITQ rq ff18cf88   n: ffbf7e04, p: ff1bfe04
> (XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0
> c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%)
> (XEN)   1: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0
> c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%)
> (XEN)
> (XEN) EXTRAQ (penalty) rq ff18cf90   n: ffbf7e0c, p: ffbf7e0c
> (XEN)   0: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0
> c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%)
> (XEN)
> (XEN) EXTRAQ (utilization) rq ff18cf98   n: ff1bfe14, p: ffbf7e14
> (XEN)   0: 1.0 has=F p=100000000 sl=0 ddl=321370002222 w=0
> c=20938650825 sc=131072 xtr(yes)=20938650825 ew=1 (100%)
> (XEN)   1: 0.0 has=T p=20000000 sl=15000000 ddl=321220001780 w=0
> c=34111278898 sc=451020117 xtr(yes)=11730339611 ew=0 (34%)
> (XEN)
> (XEN) not on Q
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rob Gardner

2005-Jul-14 19:31 UTC

head link

Re: [Xen-devel] More on sedf scheduler

Stephan Diestelhorst wrote:
>Just another question: What kind of hardware are you running on?
>  
>
Vanilla x86. 2.6ghz P4 in a consumer grade HP box.
>Might this be related in any way to the strange IA64 bugs?
>
>  
>
Well, Dan Magenheimer sits at the desk right next to mine, so it''s
quite
possible it''s something contagious. ;)

Rob



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Li, Xin B

2005-Jul-15 08:07 UTC

head link

RE: [Xen-devel] Dom0 crashing on x86_64

This bug is caused by the size of netif_tx_request_t/netif_rx_response_t
on x86_64, which is using 8 byte alignment.  When PACKET is removed by
changeset 5648, their sizes are changed from 12 to 16, then
netif_tx_interface_t/netif_rx_interface_t will overflow a page.
We have 2 ways to resolve this bug:

1. add back __attribute__((packed)) to the definition of the two
structures.

2. add #pragma pack(4) to netif.h as:

diff -r 1d026c7023d2 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h     Thu Jul 14 23:48:06 2005
+++ b/xen/include/public/io/netif.h     Fri Jul 15 19:17:52 2005
@@ -8,6 +8,10 @@

 #ifndef __XEN_PUBLIC_IO_NETIF_H__
 #define __XEN_PUBLIC_IO_NETIF_H__
+
+#ifdef __x86_64__
+#pragma pack(4)
+#endif

 typedef struct netif_tx_request {
     memory_t addr;   /* Machine address of packet.  */

3. define a smaller value on x86_64 for
NETIF_TX_RING_SIZE/NETIF_RX_RING_SIZE, 128?

Keir, which one do you perfer?

-Xin

Li, Xin B wrote:> David F Barrera wrote:
>> This is the trace that I see on the serial console:
>> 
>> Unable to handle kernel NULL pointer dereference at
>> 0000000000000c20 RIP:
>> <ffffffff80118aba>{do_page_fault+426} 
>> PGD d313067 PUD d312067 PMD 0
>> Oops: 0000 [1]
>> CPU 0
>> Modules linked in: thermal processor fan button battery
>> ac Pid: 0, comm: swapper Not tainted 2.6.12-xen0
>> RIP: e030:[<ffffffff80118aba>]
>> <ffffffff80118aba>{do_page_fault+426} RSP:
>> e02b:ffffffff8054ba00  EFLAGS: 00010202
>> RAX: 00000000013e4067 RBX: 0000000000000c20 RCX:
>> 0000000000000000
>> RDX: 0000000000000067 RSI: 00000000093e4067 RDI:
>> ffff800000000000
>> RBP: 0000000000000c20 R08: 00000000000000ff R09:
>> 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000206 R12:
>> 0000000000000000
>> R13: ffffc20000036000 R14: 0000000000000000 R15:
>> ffffffff8054bb00
>> FS:  0000000000000000(0000) GS:ffffffff80537b80(0000)
>> knlGS:0000000000000000 CS:  e033 DS: 0000 ES: 0000
>> Process swapper (pid: 0, threadinfo ffffffff8054a000,
>> task ffffffff80435680) Stack: ffff88000f414000 fff
> 
> It is caused by checkin of changeset 5648: Remove non-ISO
> attributes from public headers.(
>
http://xenbits.xensource.com/xen-unstable.hg?cmd=changeset;node=2b6c1a80> 98078f7e53de7cf72227fddf01f0b2b6 ).  Actually, on x86_64
> xenlinux, only the change to
> xen/include/public/io/netif.h caused this issue, other
> part of this changeset are OK.  After reverting the
> changes to this file, this issue is gone, but we need a
> clean patch to this issue. Here we also found that, on
> i386 xenlinux, mmap001 of LTP will crash domU, I''m
> doubting it is also introduced by this changeset.  
> 
> -Xin
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Jul-15 08:20 UTC

head link

Re: [Xen-devel] Dom0 crashing on x86_64

On 15 Jul 2005, at 09:07, Li, Xin B wrote:
> 3. define a smaller value on x86_64 for
> NETIF_TX_RING_SIZE/NETIF_RX_RING_SIZE, 128?
>
> Keir, which one do you perfer?
This one is fine for now. I''ll add that in with a comment that it can 
be removed when we switch to grant tables for netfront/netback. That 
will get rid of the 8-byte memory_t out of the structures and relax 
natural alignment restrictions.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Li, Xin B

2005-Jul-15 08:26 UTC

head link

RE: [Xen-devel] Dom0 crashing on x86_64

This patch fixes x86_64 domU network crashes dom0.

This bug is caused by the size of netif_tx_request_t/netif_rx_response_t
on x86_64, which is using 8 byte alignment.  When PACKET is removed by
changeset 5648, their sizes are changed from 12 to 16, then
netif_tx_interface_t/netif_rx_interface_t will overflow a page.

Signed-off-by: Xin Li <xin.b.li@intel.com>
Signed-off-by: Xiaofeng Ling <xiaofeng.lingi@intel.com>

-Xin

diff -r 1d026c7023d2 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h     Thu Jul 14 23:48:06 2005
+++ b/xen/include/public/io/netif.h     Fri Jul 15 19:55:23 2005
@@ -21,11 +21,11 @@
     s8       status;
 } netif_tx_response_t;

-typedef struct {
+typedef struct netif_rx_request {
     u16       id;    /* Echoed in response message.        */
 } netif_rx_request_t;

-typedef struct {
+typedef struct netif_rx_response {
     memory_t addr;   /* Machine address of packet.              */
     u16      csum_valid:1; /* Protocol checksum is validated?       */
     u16      id:15;
@@ -46,8 +46,13 @@
 #define MASK_NETIF_RX_IDX(_i) ((_i)&(NETIF_RX_RING_SIZE-1))
 #define MASK_NETIF_TX_IDX(_i) ((_i)&(NETIF_TX_RING_SIZE-1))

+#ifdef __x86_64__
+#define NETIF_TX_RING_SIZE 128
+#define NETIF_RX_RING_SIZE 128
+#else
 #define NETIF_TX_RING_SIZE 256
 #define NETIF_RX_RING_SIZE 256
+#endif

 /* This structure must fit in a memory page. */
 typedef struct netif_tx_interface {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jerone Young

2005-Jul-15 19:11 UTC

head link

RE: [Xen-devel] Dom0 crashing on x86_64

Awsome! Patch works...can now do networking in x86-64 domU without
crashing Xen.

On Fri, 2005-07-15 at 16:26 +0800, Li, Xin B wrote:> This patch fixes x86_64 domU network crashes dom0.
> 
> This bug is caused by the size of netif_tx_request_t/netif_rx_response_t
> on x86_64, which is using 8 byte alignment.  When PACKET is removed by
> changeset 5648, their sizes are changed from 12 to 16, then
> netif_tx_interface_t/netif_rx_interface_t will overflow a page.
> 
> Signed-off-by: Xin Li <xin.b.li@intel.com>
> Signed-off-by: Xiaofeng Ling <xiaofeng.lingi@intel.com>
> 
> -Xin
> 
> diff -r 1d026c7023d2 xen/include/public/io/netif.h
> --- a/xen/include/public/io/netif.h     Thu Jul 14 23:48:06 2005
> +++ b/xen/include/public/io/netif.h     Fri Jul 15 19:55:23 2005
> @@ -21,11 +21,11 @@
>      s8       status;
>  } netif_tx_response_t;
> 
> -typedef struct {
> +typedef struct netif_rx_request {
>      u16       id;    /* Echoed in response message.        */
>  } netif_rx_request_t;
> 
> -typedef struct {
> +typedef struct netif_rx_response {
>      memory_t addr;   /* Machine address of packet.              */
>      u16      csum_valid:1; /* Protocol checksum is validated?       */
>      u16      id:15;
> @@ -46,8 +46,13 @@
>  #define MASK_NETIF_RX_IDX(_i) ((_i)&(NETIF_RX_RING_SIZE-1))
>  #define MASK_NETIF_TX_IDX(_i) ((_i)&(NETIF_TX_RING_SIZE-1))
> 
> +#ifdef __x86_64__
> +#define NETIF_TX_RING_SIZE 128
> +#define NETIF_RX_RING_SIZE 128
> +#else
>  #define NETIF_TX_RING_SIZE 256
>  #define NETIF_RX_RING_SIZE 256
> +#endif
> 
>  /* This structure must fit in a memory page. */
>  typedef struct netif_tx_interface {
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> -- 
Jerone Young
IBM Linux Technology Center
jyoung5@us.ibm.com
512-838-1157 (T/L: 678-1157)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2005-Jul-15 23:57 UTC

head link

Re: [Xen-devel] More on sedf scheduler

> >Might this be related in any way to the strange IA64 bugs?
>
> Well, Dan Magenheimer sits at the desk right next to mine, so it''s
quite
> possible it''s something contagious. ;)
>Heh. You think Xen''s code is virulent? I wouldn''t go that far,
although it has
infected quite a number of people. :-)

Seriously: Unfortunately I can''t reproduce the bug properly ATM,
because the
recent builds don''t work properly for me... But as far as I can see
(i.e.
just dom0 testing) something seems to be wrong, but I guess it is something 
outside the core-scheduler, as the problem doesn''t happen with my older
(bk
repo) version, and from that to the current code nothing has changed (at 
least not inside sedf). But that said, I think I should introduce measures to 
make the scheduler invulnerable against that kind of external fault! 

Stephan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

xuehai zhang

2005-Jul-18 18:27 UTC

head link

Re: [Xen-devel] More on sedf scheduler

Hi Stephan,
Could you please tell me if SEDF is available in -testing tree or it is only
available in -unstable
tree so far?
Thanks.
Xuehai

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2005-Jul-22 12:22 UTC

head link

Re: [Xen-devel] More on sedf scheduler

It is not available in the testing tree, but I can create a patch, if needed!

Stephan> Hi Stephan,
> Could you please tell me if SEDF is available in -testing tree or it is
> only available in -unstable tree so far?
> Thanks.
> Xuehai
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2005-Jul-22 17:20 UTC

head link

greedy dom0 in sedf fixed! (Re: [Xen-devel] More on sedf scheduler)

Am Donnerstag, 14. Juli 2005 19:16 schrieb Rob Gardner:> Stephan Diestelhorst wrote:
> >Thanks,
> >  what is your dom0 tight loop doing? Heavy I/O?
> >It looks like dom0 does a lot of I/O, which immediatelly unblocks and
then
> >gets a higher priority in the L1 extraq, without anybounds on
> > executiontime, i.e. slice length. Can you confirm this?
> >
> >Stephan
>
> The loop was just while(1); There was also a python script running that
> was reading from xentrace. So no heavy i/o, at least no disk/network
> i/o. But that doesn''t matter anyway. Here is another dump where
dom0 is
> doing absolutely nothing besides while (1); And dom1 is completely
> comatose during this time.
>
> RobShould be fixed in the current repository!

Stephan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rob Gardner

2005-Jul-22 17:45 UTC

head link

Re: greedy dom0 in sedf fixed! (Re: [Xen-devel] More on sedf scheduler)

Stephan Diestelhorst wrote:
>
>>he loop was just while(1); There was also a python script running that
>>was reading from xentrace. So no heavy i/o, at least no disk/network
>>i/o. But that doesn''t matter anyway. Here is another dump where
dom0 is
>>doing absolutely nothing besides while (1); And dom1 is completely
>>comatose during this time.
>>
>>Rob
>>    
>>
>Should be fixed in the current repository!
>
>Stephan
>  
>

Thanks, will give it a try soon.

Rob



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Maybe Matching Threads

Search for more apparently analagous threads

Xen devel - Jul 2005 - Dom0 crashing on x86_64

[Xen-devel] Dom0 crashing on x86_64

Re: [Xen-devel] Dom0 crashing on x86_64

Re: [Xen-devel] Dom0 crashing on x86_64

RE: [Xen-devel] Dom0 crashing on x86_64

[Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

RE: [Xen-devel] Dom0 crashing on x86_64

Re: [Xen-devel] Dom0 crashing on x86_64

RE: [Xen-devel] Dom0 crashing on x86_64

RE: [Xen-devel] Dom0 crashing on x86_64

Re: [Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

Re: [Xen-devel] More on sedf scheduler

greedy dom0 in sedf fixed! (Re: [Xen-devel] More on sedf scheduler)

Re: greedy dom0 in sedf fixed! (Re: [Xen-devel] More on sedf scheduler)

Maybe Matching Threads