thr3ads.net - Xen users - [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o,

If this information is useful, please help other people find it:
Share via:

Tom Brown

2005-Dec-08 08:45 UTC

[Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram

This seems to be a repeatable crash. just do some disk intensive stuff in
domU and then type "sync" :(

The box is a dual opteron 720, with 8 gig of ram, one domU and (duh) one
dom0, both with aprox 500 meg of RAM allocated.

The box has remote power control, serial console, and I can provide
developer access if it helps. Kernel was compiled locally (on centos 4.2
amd64 domU and dom0)

Box seems stable under raw linux 2.6.14.2, but does generate occasionaly
MCE messages pointing at the northbridge/GART... I spent a day researching
that, and didn''t come to any conclusion other than it could be a bogus
report specific to amd64 systems with > 4gig ram. there is an IBM page to
that effect for an older RHE system... box has a 3ware controller and SATA
drives.

Anyhow, any help would be appreciated. I''m probably going to try to see
if
the PAE stuff is more stable... but obviously not tonight.

In theory this is a 3.0.0 box, but might be 3.0-testing...

This is pretty greek to me, but given that it seems reproducable, I should
be able to produce any other info required...?

Or should I be dumping this into bugzilla?

-Tom
>From root@localhost.localdomain Thu Dec  8 00:33:19 2005Date: Thu, 8 Dec 2005 00:21:56 -0800
From: root <root@localhost.localdomain>
To: tbrown@baremetal.com
Subject: oops.2.ksymoops
ksymoops 2.4.11 on x86_64 2.6.12.6-xen0.  Options used
     -V (default)
     -K (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.6.12.6-xen0/ (default)
     -m /boot/System.map-2.6.12.6-xen0 (specified)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel paging request at ffff88001e61b000 RIP:
<ffffffff80220bfb>{memcpy+11}
Oops: 0003 [1]
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0
RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11}
Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64
RSP: e02b:ffffffff80525d50  EFLAGS: 00010246
RAX: ffff88001e61b000 RBX: 000000000000500c RCX: 0000000000000200
RDX: 0000000000000000 RSI: ffff8800040a2000 RDI: ffff88001e61b000
RBP: 0000000000000002 R08: 0000000000000002 R09: ffff8800040a2000
R10: ffff8800040a2000 R11: 0000000000000246 R12: 0000000000000000
R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7
FS:  00002aaaaaac9360(0000) GS:ffffffff80511a00(0000) knlGS:0000000055572460
CS:  e033 DS: 0000 ES: 0000
Stack: ffffffff8011a094 ffff8800016a55e8 0000000000000000 ffff880005ac42d8
       ffffffff8011a2cd ffff8800016a55e8 0000000000000000 0000000100000000
       ffff8800147221c0 0000000000000001
Call Trace:<ffffffff8011a094>{__sync_single+100}
<ffffffff8011a2cd>{unmap_single+109}
       <ffffffff8011aa40>{swiotlb_unmap_sg+192}
<ffffffff802eb517>{tw_interrupt+1799}
       <ffffffff8014cd9d>{handle_IRQ_event+61}
<ffffffff8014ce87>{__do_IRQ+167}
       <ffffffff80114dc4>{do_IRQ+52}
<ffffffff8010d958>{evtchn_do_upcall+136}
       <ffffffff80111e7d>{do_hypervisor_callback+17}
<ffffffff8010f793>{xen_idle+83}
       <ffffffff8010f793>{xen_idle+83}
<ffffffff8010f7cf>{cpu_idle+31}
       <ffffffff8052671f>{start_kernel+495}
<ffffffff80526193>{_sinittext+403}
Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90

>>RIP; ffffffff80220bfb <memcpy+b/b0>   <====
>>RAX; ffff88001e61b000
<__start___xen_guest+ffff88001e612144/ffffffff800f7144>
>>RSI; ffff8800040a2000
<__start___xen_guest+ffff880004099144/ffffffff800f7144>
>>RDI; ffff88001e61b000
<__start___xen_guest+ffff88001e612144/ffffffff800f7144>
>>R09; ffff8800040a2000
<__start___xen_guest+ffff880004099144/ffffffff800f7144>
>>R10; ffff8800040a2000
<__start___xen_guest+ffff880004099144/ffffffff800f7144>
>>R13; ffff800000000000
<__start___xen_guest+ffff7fffffff7144/ffffffff800f7144>
>>R14; 7fffffffffffffff
<__start___xen_guest+7fffffffffff7143/ffffffff800f7144>
>>R15; 6db6db6db6db6db7
<__start___xen_guest+6db6db6db6dadefb/ffffffff800f7144>
Trace; ffffffff8011a094 <__sync_single+64/70>
Trace; ffffffff8011aa40 <swiotlb_unmap_sg+c0/e0>
Trace; ffffffff8014cd9d <handle_IRQ_event+3d/80>
Trace; ffffffff80114dc4 <do_IRQ+34/50>
Trace; ffffffff80111e7d <do_hypervisor_callback+11/18>
Trace; ffffffff8010f793 <xen_idle+53/70>
Trace; ffffffff8052671f <start_kernel+1ef/200>

Code;  ffffffff80220bfb <memcpy+b/b0>
0000000000000000 <_RIP>:
Code;  ffffffff80220bfb <memcpy+b/b0>   <====   0:   f3 48 a5          
repz movsq %ds:(%rsi),%es:(%rdi)   <====Code;  ffffffff80220bfe
<memcpy+e/b0>
   3:   89 d1                     mov    %edx,%ecx
Code;  ffffffff80220c00 <memcpy+10/b0>
   5:   f3 a4                     repz movsb %ds:(%rsi),%es:(%rdi)
Code;  ffffffff80220c02 <memcpy+12/b0>
   7:   c3                        retq
Code;  ffffffff80220c03 <memcpy+13/b0>
   8:   66                        data16
Code;  ffffffff80220c04 <memcpy+14/b0>
   9:   66                        data16
Code;  ffffffff80220c05 <memcpy+15/b0>
   a:   66                        data16
Code;  ffffffff80220c06 <memcpy+16/b0>
   b:   90                        nop
Code;  ffffffff80220c07 <memcpy+17/b0>
   c:   66                        data16
Code;  ffffffff80220c08 <memcpy+18/b0>
   d:   66                        data16
Code;  ffffffff80220c09 <memcpy+19/b0>
   e:   66                        data16
Code;  ffffffff80220c0a <memcpy+1a/b0>
   f:   90                        nop
Code;  ffffffff80220c0b <memcpy+1b/b0>
  10:   66                        data16
Code;  ffffffff80220c0c <memcpy+1c/b0>
  11:   66                        data16
Code;  ffffffff80220c0d <memcpy+1d/b0>
  12:   66                        data16
Code;  ffffffff80220c0e <memcpy+1e/b0>
  13:   90                        nop

CR2: ffff88001e61b000
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!


>From root@localhost.localdomain Thu Dec  8 00:43:16 2005Date: Thu, 8 Dec 2005 00:40:51 -0800
From: root <root@localhost.localdomain>
To: tbrown@baremetal.com
Subject: tmpx3.ksymoops

ksymoops 2.4.11 on x86_64 2.6.12.6-xen0.  Options used
     -V (default)
     -K (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.6.12.6-xen0/ (default)
     -m /usr/src/linux/System.map (default)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel paging request at ffff88001e527000 RIP:
<ffffffff80220bfb>{memcpy+11}
Oops: 0003 [1]
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0
RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11}
Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64
RSP: e02b:ffffffff80525d50  EFLAGS: 00010246
RAX: ffff88001e527000 RBX: 0000000000003968 RCX: 0000000000000200
RDX: 0000000000000000 RSI: ffff880003550000 RDI: ffff88001e527000
RBP: 0000000000000002 R08: 0000000000000002 R09: ffff880003550000
R10: ffff880003550000 R11: 0000000000000246 R12: 0000000000000000
R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7
FS:  00002aaaabe8f280(0000) GS:ffffffff80511a00(0000) knlGS:0000000055572460
CS:  e033 DS: 0000 ES: 0000
Stack: ffffffff8011a094 ffff8800016a2088 ffffffff00000000 ffff880005ac42d8
       ffffffff8011a2cd ffff8800016a2088 ffffffff00000000 0000000100000000
       ffff8800078caf20 0000000000000001
Call Trace:<ffffffff8011a094>{__sync_single+100}
<ffffffff8011a2cd>{unmap_single+109}
       <ffffffff8011aa40>{swiotlb_unmap_sg+192}
<ffffffff802eb517>{tw_interrupt+1799}
       <ffffffff8014cd9d>{handle_IRQ_event+61}
<ffffffff8014ce87>{__do_IRQ+167}
       <ffffffff80114dc4>{do_IRQ+52}
<ffffffff8010d958>{evtchn_do_upcall+136}
       <ffffffff80111e7d>{do_hypervisor_callback+17}
<ffffffff8010f793>{xen_idle+83}
       <ffffffff8010f793>{xen_idle+83}
<ffffffff8010f7cf>{cpu_idle+31}
       <ffffffff8052671f>{start_kernel+495}
<ffffffff80526193>{_sinittext+403}
Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90

>>RIP; ffffffff80220bfb <bitmap_parse+bb/210>   <====
>>RAX; ffff88001e527000
<phys_startup_64+ffff88001e426f00/ffffffff7fffff00>
>>RSI; ffff880003550000
<phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
>>RDI; ffff88001e527000
<phys_startup_64+ffff88001e426f00/ffffffff7fffff00>
>>R09; ffff880003550000
<phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
>>R10; ffff880003550000
<phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
>>R13; ffff800000000000
<phys_startup_64+ffff7fffffefff00/ffffffff7fffff00>
>>R14; 7fffffffffffffff
<phys_startup_64+7fffffffffeffeff/ffffffff7fffff00>
>>R15; 6db6db6db6db6db7
<phys_startup_64+6db6db6db6cb6cb7/ffffffff7fffff00>
Trace; ffffffff8011a094 <touch_nmi_watchdog+4/30>
Trace; ffffffff8011aa40 <pin_2_irq+60/130>
Trace; ffffffff8014cd9d <kfifo_init+8d/90>
Trace; ffffffff80114dc4 <pda_init+94/110>
Trace; ffffffff80111e7d <handle_lost_ticks+13d/170>
Trace; ffffffff8010f793 <oops_begin+23/70>
Trace; ffffffff8052671f <__log_buf+e15f/20000>

Code;  ffffffff80220bfb <bitmap_parse+bb/210>
0000000000000000 <_RIP>:
Code;  ffffffff80220bfb <bitmap_parse+bb/210>   <====   0:   f3 48 a5  
repz movsq %ds:(%rsi),%es:(%rdi)   <====Code;  ffffffff80220bfe
<bitmap_parse+be/210>
   3:   89 d1                     mov    %edx,%ecx
Code;  ffffffff80220c00 <bitmap_parse+c0/210>
   5:   f3 a4                     repz movsb %ds:(%rsi),%es:(%rdi)
Code;  ffffffff80220c02 <bitmap_parse+c2/210>
   7:   c3                        retq
Code;  ffffffff80220c03 <bitmap_parse+c3/210>
   8:   66                        data16
Code;  ffffffff80220c04 <bitmap_parse+c4/210>
   9:   66                        data16
Code;  ffffffff80220c05 <bitmap_parse+c5/210>
   a:   66                        data16
Code;  ffffffff80220c06 <bitmap_parse+c6/210>
   b:   90                        nop
Code;  ffffffff80220c07 <bitmap_parse+c7/210>
   c:   66                        data16
Code;  ffffffff80220c08 <bitmap_parse+c8/210>
   d:   66                        data16
Code;  ffffffff80220c09 <bitmap_parse+c9/210>
   e:   66                        data16
Code;  ffffffff80220c0a <bitmap_parse+ca/210>
   f:   90                        nop
Code;  ffffffff80220c0b <bitmap_parse+cb/210>
  10:   66                        data16
Code;  ffffffff80220c0c <bitmap_parse+cc/210>
  11:   66                        data16
Code;  ffffffff80220c0d <bitmap_parse+cd/210>
  12:   66                        data16
Code;  ffffffff80220c0e <bitmap_parse+ce/210>
  13:   90                        nop

CR2: ffff88001e527000
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Ian Pratt

2005-Dec-08 12:03 UTC

head link

RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram

Looking at thhe oops message, this is with a 3ware card, right?
We''ve had at least one other report of them causing problems on systems
with >4GB enabled (or maybe it was you?)

Ian
> This seems to be a repeatable crash. just do some disk 
> intensive stuff in domU and then type "sync" :(
> 
> The box is a dual opteron 720, with 8 gig of ram, one domU 
> and (duh) one dom0, both with aprox 500 meg of RAM allocated.
> 
> The box has remote power control, serial console, and I can 
> provide developer access if it helps. Kernel was compiled 
> locally (on centos 4.2
> amd64 domU and dom0)
> 
> Box seems stable under raw linux 2.6.14.2, but does generate 
> occasionaly MCE messages pointing at the northbridge/GART... 
> I spent a day researching that, and didn''t come to any 
> conclusion other than it could be a bogus report specific to 
> amd64 systems with > 4gig ram. there is an IBM page to that 
> effect for an older RHE system... box has a 3ware controller 
> and SATA drives.
> 
> Anyhow, any help would be appreciated. I''m probably going to 
> try to see if the PAE stuff is more stable... but obviously 
> not tonight.
> 
> In theory this is a 3.0.0 box, but might be 3.0-testing...
> 
> This is pretty greek to me, but given that it seems 
> reproducable, I should be able to produce any other info required...?
> 
> Or should I be dumping this into bugzilla?
> 
> -Tom
> 
> >From root@localhost.localdomain Thu Dec  8 00:33:19 2005
> Date: Thu, 8 Dec 2005 00:21:56 -0800
> From: root <root@localhost.localdomain>
> To: tbrown@baremetal.com
> Subject: oops.2.ksymoops
> ksymoops 2.4.11 on x86_64 2.6.12.6-xen0.  Options used
>      -V (default)
>      -K (specified)
>      -l /proc/modules (default)
>      -o /lib/modules/2.6.12.6-xen0/ (default)
>      -m /boot/System.map-2.6.12.6-xen0 (specified)
> 
> No modules in ksyms, skipping objects
> No ksyms, skipping lsmod
> Unable to handle kernel paging request at ffff88001e61b000 RIP:
> <ffffffff80220bfb>{memcpy+11}
> Oops: 0003 [1]
> CPU 0
> Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0
> RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11} 
> Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64
> RSP: e02b:ffffffff80525d50  EFLAGS: 00010246
> RAX: ffff88001e61b000 RBX: 000000000000500c RCX: 0000000000000200
> RDX: 0000000000000000 RSI: ffff8800040a2000 RDI: ffff88001e61b000
> RBP: 0000000000000002 R08: 0000000000000002 R09: ffff8800040a2000
> R10: ffff8800040a2000 R11: 0000000000000246 R12: 0000000000000000
> R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7
> FS:  00002aaaaaac9360(0000) GS:ffffffff80511a00(0000) 
> knlGS:0000000055572460
> CS:  e033 DS: 0000 ES: 0000
> Stack: ffffffff8011a094 ffff8800016a55e8 0000000000000000 
> ffff880005ac42d8
>        ffffffff8011a2cd ffff8800016a55e8 0000000000000000 
> 0000000100000000
>        ffff8800147221c0 0000000000000001 Call 
> Trace:<ffffffff8011a094>{__sync_single+100} 
> <ffffffff8011a2cd>{unmap_single+109}
>        <ffffffff8011aa40>{swiotlb_unmap_sg+192} 
> <ffffffff802eb517>{tw_interrupt+1799}
>        <ffffffff8014cd9d>{handle_IRQ_event+61} 
> <ffffffff8014ce87>{__do_IRQ+167}
>        <ffffffff80114dc4>{do_IRQ+52} 
> <ffffffff8010d958>{evtchn_do_upcall+136}
>        <ffffffff80111e7d>{do_hypervisor_callback+17} 
> <ffffffff8010f793>{xen_idle+83}
>        <ffffffff8010f793>{xen_idle+83}
<ffffffff8010f7cf>{cpu_idle+31}
>        <ffffffff8052671f>{start_kernel+495} 
> <ffffffff80526193>{_sinittext+403}
> Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90
> 
> 
> >>RIP; ffffffff80220bfb <memcpy+b/b0>   <====> 
> >>RAX; ffff88001e61b000 
> >><__start___xen_guest+ffff88001e612144/ffffffff800f7144>
> >>RSI; ffff8800040a2000 
> >><__start___xen_guest+ffff880004099144/ffffffff800f7144>
> >>RDI; ffff88001e61b000 
> >><__start___xen_guest+ffff88001e612144/ffffffff800f7144>
> >>R09; ffff8800040a2000 
> >><__start___xen_guest+ffff880004099144/ffffffff800f7144>
> >>R10; ffff8800040a2000 
> >><__start___xen_guest+ffff880004099144/ffffffff800f7144>
> >>R13; ffff800000000000 
> >><__start___xen_guest+ffff7fffffff7144/ffffffff800f7144>
> >>R14; 7fffffffffffffff 
> >><__start___xen_guest+7fffffffffff7143/ffffffff800f7144>
> >>R15; 6db6db6db6db6db7 
> >><__start___xen_guest+6db6db6db6dadefb/ffffffff800f7144>
> 
> Trace; ffffffff8011a094 <__sync_single+64/70> Trace; 
> ffffffff8011aa40 <swiotlb_unmap_sg+c0/e0> Trace; 
> ffffffff8014cd9d <handle_IRQ_event+3d/80> Trace; 
> ffffffff80114dc4 <do_IRQ+34/50> Trace; ffffffff80111e7d 
> <do_hypervisor_callback+11/18> Trace; ffffffff8010f793 
> <xen_idle+53/70> Trace; ffffffff8052671f <start_kernel+1ef/200>
> 
> Code;  ffffffff80220bfb <memcpy+b/b0>
> 0000000000000000 <_RIP>:
> Code;  ffffffff80220bfb <memcpy+b/b0>   <====>    0:   f3 48 a5
repz movsq
> %ds:(%rsi),%es:(%rdi)   <====> Code;  ffffffff80220bfe
<memcpy+e/b0>
>    3:   89 d1                     mov    %edx,%ecx
> Code;  ffffffff80220c00 <memcpy+10/b0>
>    5:   f3 a4                     repz movsb %ds:(%rsi),%es:(%rdi)
> Code;  ffffffff80220c02 <memcpy+12/b0>
>    7:   c3                        retq
> Code;  ffffffff80220c03 <memcpy+13/b0>
>    8:   66                        data16
> Code;  ffffffff80220c04 <memcpy+14/b0>
>    9:   66                        data16
> Code;  ffffffff80220c05 <memcpy+15/b0>
>    a:   66                        data16
> Code;  ffffffff80220c06 <memcpy+16/b0>
>    b:   90                        nop
> Code;  ffffffff80220c07 <memcpy+17/b0>
>    c:   66                        data16
> Code;  ffffffff80220c08 <memcpy+18/b0>
>    d:   66                        data16
> Code;  ffffffff80220c09 <memcpy+19/b0>
>    e:   66                        data16
> Code;  ffffffff80220c0a <memcpy+1a/b0>
>    f:   90                        nop
> Code;  ffffffff80220c0b <memcpy+1b/b0>
>   10:   66                        data16
> Code;  ffffffff80220c0c <memcpy+1c/b0>
>   11:   66                        data16
> Code;  ffffffff80220c0d <memcpy+1d/b0>
>   12:   66                        data16
> Code;  ffffffff80220c0e <memcpy+1e/b0>
>   13:   90                        nop
> 
> CR2: ffff88001e61b000
>  <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
> 
> 
> 
> >From root@localhost.localdomain Thu Dec  8 00:43:16 2005
> Date: Thu, 8 Dec 2005 00:40:51 -0800
> From: root <root@localhost.localdomain>
> To: tbrown@baremetal.com
> Subject: tmpx3.ksymoops
> 
> ksymoops 2.4.11 on x86_64 2.6.12.6-xen0.  Options used
>      -V (default)
>      -K (specified)
>      -l /proc/modules (default)
>      -o /lib/modules/2.6.12.6-xen0/ (default)
>      -m /usr/src/linux/System.map (default)
> 
> No modules in ksyms, skipping objects
> No ksyms, skipping lsmod
> Unable to handle kernel paging request at ffff88001e527000 RIP:
> <ffffffff80220bfb>{memcpy+11}
> Oops: 0003 [1]
> CPU 0
> Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0
> RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11} 
> Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64
> RSP: e02b:ffffffff80525d50  EFLAGS: 00010246
> RAX: ffff88001e527000 RBX: 0000000000003968 RCX: 0000000000000200
> RDX: 0000000000000000 RSI: ffff880003550000 RDI: ffff88001e527000
> RBP: 0000000000000002 R08: 0000000000000002 R09: ffff880003550000
> R10: ffff880003550000 R11: 0000000000000246 R12: 0000000000000000
> R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7
> FS:  00002aaaabe8f280(0000) GS:ffffffff80511a00(0000) 
> knlGS:0000000055572460
> CS:  e033 DS: 0000 ES: 0000
> Stack: ffffffff8011a094 ffff8800016a2088 ffffffff00000000 
> ffff880005ac42d8
>        ffffffff8011a2cd ffff8800016a2088 ffffffff00000000 
> 0000000100000000
>        ffff8800078caf20 0000000000000001 Call 
> Trace:<ffffffff8011a094>{__sync_single+100}
> <ffffffff8011a2cd>{unmap_single+109}
>        <ffffffff8011aa40>{swiotlb_unmap_sg+192}
> <ffffffff802eb517>{tw_interrupt+1799}
>        <ffffffff8014cd9d>{handle_IRQ_event+61} 
> <ffffffff8014ce87>{__do_IRQ+167}
>        <ffffffff80114dc4>{do_IRQ+52} 
> <ffffffff8010d958>{evtchn_do_upcall+136}
>        <ffffffff80111e7d>{do_hypervisor_callback+17}
> <ffffffff8010f793>{xen_idle+83}
>        <ffffffff8010f793>{xen_idle+83}
<ffffffff8010f7cf>{cpu_idle+31}
>        <ffffffff8052671f>{start_kernel+495} 
> <ffffffff80526193>{_sinittext+403}
> Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90
> 
> 
> >>RIP; ffffffff80220bfb <bitmap_parse+bb/210>   <====> 
> >>RAX; ffff88001e527000 
> >><phys_startup_64+ffff88001e426f00/ffffffff7fffff00>
> >>RSI; ffff880003550000 
> >><phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
> >>RDI; ffff88001e527000 
> >><phys_startup_64+ffff88001e426f00/ffffffff7fffff00>
> >>R09; ffff880003550000 
> >><phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
> >>R10; ffff880003550000 
> >><phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
> >>R13; ffff800000000000 
> >><phys_startup_64+ffff7fffffefff00/ffffffff7fffff00>
> >>R14; 7fffffffffffffff 
> >><phys_startup_64+7fffffffffeffeff/ffffffff7fffff00>
> >>R15; 6db6db6db6db6db7 
> >><phys_startup_64+6db6db6db6cb6cb7/ffffffff7fffff00>
> 
> Trace; ffffffff8011a094 <touch_nmi_watchdog+4/30> Trace; 
> ffffffff8011aa40 <pin_2_irq+60/130> Trace; ffffffff8014cd9d 
> <kfifo_init+8d/90> Trace; ffffffff80114dc4 <pda_init+94/110> 
> Trace; ffffffff80111e7d <handle_lost_ticks+13d/170> Trace; 
> ffffffff8010f793 <oops_begin+23/70> Trace; ffffffff8052671f 
> <__log_buf+e15f/20000>
> 
> Code;  ffffffff80220bfb <bitmap_parse+bb/210> 0000000000000000
<_RIP>:
> Code;  ffffffff80220bfb <bitmap_parse+bb/210>   <====>    0:  
f3 48 a5                  repz movsq
> %ds:(%rsi),%es:(%rdi)   <====> Code;  ffffffff80220bfe
<bitmap_parse+be/210>
>    3:   89 d1                     mov    %edx,%ecx
> Code;  ffffffff80220c00 <bitmap_parse+c0/210>
>    5:   f3 a4                     repz movsb %ds:(%rsi),%es:(%rdi)
> Code;  ffffffff80220c02 <bitmap_parse+c2/210>
>    7:   c3                        retq
> Code;  ffffffff80220c03 <bitmap_parse+c3/210>
>    8:   66                        data16
> Code;  ffffffff80220c04 <bitmap_parse+c4/210>
>    9:   66                        data16
> Code;  ffffffff80220c05 <bitmap_parse+c5/210>
>    a:   66                        data16
> Code;  ffffffff80220c06 <bitmap_parse+c6/210>
>    b:   90                        nop
> Code;  ffffffff80220c07 <bitmap_parse+c7/210>
>    c:   66                        data16
> Code;  ffffffff80220c08 <bitmap_parse+c8/210>
>    d:   66                        data16
> Code;  ffffffff80220c09 <bitmap_parse+c9/210>
>    e:   66                        data16
> Code;  ffffffff80220c0a <bitmap_parse+ca/210>
>    f:   90                        nop
> Code;  ffffffff80220c0b <bitmap_parse+cb/210>
>   10:   66                        data16
> Code;  ffffffff80220c0c <bitmap_parse+cc/210>
>   11:   66                        data16
> Code;  ffffffff80220c0d <bitmap_parse+cd/210>
>   12:   66                        data16
> Code;  ffffffff80220c0e <bitmap_parse+ce/210>
>   13:   90                        nop
> 
> CR2: ffff88001e527000
>  <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
> 
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Tom Brown

2005-Dec-08 17:23 UTC

head link

RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o,> 4 gig ram

On Thu, 8 Dec 2005, Ian Pratt wrote:
>
> Looking at thhe oops message, this is with a 3ware card, right?
> We''ve had at least one other report of them causing problems on
systems
> with >4GB enabled (or maybe it was you?)
Yes, I am (or was) using a 3ware controller, since the on board sata
controller is a marvell one, and the native linux drivers seem flaky as
can be. I''ve only had the 3ware card for about a week, and
don''t believe I
have written about it. I was going to ask on this list about the MCE,
since I don''t know of a better list for asking questions about big
opteron
systems.

-Tom
>
> Ian
>
> > This seems to be a repeatable crash. just do some disk
> > intensive stuff in domU and then type "sync" :(
> >
> > The box is a dual opteron 720, with 8 gig of ram, one domU
> > and (duh) one dom0, both with aprox 500 meg of RAM allocated.
<snip>


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Matt Ayres

2005-Dec-09 15:17 UTC

head link

Re: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o,> 4 gig ram

Tom Brown wrote:> On Thu, 8 Dec 2005, Ian Pratt wrote:
> 
>> Looking at thhe oops message, this is with a 3ware card, right?
>> We''ve had at least one other report of them causing problems
on systems
>> with >4GB enabled (or maybe it was you?)
> 
> Yes, I am (or was) using a 3ware controller, since the on board sata
> controller is a marvell one, and the native linux drivers seem flaky as
> can be. I''ve only had the 3ware card for about a week, and
don''t believe I
> have written about it. I was going to ask on this list about the MCE,
> since I don''t know of a better list for asking questions about big
opteron
> systems.
> 
I was the other who reported (via bugzilla, #402) a crash with 3ware.  I 
was told the 3ware driver does not look >4GB safe.  I run the same 
hardware specs using other (non-xen) kernels using PAE and >4GB and have 
absolutely no troubles though.  I am still in a re-testing phase since 
3.0.0 was announced.  I have yet to put customers on this setup (which 
is where whatever can happen, will happen).

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Tom Brown

2005-Dec-09 18:49 UTC

head link

Re: [Xen-users] (3ware) xen 3.0 amd64 crash... seems to be tied into disk i/o,> 4 gig ram

On Fri, 9 Dec 2005, Matt Ayres wrote:
> Tom Brown wrote:
> > On Thu, 8 Dec 2005, Ian Pratt wrote:
> >
> >> Looking at thhe oops message, this is with a 3ware card, right?
> >> We''ve had at least one other report of them causing
problems on systems
> >> with >4GB enabled (or maybe it was you?)
> >
> > Yes, I am (or was) using a 3ware controller, since the on board sata
> > controller is a marvell one, and the native linux drivers seem flaky
as
> > can be. I''ve only had the 3ware card for about a week, and
don''t believe I
> > have written about it. I was going to ask on this list about the MCE,
> > since I don''t know of a better list for asking questions
about big opteron
> > systems.
>
> I was the other who reported (via bugzilla, #402) a crash with 3ware.  I
> was told the 3ware driver does not look >4GB safe.  I run the same
> hardware specs using other (non-xen) kernels using PAE and >4GB and have
> absolutely no troubles though.  I am still in a re-testing phase since
> 3.0.0 was announced.  I have yet to put customers on this setup (which
> is where whatever can happen, will happen).
I got a response back from 3ware, as I forwarded Ian''s note to them...
My card(s) is(are) an 8506-4LP

I don''t know much about ''the use of the IOMMU''. I do
know that at
least one of my kernels was bitching about the bios settings for
IOMMU, and it seems to be an active area of Linux development. It
seems to be some sort of aperature possibly used as a temporary
storage space for information destined for addresses > 4gig? It
also appears to tied into the onboard (northbridge) GART stuff. I
believe the engineers MCE/memory comments apply to the controller
memory, as I would expect a clearer MCE message if we got an ECC
fault on main memory.

If anyone can recommed a good "big linux", or amd64 linux list,
I''d be happier posting this stuff there... although I expect
there are more "high powered techs" on this list, than most
others... of course there are a lot of "very junior" sys admins
here too :-)

   Date: Thu, 8 Dec 2005 14:32:58 -0800
   From: David Graas <@amcc.com>
   To: Tom Brown <tbrown@baremetal.com>
   Subject: RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk
       i/o, > 4 gig ram (fwd)

   Tom, I am not sure if this helps or not but your e-mail did concern me
   so I checked with some of our engineers on this. Their reply is below...

   "We have supported > 4GB ram, even with the 3w-xxxx driver and
   5000/6000/7000/8000 series cards on amd64 through the use of the IOMMU
   GART (Northbridge memory aperature) before the Opteron even shipped to
   customers, even though these cards only do 32-bit DMA operations.

   We have supported > 4GB ram with the 9500-S and 9550-SX since day one.
   We also support PAE (Physical Address Extensions) mode with the 9500-S
   and 9550-SX So you can DMA to > 4GB of ram on a 32-bit pentium pro or
   higher architecture.

   Also, he is getting MCE error messages (Machine Check Exceptions) from
   his Northbridge, which could mean bad memory."

   It might be worth a call to our support group (800-840-6055) to see if
   there is a fix on this situation.

   David Graas
   Corporate Sales Manager
   3ware - an AMCC company
   Direct 408-542-8670
   Mobile 650-269-2972
   Fax 408-542-8602
   @amcc.com
   www.amcc.com

I''ve invalidated David''s address, but it is first inital,
lastname if
you want to write to him...

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Dec 2005 - xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram

[Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram

RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram

RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o,> 4 gig ram

Re: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o,> 4 gig ram

Re: [Xen-users] (3ware) xen 3.0 amd64 crash... seems to be tied into disk i/o,> 4 gig ram