Tom Brown
2005-Dec-08 08:45 UTC
[Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram
This seems to be a repeatable crash. just do some disk intensive stuff in domU and then type "sync" :( The box is a dual opteron 720, with 8 gig of ram, one domU and (duh) one dom0, both with aprox 500 meg of RAM allocated. The box has remote power control, serial console, and I can provide developer access if it helps. Kernel was compiled locally (on centos 4.2 amd64 domU and dom0) Box seems stable under raw linux 2.6.14.2, but does generate occasionaly MCE messages pointing at the northbridge/GART... I spent a day researching that, and didn''t come to any conclusion other than it could be a bogus report specific to amd64 systems with > 4gig ram. there is an IBM page to that effect for an older RHE system... box has a 3ware controller and SATA drives. Anyhow, any help would be appreciated. I''m probably going to try to see if the PAE stuff is more stable... but obviously not tonight. In theory this is a 3.0.0 box, but might be 3.0-testing... This is pretty greek to me, but given that it seems reproducable, I should be able to produce any other info required...? Or should I be dumping this into bugzilla? -Tom>From root@localhost.localdomain Thu Dec 8 00:33:19 2005Date: Thu, 8 Dec 2005 00:21:56 -0800 From: root <root@localhost.localdomain> To: tbrown@baremetal.com Subject: oops.2.ksymoops ksymoops 2.4.11 on x86_64 2.6.12.6-xen0. Options used -V (default) -K (specified) -l /proc/modules (default) -o /lib/modules/2.6.12.6-xen0/ (default) -m /boot/System.map-2.6.12.6-xen0 (specified) No modules in ksyms, skipping objects No ksyms, skipping lsmod Unable to handle kernel paging request at ffff88001e61b000 RIP: <ffffffff80220bfb>{memcpy+11} Oops: 0003 [1] CPU 0 Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0 RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11} Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 RSP: e02b:ffffffff80525d50 EFLAGS: 00010246 RAX: ffff88001e61b000 RBX: 000000000000500c RCX: 0000000000000200 RDX: 0000000000000000 RSI: ffff8800040a2000 RDI: ffff88001e61b000 RBP: 0000000000000002 R08: 0000000000000002 R09: ffff8800040a2000 R10: ffff8800040a2000 R11: 0000000000000246 R12: 0000000000000000 R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7 FS: 00002aaaaaac9360(0000) GS:ffffffff80511a00(0000) knlGS:0000000055572460 CS: e033 DS: 0000 ES: 0000 Stack: ffffffff8011a094 ffff8800016a55e8 0000000000000000 ffff880005ac42d8 ffffffff8011a2cd ffff8800016a55e8 0000000000000000 0000000100000000 ffff8800147221c0 0000000000000001 Call Trace:<ffffffff8011a094>{__sync_single+100} <ffffffff8011a2cd>{unmap_single+109} <ffffffff8011aa40>{swiotlb_unmap_sg+192} <ffffffff802eb517>{tw_interrupt+1799} <ffffffff8014cd9d>{handle_IRQ_event+61} <ffffffff8014ce87>{__do_IRQ+167} <ffffffff80114dc4>{do_IRQ+52} <ffffffff8010d958>{evtchn_do_upcall+136} <ffffffff80111e7d>{do_hypervisor_callback+17} <ffffffff8010f793>{xen_idle+83} <ffffffff8010f793>{xen_idle+83} <ffffffff8010f7cf>{cpu_idle+31} <ffffffff8052671f>{start_kernel+495} <ffffffff80526193>{_sinittext+403} Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90>>RIP; ffffffff80220bfb <memcpy+b/b0> <==== >>RAX; ffff88001e61b000 <__start___xen_guest+ffff88001e612144/ffffffff800f7144> >>RSI; ffff8800040a2000 <__start___xen_guest+ffff880004099144/ffffffff800f7144> >>RDI; ffff88001e61b000 <__start___xen_guest+ffff88001e612144/ffffffff800f7144> >>R09; ffff8800040a2000 <__start___xen_guest+ffff880004099144/ffffffff800f7144> >>R10; ffff8800040a2000 <__start___xen_guest+ffff880004099144/ffffffff800f7144> >>R13; ffff800000000000 <__start___xen_guest+ffff7fffffff7144/ffffffff800f7144> >>R14; 7fffffffffffffff <__start___xen_guest+7fffffffffff7143/ffffffff800f7144> >>R15; 6db6db6db6db6db7 <__start___xen_guest+6db6db6db6dadefb/ffffffff800f7144>Trace; ffffffff8011a094 <__sync_single+64/70> Trace; ffffffff8011aa40 <swiotlb_unmap_sg+c0/e0> Trace; ffffffff8014cd9d <handle_IRQ_event+3d/80> Trace; ffffffff80114dc4 <do_IRQ+34/50> Trace; ffffffff80111e7d <do_hypervisor_callback+11/18> Trace; ffffffff8010f793 <xen_idle+53/70> Trace; ffffffff8052671f <start_kernel+1ef/200> Code; ffffffff80220bfb <memcpy+b/b0> 0000000000000000 <_RIP>: Code; ffffffff80220bfb <memcpy+b/b0> <==== 0: f3 48 a5 repz movsq %ds:(%rsi),%es:(%rdi) <====Code; ffffffff80220bfe <memcpy+e/b0> 3: 89 d1 mov %edx,%ecx Code; ffffffff80220c00 <memcpy+10/b0> 5: f3 a4 repz movsb %ds:(%rsi),%es:(%rdi) Code; ffffffff80220c02 <memcpy+12/b0> 7: c3 retq Code; ffffffff80220c03 <memcpy+13/b0> 8: 66 data16 Code; ffffffff80220c04 <memcpy+14/b0> 9: 66 data16 Code; ffffffff80220c05 <memcpy+15/b0> a: 66 data16 Code; ffffffff80220c06 <memcpy+16/b0> b: 90 nop Code; ffffffff80220c07 <memcpy+17/b0> c: 66 data16 Code; ffffffff80220c08 <memcpy+18/b0> d: 66 data16 Code; ffffffff80220c09 <memcpy+19/b0> e: 66 data16 Code; ffffffff80220c0a <memcpy+1a/b0> f: 90 nop Code; ffffffff80220c0b <memcpy+1b/b0> 10: 66 data16 Code; ffffffff80220c0c <memcpy+1c/b0> 11: 66 data16 Code; ffffffff80220c0d <memcpy+1d/b0> 12: 66 data16 Code; ffffffff80220c0e <memcpy+1e/b0> 13: 90 nop CR2: ffff88001e61b000 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!>From root@localhost.localdomain Thu Dec 8 00:43:16 2005Date: Thu, 8 Dec 2005 00:40:51 -0800 From: root <root@localhost.localdomain> To: tbrown@baremetal.com Subject: tmpx3.ksymoops ksymoops 2.4.11 on x86_64 2.6.12.6-xen0. Options used -V (default) -K (specified) -l /proc/modules (default) -o /lib/modules/2.6.12.6-xen0/ (default) -m /usr/src/linux/System.map (default) No modules in ksyms, skipping objects No ksyms, skipping lsmod Unable to handle kernel paging request at ffff88001e527000 RIP: <ffffffff80220bfb>{memcpy+11} Oops: 0003 [1] CPU 0 Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0 RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11} Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 RSP: e02b:ffffffff80525d50 EFLAGS: 00010246 RAX: ffff88001e527000 RBX: 0000000000003968 RCX: 0000000000000200 RDX: 0000000000000000 RSI: ffff880003550000 RDI: ffff88001e527000 RBP: 0000000000000002 R08: 0000000000000002 R09: ffff880003550000 R10: ffff880003550000 R11: 0000000000000246 R12: 0000000000000000 R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7 FS: 00002aaaabe8f280(0000) GS:ffffffff80511a00(0000) knlGS:0000000055572460 CS: e033 DS: 0000 ES: 0000 Stack: ffffffff8011a094 ffff8800016a2088 ffffffff00000000 ffff880005ac42d8 ffffffff8011a2cd ffff8800016a2088 ffffffff00000000 0000000100000000 ffff8800078caf20 0000000000000001 Call Trace:<ffffffff8011a094>{__sync_single+100} <ffffffff8011a2cd>{unmap_single+109} <ffffffff8011aa40>{swiotlb_unmap_sg+192} <ffffffff802eb517>{tw_interrupt+1799} <ffffffff8014cd9d>{handle_IRQ_event+61} <ffffffff8014ce87>{__do_IRQ+167} <ffffffff80114dc4>{do_IRQ+52} <ffffffff8010d958>{evtchn_do_upcall+136} <ffffffff80111e7d>{do_hypervisor_callback+17} <ffffffff8010f793>{xen_idle+83} <ffffffff8010f793>{xen_idle+83} <ffffffff8010f7cf>{cpu_idle+31} <ffffffff8052671f>{start_kernel+495} <ffffffff80526193>{_sinittext+403} Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90>>RIP; ffffffff80220bfb <bitmap_parse+bb/210> <==== >>RAX; ffff88001e527000 <phys_startup_64+ffff88001e426f00/ffffffff7fffff00> >>RSI; ffff880003550000 <phys_startup_64+ffff88000344ff00/ffffffff7fffff00> >>RDI; ffff88001e527000 <phys_startup_64+ffff88001e426f00/ffffffff7fffff00> >>R09; ffff880003550000 <phys_startup_64+ffff88000344ff00/ffffffff7fffff00> >>R10; ffff880003550000 <phys_startup_64+ffff88000344ff00/ffffffff7fffff00> >>R13; ffff800000000000 <phys_startup_64+ffff7fffffefff00/ffffffff7fffff00> >>R14; 7fffffffffffffff <phys_startup_64+7fffffffffeffeff/ffffffff7fffff00> >>R15; 6db6db6db6db6db7 <phys_startup_64+6db6db6db6cb6cb7/ffffffff7fffff00>Trace; ffffffff8011a094 <touch_nmi_watchdog+4/30> Trace; ffffffff8011aa40 <pin_2_irq+60/130> Trace; ffffffff8014cd9d <kfifo_init+8d/90> Trace; ffffffff80114dc4 <pda_init+94/110> Trace; ffffffff80111e7d <handle_lost_ticks+13d/170> Trace; ffffffff8010f793 <oops_begin+23/70> Trace; ffffffff8052671f <__log_buf+e15f/20000> Code; ffffffff80220bfb <bitmap_parse+bb/210> 0000000000000000 <_RIP>: Code; ffffffff80220bfb <bitmap_parse+bb/210> <==== 0: f3 48 a5 repz movsq %ds:(%rsi),%es:(%rdi) <====Code; ffffffff80220bfe <bitmap_parse+be/210> 3: 89 d1 mov %edx,%ecx Code; ffffffff80220c00 <bitmap_parse+c0/210> 5: f3 a4 repz movsb %ds:(%rsi),%es:(%rdi) Code; ffffffff80220c02 <bitmap_parse+c2/210> 7: c3 retq Code; ffffffff80220c03 <bitmap_parse+c3/210> 8: 66 data16 Code; ffffffff80220c04 <bitmap_parse+c4/210> 9: 66 data16 Code; ffffffff80220c05 <bitmap_parse+c5/210> a: 66 data16 Code; ffffffff80220c06 <bitmap_parse+c6/210> b: 90 nop Code; ffffffff80220c07 <bitmap_parse+c7/210> c: 66 data16 Code; ffffffff80220c08 <bitmap_parse+c8/210> d: 66 data16 Code; ffffffff80220c09 <bitmap_parse+c9/210> e: 66 data16 Code; ffffffff80220c0a <bitmap_parse+ca/210> f: 90 nop Code; ffffffff80220c0b <bitmap_parse+cb/210> 10: 66 data16 Code; ffffffff80220c0c <bitmap_parse+cc/210> 11: 66 data16 Code; ffffffff80220c0d <bitmap_parse+cd/210> 12: 66 data16 Code; ffffffff80220c0e <bitmap_parse+ce/210> 13: 90 nop CR2: ffff88001e527000 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ian Pratt
2005-Dec-08 12:03 UTC
RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram
Looking at thhe oops message, this is with a 3ware card, right? We''ve had at least one other report of them causing problems on systems with >4GB enabled (or maybe it was you?) Ian> This seems to be a repeatable crash. just do some disk > intensive stuff in domU and then type "sync" :( > > The box is a dual opteron 720, with 8 gig of ram, one domU > and (duh) one dom0, both with aprox 500 meg of RAM allocated. > > The box has remote power control, serial console, and I can > provide developer access if it helps. Kernel was compiled > locally (on centos 4.2 > amd64 domU and dom0) > > Box seems stable under raw linux 2.6.14.2, but does generate > occasionaly MCE messages pointing at the northbridge/GART... > I spent a day researching that, and didn''t come to any > conclusion other than it could be a bogus report specific to > amd64 systems with > 4gig ram. there is an IBM page to that > effect for an older RHE system... box has a 3ware controller > and SATA drives. > > Anyhow, any help would be appreciated. I''m probably going to > try to see if the PAE stuff is more stable... but obviously > not tonight. > > In theory this is a 3.0.0 box, but might be 3.0-testing... > > This is pretty greek to me, but given that it seems > reproducable, I should be able to produce any other info required...? > > Or should I be dumping this into bugzilla? > > -Tom > > >From root@localhost.localdomain Thu Dec 8 00:33:19 2005 > Date: Thu, 8 Dec 2005 00:21:56 -0800 > From: root <root@localhost.localdomain> > To: tbrown@baremetal.com > Subject: oops.2.ksymoops > ksymoops 2.4.11 on x86_64 2.6.12.6-xen0. Options used > -V (default) > -K (specified) > -l /proc/modules (default) > -o /lib/modules/2.6.12.6-xen0/ (default) > -m /boot/System.map-2.6.12.6-xen0 (specified) > > No modules in ksyms, skipping objects > No ksyms, skipping lsmod > Unable to handle kernel paging request at ffff88001e61b000 RIP: > <ffffffff80220bfb>{memcpy+11} > Oops: 0003 [1] > CPU 0 > Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0 > RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11} > Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 > RSP: e02b:ffffffff80525d50 EFLAGS: 00010246 > RAX: ffff88001e61b000 RBX: 000000000000500c RCX: 0000000000000200 > RDX: 0000000000000000 RSI: ffff8800040a2000 RDI: ffff88001e61b000 > RBP: 0000000000000002 R08: 0000000000000002 R09: ffff8800040a2000 > R10: ffff8800040a2000 R11: 0000000000000246 R12: 0000000000000000 > R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7 > FS: 00002aaaaaac9360(0000) GS:ffffffff80511a00(0000) > knlGS:0000000055572460 > CS: e033 DS: 0000 ES: 0000 > Stack: ffffffff8011a094 ffff8800016a55e8 0000000000000000 > ffff880005ac42d8 > ffffffff8011a2cd ffff8800016a55e8 0000000000000000 > 0000000100000000 > ffff8800147221c0 0000000000000001 Call > Trace:<ffffffff8011a094>{__sync_single+100} > <ffffffff8011a2cd>{unmap_single+109} > <ffffffff8011aa40>{swiotlb_unmap_sg+192} > <ffffffff802eb517>{tw_interrupt+1799} > <ffffffff8014cd9d>{handle_IRQ_event+61} > <ffffffff8014ce87>{__do_IRQ+167} > <ffffffff80114dc4>{do_IRQ+52} > <ffffffff8010d958>{evtchn_do_upcall+136} > <ffffffff80111e7d>{do_hypervisor_callback+17} > <ffffffff8010f793>{xen_idle+83} > <ffffffff8010f793>{xen_idle+83} <ffffffff8010f7cf>{cpu_idle+31} > <ffffffff8052671f>{start_kernel+495} > <ffffffff80526193>{_sinittext+403} > Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90 > > > >>RIP; ffffffff80220bfb <memcpy+b/b0> <====> > >>RAX; ffff88001e61b000 > >><__start___xen_guest+ffff88001e612144/ffffffff800f7144> > >>RSI; ffff8800040a2000 > >><__start___xen_guest+ffff880004099144/ffffffff800f7144> > >>RDI; ffff88001e61b000 > >><__start___xen_guest+ffff88001e612144/ffffffff800f7144> > >>R09; ffff8800040a2000 > >><__start___xen_guest+ffff880004099144/ffffffff800f7144> > >>R10; ffff8800040a2000 > >><__start___xen_guest+ffff880004099144/ffffffff800f7144> > >>R13; ffff800000000000 > >><__start___xen_guest+ffff7fffffff7144/ffffffff800f7144> > >>R14; 7fffffffffffffff > >><__start___xen_guest+7fffffffffff7143/ffffffff800f7144> > >>R15; 6db6db6db6db6db7 > >><__start___xen_guest+6db6db6db6dadefb/ffffffff800f7144> > > Trace; ffffffff8011a094 <__sync_single+64/70> Trace; > ffffffff8011aa40 <swiotlb_unmap_sg+c0/e0> Trace; > ffffffff8014cd9d <handle_IRQ_event+3d/80> Trace; > ffffffff80114dc4 <do_IRQ+34/50> Trace; ffffffff80111e7d > <do_hypervisor_callback+11/18> Trace; ffffffff8010f793 > <xen_idle+53/70> Trace; ffffffff8052671f <start_kernel+1ef/200> > > Code; ffffffff80220bfb <memcpy+b/b0> > 0000000000000000 <_RIP>: > Code; ffffffff80220bfb <memcpy+b/b0> <====> 0: f3 48 a5 repz movsq > %ds:(%rsi),%es:(%rdi) <====> Code; ffffffff80220bfe <memcpy+e/b0> > 3: 89 d1 mov %edx,%ecx > Code; ffffffff80220c00 <memcpy+10/b0> > 5: f3 a4 repz movsb %ds:(%rsi),%es:(%rdi) > Code; ffffffff80220c02 <memcpy+12/b0> > 7: c3 retq > Code; ffffffff80220c03 <memcpy+13/b0> > 8: 66 data16 > Code; ffffffff80220c04 <memcpy+14/b0> > 9: 66 data16 > Code; ffffffff80220c05 <memcpy+15/b0> > a: 66 data16 > Code; ffffffff80220c06 <memcpy+16/b0> > b: 90 nop > Code; ffffffff80220c07 <memcpy+17/b0> > c: 66 data16 > Code; ffffffff80220c08 <memcpy+18/b0> > d: 66 data16 > Code; ffffffff80220c09 <memcpy+19/b0> > e: 66 data16 > Code; ffffffff80220c0a <memcpy+1a/b0> > f: 90 nop > Code; ffffffff80220c0b <memcpy+1b/b0> > 10: 66 data16 > Code; ffffffff80220c0c <memcpy+1c/b0> > 11: 66 data16 > Code; ffffffff80220c0d <memcpy+1d/b0> > 12: 66 data16 > Code; ffffffff80220c0e <memcpy+1e/b0> > 13: 90 nop > > CR2: ffff88001e61b000 > <0>Kernel panic - not syncing: Aiee, killing interrupt handler! > > > > >From root@localhost.localdomain Thu Dec 8 00:43:16 2005 > Date: Thu, 8 Dec 2005 00:40:51 -0800 > From: root <root@localhost.localdomain> > To: tbrown@baremetal.com > Subject: tmpx3.ksymoops > > ksymoops 2.4.11 on x86_64 2.6.12.6-xen0. Options used > -V (default) > -K (specified) > -l /proc/modules (default) > -o /lib/modules/2.6.12.6-xen0/ (default) > -m /usr/src/linux/System.map (default) > > No modules in ksyms, skipping objects > No ksyms, skipping lsmod > Unable to handle kernel paging request at ffff88001e527000 RIP: > <ffffffff80220bfb>{memcpy+11} > Oops: 0003 [1] > CPU 0 > Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0 > RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11} > Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 > RSP: e02b:ffffffff80525d50 EFLAGS: 00010246 > RAX: ffff88001e527000 RBX: 0000000000003968 RCX: 0000000000000200 > RDX: 0000000000000000 RSI: ffff880003550000 RDI: ffff88001e527000 > RBP: 0000000000000002 R08: 0000000000000002 R09: ffff880003550000 > R10: ffff880003550000 R11: 0000000000000246 R12: 0000000000000000 > R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7 > FS: 00002aaaabe8f280(0000) GS:ffffffff80511a00(0000) > knlGS:0000000055572460 > CS: e033 DS: 0000 ES: 0000 > Stack: ffffffff8011a094 ffff8800016a2088 ffffffff00000000 > ffff880005ac42d8 > ffffffff8011a2cd ffff8800016a2088 ffffffff00000000 > 0000000100000000 > ffff8800078caf20 0000000000000001 Call > Trace:<ffffffff8011a094>{__sync_single+100} > <ffffffff8011a2cd>{unmap_single+109} > <ffffffff8011aa40>{swiotlb_unmap_sg+192} > <ffffffff802eb517>{tw_interrupt+1799} > <ffffffff8014cd9d>{handle_IRQ_event+61} > <ffffffff8014ce87>{__do_IRQ+167} > <ffffffff80114dc4>{do_IRQ+52} > <ffffffff8010d958>{evtchn_do_upcall+136} > <ffffffff80111e7d>{do_hypervisor_callback+17} > <ffffffff8010f793>{xen_idle+83} > <ffffffff8010f793>{xen_idle+83} <ffffffff8010f7cf>{cpu_idle+31} > <ffffffff8052671f>{start_kernel+495} > <ffffffff80526193>{_sinittext+403} > Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90 > > > >>RIP; ffffffff80220bfb <bitmap_parse+bb/210> <====> > >>RAX; ffff88001e527000 > >><phys_startup_64+ffff88001e426f00/ffffffff7fffff00> > >>RSI; ffff880003550000 > >><phys_startup_64+ffff88000344ff00/ffffffff7fffff00> > >>RDI; ffff88001e527000 > >><phys_startup_64+ffff88001e426f00/ffffffff7fffff00> > >>R09; ffff880003550000 > >><phys_startup_64+ffff88000344ff00/ffffffff7fffff00> > >>R10; ffff880003550000 > >><phys_startup_64+ffff88000344ff00/ffffffff7fffff00> > >>R13; ffff800000000000 > >><phys_startup_64+ffff7fffffefff00/ffffffff7fffff00> > >>R14; 7fffffffffffffff > >><phys_startup_64+7fffffffffeffeff/ffffffff7fffff00> > >>R15; 6db6db6db6db6db7 > >><phys_startup_64+6db6db6db6cb6cb7/ffffffff7fffff00> > > Trace; ffffffff8011a094 <touch_nmi_watchdog+4/30> Trace; > ffffffff8011aa40 <pin_2_irq+60/130> Trace; ffffffff8014cd9d > <kfifo_init+8d/90> Trace; ffffffff80114dc4 <pda_init+94/110> > Trace; ffffffff80111e7d <handle_lost_ticks+13d/170> Trace; > ffffffff8010f793 <oops_begin+23/70> Trace; ffffffff8052671f > <__log_buf+e15f/20000> > > Code; ffffffff80220bfb <bitmap_parse+bb/210> 0000000000000000 <_RIP>: > Code; ffffffff80220bfb <bitmap_parse+bb/210> <====> 0: f3 48 a5 repz movsq > %ds:(%rsi),%es:(%rdi) <====> Code; ffffffff80220bfe <bitmap_parse+be/210> > 3: 89 d1 mov %edx,%ecx > Code; ffffffff80220c00 <bitmap_parse+c0/210> > 5: f3 a4 repz movsb %ds:(%rsi),%es:(%rdi) > Code; ffffffff80220c02 <bitmap_parse+c2/210> > 7: c3 retq > Code; ffffffff80220c03 <bitmap_parse+c3/210> > 8: 66 data16 > Code; ffffffff80220c04 <bitmap_parse+c4/210> > 9: 66 data16 > Code; ffffffff80220c05 <bitmap_parse+c5/210> > a: 66 data16 > Code; ffffffff80220c06 <bitmap_parse+c6/210> > b: 90 nop > Code; ffffffff80220c07 <bitmap_parse+c7/210> > c: 66 data16 > Code; ffffffff80220c08 <bitmap_parse+c8/210> > d: 66 data16 > Code; ffffffff80220c09 <bitmap_parse+c9/210> > e: 66 data16 > Code; ffffffff80220c0a <bitmap_parse+ca/210> > f: 90 nop > Code; ffffffff80220c0b <bitmap_parse+cb/210> > 10: 66 data16 > Code; ffffffff80220c0c <bitmap_parse+cc/210> > 11: 66 data16 > Code; ffffffff80220c0d <bitmap_parse+cd/210> > 12: 66 data16 > Code; ffffffff80220c0e <bitmap_parse+ce/210> > 13: 90 nop > > CR2: ffff88001e527000 > <0>Kernel panic - not syncing: Aiee, killing interrupt handler! > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Tom Brown
2005-Dec-08 17:23 UTC
RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o,> 4 gig ram
On Thu, 8 Dec 2005, Ian Pratt wrote:> > Looking at thhe oops message, this is with a 3ware card, right? > We''ve had at least one other report of them causing problems on systems > with >4GB enabled (or maybe it was you?)Yes, I am (or was) using a 3ware controller, since the on board sata controller is a marvell one, and the native linux drivers seem flaky as can be. I''ve only had the 3ware card for about a week, and don''t believe I have written about it. I was going to ask on this list about the MCE, since I don''t know of a better list for asking questions about big opteron systems. -Tom> > Ian > > > This seems to be a repeatable crash. just do some disk > > intensive stuff in domU and then type "sync" :( > > > > The box is a dual opteron 720, with 8 gig of ram, one domU > > and (duh) one dom0, both with aprox 500 meg of RAM allocated.<snip> _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Matt Ayres
2005-Dec-09 15:17 UTC
Re: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o,> 4 gig ram
Tom Brown wrote:> On Thu, 8 Dec 2005, Ian Pratt wrote: > >> Looking at thhe oops message, this is with a 3ware card, right? >> We''ve had at least one other report of them causing problems on systems >> with >4GB enabled (or maybe it was you?) > > Yes, I am (or was) using a 3ware controller, since the on board sata > controller is a marvell one, and the native linux drivers seem flaky as > can be. I''ve only had the 3ware card for about a week, and don''t believe I > have written about it. I was going to ask on this list about the MCE, > since I don''t know of a better list for asking questions about big opteron > systems. >I was the other who reported (via bugzilla, #402) a crash with 3ware. I was told the 3ware driver does not look >4GB safe. I run the same hardware specs using other (non-xen) kernels using PAE and >4GB and have absolutely no troubles though. I am still in a re-testing phase since 3.0.0 was announced. I have yet to put customers on this setup (which is where whatever can happen, will happen). _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Tom Brown
2005-Dec-09 18:49 UTC
Re: [Xen-users] (3ware) xen 3.0 amd64 crash... seems to be tied into disk i/o,> 4 gig ram
On Fri, 9 Dec 2005, Matt Ayres wrote:> Tom Brown wrote: > > On Thu, 8 Dec 2005, Ian Pratt wrote: > > > >> Looking at thhe oops message, this is with a 3ware card, right? > >> We''ve had at least one other report of them causing problems on systems > >> with >4GB enabled (or maybe it was you?) > > > > Yes, I am (or was) using a 3ware controller, since the on board sata > > controller is a marvell one, and the native linux drivers seem flaky as > > can be. I''ve only had the 3ware card for about a week, and don''t believe I > > have written about it. I was going to ask on this list about the MCE, > > since I don''t know of a better list for asking questions about big opteron > > systems. > > I was the other who reported (via bugzilla, #402) a crash with 3ware. I > was told the 3ware driver does not look >4GB safe. I run the same > hardware specs using other (non-xen) kernels using PAE and >4GB and have > absolutely no troubles though. I am still in a re-testing phase since > 3.0.0 was announced. I have yet to put customers on this setup (which > is where whatever can happen, will happen).I got a response back from 3ware, as I forwarded Ian''s note to them... My card(s) is(are) an 8506-4LP I don''t know much about ''the use of the IOMMU''. I do know that at least one of my kernels was bitching about the bios settings for IOMMU, and it seems to be an active area of Linux development. It seems to be some sort of aperature possibly used as a temporary storage space for information destined for addresses > 4gig? It also appears to tied into the onboard (northbridge) GART stuff. I believe the engineers MCE/memory comments apply to the controller memory, as I would expect a clearer MCE message if we got an ECC fault on main memory. If anyone can recommed a good "big linux", or amd64 linux list, I''d be happier posting this stuff there... although I expect there are more "high powered techs" on this list, than most others... of course there are a lot of "very junior" sys admins here too :-) Date: Thu, 8 Dec 2005 14:32:58 -0800 From: David Graas <@amcc.com> To: Tom Brown <tbrown@baremetal.com> Subject: RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram (fwd) Tom, I am not sure if this helps or not but your e-mail did concern me so I checked with some of our engineers on this. Their reply is below... "We have supported > 4GB ram, even with the 3w-xxxx driver and 5000/6000/7000/8000 series cards on amd64 through the use of the IOMMU GART (Northbridge memory aperature) before the Opteron even shipped to customers, even though these cards only do 32-bit DMA operations. We have supported > 4GB ram with the 9500-S and 9550-SX since day one. We also support PAE (Physical Address Extensions) mode with the 9500-S and 9550-SX So you can DMA to > 4GB of ram on a 32-bit pentium pro or higher architecture. Also, he is getting MCE error messages (Machine Check Exceptions) from his Northbridge, which could mean bad memory." It might be worth a call to our support group (800-840-6055) to see if there is a fix on this situation. David Graas Corporate Sales Manager 3ware - an AMCC company Direct 408-542-8670 Mobile 650-269-2972 Fax 408-542-8602 @amcc.com www.amcc.com I''ve invalidated David''s address, but it is first inital, lastname if you want to write to him... _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users