Hello everyone, We are trying to set up a Xen System based on the latest stable vanilla Linux Kernel (3.1.5) and hypervisor & tools (4.1.2). When starting to test with the iSCSI target, we experienced (reproducable) system crashes and immediate reboots in the initiator System (Xen dom0). When afterwards trying to copy the domU System to a local disk drive, the same crash happened ... so it''s probably not really iSCSI related. Some facts: - crash during network IO. It happens reliably - just within minutes of using either iSCSI or NFS - The whole system crashes and reboots. We can see there is a stacktrace on screen, but it''s too quick to read. - It happens in dom0. No domU needs to run to reproduce it. - We tried with noacpi, nolapic - with the same result. - When booting the same kernel without hypervisor it runs stable! - The crash is reproducible on 2 different amd64 machines (see below). - We are still using xm and xend (but that should not make any difference, since we don''t start a single domU). - The iSCSI target is LIO (Linux 3.1.5) with 4k blocksize. - MTU 9000 - The Xen system seems to run stable without NFS or iSCSI access! Hardware: Tested on 2 machines: - AMD Athlon(tm) 64 Processor 2800+ on Asus K8V Marvell 88E8001 Gigabit Ethernet (rev 13) - AMD Athlon(tm) 64 Processor 3700+ on Asus A8N-E (nforce4) nVidia CK804 Ethernet (rev a3) as well as: Intel 82571EB Gigabit Ethernet (rev 06) We are testing on a gentoo system, but installed xen tools and hypervisor from source, as well as the linux kernel. We don''t have any experience in kernel debugging - so I don''t know if there is a chance to grab the stack trace in any way. Any suggestions about what we can do to get this thing stable? Or rather on how or what to test to so we can provide more usable information? Thanks in advance, - peter.
Hello everyone, We are trying to set up a Xen System based on the latest stable vanilla Linux Kernel (3.1.5) and hypervisor & tools (4.1.2). When starting to test with the iSCSI target, we experienced (reproducable) system crashes and immediate reboots in the initiator System (Xen dom0). When afterwards trying to copy the domU System to a local disk drive, the same crash happened ... so it''s probably not really iSCSI related. Some facts: - crash during network IO. It happens reliably - just within minutes of using either iSCSI or NFS - The whole system crashes and reboots. We can see there is a stacktrace on screen, but it''s too quick to read. - It happens in dom0. No domU needs to run to reproduce it. - We tried with noacpi, nolapic - with the same result. - When booting the same kernel without hypervisor it runs stable! - The crash is reproducible on 2 different amd64 machines (see below). - We are still using xm and xend (but that should not make any difference, since we don''t start a single domU). - The iSCSI target is LIO (Linux 3.1.5) with 4k blocksize. - MTU 9000 - The Xen system seems to run stable without NFS or iSCSI access! Hardware: Tested on 2 machines: - AMD Athlon(tm) 64 Processor 2800+ on Asus K8V Marvell 88E8001 Gigabit Ethernet (rev 13) - AMD Athlon(tm) 64 Processor 3700+ on Asus A8N-E (nforce4) nVidia CK804 Ethernet (rev a3) as well as: Intel 82571EB Gigabit Ethernet (rev 06) We are testing on a gentoo system, but installed xen tools and hypervisor from source, as well as the linux kernel. We don''t have any experience in kernel debugging - so I don''t know if there is a chance to grab the stack trace in any way. Any suggestions about what we can do to get this thing stable? Or rather on how or what to test to so we can provide more usable information? Thanks in advance, - peter. -- Peter Gansterer PARADIGMA Unternehmensberatung GmbH Mariahilferstraße 47/1/3 A-1060 Wien Tel: 0043-(0)1-585 49 72 http://www.paradigma.net Firmenbuchnummer: FN 134564 p Rechtsform: GmbH Firmenbuchgericht: Handelsgericht Wien
Hi again, It seems to be somehow iSCSI-related after all. Today''s tests showed: - We could not reproduce the crash using NFS only. (Altho it DID crash when copying the system via NFS ... maybe we had accessed an iSCSI device before and the kernel was left in some erroneous state). - It still reliably crashes when accessing iSCSI. - It''s still stable with iSCSI and the same kernel without Xen. - We tried the "noirqbalance" option, which had no effect. We are in the process of preparing a 2.6.38 System for comparison. Any input is still very welcome. thx, - peter. On Mon, December 19, 2011 15:43, Peter Gansterer wrote:> > Hello everyone, > > We are trying to set up a Xen System based on the latest stable vanilla > Linux Kernel (3.1.5) and hypervisor & tools (4.1.2). > > When starting to test with the iSCSI target, we experienced (reproducable) > system crashes and immediate reboots in the initiator System (Xen dom0). > > When afterwards trying to copy the domU System to a local disk drive, the > same crash happened ... so it''s probably not really iSCSI related. > > Some facts: > - crash during network IO. > It happens reliably - just within minutes of using either iSCSI or NFS > - The whole system crashes and reboots. > We can see there is a stacktrace on screen, but it''s too quick to read. > - It happens in dom0. No domU needs to run to reproduce it. > - We tried with noacpi, nolapic - with the same result. > - When booting the same kernel without hypervisor it runs stable! > - The crash is reproducible on 2 different amd64 machines (see below). > - We are still using xm and xend (but that should not make any > difference, since we don''t start a single domU). > - The iSCSI target is LIO (Linux 3.1.5) with 4k blocksize. > - MTU 9000 > - The Xen system seems to run stable without NFS or iSCSI access! > > Hardware: > Tested on 2 machines: > - AMD Athlon(tm) 64 Processor 2800+ > on Asus K8V > Marvell 88E8001 Gigabit Ethernet (rev 13) > - AMD Athlon(tm) 64 Processor 3700+ > on Asus A8N-E (nforce4) > nVidia CK804 Ethernet (rev a3) > as well as: > Intel 82571EB Gigabit Ethernet (rev 06) > > We are testing on a gentoo system, but installed xen tools and hypervisor > from source, as well as the linux kernel. > > We don''t have any experience in kernel debugging - so I don''t know if > there is a chance to grab the stack trace in any way. > > Any suggestions about what we can do to get this thing stable? Or rather > on how or what to test to so we can provide more usable information? > > Thanks in advance, > - peter. > > > > > > > -- > Peter Gansterer > > PARADIGMA Unternehmensberatung GmbH > Mariahilferstraße 47/1/3 > A-1060 Wien > Tel: 0043-(0)1-585 49 72 > http://www.paradigma.net > > Firmenbuchnummer: FN 134564 p > Rechtsform: GmbH > Firmenbuchgericht: Handelsgericht Wien > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- Peter Gansterer PARADIGMA Unternehmensberatung GmbH Mariahilferstraße 47/1/3 A-1060 Wien Tel: 0043-(0)1-585 49 72 http://www.paradigma.net Firmenbuchnummer: FN 134564 p Rechtsform: GmbH Firmenbuchgericht: Handelsgericht Wien
2011/12/19 Peter Gansterer <peter.gansterer@paradigma.net>:> Hi again, > > It seems to be somehow iSCSI-related after all. > Today''s tests showed: > - We could not reproduce the crash using NFS only. > (Altho it DID crash when copying the system via NFS ... maybe we had > accessed an iSCSI device before and the kernel was left in some > erroneous state). > - It still reliably crashes when accessing iSCSI.Please try setting panic=100 on the kernel command line. If you''re lucky, this will result in a 100 second delay prior to the dom0 restart. But I didn''t have success when I tested that. I''m assuming that Xen auto-reacts on a dom0 crash. Anyway, please give it a try. As to your actual problem: Sorry. no idea.
Short update on our issue: - panic=100 does not seem to work with Xen. - 2.6.38 (Xen patches by gentoo) runs stable with iSCSI! So it seems to be an issue that occurs since 3.x kernels or it is something already fixed in gentoo/suse patches that didn''t make it upstream ... Can you please tell me, what would be the best way to inform developers? - Where can I post the issue? - What information will be required? thx, - peter. On Mon, December 19, 2011 21:34, Florian Heigl wrote:> 2011/12/19 Peter Gansterer <peter.gansterer@paradigma.net>: >> Hi again, >> >> It seems to be somehow iSCSI-related after all. >> Today''s tests showed: >> - We could not reproduce the crash using NFS only. >> (Altho it DID crash when copying the system via NFS ... maybe we had >> accessed an iSCSI device before and the kernel was left in some >> erroneous state). >> - It still reliably crashes when accessing iSCSI. > > Please try setting panic=100 on the kernel command line. If you''re > lucky, this will result in a 100 second delay prior to the dom0 > restart. > But I didn''t have success when I tested that. I''m assuming that Xen > auto-reacts on a dom0 crash. Anyway, please give it a try. > > As to your actual problem: Sorry. no idea. >-- Peter Gansterer PARADIGMA Unternehmensberatung GmbH Mariahilferstraße 47/1/3 A-1060 Wien Tel: 0043-(0)1-585 49 72 http://www.paradigma.net Firmenbuchnummer: FN 134564 p Rechtsform: GmbH Firmenbuchgericht: Handelsgericht Wien
On Tue, Dec 20, 2011 at 04:20:56PM +0100, Peter Gansterer wrote:> Short update on our issue: > > - panic=100 does not seem to work with Xen. > - 2.6.38 (Xen patches by gentoo) runs stable with iSCSI! > > So it seems to be an issue that occurs since 3.x kernels or it is > something already fixed in gentoo/suse patches that didn''t make it > upstream ... > > Can you please tell me, what would be the best way to inform developers?Capture a full crash/oops log using a serial console: http://wiki.xen.org/xenwiki/XenSerialConsole> - Where can I post the issue?To xen-devel mailinglist.> - What information will be required? >Full serial console log as text, including Xen hypervisor (xen.gz) output and dom0 Linux kernel output, all the way from boot to the point where it crashes. -- Pasi> thx, > - peter. > > On Mon, December 19, 2011 21:34, Florian Heigl wrote: > > 2011/12/19 Peter Gansterer <peter.gansterer@paradigma.net>: > >> Hi again, > >> > >> It seems to be somehow iSCSI-related after all. > >> Today''s tests showed: > >> - We could not reproduce the crash using NFS only. > >> (Altho it DID crash when copying the system via NFS ... maybe we had > >> accessed an iSCSI device before and the kernel was left in some > >> erroneous state). > >> - It still reliably crashes when accessing iSCSI. > > > > Please try setting panic=100 on the kernel command line. If you''re > > lucky, this will result in a 100 second delay prior to the dom0 > > restart. > > But I didn''t have success when I tested that. I''m assuming that Xen > > auto-reacts on a dom0 crash. Anyway, please give it a try. > > > > As to your actual problem: Sorry. no idea. > > > > > -- > Peter Gansterer > > PARADIGMA Unternehmensberatung GmbH > Mariahilferstraße 47/1/3 > A-1060 Wien > Tel: 0043-(0)1-585 49 72 > http://www.paradigma.net > > Firmenbuchnummer: FN 134564 p > Rechtsform: GmbH > Firmenbuchgericht: Handelsgericht Wien > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users