Volker A. Brandt
2009-Sep-02 17:21 UTC
SXCE 121 Kernel Panic while installing NetBSD 5.0.1 PVM DomU
Hi all! I am running SXCE 121 on a dual quad-core X2200M2 (64 bit of course). During an installation of a NetBSD 5.0.1 PVM domU, the entire machine crashed with a kernel panic. Here''s what I managed to salvage from the LOM console of the machine: Sep 2 18:55:19 glaurung genunix: /xpvd/xdb@41,51712 (xdb5) offline Sep 2 18:55:19 glaurung genunix: /xpvd/xdb@41,51728 (xdb6) offline Sep 2 18:55:19 glaurung genunix: /xpvd/xnb@41,0 (xnbo3) offline Sep 2 18:56:36 glaurung genunix: /xpvd/xdb@42,51712 (xdb5) online Sep 2 18:56:36 glaurung genunix: /xpvd/xdb@42,51728 (xdb6) online Sep 2 18:56:37 glaurung genunix: /xpvd/xnb@42,0 (xnbo3) online Xen panic[dom=0xffff8300defb2080/vcpu=0xffff8300dfcdc080]: Xen BUG at mm.c:101 ffff828c8024f558 xpv:do_invalid_op+4d1 ffff828c8024f5f8 xpv:handle_exception+45 ffff828c8024f6d8 xpv:map_pages_to_xen+471 ffff828c8024f768 xpv:get_page_from_l1e+464 ffff828c8024f7c8 xpv:ptwr_emulated_update+142 ffff828c8024f848 xpv:ptwr_emulated_cmpxchg+a9 ffff828c8024f878 xpv:x86_emulate+525f ffff828c8024fe68 xpv:ptwr_do_page_fault+12e ffff828c8024fed8 xpv:do_page_fault+208 ffff828c8024ff18 xpv:handle_exception+45 ffffff003e031a90 unix:hati_pte_map+123 () ffffff003e031b10 unix:hati_load_common+15d () ffffff003e031bd0 unix:hat_devload+198 () ffffff003e031c80 xnb:xnb_to_peer+12c () ffffff003e031d40 xnb:xnb_copy_to_peer+64d () ffffff003e031d70 xnbo:xnbo_from_mac+20 () ffffff003e031db0 mac:mac_promisc_dispatch_one+5f () ffffff003e031e10 mac:mac_promisc_client_dispatch+8c () ffffff003e031e90 mac:mac_rx_srs_drain+117 () ffffff003e031f20 mac:mac_rx_srs_process+1db () ffffff003e032010 mac:mac_tx_send+519 () ffffff003e032080 mac:mac_tx_single_ring_mode+f4 () ffffff003e032110 mac:mac_tx+302 () ffffff003e032170 dld:str_mdata_fastpath_put+a4 () ffffff003e032270 ip:udp_xmit+806 () ffffff003e032310 ip:udp_send_data+3b3 () ffffff003e032470 ip:udp_output_v4+9f9 () ffffff003e0324f0 ip:udp_send_not_connected+eb () ffffff003e032550 ip:udp_wput+fe () ffffff003e0325c0 unix:putnext+21e () ffffff003e032600 rpcmod:rpcmodwput+9b () ffffff003e032620 rpcmod:rmm_wput+1e () ffffff003e032690 unix:put+1aa () ffffff003e0326f0 rpcmod:svc_clts_ksend+1da () ffffff003e032770 rpcmod:svc_sendreply+56 () ffffff003e032a70 nfssrv:common_dispatch+6b2 () ffffff003e032a90 nfssrv:rfs_dispatch+2d () ffffff003e032b70 rpcmod:svc_getreq+19c () ffffff003e032bd0 rpcmod:svc_run+16b () ffffff003e032c00 rpcmod:svc_do_run+81 () ffffff003e032ec0 nfs:nfssys+765 () ffffff003e032f10 unix:brand_sys_syscall32+1cd () syncing file systems... done dumping to /dev/zvol/dsk/BOOTDISK/dump, offset 65536, content: kernel panic[cpu0]/thread=ffffff08e137bc20: Illegally issued hypercall 2 during panic! dump aborted: please record the above information! Here is the XML file I have used to "virsh define" the domain: <domain type=''xen''> <name>nbsd-01</name> <uuid>4a872c6d-cb4c-431c-6899-8e925bbc2001</uuid> <memory>1048576</memory> <currentMemory>1048576</currentMemory> <vcpu>1</vcpu> <bootloader></bootloader> <os> <type>linux</type> <kernel>/var/tmp/netbsd.gz</kernel> <initrd>/var/tmp/install.fs</initrd> </os> <clock offset=''localtime''/> <on_poweroff>destroy</on_poweroff> <on_reboot>destroy</on_reboot> <on_crash>destroy</on_crash> <distro> <type>unix</type> <variant>openbsd4</variant> </distro> <devices> <disk type=''block'' device=''disk''> <driver name=''phy''/> <source dev=''/dev/zvol/dsk/DATADISK/vol-nbsd-01''/> <target dev=''xvda'' bus=''xen''/> </disk> <disk type=''file'' device=''cdrom''> <driver name=''file''/> <source file=''/xvm/img/amd64cd-5.0.1.iso''/> <target dev=''xvdb'' bus=''xen''/> <readonly/> </disk> <interface type=''bridge''> <mac address=''00:bb:c0:00:20:01''/> <source bridge=''public0''/> <script path=''/usr/lib/xen/scripts/vif-vnic''/> </interface> <console type=''pty''> <target port=''0''/> </console> </devices> </domain> (HTML entities inserted for the poor forum readers :-) Note that even though I had the NetBSD install ISO specified as a disk, I ended up running the actual installation via NFS because the names on the image would not map correctly (some RockRidge problem I guess). In /var/log/xen/xend.log I see only INFO and DEBUG messages dating from before the panic. The crash occurred during unpacking of the distribution sets (.tar.gz) via NFS3 from localhost:/tmp. Has anyone seen this crash before? What could I do to keep the box from crashing while I continue my experiments? Thanks -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Sun Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim Email: vab@bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgröße: 45 Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
David Edmondson
2009-Sep-03 13:29 UTC
Re: SXCE 121 Kernel Panic while installing NetBSD 5.0.1 PVM DomU
Volker, I haven''t tried running NetBSD as a domU for a long time. It seems that the domU is using ''page flip'' mode to transfer network packets from dom0 to the guest domain. None of the guests that we regularly use take this path any more (they use ''hypervisor copy'' mode), so it''s possible that the code in the Solaris driver has gone rotten. It would be interesting if you could get a more detailed stack trace from the system using the dump or kernel debugger. Instructions on how to reproduce the problem (where to get the images, etc.) might also help.
Mark Johnson
2009-Sep-03 13:30 UTC
Re: SXCE 121 Kernel Panic while installing NetBSD 5.0.1 PVM DomU
Volker A. Brandt wrote:> Hi all! > Note that even though I had the NetBSD install ISO specified as a disk, > I ended up running the actual installation via NFS because the names > on the image would not map correctly (some RockRidge problem I guess). > > In /var/log/xen/xend.log I see only INFO and DEBUG messages dating > from before the panic. The crash occurred during unpacking of the > distribution sets (.tar.gz) via NFS3 from localhost:/tmp. > > Has anyone seen this crash before? What could I do to keep the box > from crashing while I continue my experiments?It''s being looked at... Current theory is a bug in dom0 when the guest is using page flip mode. MRJ
Volker A. Brandt
2009-Sep-03 14:00 UTC
Re: SXCE 121 Kernel Panic while installing NetBSD 5.0.1 PVM DomU
Hi David, hi Mark! Thanks for taking the time to investigate.> It seems that the domU is using ''page flip'' mode to transfer network > packets from dom0 to the guest domain. None of the guests that we > regularly use take this path any more (they use ''hypervisor copy'' > mode), so it''s possible that the code in the Solaris driver has gone > rotten.Hmmm... is that ''hypervisor copy'' method the better one? Is ''page flip'' deprecated? If so, maybe the NetBSD people can pick this up in the long run. Note that I cpuld reproduce the problem using either NFS or ftp installation methods. The panic would occur only after substantial network traffic, i.e. not after the first packet.> It would be interesting if you could get a more detailed stack trace > from the system using the dump or kernel debugger. Instructions on how > to reproduce the problem (where to get the images, etc.) might also > help.Will do... but it might take a day or two. The box is jumpstarting SXCE build 122 as I write. I''ll certainly try to reproduce it under 122. Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Sun Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim Email: vab@bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgröße: 45 Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
David Edmondson
2009-Sep-03 14:31 UTC
Re: SXCE 121 Kernel Panic while installing NetBSD 5.0.1 PVM DomU
On 3 Sep 2009, at 3:00pm, Volker A. Brandt wrote:>> It seems that the domU is using ''page flip'' mode to transfer network >> packets from dom0 to the guest domain. None of the guests that we >> regularly use take this path any more (they use ''hypervisor copy'' >> mode), so it''s possible that the code in the Solaris driver has gone >> rotten. > > Hmmm... is that ''hypervisor copy'' method the better one?It places lower load on dom0 and also helps with PV drivers for fully- virtualised guest domains.> Is ''page > flip'' deprecated?Yes.> If so, maybe the NetBSD people can pick this up in > the long run.I had a quick look at the NetBSD source and couldn''t see hypervisor copy support. It''s not difficult to add.
Volker A. Brandt
2009-Sep-04 08:48 UTC
Re: SXCE 121 Kernel Panic while installing NetBSD 5.0.1 PVM DomU
> > Hmmm... is that ''hypervisor copy'' method the better one? > > It places lower load on dom0 and also helps with PV drivers for fully- > virtualised guest domains. > > > Is ''page > > flip'' deprecated? > > Yes.OK, so we want NetBSD to switch in the long run, but Solaris Xen still shouldn''t panic if ''page flip'' is being used. :-)> I had a quick look at the NetBSD source and couldn''t see hypervisor > copy support. It''s not difficult to add.I can''t speak for the NetBSD developers, but I''ll post a message in the appropriate mailing list. Meanwile, here''s how to reproduce the panic. My environment is an X2200M2: System Configuration: Sun Microsystems Sun Fire X2200 M2 with Quad Core Processor BIOS Configuration: Sun Microsystems S39_3D12 10/06/2008 BMC Configuration: IPMI 1.5 (KCS: Keyboard Controller Style) ==== Processor Sockets =================================== Version Location Tag -------------------------------- -------------------------- Quad-Core AMD Opteron(tm) Processor 2347 HE CPU 1 Quad-Core AMD Opteron(tm) Processor 2347 HE CPU 2 I am using SXCE Build #122: # uname -a SunOS glaurung 5.11 snv_122 i86pc i386 i86xpv and the xVM version that came with it: v3.3.2-xvm chgset ''Tue Aug 18 03:21:41 2009 -0700 18433:7e735e9e9bf6'' The NetBSD domU will live on a zvol: # zfs get all DATADISK/vol-nbsd-01 NAME PROPERTY VALUE SOURCE DATADISK/vol-nbsd-01 type volume - DATADISK/vol-nbsd-01 creation Wed Sep 2 14:00 2009 - DATADISK/vol-nbsd-01 used 8G - DATADISK/vol-nbsd-01 available 894G - DATADISK/vol-nbsd-01 referenced 134M - DATADISK/vol-nbsd-01 compressratio 1.00x - DATADISK/vol-nbsd-01 reservation none default DATADISK/vol-nbsd-01 volsize 8G - DATADISK/vol-nbsd-01 volblocksize 8K - DATADISK/vol-nbsd-01 checksum on default DATADISK/vol-nbsd-01 compression off default DATADISK/vol-nbsd-01 readonly off default DATADISK/vol-nbsd-01 shareiscsi off default DATADISK/vol-nbsd-01 copies 1 default DATADISK/vol-nbsd-01 refreservation 8G local DATADISK/vol-nbsd-01 primarycache all default DATADISK/vol-nbsd-01 secondarycache all default DATADISK/vol-nbsd-01 usedbysnapshots 0 - DATADISK/vol-nbsd-01 usedbydataset 134M - DATADISK/vol-nbsd-01 usedbychildren 0 - DATADISK/vol-nbsd-01 usedbyrefreservation 7.87G - DATADISK/vol-nbsd-01 logbias latency default To set up the NetBSd installation environment, get the following two files: ftp://ftp.netbsd.org/pub/NetBSD/5.0.1/amd64/binary/kernel/netbsd-INSTALL_XEN3_DOMU.gz ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-5.0.1/amd64/installation/cdrom/boot.iso and place them somewhere on the dom0 box (I chose /var/tmp). Then create the domU, I used an XML file since I went through several iterations. Here it is (changed slightly from my first email): <domain type=''xen''> <name>nbsd-01</name> <uuid>4a872c6d-cb4c-431c-6899-8e925bbc2001</uuid> <memory>1048576</memory> <currentMemory>1048576</currentMemory> <vcpu>1</vcpu> <bootloader></bootloader> <os> <type>linux</type> <kernel>/var/tmp/netbsd-INSTALL_XEN3_DOMU.gz</kernel> <initrd>/var/tmp/boot.iso</initrd> </os> <clock offset=''localtime''/> <on_poweroff>destroy</on_poweroff> <on_reboot>destroy</on_reboot> <on_crash>destroy</on_crash> <distro> <type>unix</type> <variant>openbsd4</variant> </distro> <devices> <disk type=''block'' device=''disk''> <driver name=''phy''/> <source dev=''/dev/zvol/dsk/DATADISK/vol-nbsd-01''/> <target dev=''xvda'' bus=''xen''/> </disk> <interface type=''bridge''> <mac address=''00:bb:c0:00:20:01''/> <source bridge=''public0''/> <script path=''/usr/lib/xen/scripts/vif-vnic''/> <target dev=''vif-1.0''/> </interface> <console type=''pty''> <target port=''0''/> </console> </devices> </domain> (Sorry to the forum readers.) About the distro variant: I wasn''t really too sure what to pick, so I just selected unix/openbsd4 since it seemed the closest match. The source network interface for the bridge is an aggregation: # dladm show-link LINK CLASS MTU STATE OVER nge0 phys 1500 up -- nge1 phys 1500 up -- bge0 phys 1500 up -- bge1 phys 1500 up -- public0 aggr 1500 up bge0 nge0 Time to create and start up the domain: # virsh define nbsd-01.xml Domain nbsd-01 defined from nbsd-01.xml # xm list Name ID Mem VCPUs State Time(s) Domain-0 0 32196 8 r----- 546.1 nbsd-01 1024 1 0.0 # virsh start nbsd-01 Domain nbsd-01 started # virsh dominfo nbsd-01 Id: 2 Name: nbsd-01 UUID: 4a872c6d-cb4c-431c-6899-8e925bbc2001 OS Type: linux State: idle CPU(s): 1 CPU time: 3.8s Max memory: 1048576 kB Used memory: 1047608 kB Autostart: disable # virsh console nbsd-01 Welcome to sysinst, the NetBSD-5.0.1 system installation tool. This menu-driven tool is designed to help you install NetBSD to a hard disk, or upgrade an existing NetBSD system, with a minimum of work. In the following menus type the reference letter (a, b, c, ...) to select an item, or type CTRL+N/CTRL+P to select the next/previous item. The arrow keys and Page-up/Page-down may also work. Activate the current selection from the menu by typing the enter key. +---------------------------------------------+ |>a: Installation messages in English | | b: Messages d''installation en français | | c: Installation auf Deutsch | | d: Komunikaty instalacyjne w jezyku polskim | | e: Mensajes de instalación en castellano | +---------------------------------------------+ Just follow the prompts to install NetBSD on the hard disk. Partition sizes etc do not really matter; just use the entire disk. For networking, I use a fixed MAC address and DHCP; I have verified that the panic will eventually happen with a static IP address, too. After network configuration, select the "ftp" installation method. You will end up at this screen: The following are the ftp site, directory, user, and password that will be used. If "user" is "ftp", then the password is not needed. >a: Host ftp.NetBSD.org b: Base directory pub/NetBSD/NetBSD-5.0.1 c: Set directory /amd64/binary/sets d: User ftp e: Password f: Proxy g: Transfer directory /usr/INSTALL h: Delete after install No x: Get Distribution Just select "Get Distribution", all values are correct. Then watch the installation sets being retrieved via ftp. Eventually, the physical box will panic and reboot. I have a local copy of the files that I can ftp to, same result. I''ve also tried NFS installation from a local share on the dom0, still same result. There is no crash dump, and nothing in /var/xen/dump. I will play around with the kernel debugge later if I have the time. Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Sun Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim Email: vab@bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgröße: 45 Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
Frank van der Linden
2009-Sep-10 19:27 UTC
Re: SXCE 121 Kernel Panic while installing NetBSD 5.0.1 PVM DomU
Volker A. Brandt wrote:> Hi all! > > > I am running SXCE 121 on a dual quad-core X2200M2 (64 bit of course). > During an installation of a NetBSD 5.0.1 PVM domU, the entire machine > crashed with a kernel panic. Here''s what I managed to salvage from > the LOM console of the machine: > > Sep 2 18:55:19 glaurung genunix: /xpvd/xdb@41,51712 (xdb5) offline > Sep 2 18:55:19 glaurung genunix: /xpvd/xdb@41,51728 (xdb6) offline > Sep 2 18:55:19 glaurung genunix: /xpvd/xnb@41,0 (xnbo3) offline > Sep 2 18:56:36 glaurung genunix: /xpvd/xdb@42,51712 (xdb5) online > Sep 2 18:56:36 glaurung genunix: /xpvd/xdb@42,51728 (xdb6) online > Sep 2 18:56:37 glaurung genunix: /xpvd/xnb@42,0 (xnbo3) online > Xen panic[dom=0xffff8300defb2080/vcpu=0xffff8300dfcdc080]: Xen BUG at mm.c:101 >That''s an assertion failure in Xen, in this case it unexpectedly finds a level 3 page table entry with the PSE bit set. I can''t immediately say why this happens, but it''s definitely a bug in Xen itself, not in the Solaris dom0. I can take this up on the xen-devel list.. if there''s an easy way to reproduce it, that''d be great! - Frank