Jean-Pascal Mazzilli - Sun Microsystems
2006-Sep-08 09:15 UTC
Failed to create a solaris DOMU using NFS root
Hello, I''ve installed a OpenSolaris dom0 snv_44 on a x4100 server, and booted a 32-bit xen kernel. I''m trying to create a OpenSolaris DOMU (snv_44) by following the howto described at http://www.opensolaris.org/os/community/xen/howto/create-osox-domu/. I did not manage to create a diskless environment by using SUNWCXall. I''ve created SUNWCreq instead and add the two additional packages SUNWbind and SUNWbindr. I upgraded the diskless environment using BFU osox-bfu-2006-08-16. Note here that the howto is wrong, the dlserver# mkdir /export/root/<domU-hostname>/platform/kernel/i86xen show be replaced by: dlserver# mkdir -p /export/root/<domU-hostname>/platform/i86xen/kernel/ And finally, I''ve tried to create the xem domain using the following configuration file: memory = 256 name = "t1-30-zone1" kernel = "/export/root/t1-30-zone1/platform/i86xen/kernel/unix" extra = "/platform/i86xen/kernel/unix -B console=xen" ramdisk = "/export/root/t1-30-zone1/platform/i86pc/boot_archive" nics = 1 ip = "10.17.1.53" vif = [ ''bridge=xenbr0'' ] nfs_root = "l6-g-8:/export/root/t1-30-zone1" nfs_server = "10.17.1.97" restart = ''none'' I''ve launched the following command: xm create -c ./t1-30-zone1.py Using config file "./t1-30-zone1.py". Warning: The nics option is deprecated. Please use an empty vif entry instead: vif = [ '''' ] Started domain t1-30-zone1 network interface name ''eth0'' replaced with ''xnf0'' WARNING: cpu0: no workaround for erratum 123 SunOS Release 5.11 Version matrix-aug 32-bit Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. DEBUG enabled xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 512 4 r----- 110.2 t1-30-zone1 4 256 1 -b---- 0.4 The domain seems to be blocked at this point ??? How can I debug that kind of situation ? thanks in advance, Jean-Pascal -- Jean-Pascal Mazzilli - Sun Microsystems E-Mail : Jean-Pascal.Mazzilli@sun.com Tel : +33 (0)4 76 18 80 42
> memory = 256 > name = "t1-30-zone1" > kernel > "/export/root/t1-30-zone1/platform/i86xen/kernel/unix" > extra = "/platform/i86xen/kernel/unix -B console=xen" > ramdisk > "/export/root/t1-30-zone1/platform/i86pc/boot_archive" > nics = 1 > ip = "10.17.1.53" > vif = [ ''bridge=xenbr0'' ] > nfs_root = "l6-g-8:/export/root/t1-30-zone1" > nfs_server = "10.17.1.97" > restart = ''none''Could you try adding the following to the config file and see if it makes a difference? netmask = "255.255.255.0" This message posted from opensolaris.org
Jean-Pascal Mazzilli - Sun Microsystems
2006-Sep-11 15:18 UTC
Re: Re: Failed to create a solaris DOMU using NFS root
I''ve been too optimistic with my previous e-mail. I''m going a little bit further but the boot process seems to be blocked in NFS processing. The ping 10.17.1.53 is working. Do you have some hints to debug that kind of problem ? thanks in advance, The snoop traces below show the problem. 0.00000 t1-30-zone1 -> (broadcast) ARP C Who is 10.17.1.53, t1-30-zone1 ? 0.00132 t1-30-zone1 -> (broadcast) ARP C Who is 10.17.1.97, l6-g-8 ? 0.00152 l6-g-8 -> t1-30-zone1 ARP R 10.17.1.97, l6-g-8 is 0:14:4f:20:6d:bc 0.00175 t1-30-zone1 -> l6-g-8 PORTMAP C GETPORT prog=100005 (MOUNT) vers=1 proto=UDP 0.00204 l6-g-8 -> t1-30-zone1 PORTMAP R GETPORT port=32827 0.00226 t1-30-zone1 -> l6-g-8 MOUNT1 C Mount /export/root/t1-30-zone1 0.00766 l6-g-8 -> t1-30-zone1 MOUNT1 R Mount OK FH=3A01 0.00832 t1-30-zone1 -> l6-g-8 TCP D=2049 S=1023 Syn Seq=85676521 Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> 0.00850 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Syn Ack=85676522 Seq=3010341098 Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> 0.00867 t1-30-zone1 -> l6-g-8 TCP D=2049 S=1023 Ack=3010341099 Seq=85676522 Len=0 Win=49640 0.00883 t1-30-zone1 -> l6-g-8 NFS C NULL2 0.00907 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Ack=85676586 Seq=3010341099 Len=0 Win=49576 0.00920 l6-g-8 -> t1-30-zone1 NFS R NULL2 0.00934 t1-30-zone1 -> l6-g-8 TCP D=2049 S=1023 Ack=3010341127 Seq=85676586 Len=0 Win=49640 0.00952 t1-30-zone1 -> l6-g-8 NFS_ACL C GETATTR2 FH=3A01 0.00979 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Ack=85676682 Seq=3010341127 Len=0 Win=49640 0.00979 l6-g-8 -> t1-30-zone1 NFS_ACL R GETATTR2 OK 0.01000 t1-30-zone1 -> l6-g-8 TCP D=2049 S=1023 Ack=3010341227 Seq=85676682 Len=0 Win=49640 0.01010 t1-30-zone1 -> l6-g-8 NFS C STATFS2 FH=3A01 ..... 1.08436 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Push Ack=85678578 Seq=3010343191 Len=1460 Win=49640 1.99199 t1-30-zone1 -> (broadcast) ARP C Who is 10.17.1.53, t1-30-zone1 ? ... 180.02716 t1-30-zone1 -> l6-g-8 NFS C READ2 FH=5D88 at 0 for 4096 180.08229 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Ack=85678686 Seq=3010347287 Len=0 Win=49640 238.40684 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Ack=85678686 Seq=3010343191 Len=1460 Win=49640 298.41615 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Ack=85678686 Seq=3010343191 Len=1460 Win=49640 358.47565 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Ack=85678686 Seq=3010343191 Len=1460 Win=49640 360.13701 t1-30-zone1 -> l6-g-8 NFS C READ2 FH=5D88 at 0 for 2048 360.19534 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Ack=85678794 Seq=3010347287 Len=0 Win=49640 418.48708 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Ack=85678794 Seq=3010343191 Len=1460 Win=49640 478.49648 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Ack=85678794 Seq=3010343191 Len=1460 Win=49640 540.13680 t1-30-zone1 -> l6-g-8 NFS C READ2 FH=5D88 at 0 for 1024 540.13708 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Rst Seq=3010343191 Len=0 Win=0 550.13667 t1-30-zone1 -> l6-g-8 TCP D=2049 S=1022 Syn Seq=220156562 Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> 550.13695 l6-g-8 -> t1-30-zone1 TCP D=1022 S=2049 Syn Ack=220156563 Seq=3145372700 Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> 550.13718 t1-30-zone1 -> l6-g-8 TCP D=2049 S=1022 Ack=3145372701 Seq=220156563 Len=0 Win=49640 550.13735 t1-30-zone1 -> l6-g-8 NFS C READ2 FH=5D88 at 0 for 512 550.13748 l6-g-8 -> t1-30-zone1 TCP D=1022 S=2049 Ack=220156671 Seq=3145372701 Len=0 Win=49532 550.13790 l6-g-8 -> t1-30-zone1 NFS R READ2 OK (512 bytes) 550.13808 t1-30-zone1 -> l6-g-8 TCP D=2049 S=1022 Ack=3145373317 Seq=220156671 Len=0 Win=49640 550.16264 t1-30-zone1 -> l6-g-8 NFS C READ2 FH=5D88 at 512 for 512 550.16278 l6-g-8 -> t1-30-zone1 TCP D=1022 S=2049 Ack=220156779 Seq=3145373317 Len=0 Win=49640 550.16306 l6-g-8 -> t1-30-zone1 NFS R READ2 OK (512 bytes) 550.16323 t1-30-zone1 -> l6-g-8 TCP D=2049 S=1022 Ack=3145373933 Seq=220156779 Len=0 Win=49640 550.16337 t1-30-zone1 -> l6-g-8 NFS C READ2 FH=5D88 at 1024 for 512 ... The console output is : l6-g-5:root:bash$ xm create -c ./t1-30-zone1.py Using config file "./t1-30-zone1.py". Error: Kernel image does not exist: /export/root/t1-30-zone1/platform/i86xen/kernel/amd64/unix l6-g-5:root:bash$ mount -a mount: /tmp is already mounted or swap is busy mount: /dev/dsk/c0t3d0s0 is already mounted or /export is busy l6-g-5:root:bash$ xm create -c ./t1-30-zone1.py Using config file "./t1-30-zone1.py". Started domain t1-30-zone1 network interface name ''eth0'' replaced with ''xnf0'' module /platform/i86xen/kernel/amd64/unix: text at [0xfffffffffb800000, 0xfffffffffb93984b] data at 0xfffffffffbc00000 module /kernel/amd64/genunix: text at [0xfffffffffb939850, 0xfffffffffbbc1b37] data at 0xfffffffffbc80c60 WARNING: cpu0: no workaround for erratum 123 SunOS Release 5.11 Version matrix-aug 64-bit Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. DEBUG enabled xen v3.0.2-3-sun chgset ''Mon Aug 14 06:30:03 2006 -0700 9671:7419a66ebaeb'' features: 10766c2<cpuid,sse3,nx,asysc,sse2,sse,cx8,pae,mmx,cmov,tsc> Using default device instance data cpu0: initialized cpu module ''cpu.generic'' mem = 524288K (0x20000000) root nexus = i86xen pseudo0 at root pseudo0 is /pseudo scsi_vhci0 at root scsi_vhci0 is /scsi_vhci pseudo-device: dld0 dld0 is /pseudo/dld@0 xendev0 at root IP address: 10.17.1.53 IP netmask: 255.255.255.0 IP router: 10.17.1.1 NFS server: l6-g-8 (10.17.1.97) NFS path: /export/root/t1-30-zone1 NFS2 server l6-g-8 not responding still trying NFS2 server l6-g-8 ok xencons@0, xencons0 xencons0 is /xendev/xencons@0 boot scratch memory used: 0x466f1170 cpu0: x86 (AuthenticAMD family 15 model 33 step 2 clock 2200 MHz) cpu0: Dual Core AMD Opteron(tm) Processor 275 workaround applied for cpu erratum #121 NOTICE: cpqhpc: 64-bit driver module not found pseudo-device: devinfo0 devinfo0 is /pseudo/devinfo@0 Hostname: t1-30-zone1 Jean-Pascal Mazzilli - Sun Microsystems wrote:> thanks for your help. > > I''ve updated the xen config file as follows: > > memory = 512 > name = "t1-30-zone1" > kernel = "/export/root/t1-30-zone1/platform/i86xen/kernel/amd64/unix" > extra = "/platform/i86xen/kernel/amd64/unix -v" > ramdisk = "/export/root/t1-30-zone1/platform/i86pc/boot_archive" > ip = "10.17.1.53" > netmask = "255.255.255.0" > hostname = "t1-30-zone1" > gateway = "10.17.1.1" > vif = [ ''mac=0:14:4f:1f:c3:a'' ] > nfs_root = "l6-g-8:/export/root/t1-30-zone1" > nfs_server = "10.17.1.97" > on_shutdown = "destroy" > on_reboot = "restart" > on_crash = "destroy" > > and it works fine now. > > Jean-Pascal > > Mark Johnson wrote: > >>> memory = 256 >>> name = "t1-30-zone1" >>> kernel >>> "/export/root/t1-30-zone1/platform/i86xen/kernel/unix" >>> extra = "/platform/i86xen/kernel/unix -B console=xen" >>> ramdisk >>> "/export/root/t1-30-zone1/platform/i86pc/boot_archive" >>> nics = 1 >>> ip = "10.17.1.53" >>> vif = [ ''bridge=xenbr0'' ] >>> nfs_root = "l6-g-8:/export/root/t1-30-zone1" >>> nfs_server = "10.17.1.97" >>> restart = ''none'' >> >> >> >> Could you try adding the following to the config file >> and see if it makes a difference? >> >> netmask = "255.255.255.0" >> >> >> This message posted from opensolaris.org >> _______________________________________________ >> xen-discuss mailing list >> xen-discuss@opensolaris.org > > >-- Jean-Pascal Mazzilli - Sun Microsystems E-Mail : Jean-Pascal.Mazzilli@sun.com Tel : +33 (0)4 76 18 80 42
* Jean-Pascal.Mazzilli@Sun.COM [2006-09-11 16:18:11]> 540.13680 t1-30-zone1 -> l6-g-8 NFS C READ2 FH=5D88 at 0 for 1024 > 540.13708 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Rst > Seq=3010343191 Len=0 Win=0This is interesting. I wonder why the server decided to reset the connection? How long did you wait for it to boot? I''ve seen it take 2 minutes for the first boot of a new diskless client. Is network traffic continuing to flow? dme. -- David Edmondson, Sun Microsystems, http://www.dme.org
Jean-Pascal Mazzilli - Sun Microsystems
2006-Sep-12 07:57 UTC
Re: Failed to create a solaris DOMU using NFS root
David Edmondson wrote:> * Jean-Pascal.Mazzilli@Sun.COM [2006-09-11 16:18:11] > >>540.13680 t1-30-zone1 -> l6-g-8 NFS C READ2 FH=5D88 at 0 for 1024 >>540.13708 l6-g-8 -> t1-30-zone1 TCP D=1023 S=2049 Rst >>Seq=3010343191 Len=0 Win=0 > > > This is interesting. I wonder why the server decided to reset the > connection?What is have observed is the following: at the begining of the domu boot, the NFS connection is correctly working, the root filesystem /export/root/t1-30-zone1 is mounted and NFS processing is working fine. We are not so familiar with the diskless solaris boot, but we suppose that xnf0 interface is replumbed in the early stage of the boot. We think that this replumb broke the NFS/TCP connection and leads to several NFS retries, and finally to a connection reset coming from the server side. Just a guess ;-)> > How long did you wait for it to boot? I''ve seen it take 2 minutes for > the first boot of a new diskless client.We did not succeed to complete the boot. The last message that appears on the domu console is: Hostname: t1-30-zone1> > Is network traffic continuing to flow?Yes, ICMP traffic is flowing continuously during the boot process.> > dme.Any clues to debug the NFS problem ? thanks, Jean-Pascal