I have a number of boxes which have been running well w/ 5.4-stable. The
last kernel and userland builds on those boxes were on May 26th, from a
CVSUP done that same day.
Last night I did another CVSUP and the usual buildworld/buildkernel/in-
stallkernel/installworld/mergemaster, and the boxes wouldn't come back up,
hanging in sbwait state at the point where they would normally do their
first net access.
In one box, this is a batch of NFS mounts; on the other it is the initial
ntpdate query. At this point the boxes don't respond to pings and sit for-
ever (at least an hour) not doing anything. Typing ^C on the console aborts
the hanging process, and then the startup proceeds. With a ping going from
another system, I see that the net doesn't come up on the problem boxes
until
several seconds into the execution of the next network-related command - it
is almost as if the previous ifconfig (done from rc.conf) didn't have any
effect (despite the "link up" message on the console). The other
oddity is
that I get a "rpc.lockd: 100024 RPC: Port mapper failure" right after
lockd
starts, even though other net commands have completed successfully.
Booting from kernel.old (the May 26th one) boots normally, so it is a ker-
nel issue, not something in userland.
Hardware is a Tyan S2721-533 (dual Xeon) board w/ onboard Intel "em"
Gig-
abit Ethernet. To keep this message short, I'm omitting the dmesg output,
but it can be found at http://www.tmk.com/raidzilla.
Here's the relevant parts of the console output from the two systems:
first system:
/dev/da1s3h: clean, 199137891 free (819 frags, 24892134 blocks, 0.0%
fragmentation)
Setting hostname: rz1.tmk.com.
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
options=b<RXCSUM,TXCSUM,VLAN_MTU>
inet 204.141.35.40 netmask 0xffffff00 broadcast 204.141.35.255
inet6 fe80::2e0:81ff:fe28:94d6%em0 prefixlen 64 tentative scopeid 0x1
ether 00:e0:81:28:94:d6
media: Ethernet 1000baseTX <full-duplex> (autoselect)
status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
add net default: gateway 204.141.35.1
Additional routing options:.
Starting devd.
Mounting NFS file systems:em0: Link is up 1000 Mbps Full Duplex
load: 1.10 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.10 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.10 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.10 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.01 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.01 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 0.56 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 0.56 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 0.07 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
^CScript /etc/rc.d/mountcritremote interrupted
Starting syslogd.
Checking for core dump on /dev/da0s1b...
savecore: no dumps found
Setting date via ntp.
Looking for host gate.tmk.com and service ntp
host found : gate.tmk.com
Looking for host 204.141.35.61 and service ntp
host found : gate.tmk.com
Looking for host 199.224.0.146 and service ntp
host found : ns1.ispnetinc.net
Looking for host 199.224.0.154 and service ntp
host found : ns2.ispnetinc.net
Looking for host 204.141.40.135 and service ntp
host found : ns3.ispnetinc.net
8 Jun 00:30:34 ntpdate[416]: step time server 199.224.0.146 offset 1.526419 sec
Starting rpcbind.
NFS access cache time=2
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout
Starting mountd.
Starting nfsd.
Starting statd.
Starting lockd.
Jun 8 00:30:35 rz1 rpc.lockd: 100024 RPC: Port mapper failure
Starting usbd.
Starting local daemons:.
Updating motd.
Starting ntpd.
Starting rwhod.
Configuring syscons: blanktime screensaver.
second system:
/dev/da1s3h: clean, 199127439 free (1095 frags, 24890793 blocks, 0.0%
fragmentation)
Setting hostname: rz2.tmk.com.
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
options=b<RXCSUM,TXCSUM,VLAN_MTU>
inet 204.141.35.41 netmask 0xffffff00 broadcast 204.141.35.255
inet6 fe80::2e0:81ff:fe2e:67b0%em0 prefixlen 64 tentative scopeid 0x1
ether 00:e0:81:2e:67:b0
media: Ethernet autoselect
status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
add net default: gateway 204.141.35.1
Additional routing options:.
Starting devd.
Mounting NFS file systems:.
Starting syslogd.
Checking for core dump on /dev/da0s1b...
savecore: no dumps found
Setting date via ntp.
Looking for host gate.tmk.com and service ntp
em0: Link is up 1000 Mbps Full Duplex
load: 0.75 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.75 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.75 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.75 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.69 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.64 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.64 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.02 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
^CScript /etc/rc.d/ntpdate interrupted
Starting rpcbind.
NFS access cache time=2
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout
Starting mountd.
Starting nfsd.
Starting statd.
Starting lockd.
Jun 8 00:58:41 rz2 rpc.lockd: 100024 RPC: Port mapper failure
Starting usbd.
Starting local daemons:.
Updating motd.
Starting ntpd.
Starting rwhod.
Configuring syscons: blanktime screensaver.
Here are the list of changed kernel files since the previous (working)
kernel:
find . -newer /tmp/date
./alpha/alpha/busdma_machdep.c
./amd64/amd64/identcpu.c
./amd64/amd64/machdep.c
./conf/NOTES
./conf/files
./conf/options
./dev/ata/ata-chipset.c
./dev/ata/ata-pci.h
./dev/bktr/bktr_reg.h
./dev/sound/pci/ich.c
./kern/uipc_socket.c
./kern/kern_event.c
./kern/sysv_shm.c
./kern/subr_unit.c
./modules/netgraph/device
./modules/netgraph/device/Makefile
./modules/netgraph/Makefile.inc
./modules/udbp/Makefile
./netgraph/bluetooth/drivers/ubt
./netgraph/bluetooth/drivers/ubt/ng_ubt.c
./netgraph/ng_eiface.c
./netgraph/netgraph.h
./netgraph/ng_base.c
./netgraph/ng_iface.c
./netgraph/ng_device.c
./netgraph/ng_ksocket.c
./netinet/tcp_subr.c
./netinet/ip_divert.c
./netinet/ip_icmp.c
./netinet/ip_icmp.h
./netinet6/ipsec.c
./sys/systm.h
Any ideas? Also, if there is additional information needed, let me know
what to do and I'll report back.
Terry Kennedy http://www.tmk.com
terry@tmk.com New York, NY USA