Environment: * i386, 7.1 Prerelease (updated today) with a custom UP kernel, ULE scheduler * KDE 3.5.10 * NIC does not share interrupts with another device * See below for configuration files Symptoms: * I can trigger this lockup reliably by starting ktorrent. After a short while (one to two minutes), it locks up. Other commands, e.g., netstat, also lock up. * The console generates "nfe0: watchdog timeout" error messages. * The system becomes unusable and must be rebooted. Attempted Work-arounds: * I have replaced the NIC. No change except now the console now generates "dc0: watchdog timeout". * I have tried an SMP kernel. No change. Attempted Diagnosis: If I break into DDB, the 'ps' output shows a number of processes that seem to be locked related to udp. [irq18:dc0] L *udp ktorrent L *udpinp hald L *udp ntpd L *udp Unfortunately, I am rapidly getting out of my depth here. I have no idea how to go about further analyzing this problem and would appreciate help. Cheers, -- Norbert. /boot/loader.conf: loader_logo=beastie verbose_loading="YES" cpufreq_load="YES" geom_gpt_load="YES" hwpmc_load="YES" # File systems cd9660_load="YES" msdosfs_load="YES" # NIC supprt (MII provides common controller code) miibus_load="YES" if_dc_load=YES pflog_load="YES" procfs_load="YES" # USB ugen_load="YES" uhid_load="YES" ukbd_load="YES" umass_load="YES" ums_load="YES" # Linux linprocfs_load="YES" linux_load="YES" nvidia_load="YES" pseudo_load="YES" random_load="YES" snd_hda_load="YES" # SYSV support sysvmsg_load="YES" sysvsem_load="YES" sysvshm_load="YES" # For gamin kern.maxfiles="25000" # For ZFS vm.kmem_size="512M" vm.kmem_size_max="512M" vfs.zfs.arc_max="160M" vfs.zfs.arc_min="100M" vfs.zfs.vdev.cache.size="5M" vfs.zfs.debug=1 vfs.zfs.prefetch_disable="1" Kernel Config: machine i386 cpu I686_CPU ident NGP makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options KDB # kernel debugger (just in case) options KDB_TRACE options DDB # kernel debugger (just in case) options SCHED_ULE # ULE scheduler options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options INET6 # IPv6 communications protocols options SCTP # Stream Control Transmission Protocol options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options MD_ROOT # MD is a potential root device options COMPAT_43 # Compatible with BSD 4.3 [KEEP THIS!] options COMPAT_FREEBSD4 # Compatible with FreeBSD4 options COMPAT_FREEBSD5 # Compatible with FreeBSD5 options COMPAT_FREEBSD6 # Compatible with FreeBSD6 options KTRACE # ktrace(1) support options STACK # stack(9) support options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. options ADAPTIVE_GIANT # Giant mutex is adaptive. options STOP_NMI # Stop CPUS using NMI instead of IPI options HWPMC_HOOKS # hwpmc(4) performance measurements support. also needs device or kernel module #option KVA_PAGES=512 # bigger kernel address space (2GB) for ZFS (conflicts with nvidia-driver) # Alternate Queuing of network packets options ALTQ options ALTQ_CBQ # Class Bases Queuing (CBQ) options ALTQ_RED # Random Early Detection (RED) options ALTQ_RIO # RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler (HFSC) options ALTQ_PRIQ # Priority Queuing (PRIQ) #options ALTQ_NOPCC # Required for SMP build device apic # I/O APIC # Bus support. device eisa device pci # ATA and ATAPI devices device ata device atadisk # ATA disk drives device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives options ATA_STATIC_ID # Static device numbering device atapicam # SCSI emulation for ATA device scbus device cd # SCSI CD (for atapicam) device da # SCSI disk (for umass) device pass # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver device splash # Splash screen and screen saver support # syscons is the default console driver, resembling an SCO console device sc option SC_HISTORY_SIZE=1000 # normal output options SC_NORM_ATTR=(FG_GREEN|BG_BLACK) options SC_NORM_REV_ATTR=(FG_YELLOW|BG_GREEN) # kernel messages options SC_KERNEL_CONS_ATTR=(FG_LIGHTRED|BG_BLACK) options SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED) # Add suspend/resume support for the i8254. device pmtimer # Parallel port device ppc device ppbus # Parallel port bus (required) #device lpt # Printer #device plip # TCP/IP over parallel #device ppi # Parallel port interface device #device vpo # Requires scbus and da # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! device miibus # MII bus support # Pseudo devices. device loop # Network loopback device ether # Ethernet support device sl # Kernel SLIP device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device gif # IPv6 and IPv4 tunneling device faith # IPv6-to-IPv4 relaying (translation) # The `bpf' device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! # Note that 'bpf' is required for DHCP. device bpf # Berkeley packet filter # USB support (specific devices loaded as modules) device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device ehci # EHCI PCI->USB interface (USB 2.0) device usb # cannot be module -- otherwise compile errors
On Sun, 2008-09-14 at 12:19 -0700, Norbert Papke wrote:> Symptoms: > > * I can trigger this lockup reliably by starting ktorrent. After a short > while (one to two minutes), it locks up. Other commands, e.g., netstat, also > lock up. > * The console generates "nfe0: watchdog timeout" error messages. > * The system becomes unusable and must be rebooted.> Attempted Diagnosis: > > If I break into DDB, the 'ps' output shows a number of processes that seem to > be locked related to udp. > > [irq18:dc0] L *udp > ktorrent L *udpinp > hald L *udp > ntpd L *udp > > Unfortunately, I am rapidly getting out of my depth here. I have no idea how > to go about further analyzing this problem and would appreciate help.Can you add: options WITNESS options WITNESS_SKIPSPIN to your kernel, recompile and wait for the problem to happen again? When it does, from the debugger issue "sh alllocks" and make a note of the output? This will probably show that two locks are held, "Giant" and "udp", along with the thread that holds each of them. Take the ID of the thread that holds the "udp" lock, and enter "tr 100150" (where 100150 is the thread ID. This should hopefully provide enough info to figure out what is happening. Thanks, Gavin