Hello, I'm having a Problem with FreeBSD 5.3 here. The system slowly freezes. It starts with one application that just locks up. Other applications still work, but when I switch to them and do stuff in them, they usually lock up after a few seconds as well. Starting new processes or logging in at a physical console does not work anymore, and after about 30 secs the whole system is frozen. Nothing is printed to the first physical console or the logs. This has happened both under load and while the system was mostly idle (just me irc'ing). Now, I realize that this description is very vague, but maybe you can tell me how to even start debugging this? There's no panic, ie no kernel dump I could analyze. I'm no kernel developer, but if I had to guess it sounds like a scheduler problem, ie some table being overwritten. I've attached my dmesg for reference. Greetings Benjamin -------------- next part -------------- Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-RELEASE-p2 #20: Thu Dec 2 03:52:21 CET 2004 maxlor@merlin:/usr/obj/usr/src/sys/MERLIN WARNING: MPSAFE network stack disabled, expect reduced performance. ACPI APIC Table: <Nvidia AWRDACPI> Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 Processor 3500+ (2210.09-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0xff0 Stepping = 0 Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> AMD Features=0xe0500000<NX,AMIE,LM,DSP,3DNow!> real memory = 1073676288 (1023 MB) avail memory = 1036861440 (988 MB) ioapic0 <Version 1.1> irqs 0-23 on motherboard npx0: [FAST] npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <Nvidia AWRDACPI> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf0-0xcf3,0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 isab0: <PCI-ISA bridge> at device 1.0 on pci0 isa0: <ISA bus> on isab0 pci0: <serial bus, SMBus> at device 1.1 (no driver attached) ohci0: <OHCI (generic) USB controller> mem 0xf5002000-0xf5002fff irq 22 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered ohci1: <OHCI (generic) USB controller> mem 0xf5003000-0xf5003fff irq 21 at device 2.1 on pci0 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: <OHCI (generic) USB controller> on ohci1 usb1: USB revision 1.0 uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 4 ports with 4 removable, self powered ehci0: <EHCI (generic) USB 2.0 controller> mem 0xf5004000-0xf50040ff irq 20 at device 2.2 on pci0 ehci0: [GIANT-LOCKED] ehci_pci_attach: companion usb0 ehci_pci_attach: companion usb1 usb2: EHCI version 1.0 usb2: companion controllers, 4 ports each: usb0 usb1 usb2: <EHCI (generic) USB 2.0 controller> on ehci0 usb2: USB revision 2.0 uhub2: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 8 ports with 8 removable, self powered pci0: <bridge, PCI-unknown> at device 5.0 (no driver attached) pci0: <multimedia, audio> at device 6.0 (no driver attached) atapci0: <nVidia nForce3 Pro UDMA133 controller> port 0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 8.0 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 atapci1: <GENERIC ATA controller> port 0xdc00-0xdc0f,0xb60-0xb63,0x960-0x967,0xbe0-0xbe3,0x9e0-0x9e7 irq 20 at device 9.0 on pci0 ata2: channel #0 on atapci1 ata3: channel #1 on atapci1 pcib1: <ACPI PCI-PCI bridge> at device 11.0 on pci0 pci1: <ACPI PCI bus> on pcib1 nvidia0: <GeForce FX 5200> mem 0xe8000000-0xefffffff,0xf0000000-0xf0ffffff irq 16 at device 0.0 on pci1 nvidia0: [GIANT-LOCKED] pcib2: <ACPI PCI-PCI bridge> at device 14.0 on pci0 pci2: <ACPI PCI bus> on pcib2 pcm0: <Creative EMU10K1> port 0xa000-0xa01f irq 18 at device 8.0 on pci2 pcm0: <SigmaTel STAC9708/11 AC97 Codec> pci2: <multimedia, video> at device 9.0 (no driver attached) pci2: <multimedia> at device 9.1 (no driver attached) re0: <RealTek 8169S Single-chip Gigabit Ethernet> port 0xa800-0xa8ff mem 0xf3000000-0xf30000ff irq 16 at device 13.0 on pci2 miibus0: <MII bus> on re0 rgephy0: <RTL8169S/8110S media interface> on miibus0 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto re0: Ethernet address: 00:11:09:65:fc:0e re0: [GIANT-LOCKED] fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A ppc0: <Standard parallel printer port> port 0x778-0x77b,0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: <Parallel port bus> on ppc0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model MouseMan+, device ID 0 pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xd6000-0xd6fff,0xd4000-0xd57ff,0xd0000-0xd3fff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled Timecounter "TSC" frequency 2210091664 Hz quality 800 Timecounters tick every 10.000 msec acd0: DVDROM <Pioneer DVD-ROM ATAPIModel DVD-106S 0122/E1.22> at ata0-master UDMA66 acd1: DVDR <PLEXTOR DVDR PX-708A/1.07> at ata1-master UDMA33 ad4: 152627MB <WDC WD1600JD-00HBB0/08.02D08> [310101/16/63] at ata2-master UDMA33 cd0 at ata0 bus 0 target 0 lun 0 cd0: <PIONEER DVD-ROM DVD-106 1.22> Removable CD-ROM SCSI-0 device cd0: 66.000MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present cd1 at ata1 bus 0 target 0 lun 0 cd1: <PLEXTOR DVDR PX-708A 1.07> Removable CD-ROM SCSI-0 device cd1: 33.000MB/s transfers cd1: Attempt to query device size failed: NOT READY, Medium not present - tray closed Mounting root from ufs:/dev/ad4s1a WARNING: / was not properly dismounted WARNING: /tmp was not properly dismounted WARNING: /var was not properly dismounted NVRM: AGP cannot be enabled on this combination of the AMD CPU and OS kernel NVRM: kernel upgrade recommended. NVRM: AGP cannot be enabled on this combination of the AMD CPU and OS kernel NVRM: kernel upgrade recommended. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20041223/389ab0d7/attachment.bin
On Thu, 2004-Dec-23 04:08:39 +0100, Benjamin Lutz wrote:>I'm having a Problem with FreeBSD 5.3 here. The system slowly freezes. > >It starts with one application that just locks up. Other applications >still work, but when I switch to them and do stuff in them, they usually >lock up after a few seconds as well. Starting new processes or logging in >at a physical console does not work anymore, and after about 30 secs the >whole system is frozen. Nothing is printed to the first physical console >or the logs. This has happened both under load and while the system was >mostly idle (just me irc'ing).Can you do a 'ps axl' as the system freezes. Of particular interest would be the WCHAN for the frozen processes. What still works when the system is frozen? Can you switch VTYs? Do ping's work (from another system)? To actually solve the problem, you're going to need to enable DDB. See http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html Again, a "ps" with the system frozen is the first step. -- Peter Jeremy
On Thu, 23 Dec 2004, Benjamin Lutz wrote:> I'm having a Problem with FreeBSD 5.3 here. The system slowly freezes. > > It starts with one application that just locks up. Other applications > still work, but when I switch to them and do stuff in them, they usually > lock up after a few seconds as well. Starting new processes or logging in > at a physical console does not work anymore, and after about 30 secs the > whole system is frozen. Nothing is printed to the first physical console > or the logs. This has happened both under load and while the system was > mostly idle (just me irc'ing). > > Now, I realize that this description is very vague, but maybe you can tell > me how to even start debugging this? There's no panic, ie no kernel dump > I could analyze. > > I'm no kernel developer, but if I had to guess it sounds like a scheduler > problem, ie some table being overwritten. > > I've attached my dmesg for reference.This is actually fairly symptomatic of a deadlock, either due to a leaked lock, a literal lock deadlock, or a resource deadlock. If you can get to the console, either by switching away from X or via a serial console, compile your kernel with DDB+KDB, break to the debugger, and do the following: ps show threads show lockedvnods You might also try building with INVARIANTS and WITNESS support, and see if the failure mode becomes an assertion failure instead of a wedge. With WITNESS compiled in, you can also get more extensive debugging information using "show locks" and "show witness". Ideally, with a serial console, you can copy and paste the results of these commands into an e-mail. If you don't have a serial console, it's a bit more laborious: however, what you're looking for is lots of threads blocked in similar wait channels in the ps output. You'll see lots of output like this: db> ps pid proc uid ppid pgrp flag stat wmesg wchan cmd 586 c168adc8 0 585 585 0000002 [SLPQ ttyin 0xc13e1c10][SLP] cu 585 c16ca000 0 559 585 0004002 [SLPQ ttyin 0xc13e5410][SLP] cu 559 c16867e0 0 558 559 0004002 [SLPQ pause 0xc1686814][SLP] csh 558 c16869d8 0 1 558 0004102 [SLPQ wait 0xc16869d8][SLP] login 557 c1686bd0 0 1 557 0004002 [SLPQ ttyin 0xc13ee810][SLP] getty 556 c1686dc8 0 1 556 0004002 [SLPQ ttyin 0xc13f4c10][SLP] getty ^^^^^^^^^^^^^^ this stuff What we want to know is what the common entries in the "wmesg" column are, particularly for processes that are known to be in a wedge state. If doing this by hand, we don't need the output of "show threads", but knowing how many lines and what sort of lines appear in "show lockedvnods" would be useful. You can find some reasonable documentation on how to get started on kernel debugging in the handbook. I'm not sure it addresses live debugging via DDB in great detail, so I guess I'll take a look and flesh it out some over the holidays if there isn't enough information there. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research
Well, this is unexpected. I enabled the debugging options as instructed (INVARIANT_SUPPORT, INVARIANTS, WITNESS, DDB, GDB), rebooted and then I got this next time I booted: KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-RELEASE-p2 #20: Thu Dec 2 03:52:21 CET 2004 maxlor@merlin:/usr/obj/usr/src/sys/MERLIN WARNING: WITNESS option enabled, expect reduced performance. WARNING: MPSAFE network stack disabled, expect reduced performance. ACPI APIC Table: <Nvidia AWRDACPI> Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 Processor 3500+ (2210.09-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0xff0 Stepping = 0 Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CM OV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> AMD Features=0xe0500000<NX,AMIE,LM,DSP,3DNow!> real memory = 1073676288 (1023 MB) avail memory = 1036861440 (988 MB) ioapic0 <Version 1.1> irqs 0-23 on motherboard panic: spin lock rm.mutex_mtx not in order list KDB: enter: panic [thread 0] Stopped at kdb_enter+0x30: leave db> ps axl Symbol not found db> show threads 100011 (0xc268c190) fork_trampoline() at fork_trampoline 100035 (0xc26ec7d0) fork_trampoline() at fork_trampoline 100034 (0xc26ec640) fork_trampoline() at fork_trampoline 100033 (0xc26ec320) fork_trampoline() at fork_trampoline (...) 100001 (0xc2689190) fork_trampoline() at fork_trampoline 0 (0xc06fbbc0) kdb_enter(c06a4be9,c0700d00,c06a857d,c1021c60,100) at kbd _enter+0x30 db> show lockedvnods Locked vnodes db> show witness (too much info for me to type (haven't got the serial console running yet, I can however not find any mention of an rm mutex) So... what do I do about this? Greetings Benjamin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20041223/75dbc1bb/attachment.bin
Hello, It's been two months since this thread was started. To recap, I repeatedly encountered what was identified by the gurus here as filesystem deadlock, which froze my machine. Now, after some time, I have an additional observation to add: The deadlock *only* happens when Quanta (KDE HTML Editor) is running (for some time, say, 1-5 hours). Since it's the only application causing this, it appears to me that Quanta is somehow misbehaving (and maybe hitting a weak spot in FreeBSD that isn't relevant normally because apps don't misbehave this way). I'm short on time at the moment (lots of big compsci tests :) ), but I'll try to find out what the last is that Quanta does before it crashes the system. In the meantime... I'd be interested in knowing whether other folks here can reproduce this behaviour. (System is FreeBSD-5.3-RELEASE, newest KDE from ports). Thanks :) Benjamin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20050223/802798ff/attachment.bin