Hi all! I'm having problems no end on my 4.8-STABLE box, and I hope you can help me. My processes get lots of signals (mostly 6, 10 and 11), and my kernel dumps core very often. I have the core dumps from the 4 latest crashes. See attachment for more info. Notice the common values for IdlePTD and initial pcb. Does this mean anything? There are other problems as well: 1) Some time ago KDE was hanging in disk wait state forever. 2) When rebuilding the world, make sometimes fails, but when I retry, it either works, or fails elsewhere. I have not seen this on -STABLE, only on 5.0-RELEASE (on the same computer). 3) Programs are frequently closing when they shouldn't, but without any signs that they recieved a signal. I haven't managed to catch one with gdb yet. 4) X11 (or KDE) locks up very often, only a hard reset brings the OS back. 5) The absolutely strangest thing I've ever seen happened today: The console screensaver was running, and when I pressed a key, everything locked up (I could not even ping the box). The view was locked to ttyv0, the cursor disappeared. When I tried to switch to another terminal (with ALT + Function key), it either beeped or did nothing. It was completely random, sometimes it beeped, sometimes not. I have already tested the memory (no errors). There weren't any problems with 4.6-RELEASE on my old box. FreeBSD is awesome, but why is it not working for me? Please help me. Daniela
Oops, looks like my mailer didn't send the attachment. Here it is again. -------------- next part -------------- IdlePTD at phsyical address 0x005ed000 initial pcb at physical address 0x004ff9c0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x3baf84 fault code = supervisor read, page not present instruction pointer = 0x8:0xc022461b stack pointer = 0x10:0xe48afdac frame pointer = 0x10:0xe48afdac code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 18162 (pkg_info) interrupt mask = none trap number = 12 panic: page fault syncing disks... 67 28 9 4 done IdlePTD at phsyical address 0x005ee000 initial pcb at physical address 0x005009c0 panicstr: timeout table full panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x8966de8e fault code = supervisor write, page not present instruction pointer = 0x8:0xc0237273 stack pointer = 0x10:0xdf26cd74 frame pointer = 0x10:0xdf26cd80 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1068 (kdeinit) interrupt mask = net tty bio cam trap number = 12 panic: page fault syncing disks... panic: timeout table full IdlePTD at phsyical address 0x005ee000 initial pcb at physical address 0x005009c0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x68cac fault code = supervisor read, page not present instruction pointer = 0x8:0xc03df6ac stack pointer = 0x10:0xde88eed8 frame pointer = 0x10:0xde88eee0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (pagedaemon) interrupt mask = net tty bio cam trap number = 12 panic: page fault syncing disks... 67 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 giving up on 2 buffers IdlePTD at phsyical address 0x005ee000 initial pcb at physical address 0x005009c0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x2a7d98 fault code = supervisor write, page not present instruction pointer = 0x8:0xc034b0a6 stack pointer = 0x10:0xe4456bf8 frame pointer = 0x10:0xe4456bf8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 9941 (mtree) interrupt mask = bio trap number = 12 panic: page fault syncing disks... 83 15 10 10 10 10 10 10 10 14 done
On Sun, 10 Aug 2003 22:27:51 +0000, Daniela said:> Hi all! > > I'm having problems no end on my 4.8-STABLE box, and I hope you can help me. > > My processes get lots of signals (mostly 6, 10 and 11), and my kernel dumps > core very often. I have the core dumps from the 4 latest crashes. See > attachment for more info. > Notice the common values for IdlePTD and initial pcb. Does this mean anything? > > There are other problems as well: > > 1) Some time ago KDE was hanging in disk wait state forever. > > 2) When rebuilding the world, make sometimes fails, but when I retry, it > either works, or fails elsewhere. I have not seen this on -STABLE, only on > 5.0-RELEASE (on the same computer). > > 3) Programs are frequently closing when they shouldn't, but without any signs > that they recieved a signal. I haven't managed to catch one with gdb yet. > > 4) X11 (or KDE) locks up very often, only a hard reset brings the OS back. > > 5) The absolutely strangest thing I've ever seen happened today: > The console screensaver was running, and when I pressed a key, everything > locked up (I could not even ping the box). > The view was locked to ttyv0, the cursor disappeared. When I tried to switch > to another terminal (with ALT + Function key), it either beeped or did > nothing. It was completely random, sometimes it beeped, sometimes not. > > > I have already tested the memory (no errors). There weren't any problems with > 4.6-RELEASE on my old box. > > FreeBSD is awesome, but why is it not working for me? > Please help me. > > DanielaDo you have more memory you can try? How about a power supply? Maybe you should even rule out the hard drive? I have FBSD 4.8-p1 running on 20+ servers that I can say with (near) 100% certainty take more of a beating in a day than your desktop, and they're all solid as rock. Potentially helpful information would include machine hardware configuration, third party drivers, etc. Basically anything that isn't part of the base. Do you get any errors in syslog from the ata subsystem (assuming you're using ata disks) or anything else for that matter? I have been wrong before, but this *really* sounds like a hardware problem. -- Sam
On Sun, 10 Aug 2003, Daniela wrote:> I'm having problems no end on my 4.8-STABLE box, and I hope you can help me. > > My processes get lots of signals (mostly 6, 10 and 11), and my kernel dumps > core very often. I have the core dumps from the 4 latest crashes. See > attachment for more info. > Notice the common values for IdlePTD and initial pcb. Does this mean anything?Not particularly ... These types of problems are generally caused by: 1. Bad memory 2. Overheated processor, or some similar CPU fault 3. Bad memory> 1) Some time ago KDE was hanging in disk wait state forever.'inode'?> 2) When rebuilding the world, make sometimes fails, but when I retry, it > either works, or fails elsewhere. I have not seen this on -STABLE, only on > 5.0-RELEASE (on the same computer).5.0-R isn't very representative. You'd have to check against -current.> I have already tested the memory (no errors). There weren't any problems with > 4.6-RELEASE on my old box.How did you test the memory? Generally short of using a hardware SIMM tester its very difficult to identify bad modules. memtest86 run over the period of several hours can sometimes work. BIOS "tests" don't count. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org
I'd like to emphasize that memtest86 doesn't catch lots of memory problems. Just last week I was having trouble compiling mozilla so I ran memtest86 over night. Nothing showed up. But, "make buildworld" repeatedly failed on compiler signal 11 errors at about 20% complete. Using "make buildworld", I was able to isolate a bad DIMM and now "make buildworld" and building mozilla run to completion (multiple times). Whenever possible, I run with parity/ECC on the motherboard and the memory modules. I'm hoping a hardware/memory/motherboard expert will chime in. How can manufacturers continue to make PCs without memory checking? With today's standards of 128-256MB in a PC, isn't it just a matter of time until a bit gets flipped the wrong way? Are manufacturers hoping that the bad bit will go unnoticed in multi-media? Is there something in today's non-parity memory modules that helps insure reliable data? Until I hear otherwise, I'll continue to spend extra for the redundant, error-checking memories. Thanks -robert gray Wes Peters <wes@softweyr.com> Sun, 10 Aug 2003 23:31:57 PDT says:>> >> Well the problem with testing memory with software is that its not >> necessarily possible to hammer it hard enough to trigger the problem. >> If you can reproduce it easily you might try cycling out one dimm and >> then trying to crash it. If removing a dimm fixes it then you probably >> took out the bad one. > >In fact, many people in the FreeBSD community feel the best memory test of >all is to 'make world' several times. I have experienced this myself >only once, but after returning the SIMM module to the vendor he verified >it was bad using a hardware tester. The replacement SIMM has been in for >5 months now and the machine has been marvelously stable, as I expect >from FreeBSD.