About 10 days ago one of my personal machines started hanging at random. This is the first bit of instability I've ever experienced on this machine (2+ years running) FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD 6.2- RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 root@i386-builder.daemonology.net :/usr/obj/usr/src/sys/GENERIC i386 After about 2 weeks of watching it carefully I've learned almost nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now running healthd without complaints) it's not based on any given network traffic... however it does appear to accompany heavy cpu/disk activity. It usually dies when indexing my websites at night (but not always) and it sometimes dies when compiling programs. Just heavy disk isn't enough to do the job, as backups proceed without problems. Heavy cpu by itself isn't enough to do it either. But if I start compiling things and keep going a while, it will eventually hang. My best guess is that geom is having a problem and locking up. There's no log entry before failure to back this idea up, but I think this because during boot I see the following: ad0: 286168MB <Seagate ST3300622A 3.AAH> at ata0-master UDMA100 GEOM_MIRROR: Device gm0 created (id=575427344). GEOM_MIRROR: Device gm0: provider ad0 detected. ad1: 286168MB <Seagate ST3300622A 3.AAH> at ata0-slave UDMA100 GEOM_MIRROR: Device gm0: provider ad1 detected. GEOM_MIRROR: Device gm0: provider ad1 activated. GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. GEOM_MIRROR: Device gm0: rebuilding provider ad0. Every time it is rebuilding ad0. Every single boot in the last two weeks. Is this any way to get more logging from geom, to confirm or deny this theory? Is there anything else I should be looking at? FWIW, this never happened before the p11 patch to 6.2. I don't know if that is related or not. Obviously, I can't upgrade to 6.3 if heavy cpu/disk activity kills the system. No, I don't have any other insights. I'm not prone to posting "duh help me please!" posts, so I'm quite a bit frustrated by this one. -- Jo Rhett Net Consonance : consonant endings by net philanthropy, open source and other randomness
Jo Rhett wrote:> About 10 days ago one of my personal machines started hanging at > random. This is the first bit of instability I've ever experienced on > this machine (2+ years running) > > FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD > 6.2-RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 > root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC i386 > > After about 2 weeks of watching it carefully I've learned almost > nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now > running healthd without complaints) it's not based on any given network > traffic... however it does appear to accompany heavy cpu/disk > activity. It usually dies when indexing my websites at night (but not > always) and it sometimes dies when compiling programs. Just heavy disk > isn't enough to do the job, as backups proceed without problems. Heavy > cpu by itself isn't enough to do it either. But if I start compiling > things and keep going a while, it will eventually hang. > > My best guess is that geom is having a problem and locking up. There's > no log entry before failure to back this idea up, but I think this > because during boot I see the following: > > ad0: 286168MB <Seagate ST3300622A 3.AAH> at ata0-master UDMA100 > GEOM_MIRROR: Device gm0 created (id=575427344). > GEOM_MIRROR: Device gm0: provider ad0 detected. > ad1: 286168MB <Seagate ST3300622A 3.AAH> at ata0-slave UDMA100 > GEOM_MIRROR: Device gm0: provider ad1 detected. > GEOM_MIRROR: Device gm0: provider ad1 activated. > GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. > GEOM_MIRROR: Device gm0: rebuilding provider ad0. > > Every time it is rebuilding ad0. Every single boot in the last two weeks. > > Is this any way to get more logging from geom, to confirm or deny this > theory?Just a guess but try kern.geom.debugflags > 0 This certainly spews out far more geom info, as to how helpful this will be... Vince> > Is there anything else I should be looking at? > > FWIW, this never happened before the p11 patch to 6.2. I don't know if > that is related or not. > > Obviously, I can't upgrade to 6.3 if heavy cpu/disk activity kills the > system. > > No, I don't have any other insights. I'm not prone to posting "duh help > me please!" posts, so I'm quite a bit frustrated by this one. >
On Fri, 11 Jul 2008 09:59:33 +0200, Jo Rhett <hostmaster@netconsonance.com> wrote:> About 10 days ago one of my personal machines started hanging at > random. This is the first bit of instability I've ever experienced on > this machine (2+ years running) > > FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD 6.2- > RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 > root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC i386 > > After about 2 weeks of watching it carefully I've learned almost > nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now > running healthd without complaints) it's not based on any given network > traffic... however it does appear to accompany heavy cpu/disk > activity. It usually dies when indexing my websites at night (but not > always) and it sometimes dies when compiling programs. Just heavy disk > isn't enough to do the job, as backups proceed without problems. Heavy > cpu by itself isn't enough to do it either. But if I start compiling > things and keep going a while, it will eventually hang. > > My best guess is that geom is having a problem and locking up. There's > no log entry before failure to back this idea up, but I think this > because during boot I see the following: > > ad0: 286168MB <Seagate ST3300622A 3.AAH> at ata0-master UDMA100 > GEOM_MIRROR: Device gm0 created (id=575427344). > GEOM_MIRROR: Device gm0: provider ad0 detected. > ad1: 286168MB <Seagate ST3300622A 3.AAH> at ata0-slave UDMA100 > GEOM_MIRROR: Device gm0: provider ad1 detected. > GEOM_MIRROR: Device gm0: provider ad1 activated. > GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. > GEOM_MIRROR: Device gm0: rebuilding provider ad0. > > Every time it is rebuilding ad0. Every single boot in the last two > weeks. > > Is this any way to get more logging from geom, to confirm or deny this > theory? > > Is there anything else I should be looking at? > > FWIW, this never happened before the p11 patch to 6.2. I don't know if > that is related or not. > > Obviously, I can't upgrade to 6.3 if heavy cpu/disk activity kills the > system. > > No, I don't have any other insights. I'm not prone to posting "duh help > me please!" posts, so I'm quite a bit frustrated by this one.You can try going into the kernel debugger to see where it is hanging. Debugging via a serial cable is also very easy. I don't know the details, but there is a lot of info in the Freebsd handbook. Put this in google 'freebsd handbook kernel debug'. Ronald.
On Fri, Jul 11, 2008 at 12:59:33AM -0700, Jo Rhett wrote:> About 10 days ago one of my personal machines started hanging at > random. This is the first bit of instability I've ever experienced on > this machine (2+ years running) > > FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD 6.2- > RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 root@i386-builder.daemonology.net > :/usr/obj/usr/src/sys/GENERIC i386 > > After about 2 weeks of watching it carefully I've learned almost > nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now > running healthd without complaints) it's not based on any given > network traffic... however it does appear to accompany heavy cpu/disk > activity. It usually dies when indexing my websites at night (but not > always) and it sometimes dies when compiling programs. Just heavy > disk isn't enough to do the job, as backups proceed without > problems. Heavy cpu by itself isn't enough to do it either. But if > I start compiling things and keep going a while, it will eventually > hang.> Is there anything else I should be looking at?Power supply or motherboard would be my first guess. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080711/be841f48/attachment.pgp
On Fri, Jul 11, 2008 at 12:59:33AM -0700, Jo Rhett wrote:> My best guess is that geom is having a problem and locking up. > There's no log entry before failure to back this idea up, but I think > this because during boot I see the following: > > ad0: 286168MB <Seagate ST3300622A 3.AAH> at ata0-master UDMA100 > GEOM_MIRROR: Device gm0 created (id=575427344). > GEOM_MIRROR: Device gm0: provider ad0 detected. > ad1: 286168MB <Seagate ST3300622A 3.AAH> at ata0-slave UDMA100 > GEOM_MIRROR: Device gm0: provider ad1 detected. > GEOM_MIRROR: Device gm0: provider ad1 activated. > GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. > GEOM_MIRROR: Device gm0: rebuilding provider ad0. > > Every time it is rebuilding ad0. Every single boot in the last two > weeks.That just means that it halted without a proper shutdown. If it crashes, the mirror isn't stopped properly, so it's marked dirty, so it must rebuild it. It is the precise analogy of finding all the file systems dirty on boot and fscking them, following a crash. -- Clifton -- Clifton Royston -- cliftonr@iandicomputing.com / cliftonr@lava.net President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services
Jo Rhett <hostmaster@netconsonance.com> wrote: > About 10 days ago one of my personal machines started hanging at > random. This is the first bit of instability I've ever experienced on > this machine (2+ years running) > > FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD 6.2- > RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 root@i386-builder.daemonology.net > :/usr/obj/usr/src/sys/GENERIC i386 > > After about 2 weeks of watching it carefully I've learned almost > nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now > running healthd without complaints) it's not based on any given > network traffic... however it does appear to accompany heavy cpu/disk > activity. It usually dies when indexing my websites at night (but not > always) and it sometimes dies when compiling programs. Just heavy > disk isn't enough to do the job, as backups proceed without > problems. Heavy cpu by itself isn't enough to do it either. But if > I start compiling things and keep going a while, it will eventually > hang. I had exactly the same problems on a machine a few months ago. It had also been running for about two years, then started freezing when there was high CPU + disk activity. It turned out that the power supply went weak (either the power supply itself or the voltage regulators on the main- board). Replacing PS + mainboard solved the problem. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "C++ is the only current language making COBOL look good." -- Bertrand Meyer
-------------- Original message ---------------------- From: "Ben Kaduk" <minimarmot@gmail.com>> On Wed, Jul 16, 2008 at 5:40 PM, Jo Rhett <hostmaster@netconsonance.com> wrote: > > On Jul 11, 2008, at 4:48 AM, Ronald Klop wrote: > >> > >> You can try going into the kernel debugger to see where it is hanging. > >> Debugging via a serial cable is also very easy. > >> I don't know the details, but there is a lot of info in the Freebsd > >> handbook. Put this in google 'freebsd handbook kernel debug'. > > > > > > Thanks for the reply. I'm familiar with these options, but as the system is > > currently running GENERIC and trying to compile a kernel would guarantee to > > cause the problem to occur... I could probably keep hacking at it until I > > finally get everything compiled, but... > > > > Ugh. I guess this option doesn't appeal very much. Are there any other > > options available? > > > > You don't need to compile the kernel on the same machine that you use it > on -- you can copy the compiled kernel into /boot/kernel.new >But how do you handle the issue of differences in contents on the board where you don't have exact identical hardwares? SJK www.sulima.com <>> -Ben Kaduk > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"