There is some extra information. It's what the kernel say today. I just turn
on and turn off the powersave.
kernel: wi0: timeout in wi_seek to 152/0
last message repeated 7 times
kernel: wi0: device timeout
kernel: wi0: timeout in wi_seek to 152/0
kernel: wi0: timeout in wi_cmd 0x010b; event status 0x8000
kernel: wi0: xmit failed
kernel: wi0: timeout in wi_seek to 152/0
last message repeated 6 times
kernel: wi0: bad alloc 152 != 128, cur 0 nxt 0
kernel: wi0: record read mismatch, rid=fd42, got=fd41
kernel: wi0: record read mismatch, rid=fdc1, got=fd42
kernel: wi0: record read mismatch, rid=fd41, got=fdc1
On 6/27/06, freebsd-stable-request@freebsd.org <
freebsd-stable-request@freebsd.org> wrote:>
> Send freebsd-stable mailing list submissions to
> freebsd-stable@freebsd.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> or, via email, send a message with subject or body 'help' to
> freebsd-stable-request@freebsd.org
>
> You can reach the person managing the list at
> freebsd-stable-owner@freebsd.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of freebsd-stable digest..."
>
>
> Today's Topics:
>
> 1. Re: force panic of remote server ... possible? (Ed Maste)
> 2. Re: force panic of remote server ... possible? (Ed Maste)
> 3. Re: vinum to gvinum help (Mark Linimon)
> 4. Re: Setting up GEOM mirror (Mike Jakubik)
> 5. Re: What denotes a 'blocked' process? (Marc G. Fournier)
> 6. RE: vinum to gvinum help (Wilde, Donald)
> 7. Re: What denotes a 'blocked' process? (Kostik Belousov)
> 8. Re: vmstat 'b' (disk busy?) field keeps climbing ...
> (Marc G. Fournier)
> 9. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Dmitry Pryanishnikov)
> 10. Re: vmstat 'b' (disk busy?) field keeps climbing ... (Max
Laier)
> 11. Re: kernel can't find root filesystem (Michael Proto)
> 12. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
> 13. Re: Gigabit ethernet very slow. (Matthew D. Fuller)
> 14. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Wilko Bulte)
> 15. Re: wi0 down when print a lot of data to screen over ssh
> (Michael Proto)
> 16. Re: kernel can't find root filesystem (M.Hirsch)
> 17. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
> 18. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Wilko Bulte)
> 19. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
> 20. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Wilko Bulte)
> 21. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Dmitry Pryanishnikov)
> 22. Re: vmstat 'b' (disk busy?) field keeps climbing ...
> (Marc G. Fournier)
> 23. Re: What denotes a 'blocked' process? (Marc G. Fournier)
> 24. RE: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Michael Butler)
> 25. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
> 26. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Wilko Bulte)
> 27. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
> 28. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
> 29. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
> 30. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Dmitry Pryanishnikov)
> 31. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Steven Hartland)
> 32. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> (Thomas Nystr?m)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 26 Jun 2006 10:52:38 -0400
> From: Ed Maste <emaste@phaedrus.sandvine.ca>
> Subject: Re: force panic of remote server ... possible?
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20060626145238.GA22081@sandvine.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 01:06:14PM +0100, Gavin Atkinson wrote:
>
> > On Mon, 2006-06-26 at 08:55 -0300, Marc G. Fournier wrote:
> > > For the server that I'm fighting with right now, where Dmitry
pointed
> out
> > > that it looks like a deadlock issue ... I have dumpdev/savecore
> enabled,
> > > is there some way of forcing it to panic when I know I actually
have
> the
> > > deadlock, so that it will dump a core?
> >
> > You cen enter the debugger by setting the (badly names)
debug.kdb.enter
> > sysctl to 1, although I can't guarantee that'll trigger a dump
and
> > reboot. Do you have a serial console?
>
> >From some of your other messages, I believe this is a remote machine?
> Unless you can access an attached keyboard, or have a serial console,
> debug.kdb.enter will leave the machine sitting in ddb with no way to
> get out. Also, if you have a PS/2 keyboard (that is, one handled by
> the atkbd(4) driver) ddb will not accept any input on 6.1 or HEAD.
> (There is some discussion of this issue on the freebsd-current list.)
> Before using ddb on a remote machine I would suggest testing it out
> with the same release locally.
>
> For your original question -- I'm not sure which release it first
> appeared in (and it may be only in -CURRENT), but if it exists you
> can use:
>
> $ sysctl -d debug.kdb.panic
> debug.kdb.panic: set to panic the kernel
>
> -ed
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 26 Jun 2006 13:32:37 -0400
> From: Ed Maste <emaste@phaedrus.sandvine.ca>
> Subject: Re: force panic of remote server ... possible?
> To: "Marc G. Fournier" <scrappy@hub.org>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20060626173237.GA53085@sandvine.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 01:06:14PM +0100, Gavin Atkinson wrote:
>
> > On Mon, 2006-06-26 at 08:55 -0300, Marc G. Fournier wrote:
> > > For the server that I'm fighting with right now, where Dmitry
pointed
> out
> > > that it looks like a deadlock issue ... I have dumpdev/savecore
> enabled,
> > > is there some way of forcing it to panic when I know I actually
have
> the
> > > deadlock, so that it will dump a core?
> >
> > You cen enter the debugger by setting the (badly names)
debug.kdb.enter
> > sysctl to 1, although I can't guarantee that'll trigger a dump
and
> > reboot. Do you have a serial console?
>
> >From some of your other messages, I believe this is a remote machine?
> Unless you can access an attached keyboard, or have a serial console,
> debug.kdb.enter will leave the machine sitting in ddb with no way to
> get out. Also, if you have a PS/2 keyboard (that is, one handled by
> the atkbd(4) driver) ddb will not accept any input on 6.1 or HEAD.
> (There is some discussion of this issue on the freebsd-current list.)
> Before using ddb on a remote machine I would suggest testing it out
> with the same release locally.
>
> For your original question -- I'm not sure which release it first
> appeared in (and it may be only in -CURRENT), but if it exists you
> can use:
>
> $ sysctl -d debug.kdb.panic
> debug.kdb.panic: set to panic the kernel
>
> -ed
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 26 Jun 2006 14:33:19 -0500
> From: linimon@lonesome.com (Mark Linimon)
> Subject: Re: vinum to gvinum help
> To: Sven Willenberger <sven@dmv.com>
> Cc: Roland Smith <rsmith@xs4all.nl>, freebsd-stable
> <freebsd-stable@freebsd.org>
> Message-ID: <20060626193319.GC909@soaustin.net>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 02:15:24PM -0400, Sven Willenberger wrote:
> > this is a production server that can at best stand an hour or so of
> > downtime.
>
> IMHO there are no 5.2.1 upgrade options that can be accomplish in even
> a small number of hours. The kernel libraries were all updated for 5.3;
> and hundreds, if not more, ports were updated. Since the 5.3 release,
> there have been thousands, if not tens of thousands, of commits to the
> ports tree, many of which make major infrastructural changes.
>
> Either going to 5.5 or 6.1 at this point should (also IMHO) be a complete
> reinstall on a staging system, with some tough testing there to show that
> the upgrade will work for your applications.
>
> Otherwise I think you're asking for some serious grief here.
>
> mcl
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 26 Jun 2006 15:03:54 -0400
> From: Mike Jakubik <mikej@rogers.com>
> Subject: Re: Setting up GEOM mirror
> To: Vivek Khera <vivek@khera.org>
> Cc: freebsd-stable <freebsd-stable@freebsd.org>
> Message-ID: <44A02F9A.4080606@rogers.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Vivek Khera wrote:
> >
> > On Jun 25, 2006, at 2:14 PM, Mike Jakubik wrote:
> >>
> >> The problem with these instructions is that they don't take in
to
> >> account the last sector. You may very well end up writing the
> >> metadata on the file system.
> >>
> >
> > When was the last time you fdisk'd a disk and it used the last
sector
> > on the drive? I always end up with a bunch of extra space that
didn't
> > fit into the round numbers of the file system.
> >
>
> Hopefully never :) Just mentioning this as a precaution.
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 26 Jun 2006 12:44:17 -0300 (ADT)
> From: "Marc G. Fournier" <scrappy@hub.org>
> Subject: Re: What denotes a 'blocked' process?
> To: freebsd-stable@freebsd.org
> Message-ID: <20060626124226.Y1114@ganymede.hub.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Mon, 26 Jun 2006, Marc G. Fournier wrote:
>
> >
> > Just upgraded to June 15th sources, started up all the processes, and
am
> > already at 29 blocked processes ...
> >
> > I've checked for states D, E and L ... nothing ...
> >
> > Actually, let's go one better ... attached is a complete list of
my
> process
> > table (MWCHAN, STATE, COMMAND) ... right now, vmstat is showing:
> >
> > 1 33 0 6381952 177944 1695 0 0 0 1601 0 1 0 416 50012
1657
> 14
> > 14 72
> > 1 33 2 6376440 181744 2013 0 0 0 2172 0 3 0 448 68528
1629
> 17
> > 15 68
> > 4 33 0 6385484 178364 1944 0 3 0 1758 0 8 0 420 57698
1221
> 17
> > 14 69
> > 23 46 0 6463664 149528 5294 29 4 2 4659 0 37 0 505 44758
3040
> 27
> > 28 45
> > 4 34 1 6424904 169660 4216 16 7 0 4047 0 211 0 1002 47502
5769
> 42
> > 30 28
> > 1 35 0 6453992 167388 2414 0 9 0 2265 0 44 0 535 62932
3160
> 18
> > 18 64
> > 7 33 0 6443672 168100 1642 0 0 0 1652 0 5 0 448 51974
2163
> 15
> > 15 70
> >
> > So, according to this, there should be 33 processes blocked somewhere
> ...
> > STATEs D/E/L all show nothing ... even state R (long shot) is showing
> 3-4
> > processes, and that's it ...
> >
> > This kernel is actually worse then the last, in that the last, on a
> reboot,
> > I'd see 4-5 blocked, and then it would slowly rise over the course
of 24
> > hours, not start at 33 and rise from there ...
>
> Wow, in less then 1 hour, I'm up to 60 blocked, barely 1 runnable:
>
> 0 60 0 7016076 187424 2527 0 0 0 1722 0 5 0 320 7921 2140
> 24 19 57
> 0 60 0 7027436 185124 581 0 1 0 428 0 9 0 303 3214
> 2425 5 9 86
> 0 60 0 7053368 183060 217 4 1 0 130 0 71 0 453 1748
> 1157 6 4 90
> 1 60 1 7050848 183556 4 0 0 7 27 0 21 0 307 965
> 857 1 4 94
> 0 60 2 7050860 183652 2 0 0 0 6 0 0 0 256 829
> 1030 2 3 95
> 0 60 0 7051028 183348 28 1 2 0 11 0 3 0 307 944
> 855 3 3 95
> 0 60 1 7056876 182248 136 0 0 0 66 0 8 0 285 1190
> 945 1 4 95
>
> And nadda in ps:
>
> pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 ==
"STAT"' ; ps
> aux | wc -l
> PID PPID F MWCHAN TT STAT TIME COMMAND
> 2 0 204 - ?? DL 0:00.45 [g_event]
> 3 0 204 - ?? DL 0:04.87 [g_up]
> 4 0 204 - ?? DL 0:06.19 [g_down]
> 5 0 204 - ?? DL 0:00.00 [thread taskq]
> 6 0 204 - ?? DL 0:00.00 [kqueue taskq]
> 7 0 204 - ?? DL 0:00.00 [acpi_task0]
> 8 0 204 - ?? DL 0:00.00 [acpi_task1]
> 9 0 204 - ?? DL 0:00.00 [acpi_task2]
> 10 0 204 ktrace ?? DL 0:00.00 [ktrace]
> 15 0 204 - ?? DL 0:00.68 [yarrow]
> 25 0 204 psleep ?? DL 0:00.70 [pagedaemon]
> 26 0 204 psleep ?? DL 0:00.00 [vmdaemon]
> 27 0 20c pgzero ?? DL 0:14.43 [pagezero]
> 28 0 204 psleep ?? DL 0:00.14 [bufdaemon]
> 29 0 204 vlruwt ?? DL 0:00.15 [vnlru]
> 30 0 204 syncer ?? DL 0:10.29 [syncer]
> 31 0 204 sdflus ?? DL 0:00.68 [softdepflush]
> 32 0 204 - ?? DL 0:03.28 [schedcpu]
> 1170
> pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^E/ || $6 ==
"STAT"' ; ps
> aux | wc -l
> PID PPID F MWCHAN TT STAT TIME COMMAND
> 1174
> pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^L/ || $6 ==
"STAT"' ; ps
> aux | wc -l
> PID PPID F MWCHAN TT STAT TIME COMMAND
> 12 0 20c Giant ?? LL 0:08.16 [swi4: clock]
> 1170
> pluto#
>
> Something *has* to be leaking here somewhere ... :(
>
> ----
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org
> )
> Email . scrappy@hub.org MSN . scrappy@hub.org
> Yahoo . yscrappy Skype: hub.org ICQ . 7615664
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 26 Jun 2006 13:12:58 -0600
> From: "Wilde, Donald" <dwilde@sandia.gov>
> Subject: RE: vinum to gvinum help
> To: "freebsd-stable" <freebsd-stable@freebsd.org>
> Message-ID:
>
<040DF00BF960A24897B5B3EFBE63FE8A026B10B8@ES20SNLNT.srn.sandia.gov
> >
> Content-Type: text/plain; charset=us-ascii
>
>
>
> -----Original Message-----
> From: owner-freebsd-stable@freebsd.org
> [mailto:owner-freebsd-stable@freebsd.org] On Behalf Of Sven Willenberger
> Sent: Monday, June 26, 2006 12:15 PM
> To: Roland Smith
> Cc: freebsd-stable
> Subject: Re: vinum to gvinum help
>
> On Mon, 2006-06-26 at 19:15 +0200, Roland Smith wrote:
> > On Mon, Jun 26, 2006 at 12:22:07PM -0400, Sven Willenberger wrote:
> > > I have an i386 system currently running 5.2.1-RELEASE with a
vinum
> > > mirror array (2 drives comprising /usr ). I want to upgrade this
to
> > > 5.5-RELEASE which, if I understand correctly, no longer supports
> > > vinum arrays. Would simply chaning /boot/loader.conf to read
> > > gvinum_load instead of vinum_load work or would the geom layer
> > > prevent this from working properly? If not, is there a
recommended
> > > way of upgrading a vinum array to a gvinum or gmirror array?
> >
> > Lost of things have changed between 5.2.1 and 5.5. I think it would be
>
> > best to make a backup and do a clean reinstall.
> >
> > Roland
>
> Sadly this may not be an option; this is a production server that can at
> best stand an hour or so of downtime. Between all the custom symlinked
> directories, applications, etc, plus the sheer volume of data that would
> need to be backed up, an in-place upgrade would be infinitely more
> desirable. If it comes to the point of having to back up and do a fresh
> install I suspect I would be using the 6.x series anyway. I was really
> hoping that some way of upgrading in-place were available for vinum.
>
> Sven
>
> DSW> Sven, your best bet will be to build a set of disks off-line and
> then swap them in. That's the only way you can be sure to do it right.
> Ask yourself if the cost of finding and building a mule is worth more
> than the pain of screwing up.
>
> It _is_ well worth doing, there were many things that were still unglued
> in 5.2.1.
> --
> Don Wilde Org 01737 505-844-1126
> Earth Halted: Please reboot to continue
>
>
>
> ------------------------------
>
> Message: 7
> Date: Mon, 26 Jun 2006 23:05:15 +0300
> From: Kostik Belousov <kostikbel@gmail.com>
> Subject: Re: What denotes a 'blocked' process?
> To: "Marc G. Fournier" <scrappy@hub.org>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20060626200515.GL79678@deviant.kiev.zoral.com.ua>
> Content-Type: text/plain; charset="us-ascii"
>
> On Mon, Jun 26, 2006 at 12:44:17PM -0300, Marc G. Fournier wrote:
> > On Mon, 26 Jun 2006, Marc G. Fournier wrote:
> >
> > >
> > >Just upgraded to June 15th sources, started up all the processes,
and
> am
> > >already at 29 blocked processes ...
> > >
> > >I've checked for states D, E and L ... nothing ...
> > >
> > >Actually, let's go one better ... attached is a complete list
of my
> > >process table (MWCHAN, STATE, COMMAND) ... right now, vmstat is
> showing:
> > >
> > >1 33 0 6381952 177944 1695 0 0 0 1601 0 1 0 416 50012
1657
> 14
> > >14 72
> > >1 33 2 6376440 181744 2013 0 0 0 2172 0 3 0 448 68528
1629
> 17
> > >15 68
> > >4 33 0 6385484 178364 1944 0 3 0 1758 0 8 0 420 57698
1221
> 17
> > >14 69
> > >23 46 0 6463664 149528 5294 29 4 2 4659 0 37 0 505
44758
> 3040
> > >27 28 45
> > >4 34 1 6424904 169660 4216 16 7 0 4047 0 211 0 1002 47502
5769
> 42
> > >30 28
> > >1 35 0 6453992 167388 2414 0 9 0 2265 0 44 0 535 62932
3160
> 18
> > >18 64
> > >7 33 0 6443672 168100 1642 0 0 0 1652 0 5 0 448 51974
2163
> 15
> > >15 70
> > >
> > >So, according to this, there should be 33 processes blocked
somewhere
> ...
> > >STATEs D/E/L all show nothing ... even state R (long shot) is
showing
> 3-4
> > >processes, and that's it ...
> > >
> > >This kernel is actually worse then the last, in that the last, on
a
> > >reboot, I'd see 4-5 blocked, and then it would slowly rise
over the
> course
> > >of 24 hours, not start at 33 and rise from there ...
> >
> > Wow, in less then 1 hour, I'm up to 60 blocked, barely 1 runnable:
> >
> > 0 60 0 7016076 187424 2527 0 0 0 1722 0 5 0 320 7921
2140
> 24
> > 19 57
> > 0 60 0 7027436 185124 581 0 1 0 428 0 9 0 303 3214
> 2425 5
> > 9 86
> > 0 60 0 7053368 183060 217 4 1 0 130 0 71 0 453 1748
> 1157 6
> > 4 90
> > 1 60 1 7050848 183556 4 0 0 7 27 0 21 0 307 965
> 857 1 4
> > 94
> > 0 60 2 7050860 183652 2 0 0 0 6 0 0 0 256 829
> 1030 2
> > 3 95
> > 0 60 0 7051028 183348 28 1 2 0 11 0 3 0 307 944
> 855 3 3
> > 95
> > 0 60 1 7056876 182248 136 0 0 0 66 0 8 0 285 1190
> 945 1 4
> > 95
> >
> > And nadda in ps:
> >
> > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 ==
"STAT"' ; ps
> > aux | wc -l
> > PID PPID F MWCHAN TT STAT TIME COMMAND
> > 2 0 204 - ?? DL 0:00.45 [g_event]
> > 3 0 204 - ?? DL 0:04.87 [g_up]
> > 4 0 204 - ?? DL 0:06.19 [g_down]
> > 5 0 204 - ?? DL 0:00.00 [thread taskq]
> > 6 0 204 - ?? DL 0:00.00 [kqueue taskq]
> > 7 0 204 - ?? DL 0:00.00 [acpi_task0]
> > 8 0 204 - ?? DL 0:00.00 [acpi_task1]
> > 9 0 204 - ?? DL 0:00.00 [acpi_task2]
> > 10 0 204 ktrace ?? DL 0:00.00 [ktrace]
> > 15 0 204 - ?? DL 0:00.68 [yarrow]
> > 25 0 204 psleep ?? DL 0:00.70 [pagedaemon]
> > 26 0 204 psleep ?? DL 0:00.00 [vmdaemon]
> > 27 0 20c pgzero ?? DL 0:14.43 [pagezero]
> > 28 0 204 psleep ?? DL 0:00.14 [bufdaemon]
> > 29 0 204 vlruwt ?? DL 0:00.15 [vnlru]
> > 30 0 204 syncer ?? DL 0:10.29 [syncer]
> > 31 0 204 sdflus ?? DL 0:00.68 [softdepflush]
> > 32 0 204 - ?? DL 0:03.28 [schedcpu]
> > 1170
> > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^E/ || $6 ==
"STAT"' ; ps
> > aux | wc -l
> > PID PPID F MWCHAN TT STAT TIME COMMAND
> > 1174
> > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^L/ || $6 ==
"STAT"' ; ps
> > aux | wc -l
> > PID PPID F MWCHAN TT STAT TIME COMMAND
> > 12 0 20c Giant ?? LL 0:08.16 [swi4: clock]
> > 1170
> > pluto#
> >
> > Something *has* to be leaking here somewhere ... :(
>
> Dumb unmotivated question: do you have nfs exports on this machine ?
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 187 bytes
> Desc: not available
> Url :
>
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060626/2fd3cb15/attachment-0001.pgp
>
> ------------------------------
>
> Message: 8
> Date: Mon, 26 Jun 2006 15:25:49 -0300 (ADT)
> From: "Marc G. Fournier" <scrappy@hub.org>
> Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ...
> To: Kostik Belousov <kostikbel@gmail.com>
> Cc: freebsd-stable@freebsd.org, Dmitry Morozovsky <marck@rinet.ru>
> Message-ID: <20060626152345.M1114@ganymede.hub.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
>
> I think I might have found *at least* one of the problems, and that being
> the excessively high blocked states while ps isn't finding anything ...
>
> MySQL
>
> We just recently started allowing clients to run a MySQL server *within*
> their vServer ... in a drastic move, I just shut them all down on pluto,
> and blocked drop'd from ~86 down to 5 in a matter of moments ...
> restarting them all has it climbing once more, being up around 22 already
> ...
>
> I'm going to go with that theory for now, and keep an eye on things ...
>
> Just curious as to why, even with -H, its not showing any blocked states
> within ps though ... ?
>
> Thx
>
>
> On Mon, 26 Jun 2006, Kostik Belousov wrote:
>
> > On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote:
> >> On Mon, 26 Jun 2006, Kostik Belousov wrote:
> >>
> >>> Yes, this looks like a deadlock. As I understand, that's
on 6.1-STABLE?
> >>
> >> Yes, kernel sources, it seems, from May 25th, according to my
/usr/src
> >> tree ...
> >>
> >>> BTW, do you use snapshots ?
> >>
> >> Not that I've explicitly enabled ...
> >>
> >>> I think that without ddb access, diagnose and debug the
problem would
> be
> >>> quite hard.
> >>
> >> Would it be a simple matter of:
> >>
> >> CTL-ALT-ESC
> >> panic
> >>
> >> to get it to dump core? Or would more be involved? Would a core
dump
> >> even work?
> > Core dumps are somewhat unconvenient in this situation. Better,
> > sending report to me, follow my advise in
> >
>
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> >
>
> ----
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org
> )
> Email . scrappy@hub.org MSN . scrappy@hub.org
> Yahoo . yscrappy Skype: hub.org ICQ . 7615664
>
>
> ------------------------------
>
> Message: 9
> Date: Tue, 27 Jun 2006 00:01:08 +0300 (EEST)
> From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Robert Watson <rwatson@freebsd.org>
> Cc: freebsd-acpi@freebsd.org, freebsd-stable@freebsd.org, Pete
> French
> <petefrench@ticketswitch.com>
> Message-ID: <20060626235355.Q95667@atlantis.atlantis.dp.ua>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
>
> Hello!
>
> On Mon, 26 Jun 2006, Robert Watson wrote:
> > I think this is a useful activity, especially if you've already
run
> extensive
> > memory testing on the box. If you haven't yet done that, I
encourage
> you to
> > take a break from buildworld's and make sure the memory tests
pass. I
> spent
> > several months on and off trying to track down a bug a few years ago,
> which
> > turned out to be a one bit error in memory on the box. It would
appear
> and
>
> This is precisely the task which hardware ECC solves: to correct any
> single-
> bit memory error and to detect 2-bit and most of several-bit errors. I
> prefer
> ECC-capable hardware even for home PC; for server it's a must IMHO.
>
> Sincerely, Dmitry
> --
> Atlantis ISP, System Administrator
> e-mail: dmitry@atlantis.dp.ua
> nic-hdl: LYNX-RIPE
>
>
> ------------------------------
>
> Message: 10
> Date: Mon, 26 Jun 2006 22:44:18 +0200
> From: Max Laier <max@love2party.net>
> Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ...
> To: freebsd-stable@freebsd.org
> Cc: Kostik Belousov <kostikbel@gmail.com>, Dmitry Morozovsky
> <marck@rinet.ru>
> Message-ID: <200606262244.25505.max@love2party.net>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Monday 26 June 2006 20:25, Marc G. Fournier wrote:
> > I think I might have found *at least* one of the problems, and that
> being
> > the excessively high blocked states while ps isn't finding
anything ...
> >
> > MySQL
> >
> > We just recently started allowing clients to run a MySQL server
*within*
> > their vServer ... in a drastic move, I just shut them all down on
pluto,
> > and blocked drop'd from ~86 down to 5 in a matter of moments ...
> > restarting them all has it climbing once more, being up around 22
> already
> > ...
> >
> > I'm going to go with that theory for now, and keep an eye on
things ...
> >
> > Just curious as to why, even with -H, its not showing any blocked
states
> > within ps though ... ?
>
> The "blocked" column shows also processes that have objects
> "paging". Most
> likely you are *short* on memory. In order to relieve the pressure
> program .text pages are free'ed and need to be refetched from disc
> whenever
> the respective code is being executed.
>
> If you allow every vServer to run its own mySQL with all the libaries etc
> it's
> clear what is killing you! Add more memory or make sure that .text pages
> can
> be reused by several processes. As far as I understand vServer will all
> see
> a different source and thus not share buffers or the like.
>
> > Thx
> >
> > On Mon, 26 Jun 2006, Kostik Belousov wrote:
> > > On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote:
> > >> On Mon, 26 Jun 2006, Kostik Belousov wrote:
> > >>> Yes, this looks like a deadlock. As I understand,
that's on
> 6.1-STABLE
> > >>> ?
> > >>
> > >> Yes, kernel sources, it seems, from May 25th, according to my
> /usr/src
> > >> tree ...
> > >>
> > >>> BTW, do you use snapshots ?
> > >>
> > >> Not that I've explicitly enabled ...
> > >>
> > >>> I think that without ddb access, diagnose and debug the
problem
> would
> > >>> be quite hard.
> > >>
> > >> Would it be a simple matter of:
> > >>
> > >> CTL-ALT-ESC
> > >> panic
> > >>
> > >> to get it to dump core? Or would more be involved? Would a
core
> dump
> > >> even work?
> > >
> > > Core dumps are somewhat unconvenient in this situation. Better,
> > > sending report to me, follow my advise in
> > >
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kern
> > >eldebug-deadlocks.html
> >
> > ----
> > Marc G. Fournier Hub.Org Networking Services (
> http://www.hub.org)
> > Email . scrappy@hub.org MSN .
> scrappy@hub.org
> > Yahoo . yscrappy Skype: hub.org ICQ . 7615664
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org
> "
>
> --
> /"\ Best regards, | mlaier@freebsd.org
> \ / Max Laier | ICQ #67774661
> X http://pf4freebsd.love2party.net/ | mlaier@EFnet
> / \ ASCII Ribbon Campaign | Against HTML Mail and News
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 189 bytes
> Desc: not available
> Url :
>
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060626/a0fec014/attachment-0001.pgp
>
> ------------------------------
>
> Message: 11
> Date: Mon, 26 Jun 2006 17:18:59 -0400
> From: Michael Proto <mike@jellydonut.org>
> Subject: Re: kernel can't find root filesystem
> To: freebsd-stable@freebsd.org
> Message-ID: <44A04F43.2090400@jellydonut.org>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Robert Ames wrote:
> >> From: "M.Hirsch" <M.Hirsch@hirsch.it>
> >>
> >> I had the same problem with 6.1. But only on some occasions, not
> >> always (iirc).
> >> The installations I made over the last weeks had all very
different
> >> environments and deployment methods.
> >> I can't tell anymore when it happens and when not because I
simply
> >> added the below loader.conf setting to my postinstall-script.
> >>
> >> Add "vfs.root.mountfrom=ufs:da0s1" to /boot/loader.conf
to fix it.
> >
> > Thank you. That solves my problem even though it seems more like
> > a workaround than an actual solution. But I'll take it. :-)
> >
> > Also, someone responded asking if I had a valid entry in /etc/fstab
> > for the root filesystem.
> >
> > foo# cat /etc/fstab
> > # Device Mountpoint FStype Options Dump
> > Pass#
> > /dev/da0s1a / ufs rw
> > 1 1
> > /dev/da0s1b none swap sw
> > 0 0
> > /dev/da1s1d /local ufs rw
> > 2 2
> > /dev/cd0 /cdrom cd9660 ro,noauto
> > 0 0
> >
>
> If I'm not mistaken, you could also try to (re)install the boot0
loader:
>
> boot0cfg /dev/da0
>
>
> -Proto
>
>
> ------------------------------
>
> Message: 12
> Date: Mon, 26 Jun 2006 23:21:22 +0200
> From: "M.Hirsch" <M.Hirsch@hirsch.it>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <44A04FD2.1030001@hirsch.it>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> ECC is a way to mask broken hardware. I rather have my hardware fail
> directly when it does first, so I can replace it _immediately_
> What's your hardware good for if it passes a "test", but
fails in
> production?
>
> ECC is totally overrated.
>
> (sorry, couldn't resist...)
>
> M.
>
>
> ------------------------------
>
> Message: 13
> Date: Mon, 26 Jun 2006 14:32:26 -0500
> From: "Matthew D. Fuller" <fullermd@over-yonder.net>
> Subject: Re: Gigabit ethernet very slow.
> To: Michael Vince <mv@thebeastie.org>
> Cc: freebsd-stable@freebsd.org, performance@freebsd.org, Nikolas
> Britton <nikolas.britton@gmail.com>, Sean Bryant <
> bryants@gmail.com>
> Message-ID: <20060626193226.GF74292@over-yonder.net>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 05:05:26PM +1000 I heard the voice of
> Michael Vince, and lo! it spake thus:
> >
> > According to pftop (with modulate state rules) I am able to get
> > about 85megs/sec when I don't have dd running. dd does indeed eats
a
> > fair amount of cpu (40%) on the AMD64 6-stable machine.
>
> dd does ridiculously small (512 byte?) read/writes, so it's gotta do a
> LOT of system calls and a lot of context switching when you don't give
> it a bigger blocksize.
>
>
> --
> Matthew Fuller (MF4839) | fullermd@over-yonder.net
> Systems/Network Administrator | http://www.over-yonder.net/~fullermd/
> On the Internet, nobody can hear you scream.
>
>
> ------------------------------
>
> Message: 14
> Date: Mon, 26 Jun 2006 23:26:54 +0200
> From: Wilko Bulte <wb@freebie.xs4all.nl>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch@hirsch.it>
> Cc: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>,
> freebsd-stable@FreeBSD.ORG
> Message-ID: <20060626212654.GB93703@freebie.xs4all.nl>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 11:21:22PM +0200, M.Hirsch wrote..
> > ECC is a way to mask broken hardware. I rather have my hardware fail
> > directly when it does first, so I can replace it _immediately_
> > What's your hardware good for if it passes a "test", but
fails in
> > production?
> >
> > ECC is totally overrated.
>
> Balderdash.
>
> Following your rationale you want your bank account data
> silently be corrupted by hardware with bit errors? Be my guest, give
> me ECC any day.
>
> Proper hardware will log the ECC errors, a proper OS tailored to that
> hardware will log and notify the sysadmins.
>
> That is how it should be done.
>
> Wilko
>
> --
> Wilko Bulte wilko@FreeBSD.org
>
>
> ------------------------------
>
> Message: 15
> Date: Mon, 26 Jun 2006 17:28:54 -0400
> From: Michael Proto <mike@jellydonut.org>
> Subject: Re: wi0 down when print a lot of data to screen over ssh
> To: freebsd-stable@freebsd.org
> Message-ID: <44A05196.1070708@jellydonut.org>
> Content-Type: text/plain; charset=UTF-8
>
> Ren Zhen wrote:
> > wi0 goes down when I run a program print a lot of data to
> > stdout, or when I use zmrx-zmtx it also goes down.
> >
> > kernel says:
> > kernel: wi0: timeout in wi_seek to 152/0
> > last message repeated 7 times
> > kernel: wi0: device timeout
> > kernel: wi0: timeout in wi_seek to 152/0
> > kernel: wi0: link state changed to DOWN
> >
> > another time kernel says:
> > kernel: wi0: timeout in wi_cmd 0x010b; event status 0x8000
> > kernel: wi0: xmit failed
> > kernel: wi0: timeout in wi_seek to 128/0
> > last message repeated 3 times
> >
>
> I used to see similar behavior with wi0 on my ThinkPad A30p (IBM High
> Rate Wireless, PRISM 2.5) when powersave was enabled via ifconfig (I
> believe it may be on by default, not sure about that). If you disable
> powersave via 'ifconfig wi0 -powersave' do you still see the
problem?
>
>
> -Proto
>
>
> ------------------------------
>
> Message: 16
> Date: Mon, 26 Jun 2006 23:31:58 +0200
> From: "M.Hirsch" <M.Hirsch@gmx.de>
> Subject: Re: kernel can't find root filesystem
> To: Michael Proto <mike@jellydonut.org>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <44A0524E.900@gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Sorry, doesn't help.
>
> There is some kind of bug hiding somewhere in 6.1 where it does not
> auto-detect the root partition under certain circumstances. Can't tell
> when it worked last, as the last distro I consider "stable" was
4.X...
> (sorry for the rant...)
>
> I am not using (and don't want to use...) boot0 at all.
> Well, I tried, but it didn't help the situation anyways...
>
> It should work with the standard MBR and boot code ("/boot/mbr"
and
> "/boot/boot"), right?
> i.e. fdisk -B and bsdlabel -B without further params should do the job
> to get the system bootstrapped.
> But it does not.
>
> M.
>
> >If I'm not mistaken, you could also try to (re)install the boot0
loader:
> >
> >boot0cfg /dev/da0
> >
> >
> >-Proto
> >_______________________________________________
> >freebsd-stable@freebsd.org mailing list
> >http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"
> >
> >
> >
>
>
>
> ------------------------------
>
> Message: 17
> Date: Mon, 26 Jun 2006 23:37:18 +0200
> From: "M.Hirsch" <M.Hirsch@gmx.de>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Wilko Bulte <wb@freebie.xs4all.nl>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <44A0538E.6090906@gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Nope,
>
> I'd like my bank data to be stored on a system that does ECC, no
question.
> But please, on hard disk level (RAID; that is _permanent_), not in the
> RAM of a single node.
>
> If memory gets corrupted, please, raise a kernel panic... Even if
> there's ECC in place.
>
> Counter question:
> Would you like your bank account data to be stored on a medium where one
> failure can be corrected, two can be detected, but three go unnoticed?
> How unlikely is that, if you've got some hardware that is really
/broken/?
>
> I know this is a rather random thing to happen.
> Still, I think ECC memory is overrated. Better have it fail immediately.
> _With a kernel panic, please_
>
> M.
>
> Wilko Bulte schrieb:
>
> >Balderdash.
> >
> >Following your rationale you want your bank account data
> >silently be corrupted by hardware with bit errors? Be my guest, give
> >me ECC any day.
> >
> >Proper hardware will log the ECC errors, a proper OS tailored to that
> >hardware will log and notify the sysadmins.
> >
> >That is how it should be done.
> >
> >Wilko
> >
> >
> >
>
>
>
> ------------------------------
>
> Message: 18
> Date: Mon, 26 Jun 2006 23:45:35 +0200
> From: Wilko Bulte <wb@freebie.xs4all.nl>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch@gmx.de>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20060626214535.GA94015@freebie.xs4all.nl>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 11:37:18PM +0200, M.Hirsch wrote..
> > Nope,
> >
> > I'd like my bank data to be stored on a system that does ECC, no
> question.
> > But please, on hard disk level (RAID; that is _permanent_), not in the
> > RAM of a single node.
> >
> > If memory gets corrupted, please, raise a kernel panic... Even if
>
> You *can't* panic if it is just a single bit error in a user page. You
> will never know there was a corruption.. If that was a page holding your
> account data your are toast.
>
> > there's ECC in place.
>
> Of course not. You only panic once you have no other options left.
> Proper hardware with ECC give you these options. I am not talking
> consumer grade crap here of course.
>
> > Counter question:
> > Would you like your bank account data to be stored on a medium where
one
> > failure can be corrected, two can be detected, but three go unnoticed?
> > How unlikely is that, if you've got some hardware that is really
> /broken/?
>
> Very unlikely. There is enough hardware design done after all these
> years that this kind of problem can be prevented.
>
> > I know this is a rather random thing to happen.
> > Still, I think ECC memory is overrated. Better have it fail
immediately.
> > _With a kernel panic, please_
>
> As said, you can't
>
> >
> > M.
> >
> > Wilko Bulte schrieb:
> >
> > >Balderdash.
> > >
> > >Following your rationale you want your bank account data
> > >silently be corrupted by hardware with bit errors? Be my guest,
give
> > >me ECC any day.
> > >
> > >Proper hardware will log the ECC errors, a proper OS tailored to
that
> > >hardware will log and notify the sysadmins.
> > >
> > >That is how it should be done.
> > >
> > >Wilko
> > >
> > >
> > >
> --- end of quoted text ---
>
> --
> Wilko Bulte wilko@FreeBSD.org
>
>
> ------------------------------
>
> Message: 19
> Date: Tue, 27 Jun 2006 00:11:03 +0200
> From: "M.Hirsch" <M.Hirsch@gmx.de>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Michael Butler <imb@protected-networks.net>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <44A05B77.1030200@gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> <snip>
>
> > .. So the logs are there, all that's required is a utility to read
them
> >and, optionally, alert the administrator to the event,
> >
> >
> >
> No, I think a panic _should_ occur, even if there was a correctable
> error. Not "when there's no other option left".
> Maybe make it optional via a kernel option.
> There are much less-significant problems that can cause a panic.
>
> Sure, you may be one of the few people out there who knows how to
> correctly run a _BSD_ system...
> There's few of yous out there, ;)
>
> M.
>
>
> ------------------------------
>
> Message: 20
> Date: Tue, 27 Jun 2006 00:18:04 +0200
> From: Wilko Bulte <wb@freebie.xs4all.nl>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch@gmx.de>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20060626221804.GA94278@freebie.xs4all.nl>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, Jun 27, 2006 at 12:11:03AM +0200, M.Hirsch wrote..
> > <snip>
> >
> > >.. So the logs are there, all that's required is a utility to
read them
> > >and, optionally, alert the administrator to the event,
> > >
> > >
> > >
> > No, I think a panic _should_ occur, even if there was a correctable
> > error. Not "when there's no other option left".
>
> You really have never seen a machine used for serious business apparantly.
>
> > Maybe make it optional via a kernel option.
> > There are much less-significant problems that can cause a panic.
>
> panics like that should be eradicated, adding more nonsensical panics
> is not what we need.
>
> > Sure, you may be one of the few people out there who knows how to
> > correctly run a _BSD_ system...
> > There's few of yous out there, ;)
> >
> > M.
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org
> "
> --- end of quoted text ---
>
> --
> Wilko Bulte wilko@FreeBSD.org
>
>
> ------------------------------
>
> Message: 21
> Date: Tue, 27 Jun 2006 01:22:47 +0300 (EEST)
> From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch@hirsch.it>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20060627011512.N95667@atlantis.atlantis.dp.ua>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
>
> Hello!
>
> On Mon, 26 Jun 2006, M.Hirsch wrote:
> > ECC is a way to mask broken hardware. I rather have my hardware fail
> directly
> > when it does first, so I can replace it _immediately_
>
> You got it backwards. If your data has any value to you, then you
don't
> want
> to miss any single-error bit in it, do you? If you're running hardware
w/o
> ECC, your single-bit error in your data will go to the disk unnoticed, and
> you'll lose your data. With ECC, hardware will correct it. In (rare)
case
> of
> multiple-bit error ECC logic will generate NMI for you, so you'll
notice
> and
> "replace it _immediately_" instead of two weeks ago when your
archive wont
> extract.
>
> > What's your hardware good for if it passes a "test", but
fails in
> production?
>
> It's the way in what RAM will manifest single-bit errors: you run
memory
> test
> - it won't catch them, later in production you'll miss this error
because
> nothing will provide extra sanity check of your data.
>
> > ECC is totally overrated.
>
> Only by the people who don't understand it's point!
>
>
> Sincerely, Dmitry
> --
> Atlantis ISP, System Administrator
> e-mail: dmitry@atlantis.dp.ua
> nic-hdl: LYNX-RIPE
>
>
> ------------------------------
>
> Message: 22
> Date: Mon, 26 Jun 2006 18:55:17 -0300 (ADT)
> From: "Marc G. Fournier" <scrappy@hub.org>
> Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ...
> To: Max Laier <max@love2party.net>
> Cc: Kostik Belousov <kostikbel@gmail.com>,
freebsd-stable@freebsd.org,
> Dmitry Morozovsky <marck@rinet.ru>
> Message-ID: <20060626185437.I1114@ganymede.hub.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Mon, 26 Jun 2006, Max Laier wrote:
>
> > On Monday 26 June 2006 20:25, Marc G. Fournier wrote:
> >> I think I might have found *at least* one of the problems, and
that
> being
> >> the excessively high blocked states while ps isn't finding
anything ...
> >>
> >> MySQL
> >>
> >> We just recently started allowing clients to run a MySQL server
> *within*
> >> their vServer ... in a drastic move, I just shut them all down on
> pluto,
> >> and blocked drop'd from ~86 down to 5 in a matter of moments
...
> >> restarting them all has it climbing once more, being up around 22
> already
> >> ...
> >>
> >> I'm going to go with that theory for now, and keep an eye on
things ...
> >>
> >> Just curious as to why, even with -H, its not showing any blocked
> states
> >> within ps though ... ?
> >
> > The "blocked" column shows also processes that have objects
"paging".
> > Most likely you are *short* on memory. In order to relieve the
pressure
> > program .text pages are free'ed and need to be refetched from disc
> > whenever the respective code is being executed.
>
> 'k, but shouldn't the OS be doing any swapping, if this was the
case? I'm
> getting <1M of swappage when the blocked pages are really high ...
>
> ----
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org
> )
> Email . scrappy@hub.org MSN . scrappy@hub.org
> Yahoo . yscrappy Skype: hub.org ICQ . 7615664
>
>
> ------------------------------
>
> Message: 23
> Date: Mon, 26 Jun 2006 18:54:08 -0300 (ADT)
> From: "Marc G. Fournier" <scrappy@hub.org>
> Subject: Re: What denotes a 'blocked' process?
> To: Kostik Belousov <kostikbel@gmail.com>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20060626185338.D1114@ganymede.hub.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Mon, 26 Jun 2006, Kostik Belousov wrote:
>
> > Dumb unmotivated question: do you have nfs exports on this machine ?
>
> neither nfs nor mountd are currently running ...
>
> ----
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org
> )
> Email . scrappy@hub.org MSN . scrappy@hub.org
> Yahoo . yscrappy Skype: hub.org ICQ . 7615664
>
>
> ------------------------------
>
> Message: 24
> Date: Mon, 26 Jun 2006 18:02:38 -0400
> From: "Michael Butler" <imb@protected-networks.net>
> Subject: RE: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "'Wilko Bulte'" <wb@freebie.xs4all.nl>,
"'M.Hirsch'"
> <M.Hirsch@gmx.de>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <000001c6996c$3eab9df0$ad0d510a@toshi>
> Content-Type: text/plain; charset="us-ascii"
>
> > Of course not. You only panic once you have no other options left.
> > Proper hardware with ECC give you these options. I am not talking
> > consumer grade crap here of course.
>
> I agree that no panic should occur if the error was correctable and it
> should when it isn't.
>
> However, *real* equipment will log a corrected error .. from an aging Dell
> 1-U server ..
>
> Handle 0x0024, DMI type 15, 33 bytes
> System Event Log
> Area Length: 4096 bytes
> Header Start Offset: 0x0000
> Header Length: 16 bytes
> Data Start Offset: 0x0010
> Access Method: Memory-mapped physical 32-bit address
> Access Address: 0xFFF33000
> Status: Valid, Not Full
> Change Token: 0x00000000
> Header Format: Type 1
> Supported Log Type Descriptors: 5
> Descriptor 1: POST error
> Data Format 1: POST results bitmap
> Descriptor 2: Parity memory error
> Data Format 2: Multiple-event
> Descriptor 3: I/O channel block
> Data Format 3: Multiple-event
> Descriptor 4: Single-bit ECC memory error
> Data Format 4: Multiple-event
> Descriptor 5: Multi-bit ECC memory error
> Data Format 5: Multiple-event
>
> .. So the logs are there, all that's required is a utility to read them
> and, optionally, alert the administrator to the event,
>
> Michael Butler, CISSP
> Security Architect
> Protected Networks
> http://www.protected-networks.net
>
>
>
> ------------------------------
>
> Message: 25
> Date: Mon, 26 Jun 2006 23:54:53 +0200
> From: "M.Hirsch" <M.Hirsch@hirsch.it>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Wilko Bulte <wb@freebie.xs4all.nl>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <44A057AD.7050700@hirsch.it>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Ok, sorry. Misunderstanding here.
> My point was, along what has been posted here in this thread:
> "An ECC error should raise a kernel panic immediately, not only a
> message in the log files."
> Any hardware showing ECC errors should be replaced asap..
> Make them lazy admins do what they're getting paid for...
>
> Correct, you can't (quickly) detect this without ECC hardware, of
course.
> But I keep reading about "ECC" being the solution to broken RAM
sticks...
>
> Since FreeBSD panics on creating simple malloc() vnodes, it should do so
> on ECC errors first.
> Different mission, I guess ;)
> (And different problems with the recent fricking code...)
>
> M.
>
>
> ------------------------------
>
> Message: 26
> Date: Tue, 27 Jun 2006 00:02:06 +0200
> From: Wilko Bulte <wb@freebie.xs4all.nl>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch@hirsch.it>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20060626220206.GA94183@freebie.xs4all.nl>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 11:54:53PM +0200, M.Hirsch wrote..
> > Ok, sorry. Misunderstanding here.
> > My point was, along what has been posted here in this thread:
> > "An ECC error should raise a kernel panic immediately, not only a
> > message in the log files."
> > Any hardware showing ECC errors should be replaced asap..
>
> Yes, but keep in mind that ASAP often means "during a scheduled
> maintenance window". Which can be months away in some cases.
>
> > Make them lazy admins do what they're getting paid for...
> >
> > Correct, you can't (quickly) detect this without ECC hardware, of
> course.
>
> Skip the 'quickly', you need ECC, full stop. Otherwise you will
not
> detect
> it until it is way too late. I can tell you from personal experience
> that customers hate nothing more than undetected data corruption. ECC
> RAM is only part of the fix of course. ECC better be end to end, but it
> hardly is..
>
> > But I keep reading about "ECC" being the solution to broken
RAM
> sticks...
>
> Not really of course. But there are OS-es that simply map pages with
> known problems into a "do not use" list.
>
> > Since FreeBSD panics on creating simple malloc() vnodes, it should do
so
> > on ECC errors first.
> > Different mission, I guess ;)
> > (And different problems with the recent fricking code...)
> >
> > M.
> --- end of quoted text ---
>
> --
> Wilko Bulte wilko@FreeBSD.org
>
>
> ------------------------------
>
> Message: 27
> Date: Tue, 27 Jun 2006 00:33:39 +0200
> From: "M.Hirsch" <M.Hirsch@hirsch.it>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Wilko Bulte <wb@freebie.xs4all.nl>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <44A060C3.8090008@hirsch.it>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Wilko Bulte schrieb:
>
> >You really have never seen a machine used for serious business
> apparantly.
> >
> >
> >
> Depends on what you define "serious business"...
> Yes, I am rather new to FreeBSD (2y+)
> I am just trying to setup a /stable/ cluster of six machines right now.
> For over a week straight.
> 4.11 works perfectly. But support is going to be dropped very soon, so
> that's a bad option for me right now.
>
> Over all, the system is /only/ supposed to handle a few hundred hits per
> second. (but including dynamic stuff like php...)
>
> Dunno if that (or what else) is "serious business" for you.
> Which version would you suggest for "serious business"?
>
> Anyways, my point stands: I rather have any of my nodes panic than
> carrying the risk of creating invalid data...
> One in a billion can be high probability, soon... (just planning for the
> future...)
>
> >panics like that should be eradicated, adding more nonsensical panics
> >is not what we need.
> >
> >
> uh, I would not call hardware failure "nonsensical panics". I
guess I
> must have misunderstood you...
>
> M.
>
>
> ------------------------------
>
> Message: 28
> Date: Tue, 27 Jun 2006 00:39:47 +0200
> From: "M.Hirsch" <M.Hirsch@hirsch.it>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <44A06233.1090704@hirsch.it>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Dmitry Pryanishnikov schrieb:
>
> >
> > Hello!
> >
> > On Mon, 26 Jun 2006, M.Hirsch wrote:
> >
> >> ECC is a way to mask broken hardware. I rather have my hardware
fail
> >> directly when it does first, so I can replace it _immediately_
> >
> >
> > You got it backwards. If your data has any value to you, then you
> > don't want
> > to miss any single-error bit in it, do you? If you're running
hardware
> > w/o
> > ECC, your single-bit error in your data will go to the disk unnoticed,
> > and you'll lose your data. With ECC, hardware will correct it. In
> > (rare) case of multiple-bit error ECC logic will generate NMI for you,
> > so you'll notice and "replace it _immediately_" instead
of two weeks
> > ago when your archive wont extract.
> >
> Nope, I am right on track.
> I do not want to lose any data. So I'd prefer a ECC error to raise a
> panic so I can replace the hardware ASAP.
> Don't get me wrong, but tracking bugs in FreeBSD is quite more of an
> effort than "just" akquiring a new box...
>
> >> What's your hardware good for if it passes a "test",
but fails in
> >> production?
> >
> >
> > It's the way in what RAM will manifest single-bit errors: you run
> > memory test - it won't catch them, later in production you'll
miss
> > this error because
> > nothing will provide extra sanity check of your data.
>
> Ok...
> Does the standard fs, UFS2, do "extra sanity checks", then?
>
> M.
>
>
> ------------------------------
>
> Message: 29
> Date: Tue, 27 Jun 2006 00:51:56 +0200
> From: "M.Hirsch" <M.Hirsch@gmx.de>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch@hirsch.it>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <44A0650C.7020806@gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> > Ok...
> > Does the standard fs, UFS2, do "extra sanity checks", then?
> >
> Sorry, replying to myself...
> No, this does not matter.
> If the OS thinks the data is ok, UFS will write OK data...
>
> So, let me rephrase this:
> How can I make sure there is no broken hardware in my cluster?
> I am not looking for workarounds, like ECC. I want the box to break
> immediately once any single component goes wrong...
>
>
>
> ------------------------------
>
> Message: 30
> Date: Tue, 27 Jun 2006 01:57:17 +0300 (EEST)
> From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch@hirsch.it>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20060627014335.E87535@atlantis.atlantis.dp.ua>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Tue, 27 Jun 2006, M.Hirsch wrote:
> >> On Mon, 26 Jun 2006, M.Hirsch wrote:
> >>> ECC is a way to mask broken hardware. I rather have my
hardware fail
> >>> directly when it does first, so I can replace it _immediately_
> >>
> >>
> >> You got it backwards. If your data has any value to you, then you
> don't
> >>
> > Nope, I am right on track.
> > I do not want to lose any data. So I'd prefer a ECC error to raise
a
> panic so
> > I can replace the hardware ASAP.
>
> When you wrote "ECC is a way to mask broken hardware", you were
plain
> wrong.
> If you're using hardware w/o ECC, it just can't tell whether error
present
> or absent. So ECC _is_ the way to detect (not mask) broken hardware.
>
> If you want ECC corrector to raise NMI on corrected error (as well as
> uncorrectable), just set approproate bit in control register - every
> Intel's ECC-capable chipset allows it. But if we're speaking about
> production environment, such behaviour (abnormal termination on
> _corrected_
> error) is unacceptable.
>
> > Don't get me wrong, but tracking bugs in FreeBSD is quite more of
an
> effort
> > than "just" akquiring a new box...
>
> I don't see connection between this sentence and ECC (which is
hardware
> option).
>
> > Does the standard fs, UFS2, do "extra sanity checks", then?
>
> Ditto. And don't forget that _every_ data sector on HDD _is_ checked
> with CRC. As well as ATA data transfers in UDMA modes. As well as data
> in CPU cache. Extra check gives extra reliability.
>
> Sincerely, Dmitry
> --
> Atlantis ISP, System Administrator
> e-mail: dmitry@atlantis.dp.ua
> nic-hdl: LYNX-RIPE
>
>
> ------------------------------
>
> Message: 31
> Date: Mon, 26 Jun 2006 23:59:02 +0100
> From: "Steven Hartland" <killing@multiplay.co.uk>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch@hirsch.it>, "Dmitry
Pryanishnikov"
> <dmitry@atlantis.dp.ua>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <005401c69974$217f8860$b3db87d4@multiplay.co.uk>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> reply-type=response
>
> M.Hirsch wrote:
> > Ok...
> > Does the standard fs, UFS2, do "extra sanity checks", then?
>
> My advice would be dont feed the troll.
>
> Steve
>
>
> ===============================================> This e.mail is private
and confidential between Multiplay (UK) Ltd. and
> the person or entity to whom it is addressed. In the event of misdirection,
> the recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
>
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to postmaster@multiplay.co.uk.
>
>
>
> ------------------------------
>
> Message: 32
> Date: Tue, 27 Jun 2006 01:09:03 +0200
> From: Thomas Nystr?m <thn@saeab.se>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch@hirsch.it>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <44A0690F.8040005@saeab.se>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> M.Hirsch wrote:
> > Any hardware showing ECC errors should be replaced asap..
>
> No. ALL memory will sooner or later show single bit error.
>
> Several years ago I was checking this during my work at Ericsson.
> There was a discussion if ECC should be present in the GSM-base-stations
> or not. I had a special test-software running in several units looking
> for soft-errors. Soft errors are bits that are flipped spontaneously in
> the memory. When the bit are rewritten it will work OK again, no
> permanent damage to the memory and no need to replace the memory.
>
> During my test period (I think it was 6-8 monthes) I saw four occasions
> when this occured (total amount of memory 96 MB).
>
> ECC is intended to fix this: It will correct a single bit fault and
> allow the system to contiune uninterrupted.
>
> Of course this event should be logged and if it occurs several times
> at the same place then it is time to replace the memory.
>
> Of course memory should be better these days but.... knock on wood....
>
> /thn [20 years as HW-designer, FreeBSD since 3.0]
>
> --
> ---------------------------------------------------------------
> Svensk Aktuell Elektronik AB Thomas Nystr?m
> Box 10 Phone: +46 8 35 92 85
> S-191 21 Sollentuna Fax: +46 8 35 92 86
> Sweden Email: thn@saeab.se
> ---------------------------------------------------------------
>
>
> ------------------------------
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"
>
> End of freebsd-stable Digest, Vol 164, Issue 4
> **********************************************
>