wi0 down when print a lot of data to screen over ssh

Ren Zhen fblist at gmail.com
Tue Jun 27 14:33:06 UTC 2006


There is some extra information. It's what the kernel say today. I just turn
on and turn off the powersave.
kernel: wi0: timeout in wi_seek to 152/0
last message repeated 7 times
kernel: wi0: device timeout
kernel: wi0: timeout in wi_seek to 152/0
kernel: wi0: timeout in wi_cmd 0x010b; event status 0x8000
kernel: wi0: xmit failed
kernel: wi0: timeout in wi_seek to 152/0
last message repeated 6 times
kernel: wi0: bad alloc 152 != 128, cur 0 nxt 0
kernel: wi0: record read mismatch, rid=fd42, got=fd41
kernel: wi0: record read mismatch, rid=fdc1, got=fd42
kernel: wi0: record read mismatch, rid=fd41, got=fdc1


On 6/27/06, freebsd-stable-request at freebsd.org <
freebsd-stable-request at freebsd.org> wrote:
>
> Send freebsd-stable mailing list submissions to
>         freebsd-stable at freebsd.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> or, via email, send a message with subject or body 'help' to
>         freebsd-stable-request at freebsd.org
>
> You can reach the person managing the list at
>         freebsd-stable-owner at freebsd.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of freebsd-stable digest..."
>
>
> Today's Topics:
>
>    1. Re: force panic of remote server ... possible? (Ed Maste)
>    2. Re: force panic of remote server ... possible? (Ed Maste)
>    3. Re: vinum to gvinum help (Mark Linimon)
>    4. Re: Setting up GEOM mirror (Mike Jakubik)
>    5. Re: What denotes a 'blocked' process? (Marc G. Fournier)
>    6. RE: vinum to gvinum help (Wilde, Donald)
>    7. Re: What denotes a 'blocked' process? (Kostik Belousov)
>    8. Re: vmstat 'b' (disk busy?) field keeps climbing ...
>       (Marc G. Fournier)
>    9. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Dmitry Pryanishnikov)
>   10. Re: vmstat 'b' (disk busy?) field keeps climbing ... (Max Laier)
>   11. Re: kernel can't find root filesystem (Michael Proto)
>   12. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
>   13. Re: Gigabit ethernet very slow. (Matthew D. Fuller)
>   14. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Wilko Bulte)
>   15. Re: wi0 down when print a lot of data to screen over ssh
>       (Michael Proto)
>   16. Re: kernel can't find root filesystem (M.Hirsch)
>   17. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
>   18. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Wilko Bulte)
>   19. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
>   20. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Wilko Bulte)
>   21. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Dmitry Pryanishnikov)
>   22. Re: vmstat 'b' (disk busy?) field keeps climbing ...
>       (Marc G. Fournier)
>   23. Re: What denotes a 'blocked' process? (Marc G. Fournier)
>   24. RE: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Michael Butler)
>   25. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
>   26. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Wilko Bulte)
>   27. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
>   28. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
>   29. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
>   30. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Dmitry Pryanishnikov)
>   31. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Steven Hartland)
>   32. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
>       (Thomas Nystr?m)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 26 Jun 2006 10:52:38 -0400
> From: Ed Maste <emaste at phaedrus.sandvine.ca>
> Subject: Re: force panic of remote server ... possible?
> Cc: freebsd-stable at freebsd.org
> Message-ID: <20060626145238.GA22081 at sandvine.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 01:06:14PM +0100, Gavin Atkinson wrote:
>
> > On Mon, 2006-06-26 at 08:55 -0300, Marc G. Fournier wrote:
> > > For the server that I'm fighting with right now, where Dmitry pointed
> out
> > > that it looks like a deadlock issue ... I have dumpdev/savecore
> enabled,
> > > is there some way of forcing it to panic when I know I actually have
> the
> > > deadlock, so that it will dump a core?
> >
> > You cen enter the debugger by setting the (badly names) debug.kdb.enter
> > sysctl to 1, although I can't guarantee that'll trigger a dump and
> > reboot.  Do you have a serial console?
>
> >From some of your other messages, I believe this is a remote machine?
> Unless you can access an attached keyboard, or have a serial console,
> debug.kdb.enter will leave the machine sitting in ddb with no way to
> get out.  Also, if you have a PS/2 keyboard (that is, one handled by
> the atkbd(4) driver) ddb will not accept any input on 6.1 or HEAD.
> (There is some discussion of this issue on the freebsd-current list.)
> Before using ddb on a remote machine I would suggest testing it out
> with the same release locally.
>
> For your original question -- I'm not sure which release it first
> appeared in (and it may be only in -CURRENT), but if it exists you
> can use:
>
> $ sysctl -d debug.kdb.panic
> debug.kdb.panic: set to panic the kernel
>
> -ed
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 26 Jun 2006 13:32:37 -0400
> From: Ed Maste <emaste at phaedrus.sandvine.ca>
> Subject: Re: force panic of remote server ... possible?
> To: "Marc G. Fournier" <scrappy at hub.org>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <20060626173237.GA53085 at sandvine.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 01:06:14PM +0100, Gavin Atkinson wrote:
>
> > On Mon, 2006-06-26 at 08:55 -0300, Marc G. Fournier wrote:
> > > For the server that I'm fighting with right now, where Dmitry pointed
> out
> > > that it looks like a deadlock issue ... I have dumpdev/savecore
> enabled,
> > > is there some way of forcing it to panic when I know I actually have
> the
> > > deadlock, so that it will dump a core?
> >
> > You cen enter the debugger by setting the (badly names) debug.kdb.enter
> > sysctl to 1, although I can't guarantee that'll trigger a dump and
> > reboot.  Do you have a serial console?
>
> >From some of your other messages, I believe this is a remote machine?
> Unless you can access an attached keyboard, or have a serial console,
> debug.kdb.enter will leave the machine sitting in ddb with no way to
> get out.  Also, if you have a PS/2 keyboard (that is, one handled by
> the atkbd(4) driver) ddb will not accept any input on 6.1 or HEAD.
> (There is some discussion of this issue on the freebsd-current list.)
> Before using ddb on a remote machine I would suggest testing it out
> with the same release locally.
>
> For your original question -- I'm not sure which release it first
> appeared in (and it may be only in -CURRENT), but if it exists you
> can use:
>
> $ sysctl -d debug.kdb.panic
> debug.kdb.panic: set to panic the kernel
>
> -ed
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 26 Jun 2006 14:33:19 -0500
> From: linimon at lonesome.com (Mark Linimon)
> Subject: Re: vinum to gvinum help
> To: Sven Willenberger <sven at dmv.com>
> Cc: Roland Smith <rsmith at xs4all.nl>,    freebsd-stable
>         <freebsd-stable at freebsd.org>
> Message-ID: <20060626193319.GC909 at soaustin.net>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 02:15:24PM -0400, Sven Willenberger wrote:
> > this is a production server that can at best stand an hour or so of
> > downtime.
>
> IMHO there are no 5.2.1 upgrade options that can be accomplish in even
> a small number of hours.  The kernel libraries were all updated for 5.3;
> and hundreds, if not more, ports were updated.  Since the 5.3 release,
> there have been thousands, if not tens of thousands, of commits to the
> ports tree, many of which make major infrastructural changes.
>
> Either going to 5.5 or 6.1 at this point should (also IMHO) be a complete
> reinstall on a staging system, with some tough testing there to show that
> the upgrade will work for your applications.
>
> Otherwise I think you're asking for some serious grief here.
>
> mcl
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 26 Jun 2006 15:03:54 -0400
> From: Mike Jakubik <mikej at rogers.com>
> Subject: Re: Setting up GEOM mirror
> To: Vivek Khera <vivek at khera.org>
> Cc: freebsd-stable <freebsd-stable at freebsd.org>
> Message-ID: <44A02F9A.4080606 at rogers.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Vivek Khera wrote:
> >
> > On Jun 25, 2006, at 2:14 PM, Mike Jakubik wrote:
> >>
> >> The problem with these instructions is that they don't take in to
> >> account the last sector. You may very well end up writing the
> >> metadata on the file system.
> >>
> >
> > When was the last time you fdisk'd a disk and it used the last sector
> > on the drive? I always end up with a bunch of extra space that didn't
> > fit into the round numbers of the file system.
> >
>
> Hopefully never :) Just mentioning this as a precaution.
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 26 Jun 2006 12:44:17 -0300 (ADT)
> From: "Marc G. Fournier" <scrappy at hub.org>
> Subject: Re: What denotes a 'blocked' process?
> To: freebsd-stable at freebsd.org
> Message-ID: <20060626124226.Y1114 at ganymede.hub.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Mon, 26 Jun 2006, Marc G. Fournier wrote:
>
> >
> > Just upgraded to June 15th sources, started up all the processes, and am
> > already at 29 blocked processes ...
> >
> > I've checked for states D, E and L ... nothing ...
> >
> > Actually, let's go one better ... attached is a complete list of my
> process
> > table (MWCHAN, STATE, COMMAND) ... right now, vmstat is showing:
> >
> > 1 33 0 6381952 177944 1695   0   0   0 1601   0   1   0  416 50012 1657
> 14
> > 14 72
> > 1 33 2 6376440 181744 2013   0   0   0 2172   0   3   0  448 68528 1629
> 17
> > 15 68
> > 4 33 0 6385484 178364 1944   0   3   0 1758   0   8   0  420 57698 1221
> 17
> > 14 69
> > 23 46 0 6463664 149528 5294  29   4   2 4659   0  37   0  505 44758 3040
> 27
> > 28 45
> > 4 34 1 6424904 169660 4216  16   7   0 4047   0 211   0 1002 47502 5769
> 42
> > 30 28
> > 1 35 0 6453992 167388 2414   0   9   0 2265   0  44   0  535 62932 3160
> 18
> > 18 64
> > 7 33 0 6443672 168100 1642   0   0   0 1652   0   5   0  448 51974 2163
> 15
> > 15 70
> >
> > So, according to this, there should be 33 processes blocked somewhere
> ...
> > STATEs D/E/L all show nothing ... even state R (long shot) is showing
> 3-4
> > processes, and that's it ...
> >
> > This kernel is actually worse then the last, in that the last, on a
> reboot,
> > I'd see 4-5 blocked, and then it would slowly rise over the course of 24
> > hours, not start at 33 and rise from there ...
>
> Wow, in less then 1 hour, I'm up to 60 blocked, barely 1 runnable:
>
>   0 60 0 7016076 187424 2527   0   0   0 1722   0   5   0  320 7921 2140
> 24 19 57
>   0 60 0 7027436 185124  581   0   1   0 428   0   9   0  303 3214
> 2425  5  9 86
>   0 60 0 7053368 183060  217   4   1   0 130   0  71   0  453 1748
> 1157  6  4 90
>   1 60 1 7050848 183556    4   0   0   7  27   0  21   0  307  965
> 857  1  4 94
>   0 60 2 7050860 183652    2   0   0   0   6   0   0   0  256  829
> 1030  2  3 95
>   0 60 0 7051028 183348   28   1   2   0  11   0   3   0  307  944
> 855  3  3 95
>   0 60 1 7056876 182248  136   0   0   0  66   0   8   0  285 1190
> 945  1  4 95
>
> And nadda in ps:
>
> pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 == "STAT"' ; ps
> aux | wc -l
>    PID  PPID       F MWCHAN  TT  STAT      TIME COMMAND
>      2     0     204 -       ??  DL     0:00.45 [g_event]
>      3     0     204 -       ??  DL     0:04.87 [g_up]
>      4     0     204 -       ??  DL     0:06.19 [g_down]
>      5     0     204 -       ??  DL     0:00.00 [thread taskq]
>      6     0     204 -       ??  DL     0:00.00 [kqueue taskq]
>      7     0     204 -       ??  DL     0:00.00 [acpi_task0]
>      8     0     204 -       ??  DL     0:00.00 [acpi_task1]
>      9     0     204 -       ??  DL     0:00.00 [acpi_task2]
>     10     0     204 ktrace  ??  DL     0:00.00 [ktrace]
>     15     0     204 -       ??  DL     0:00.68 [yarrow]
>     25     0     204 psleep  ??  DL     0:00.70 [pagedaemon]
>     26     0     204 psleep  ??  DL     0:00.00 [vmdaemon]
>     27     0     20c pgzero  ??  DL     0:14.43 [pagezero]
>     28     0     204 psleep  ??  DL     0:00.14 [bufdaemon]
>     29     0     204 vlruwt  ??  DL     0:00.15 [vnlru]
>     30     0     204 syncer  ??  DL     0:10.29 [syncer]
>     31     0     204 sdflus  ??  DL     0:00.68 [softdepflush]
>     32     0     204 -       ??  DL     0:03.28 [schedcpu]
>      1170
> pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^E/ || $6 == "STAT"' ; ps
> aux | wc -l
>    PID  PPID       F MWCHAN  TT  STAT      TIME COMMAND
>      1174
> pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^L/ || $6 == "STAT"' ; ps
> aux | wc -l
>    PID  PPID       F MWCHAN  TT  STAT      TIME COMMAND
>     12     0     20c Giant   ??  LL     0:08.16 [swi4: clock]
>      1170
> pluto#
>
> Something *has* to be leaking here somewhere ... :(
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org
> )
> Email . scrappy at hub.org                              MSN . scrappy at hub.org
> Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 26 Jun 2006 13:12:58 -0600
> From: "Wilde, Donald" <dwilde at sandia.gov>
> Subject: RE: vinum to gvinum help
> To: "freebsd-stable" <freebsd-stable at freebsd.org>
> Message-ID:
>         <040DF00BF960A24897B5B3EFBE63FE8A026B10B8 at ES20SNLNT.srn.sandia.gov
> >
> Content-Type: text/plain; charset=us-ascii
>
>
>
> -----Original Message-----
> From: owner-freebsd-stable at freebsd.org
> [mailto:owner-freebsd-stable at freebsd.org] On Behalf Of Sven Willenberger
> Sent: Monday, June 26, 2006 12:15 PM
> To: Roland Smith
> Cc: freebsd-stable
> Subject: Re: vinum to gvinum help
>
> On Mon, 2006-06-26 at 19:15 +0200, Roland Smith wrote:
> > On Mon, Jun 26, 2006 at 12:22:07PM -0400, Sven Willenberger wrote:
> > > I have an i386 system currently running 5.2.1-RELEASE with a vinum
> > > mirror array (2 drives comprising /usr ). I want to upgrade this to
> > > 5.5-RELEASE which, if I understand correctly, no longer supports
> > > vinum arrays. Would simply chaning /boot/loader.conf to read
> > > gvinum_load instead of vinum_load work or would the geom layer
> > > prevent this from working properly? If not, is there a recommended
> > > way of upgrading a vinum array to a gvinum or gmirror array?
> >
> > Lost of things have changed between 5.2.1 and 5.5. I think it would be
>
> > best to make a backup and do a clean reinstall.
> >
> > Roland
>
> Sadly this may not be an option; this is a production server that can at
> best stand an hour or so of downtime. Between all the custom symlinked
> directories, applications, etc, plus the sheer volume of data that would
> need to be backed up, an in-place upgrade would be infinitely more
> desirable. If it comes to the point of having to back up and do a fresh
> install I suspect I would be using the 6.x series anyway. I was really
> hoping that some way of upgrading in-place were available for vinum.
>
> Sven
>
> DSW> Sven, your best bet will be to build a set of disks off-line and
> then swap them in. That's the only way you can be sure to do it right.
> Ask yourself if the cost of finding and building a mule is worth more
> than the pain of screwing up.
>
> It _is_ well worth doing, there were many things that were still unglued
> in 5.2.1.
> --
> Don Wilde    Org 01737    505-844-1126
> Earth Halted: Please reboot to continue
>
>
>
> ------------------------------
>
> Message: 7
> Date: Mon, 26 Jun 2006 23:05:15 +0300
> From: Kostik Belousov <kostikbel at gmail.com>
> Subject: Re: What denotes a 'blocked' process?
> To: "Marc G. Fournier" <scrappy at hub.org>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <20060626200515.GL79678 at deviant.kiev.zoral.com.ua>
> Content-Type: text/plain; charset="us-ascii"
>
> On Mon, Jun 26, 2006 at 12:44:17PM -0300, Marc G. Fournier wrote:
> > On Mon, 26 Jun 2006, Marc G. Fournier wrote:
> >
> > >
> > >Just upgraded to June 15th sources, started up all the processes, and
> am
> > >already at 29 blocked processes ...
> > >
> > >I've checked for states D, E and L ... nothing ...
> > >
> > >Actually, let's go one better ... attached is a complete list of my
> > >process table (MWCHAN, STATE, COMMAND) ... right now, vmstat is
> showing:
> > >
> > >1 33 0 6381952 177944 1695   0   0   0 1601   0   1   0  416 50012 1657
> 14
> > >14 72
> > >1 33 2 6376440 181744 2013   0   0   0 2172   0   3   0  448 68528 1629
> 17
> > >15 68
> > >4 33 0 6385484 178364 1944   0   3   0 1758   0   8   0  420 57698 1221
> 17
> > >14 69
> > >23 46 0 6463664 149528 5294  29   4   2 4659   0  37   0  505 44758
> 3040
> > >27 28 45
> > >4 34 1 6424904 169660 4216  16   7   0 4047   0 211   0 1002 47502 5769
> 42
> > >30 28
> > >1 35 0 6453992 167388 2414   0   9   0 2265   0  44   0  535 62932 3160
> 18
> > >18 64
> > >7 33 0 6443672 168100 1642   0   0   0 1652   0   5   0  448 51974 2163
> 15
> > >15 70
> > >
> > >So, according to this, there should be 33 processes blocked somewhere
> ...
> > >STATEs D/E/L all show nothing ... even state R (long shot) is showing
> 3-4
> > >processes, and that's it ...
> > >
> > >This kernel is actually worse then the last, in that the last, on a
> > >reboot, I'd see 4-5 blocked, and then it would slowly rise over the
> course
> > >of 24 hours, not start at 33 and rise from there ...
> >
> > Wow, in less then 1 hour, I'm up to 60 blocked, barely 1 runnable:
> >
> >  0 60 0 7016076 187424 2527   0   0   0 1722   0   5   0  320 7921 2140
> 24
> >  19 57
> >  0 60 0 7027436 185124  581   0   1   0 428   0   9   0  303 3214
> 2425  5
> >  9 86
> >  0 60 0 7053368 183060  217   4   1   0 130   0  71   0  453 1748
> 1157  6
> >  4 90
> >  1 60 1 7050848 183556    4   0   0   7  27   0  21   0  307  965
> 857  1  4
> >  94
> >  0 60 2 7050860 183652    2   0   0   0   6   0   0   0  256  829
> 1030  2
> >  3 95
> >  0 60 0 7051028 183348   28   1   2   0  11   0   3   0  307  944
> 855  3  3
> >  95
> >  0 60 1 7056876 182248  136   0   0   0  66   0   8   0  285 1190
> 945  1  4
> >  95
> >
> > And nadda in ps:
> >
> > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 == "STAT"' ; ps
> > aux | wc -l
> >   PID  PPID       F MWCHAN  TT  STAT      TIME COMMAND
> >     2     0     204 -       ??  DL     0:00.45 [g_event]
> >     3     0     204 -       ??  DL     0:04.87 [g_up]
> >     4     0     204 -       ??  DL     0:06.19 [g_down]
> >     5     0     204 -       ??  DL     0:00.00 [thread taskq]
> >     6     0     204 -       ??  DL     0:00.00 [kqueue taskq]
> >     7     0     204 -       ??  DL     0:00.00 [acpi_task0]
> >     8     0     204 -       ??  DL     0:00.00 [acpi_task1]
> >     9     0     204 -       ??  DL     0:00.00 [acpi_task2]
> >    10     0     204 ktrace  ??  DL     0:00.00 [ktrace]
> >    15     0     204 -       ??  DL     0:00.68 [yarrow]
> >    25     0     204 psleep  ??  DL     0:00.70 [pagedaemon]
> >    26     0     204 psleep  ??  DL     0:00.00 [vmdaemon]
> >    27     0     20c pgzero  ??  DL     0:14.43 [pagezero]
> >    28     0     204 psleep  ??  DL     0:00.14 [bufdaemon]
> >    29     0     204 vlruwt  ??  DL     0:00.15 [vnlru]
> >    30     0     204 syncer  ??  DL     0:10.29 [syncer]
> >    31     0     204 sdflus  ??  DL     0:00.68 [softdepflush]
> >    32     0     204 -       ??  DL     0:03.28 [schedcpu]
> >     1170
> > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^E/ || $6 == "STAT"' ; ps
> > aux | wc -l
> >   PID  PPID       F MWCHAN  TT  STAT      TIME COMMAND
> >     1174
> > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^L/ || $6 == "STAT"' ; ps
> > aux | wc -l
> >   PID  PPID       F MWCHAN  TT  STAT      TIME COMMAND
> >    12     0     20c Giant   ??  LL     0:08.16 [swi4: clock]
> >     1170
> > pluto#
> >
> > Something *has* to be leaking here somewhere ... :(
>
> Dumb unmotivated question: do you have nfs exports on this machine ?
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 187 bytes
> Desc: not available
> Url :
> http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060626/2fd3cb15/attachment-0001.pgp
>
> ------------------------------
>
> Message: 8
> Date: Mon, 26 Jun 2006 15:25:49 -0300 (ADT)
> From: "Marc G. Fournier" <scrappy at hub.org>
> Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ...
> To: Kostik Belousov <kostikbel at gmail.com>
> Cc: freebsd-stable at freebsd.org, Dmitry Morozovsky <marck at rinet.ru>
> Message-ID: <20060626152345.M1114 at ganymede.hub.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
>
> I think I might have found *at least* one of the problems, and that being
> the excessively high blocked states while ps isn't finding anything ...
>
> MySQL
>
> We just recently started allowing clients to run a MySQL server *within*
> their vServer ... in a drastic move, I just shut them all down on pluto,
> and blocked drop'd from ~86 down to 5 in a matter of moments ...
> restarting them all has it climbing once more, being up around 22 already
> ...
>
> I'm going to go with that theory for now, and keep an eye on things ...
>
> Just curious as to why, even with -H, its not showing any blocked states
> within ps though ... ?
>
> Thx
>
>
> On Mon, 26 Jun 2006, Kostik Belousov wrote:
>
> > On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote:
> >> On Mon, 26 Jun 2006, Kostik Belousov wrote:
> >>
> >>> Yes, this looks like a deadlock. As I understand, that's on 6.1-STABLE?
> >>
> >> Yes, kernel sources, it seems, from May 25th, according to my /usr/src
> >> tree ...
> >>
> >>> BTW, do you use snapshots ?
> >>
> >> Not that I've explicitly enabled ...
> >>
> >>> I think that without ddb access, diagnose and debug the problem would
> be
> >>> quite hard.
> >>
> >> Would it be a simple matter of:
> >>
> >> CTL-ALT-ESC
> >> panic
> >>
> >> to get it to dump core?  Or would more be involved?  Would a core dump
> >> even work?
> > Core dumps are somewhat unconvenient in this situation. Better,
> > sending report to me, follow my advise in
> >
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> >
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org
> )
> Email . scrappy at hub.org                              MSN . scrappy at hub.org
> Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
>
>
> ------------------------------
>
> Message: 9
> Date: Tue, 27 Jun 2006 00:01:08 +0300 (EEST)
> From: Dmitry Pryanishnikov <dmitry at atlantis.dp.ua>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Robert Watson <rwatson at freebsd.org>
> Cc: freebsd-acpi at freebsd.org, freebsd-stable at freebsd.org,       Pete
> French
>         <petefrench at ticketswitch.com>
> Message-ID: <20060626235355.Q95667 at atlantis.atlantis.dp.ua>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
>
> Hello!
>
> On Mon, 26 Jun 2006, Robert Watson wrote:
> > I think this is a useful activity, especially if you've already run
> extensive
> > memory testing on the box.  If you haven't yet done that, I encourage
> you to
> > take a break from buildworld's and make sure the memory tests pass. I
> spent
> > several months on and off trying to track down a bug a few years ago,
> which
> > turned out to be a one bit error in memory on the box.  It would appear
> and
>
>   This is precisely the task which hardware ECC solves: to correct any
> single-
> bit memory error and to detect 2-bit and most of several-bit errors. I
> prefer
> ECC-capable hardware even for home PC; for server it's a must IMHO.
>
> Sincerely, Dmitry
> --
> Atlantis ISP, System Administrator
> e-mail:  dmitry at atlantis.dp.ua
> nic-hdl: LYNX-RIPE
>
>
> ------------------------------
>
> Message: 10
> Date: Mon, 26 Jun 2006 22:44:18 +0200
> From: Max Laier <max at love2party.net>
> Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ...
> To: freebsd-stable at freebsd.org
> Cc: Kostik Belousov <kostikbel at gmail.com>, Dmitry Morozovsky
>         <marck at rinet.ru>
> Message-ID: <200606262244.25505.max at love2party.net>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Monday 26 June 2006 20:25, Marc G. Fournier wrote:
> > I think I might have found *at least* one of the problems, and that
> being
> > the excessively high blocked states while ps isn't finding anything ...
> >
> > MySQL
> >
> > We just recently started allowing clients to run a MySQL server *within*
> > their vServer ... in a drastic move, I just shut them all down on pluto,
> > and blocked drop'd from ~86 down to 5 in a matter of moments ...
> > restarting them all has it climbing once more, being up around 22
> already
> > ...
> >
> > I'm going to go with that theory for now, and keep an eye on things ...
> >
> > Just curious as to why, even with -H, its not showing any blocked states
> > within ps though ... ?
>
> The "blocked" column shows also processes that have objects
> "paging".  Most
> likely you are *short* on memory.  In order to relieve the pressure
> program .text pages are free'ed and need to be refetched from disc
> whenever
> the respective code is being executed.
>
> If you allow every vServer to run its own mySQL with all the libaries etc
> it's
> clear what is killing you!  Add more memory or make sure that .text pages
> can
> be reused by several processes.  As far as I understand vServer will all
> see
> a different source and thus not share buffers or the like.
>
> > Thx
> >
> > On Mon, 26 Jun 2006, Kostik Belousov wrote:
> > > On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote:
> > >> On Mon, 26 Jun 2006, Kostik Belousov wrote:
> > >>> Yes, this looks like a deadlock. As I understand, that's on
> 6.1-STABLE
> > >>> ?
> > >>
> > >> Yes, kernel sources, it seems, from May 25th, according to my
> /usr/src
> > >> tree ...
> > >>
> > >>> BTW, do you use snapshots ?
> > >>
> > >> Not that I've explicitly enabled ...
> > >>
> > >>> I think that without ddb access, diagnose and debug the problem
> would
> > >>> be quite hard.
> > >>
> > >> Would it be a simple matter of:
> > >>
> > >> CTL-ALT-ESC
> > >> panic
> > >>
> > >> to get it to dump core?  Or would more be involved?  Would a core
> dump
> > >> even work?
> > >
> > > Core dumps are somewhat unconvenient in this situation. Better,
> > > sending report to me, follow my advise in
> > >
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kern
> > >eldebug-deadlocks.html
> >
> > ----
> > Marc G. Fournier           Hub.Org Networking Services (
> http://www.hub.org)
> > Email . scrappy at hub.org                              MSN .
> scrappy at hub.org
> > Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
> > _______________________________________________
> > freebsd-stable at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org
> "
>
> --
> /"\  Best regards,                      | mlaier at freebsd.org
> \ /  Max Laier                          | ICQ #67774661
> X   http://pf4freebsd.love2party.net/  | mlaier at EFnet
> / \  ASCII Ribbon Campaign              | Against HTML Mail and News
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 189 bytes
> Desc: not available
> Url :
> http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060626/a0fec014/attachment-0001.pgp
>
> ------------------------------
>
> Message: 11
> Date: Mon, 26 Jun 2006 17:18:59 -0400
> From: Michael Proto <mike at jellydonut.org>
> Subject: Re: kernel can't find root filesystem
> To: freebsd-stable at freebsd.org
> Message-ID: <44A04F43.2090400 at jellydonut.org>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Robert Ames wrote:
> >> From: "M.Hirsch" <M.Hirsch at hirsch.it>
> >>
> >> I had the same problem with 6.1. But only on some occasions, not
> >> always (iirc).
> >> The installations I made over the last weeks had all very different
> >> environments and deployment methods.
> >> I can't tell anymore when it happens and when not because I simply
> >> added the below loader.conf setting to my postinstall-script.
> >>
> >> Add "vfs.root.mountfrom=ufs:da0s1" to /boot/loader.conf to fix it.
> >
> > Thank you.  That solves my problem even though it seems more like
> > a workaround than an actual solution.  But I'll take it.  :-)
> >
> > Also, someone responded asking if I had a valid entry in /etc/fstab
> > for the root filesystem.
> >
> > foo# cat /etc/fstab
> > # Device                Mountpoint      FStype  Options         Dump
> > Pass#
> > /dev/da0s1a          /                          ufs     rw
> > 1       1
> > /dev/da0s1b         none                   swap    sw
> > 0       0
> > /dev/da1s1d         /local                  ufs     rw
> > 2       2
> > /dev/cd0                /cdrom              cd9660  ro,noauto
> > 0       0
> >
>
> If I'm not mistaken, you could also try to (re)install the boot0 loader:
>
> boot0cfg /dev/da0
>
>
> -Proto
>
>
> ------------------------------
>
> Message: 12
> Date: Mon, 26 Jun 2006 23:21:22 +0200
> From: "M.Hirsch" <M.Hirsch at hirsch.it>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Dmitry Pryanishnikov <dmitry at atlantis.dp.ua>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <44A04FD2.1030001 at hirsch.it>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> ECC is a way to mask broken hardware. I rather have my hardware fail
> directly when it does first, so I can replace it _immediately_
> What's your hardware good for if it passes a "test", but fails in
> production?
>
> ECC is totally overrated.
>
> (sorry, couldn't resist...)
>
> M.
>
>
> ------------------------------
>
> Message: 13
> Date: Mon, 26 Jun 2006 14:32:26 -0500
> From: "Matthew D. Fuller" <fullermd at over-yonder.net>
> Subject: Re: Gigabit ethernet very slow.
> To: Michael Vince <mv at thebeastie.org>
> Cc: freebsd-stable at freebsd.org, performance at freebsd.org,        Nikolas
>         Britton <nikolas.britton at gmail.com>,    Sean Bryant <
> bryants at gmail.com>
> Message-ID: <20060626193226.GF74292 at over-yonder.net>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 05:05:26PM +1000 I heard the voice of
> Michael Vince, and lo! it spake thus:
> >
> > According to pftop (with modulate state rules) I am able to get
> > about 85megs/sec when I don't have dd running. dd does indeed eats a
> > fair amount of cpu (40%) on the AMD64 6-stable machine.
>
> dd does ridiculously small (512 byte?) read/writes, so it's gotta do a
> LOT of system calls and a lot of context switching when you don't give
> it a bigger blocksize.
>
>
> --
> Matthew Fuller     (MF4839)   |  fullermd at over-yonder.net
> Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
>            On the Internet, nobody can hear you scream.
>
>
> ------------------------------
>
> Message: 14
> Date: Mon, 26 Jun 2006 23:26:54 +0200
> From: Wilko Bulte <wb at freebie.xs4all.nl>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch at hirsch.it>
> Cc: Dmitry Pryanishnikov <dmitry at atlantis.dp.ua>,
>         freebsd-stable at FreeBSD.ORG
> Message-ID: <20060626212654.GB93703 at freebie.xs4all.nl>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 11:21:22PM +0200, M.Hirsch wrote..
> > ECC is a way to mask broken hardware. I rather have my hardware fail
> > directly when it does first, so I can replace it _immediately_
> > What's your hardware good for if it passes a "test", but fails in
> > production?
> >
> > ECC is totally overrated.
>
> Balderdash.
>
> Following your rationale you want your bank account data
> silently be corrupted by hardware with bit errors?  Be my guest, give
> me ECC any day.
>
> Proper hardware will log the ECC errors, a proper OS tailored to that
> hardware will log and notify the sysadmins.
>
> That is how it should be done.
>
> Wilko
>
> --
> Wilko Bulte                             wilko at FreeBSD.org
>
>
> ------------------------------
>
> Message: 15
> Date: Mon, 26 Jun 2006 17:28:54 -0400
> From: Michael Proto <mike at jellydonut.org>
> Subject: Re: wi0 down when print a lot of data to screen over ssh
> To: freebsd-stable at freebsd.org
> Message-ID: <44A05196.1070708 at jellydonut.org>
> Content-Type: text/plain; charset=UTF-8
>
> Ren Zhen wrote:
> > wi0 goes down when I run a program print a lot of data to
> > stdout, or when I use zmrx-zmtx it also goes down.
> >
> > kernel says:
> > kernel: wi0: timeout in wi_seek to 152/0
> > last message repeated 7 times
> > kernel: wi0: device timeout
> > kernel: wi0: timeout in wi_seek to 152/0
> > kernel: wi0: link state changed to DOWN
> >
> > another time kernel says:
> > kernel: wi0: timeout in wi_cmd 0x010b; event status 0x8000
> > kernel: wi0: xmit failed
> > kernel: wi0: timeout in wi_seek to 128/0
> > last message repeated 3 times
> >
>
> I used to see similar behavior with wi0 on my ThinkPad A30p (IBM High
> Rate Wireless, PRISM 2.5) when powersave was enabled via ifconfig (I
> believe it may be on by default, not sure about that). If you disable
> powersave via 'ifconfig wi0 -powersave' do you still see the problem?
>
>
> -Proto
>
>
> ------------------------------
>
> Message: 16
> Date: Mon, 26 Jun 2006 23:31:58 +0200
> From: "M.Hirsch" <M.Hirsch at gmx.de>
> Subject: Re: kernel can't find root filesystem
> To: Michael Proto <mike at jellydonut.org>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <44A0524E.900 at gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Sorry, doesn't help.
>
> There is some kind of bug hiding somewhere in 6.1 where it does not
> auto-detect the root partition under certain circumstances. Can't tell
> when it worked last, as the last distro I consider "stable" was 4.X...
> (sorry for the rant...)
>
> I am not using (and don't want to use...) boot0 at all.
> Well, I tried, but it didn't help the situation anyways...
>
> It should work with the standard MBR and boot code ("/boot/mbr" and
> "/boot/boot"), right?
> i.e. fdisk -B and bsdlabel -B without further params should do the job
> to get the system bootstrapped.
> But it does not.
>
> M.
>
> >If I'm not mistaken, you could also try to (re)install the boot0 loader:
> >
> >boot0cfg /dev/da0
> >
> >
> >-Proto
> >_______________________________________________
> >freebsd-stable at freebsd.org mailing list
> >http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
> >
> >
> >
>
>
>
> ------------------------------
>
> Message: 17
> Date: Mon, 26 Jun 2006 23:37:18 +0200
> From: "M.Hirsch" <M.Hirsch at gmx.de>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Wilko Bulte <wb at freebie.xs4all.nl>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <44A0538E.6090906 at gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Nope,
>
> I'd like my bank data to be stored on a system that does ECC, no question.
> But please, on hard disk level (RAID; that is _permanent_), not in the
> RAM of a single node.
>
> If memory gets corrupted, please, raise a kernel panic... Even if
> there's ECC in place.
>
> Counter question:
> Would you like your bank account data to be stored on a medium where one
> failure can be corrected, two can be detected, but three go unnoticed?
> How unlikely is that, if you've got some hardware that is really /broken/?
>
> I know this is a rather random thing to happen.
> Still, I think ECC memory is overrated. Better have it fail immediately.
> _With a kernel panic, please_
>
> M.
>
> Wilko Bulte schrieb:
>
> >Balderdash.
> >
> >Following your rationale you want your bank account data
> >silently be corrupted by hardware with bit errors?  Be my guest, give
> >me ECC any day.
> >
> >Proper hardware will log the ECC errors, a proper OS tailored to that
> >hardware will log and notify the sysadmins.
> >
> >That is how it should be done.
> >
> >Wilko
> >
> >
> >
>
>
>
> ------------------------------
>
> Message: 18
> Date: Mon, 26 Jun 2006 23:45:35 +0200
> From: Wilko Bulte <wb at freebie.xs4all.nl>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch at gmx.de>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <20060626214535.GA94015 at freebie.xs4all.nl>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 11:37:18PM +0200, M.Hirsch wrote..
> > Nope,
> >
> > I'd like my bank data to be stored on a system that does ECC, no
> question.
> > But please, on hard disk level (RAID; that is _permanent_), not in the
> > RAM of a single node.
> >
> > If memory gets corrupted, please, raise a kernel panic... Even if
>
> You *can't* panic if it is just a single bit error in a user page. You
> will never know there was a corruption..  If that was a page holding your
> account data your are toast.
>
> > there's ECC in place.
>
> Of course not.  You only panic once you have no other options left.
> Proper hardware with ECC give you these options.  I am not talking
> consumer grade crap here of course.
>
> > Counter question:
> > Would you like your bank account data to be stored on a medium where one
> > failure can be corrected, two can be detected, but three go unnoticed?
> > How unlikely is that, if you've got some hardware that is really
> /broken/?
>
> Very unlikely.  There is enough hardware design done after all these
> years that this kind of problem can be prevented.
>
> > I know this is a rather random thing to happen.
> > Still, I think ECC memory is overrated. Better have it fail immediately.
> > _With a kernel panic, please_
>
> As said, you can't
>
> >
> > M.
> >
> > Wilko Bulte schrieb:
> >
> > >Balderdash.
> > >
> > >Following your rationale you want your bank account data
> > >silently be corrupted by hardware with bit errors?  Be my guest, give
> > >me ECC any day.
> > >
> > >Proper hardware will log the ECC errors, a proper OS tailored to that
> > >hardware will log and notify the sysadmins.
> > >
> > >That is how it should be done.
> > >
> > >Wilko
> > >
> > >
> > >
> --- end of quoted text ---
>
> --
> Wilko Bulte                             wilko at FreeBSD.org
>
>
> ------------------------------
>
> Message: 19
> Date: Tue, 27 Jun 2006 00:11:03 +0200
> From: "M.Hirsch" <M.Hirsch at gmx.de>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Michael Butler <imb at protected-networks.net>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <44A05B77.1030200 at gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> <snip>
>
> > .. So the logs are there, all that's required is a utility to read them
> >and, optionally, alert the administrator to the event,
> >
> >
> >
> No, I think a panic _should_ occur, even if there was a correctable
> error. Not "when there's no other option left".
> Maybe make it optional via a kernel option.
> There are much less-significant problems that can cause a panic.
>
> Sure, you may be one of the few people out there who knows how to
> correctly run a _BSD_ system...
> There's few of yous out there, ;)
>
> M.
>
>
> ------------------------------
>
> Message: 20
> Date: Tue, 27 Jun 2006 00:18:04 +0200
> From: Wilko Bulte <wb at freebie.xs4all.nl>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch at gmx.de>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <20060626221804.GA94278 at freebie.xs4all.nl>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, Jun 27, 2006 at 12:11:03AM +0200, M.Hirsch wrote..
> > <snip>
> >
> > >.. So the logs are there, all that's required is a utility to read them
> > >and, optionally, alert the administrator to the event,
> > >
> > >
> > >
> > No, I think a panic _should_ occur, even if there was a correctable
> > error. Not "when there's no other option left".
>
> You really have never seen a machine used for serious business apparantly.
>
> > Maybe make it optional via a kernel option.
> > There are much less-significant problems that can cause a panic.
>
> panics like that should be eradicated, adding more nonsensical panics
> is not what we need.
>
> > Sure, you may be one of the few people out there who knows how to
> > correctly run a _BSD_ system...
> > There's few of yous out there, ;)
> >
> > M.
> > _______________________________________________
> > freebsd-stable at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org
> "
> --- end of quoted text ---
>
> --
> Wilko Bulte                             wilko at FreeBSD.org
>
>
> ------------------------------
>
> Message: 21
> Date: Tue, 27 Jun 2006 01:22:47 +0300 (EEST)
> From: Dmitry Pryanishnikov <dmitry at atlantis.dp.ua>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch at hirsch.it>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <20060627011512.N95667 at atlantis.atlantis.dp.ua>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
>
> Hello!
>
> On Mon, 26 Jun 2006, M.Hirsch wrote:
> > ECC is a way to mask broken hardware. I rather have my hardware fail
> directly
> > when it does first, so I can replace it _immediately_
>
>   You got it backwards. If your data has any value to you, then you don't
> want
> to miss any single-error bit in it, do you? If you're running hardware w/o
> ECC, your single-bit error in your data will go to the disk unnoticed, and
> you'll lose your data. With ECC, hardware will correct it. In (rare) case
> of
> multiple-bit error ECC logic will generate NMI for you, so you'll notice
> and
> "replace it _immediately_" instead of two weeks ago when your archive wont
> extract.
>
> > What's your hardware good for if it passes a "test", but fails in
> production?
>
>   It's the way in what RAM will manifest single-bit errors: you run memory
> test
> - it won't catch them, later in production you'll miss this error because
> nothing will provide extra sanity check of your data.
>
> > ECC is totally overrated.
>
>   Only by the people who don't understand it's point!
>
>
> Sincerely, Dmitry
> --
> Atlantis ISP, System Administrator
> e-mail:  dmitry at atlantis.dp.ua
> nic-hdl: LYNX-RIPE
>
>
> ------------------------------
>
> Message: 22
> Date: Mon, 26 Jun 2006 18:55:17 -0300 (ADT)
> From: "Marc G. Fournier" <scrappy at hub.org>
> Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ...
> To: Max Laier <max at love2party.net>
> Cc: Kostik Belousov <kostikbel at gmail.com>, freebsd-stable at freebsd.org,
>         Dmitry Morozovsky <marck at rinet.ru>
> Message-ID: <20060626185437.I1114 at ganymede.hub.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Mon, 26 Jun 2006, Max Laier wrote:
>
> > On Monday 26 June 2006 20:25, Marc G. Fournier wrote:
> >> I think I might have found *at least* one of the problems, and that
> being
> >> the excessively high blocked states while ps isn't finding anything ...
> >>
> >> MySQL
> >>
> >> We just recently started allowing clients to run a MySQL server
> *within*
> >> their vServer ... in a drastic move, I just shut them all down on
> pluto,
> >> and blocked drop'd from ~86 down to 5 in a matter of moments ...
> >> restarting them all has it climbing once more, being up around 22
> already
> >> ...
> >>
> >> I'm going to go with that theory for now, and keep an eye on things ...
> >>
> >> Just curious as to why, even with -H, its not showing any blocked
> states
> >> within ps though ... ?
> >
> > The "blocked" column shows also processes that have objects "paging".
> > Most likely you are *short* on memory.  In order to relieve the pressure
> > program .text pages are free'ed and need to be refetched from disc
> > whenever the respective code is being executed.
>
> 'k, but shouldn't the OS be doing any swapping, if this was the case?  I'm
> getting <1M of swappage when the blocked pages are really high ...
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org
> )
> Email . scrappy at hub.org                              MSN . scrappy at hub.org
> Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
>
>
> ------------------------------
>
> Message: 23
> Date: Mon, 26 Jun 2006 18:54:08 -0300 (ADT)
> From: "Marc G. Fournier" <scrappy at hub.org>
> Subject: Re: What denotes a 'blocked' process?
> To: Kostik Belousov <kostikbel at gmail.com>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <20060626185338.D1114 at ganymede.hub.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Mon, 26 Jun 2006, Kostik Belousov wrote:
>
> > Dumb unmotivated question: do you have nfs exports on this machine ?
>
> neither nfs nor mountd are currently running ...
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org
> )
> Email . scrappy at hub.org                              MSN . scrappy at hub.org
> Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
>
>
> ------------------------------
>
> Message: 24
> Date: Mon, 26 Jun 2006 18:02:38 -0400
> From: "Michael Butler" <imb at protected-networks.net>
> Subject: RE: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "'Wilko Bulte'" <wb at freebie.xs4all.nl>,     "'M.Hirsch'"
>         <M.Hirsch at gmx.de>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <000001c6996c$3eab9df0$ad0d510a at toshi>
> Content-Type: text/plain;       charset="us-ascii"
>
> > Of course not.  You only panic once you have no other options left.
> > Proper hardware with ECC give you these options.  I am not talking
> > consumer grade crap here of course.
>
> I agree that no panic should occur if the error was correctable and it
> should when it isn't.
>
> However, *real* equipment will log a corrected error .. from an aging Dell
> 1-U server ..
>
> Handle 0x0024, DMI type 15, 33 bytes
> System Event Log
>         Area Length: 4096 bytes
>         Header Start Offset: 0x0000
>         Header Length: 16 bytes
>         Data Start Offset: 0x0010
>         Access Method: Memory-mapped physical 32-bit address
>         Access Address: 0xFFF33000
>         Status: Valid, Not Full
>         Change Token: 0x00000000
>         Header Format: Type 1
>         Supported Log Type Descriptors: 5
>         Descriptor 1: POST error
>         Data Format 1: POST results bitmap
>         Descriptor 2: Parity memory error
>         Data Format 2: Multiple-event
>         Descriptor 3: I/O channel block
>         Data Format 3: Multiple-event
>         Descriptor 4: Single-bit ECC memory error
>         Data Format 4: Multiple-event
>         Descriptor 5: Multi-bit ECC memory error
>         Data Format 5: Multiple-event
>
> .. So the logs are there, all that's required is a utility to read them
> and, optionally, alert the administrator to the event,
>
> Michael Butler, CISSP
> Security Architect
> Protected Networks
> http://www.protected-networks.net
>
>
>
> ------------------------------
>
> Message: 25
> Date: Mon, 26 Jun 2006 23:54:53 +0200
> From: "M.Hirsch" <M.Hirsch at hirsch.it>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Wilko Bulte <wb at freebie.xs4all.nl>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <44A057AD.7050700 at hirsch.it>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Ok, sorry. Misunderstanding here.
> My point was, along what has been posted here in this thread:
> "An ECC error should raise a kernel panic immediately, not only a
> message in the log files."
> Any hardware showing ECC errors should be replaced asap..
> Make them lazy admins do what they're getting paid for...
>
> Correct, you can't (quickly) detect this without ECC hardware, of course.
> But I keep reading about "ECC" being the solution to broken RAM sticks...
>
> Since FreeBSD panics on creating simple malloc() vnodes, it should do so
> on ECC errors first.
> Different mission, I guess ;)
> (And different problems with the recent fricking code...)
>
> M.
>
>
> ------------------------------
>
> Message: 26
> Date: Tue, 27 Jun 2006 00:02:06 +0200
> From: Wilko Bulte <wb at freebie.xs4all.nl>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch at hirsch.it>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <20060626220206.GA94183 at freebie.xs4all.nl>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jun 26, 2006 at 11:54:53PM +0200, M.Hirsch wrote..
> > Ok, sorry. Misunderstanding here.
> > My point was, along what has been posted here in this thread:
> > "An ECC error should raise a kernel panic immediately, not only a
> > message in the log files."
> > Any hardware showing ECC errors should be replaced asap..
>
> Yes, but keep in mind that ASAP often means "during a scheduled
> maintenance window".  Which can be months away in some cases.
>
> > Make them lazy admins do what they're getting paid for...
> >
> > Correct, you can't (quickly) detect this without ECC hardware, of
> course.
>
> Skip the 'quickly', you need ECC, full stop.  Otherwise you will not
> detect
> it until it is way too late.  I can tell you from personal experience
> that customers hate nothing more than undetected data corruption.  ECC
> RAM is only part of the fix of course.  ECC better be end to end, but it
> hardly is..
>
> > But I keep reading about "ECC" being the solution to broken RAM
> sticks...
>
> Not really of course.  But there are OS-es that simply map pages with
> known problems into a "do not use" list.
>
> > Since FreeBSD panics on creating simple malloc() vnodes, it should do so
> > on ECC errors first.
> > Different mission, I guess ;)
> > (And different problems with the recent fricking code...)
> >
> > M.
> --- end of quoted text ---
>
> --
> Wilko Bulte                             wilko at FreeBSD.org
>
>
> ------------------------------
>
> Message: 27
> Date: Tue, 27 Jun 2006 00:33:39 +0200
> From: "M.Hirsch" <M.Hirsch at hirsch.it>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Wilko Bulte <wb at freebie.xs4all.nl>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <44A060C3.8090008 at hirsch.it>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Wilko Bulte schrieb:
>
> >You really have never seen a machine used for serious business
> apparantly.
> >
> >
> >
> Depends on what you define "serious business"...
> Yes, I am rather new to FreeBSD (2y+)
> I am just trying to setup a /stable/ cluster of six machines right now.
> For over a week straight.
> 4.11 works perfectly. But support is going to be dropped very soon, so
> that's a bad option for me right now.
>
> Over all, the system is /only/ supposed to handle a few hundred hits per
> second. (but including dynamic stuff like php...)
>
> Dunno if that (or what else) is "serious business" for you.
> Which version would you suggest for "serious business"?
>
> Anyways, my point stands: I rather have any of my nodes panic than
> carrying the risk of creating invalid data...
> One in a billion can be high probability, soon... (just planning for the
> future...)
>
> >panics like that should be eradicated, adding more nonsensical panics
> >is not what we need.
> >
> >
> uh, I would not call hardware failure "nonsensical panics". I guess I
> must have misunderstood you...
>
> M.
>
>
> ------------------------------
>
> Message: 28
> Date: Tue, 27 Jun 2006 00:39:47 +0200
> From: "M.Hirsch" <M.Hirsch at hirsch.it>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: Dmitry Pryanishnikov <dmitry at atlantis.dp.ua>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <44A06233.1090704 at hirsch.it>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Dmitry Pryanishnikov schrieb:
>
> >
> > Hello!
> >
> > On Mon, 26 Jun 2006, M.Hirsch wrote:
> >
> >> ECC is a way to mask broken hardware. I rather have my hardware fail
> >> directly when it does first, so I can replace it _immediately_
> >
> >
> >  You got it backwards. If your data has any value to you, then you
> > don't want
> > to miss any single-error bit in it, do you? If you're running hardware
> > w/o
> > ECC, your single-bit error in your data will go to the disk unnoticed,
> > and you'll lose your data. With ECC, hardware will correct it. In
> > (rare) case of multiple-bit error ECC logic will generate NMI for you,
> > so you'll notice and "replace it _immediately_" instead of two weeks
> > ago when your archive wont extract.
> >
> Nope, I am right on track.
> I do not want to lose any data. So I'd prefer a ECC error to raise a
> panic so I can replace the hardware ASAP.
> Don't get me wrong, but tracking bugs in FreeBSD is quite more of an
> effort than "just" akquiring a new box...
>
> >> What's your hardware good for if it passes a "test", but fails in
> >> production?
> >
> >
> >  It's the way in what RAM will manifest single-bit errors: you run
> > memory test - it won't catch them, later in production you'll miss
> > this error because
> > nothing will provide extra sanity check of your data.
>
> Ok...
> Does the standard fs, UFS2, do "extra sanity checks", then?
>
> M.
>
>
> ------------------------------
>
> Message: 29
> Date: Tue, 27 Jun 2006 00:51:56 +0200
> From: "M.Hirsch" <M.Hirsch at gmx.de>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch at hirsch.it>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <44A0650C.7020806 at gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> > Ok...
> > Does the standard fs, UFS2, do "extra sanity checks", then?
> >
> Sorry, replying to myself...
> No, this does not matter.
> If the OS thinks the data is ok, UFS will write OK data...
>
> So, let me rephrase this:
> How can I make sure there is no broken hardware in my cluster?
> I am not looking for workarounds, like ECC. I want the box to break
> immediately once any single component goes wrong...
>
>
>
> ------------------------------
>
> Message: 30
> Date: Tue, 27 Jun 2006 01:57:17 +0300 (EEST)
> From: Dmitry Pryanishnikov <dmitry at atlantis.dp.ua>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch at hirsch.it>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <20060627014335.E87535 at atlantis.atlantis.dp.ua>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Tue, 27 Jun 2006, M.Hirsch wrote:
> >> On Mon, 26 Jun 2006, M.Hirsch wrote:
> >>> ECC is a way to mask broken hardware. I rather have my hardware fail
> >>> directly when it does first, so I can replace it _immediately_
> >>
> >>
> >>  You got it backwards. If your data has any value to you, then you
> don't
> >>
> > Nope, I am right on track.
> > I do not want to lose any data. So I'd prefer a ECC error to raise a
> panic so
> > I can replace the hardware ASAP.
>
>   When you wrote "ECC is a way to mask broken hardware", you were plain
> wrong.
> If you're using hardware w/o ECC, it just can't tell whether error present
> or absent. So ECC _is_ the way to detect (not mask) broken hardware.
>
>   If you want ECC corrector to raise NMI on corrected error (as well as
> uncorrectable), just set approproate bit in control register - every
> Intel's ECC-capable chipset allows it. But if we're speaking about
> production environment, such behaviour (abnormal termination on
> _corrected_
> error) is unacceptable.
>
> > Don't get me wrong, but tracking bugs in FreeBSD is quite more of an
> effort
> > than "just" akquiring a new box...
>
>   I don't see connection between this sentence and ECC (which is hardware
> option).
>
> > Does the standard fs, UFS2, do "extra sanity checks", then?
>
>   Ditto. And don't forget that _every_ data sector on HDD _is_ checked
> with CRC. As well as ATA data transfers in UDMA modes. As well as data
> in CPU cache. Extra check gives extra reliability.
>
> Sincerely, Dmitry
> --
> Atlantis ISP, System Administrator
> e-mail:  dmitry at atlantis.dp.ua
> nic-hdl: LYNX-RIPE
>
>
> ------------------------------
>
> Message: 31
> Date: Mon, 26 Jun 2006 23:59:02 +0100
> From: "Steven Hartland" <killing at multiplay.co.uk>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch at hirsch.it>,    "Dmitry Pryanishnikov"
>         <dmitry at atlantis.dp.ua>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <005401c69974$217f8860$b3db87d4 at multiplay.co.uk>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>         reply-type=response
>
> M.Hirsch wrote:
> > Ok...
> > Does the standard fs, UFS2, do "extra sanity checks", then?
>
> My advice would be dont feed the troll.
>
>     Steve
>
>
> ================================================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and
> the person or entity to whom it is addressed. In the event of misdirection,
> the recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
>
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to postmaster at multiplay.co.uk.
>
>
>
> ------------------------------
>
> Message: 32
> Date: Tue, 27 Jun 2006 01:09:03 +0200
> From: Thomas Nystr?m <thn at saeab.se>
> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
> To: "M.Hirsch" <M.Hirsch at hirsch.it>
> Cc: freebsd-stable at freebsd.org
> Message-ID: <44A0690F.8040005 at saeab.se>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> M.Hirsch wrote:
> > Any hardware showing ECC errors should be replaced asap..
>
> No. ALL memory will sooner or later show single bit error.
>
> Several years ago I was checking this during my work at Ericsson.
> There was a discussion if ECC should be present in the GSM-base-stations
> or not. I had a special test-software running in several units looking
> for soft-errors. Soft errors are bits that are flipped spontaneously in
> the memory. When the bit are rewritten it will work OK again, no
> permanent damage to the memory and no need to replace the memory.
>
> During my test period (I think it was 6-8 monthes) I saw four occasions
> when this occured (total amount of memory 96 MB).
>
> ECC is intended to fix this: It will correct a single bit fault and
> allow the system to contiune uninterrupted.
>
> Of course this event should be logged and if it occurs several times
> at the same place then it is time to replace the memory.
>
> Of course memory should be better these days but.... knock on wood....
>
> /thn [20 years as HW-designer, FreeBSD since 3.0]
>
> --
> ---------------------------------------------------------------
> Svensk Aktuell Elektronik AB                     Thomas Nystr�m
> Box 10                                    Phone: +46 8 35 92 85
> S-191 21  Sollentuna                        Fax: +46 8 35 92 86
> Sweden                                      Email: thn at saeab.se
> ---------------------------------------------------------------
>
>
> ------------------------------
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>
> End of freebsd-stable Digest, Vol 164, Issue 4
> **********************************************
>


More information about the freebsd-stable mailing list