random FreeBSD panics
freebsd at jdc.parodius.com
Mon Mar 29 20:30:50 UTC 2010
On Mon, Mar 29, 2010 at 02:27:34PM -0400, John Baldwin wrote:
> On Monday 29 March 2010 1:30:38 pm Jeremy Chadwick wrote:
> > On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> > > On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivoras at freebsd.org> wrote:
> > > > On 28 March 2010 16:42, Masoom Shaikh <masoom.shaikh at gmail.com> wrote:
> > > >
> > > >> lets assume if this is h/w problem, then how can other OSes overcome
> > > >> this ? is there a way to make FreeBSD ignore this as well, let it
> > > >> result in reasonable performance penalty.
> > > >
> > > > Very probably, if only we could detect where the problem is.
> > > > Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
> > >
> > > this option is already there
> > The key word in Ivan's phrase is "less mangled". Neither use of or
> > increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
> > output. I've been ranting/raving about this problem for years now; it
> > truly looks like a mutex lock issue (or lack of such lock), but I've
> > been told numerous times that isn't the case.
> > To developers: what incentives would help get this issue well-needed
> > attention? This problem makes kernel debugging, panic analysis, and
> > other console-oriented viewing basically impossible.
> I was recently going to look at it. The somewhat drastic approach I was going
> to take was to add a simple serializing lock around trap_fatal() and a few
> other places that do similar block prints (e.g. mca_log()). One of the issues
> with fixing this in printf itself is that you'd want probably want to
> serialize complete lines of text on a per-thread basis. You would want to be
> able to accumulate this line of text across multiple calls to printf (think of
> it as line-buffering ala stdio). However, some folks may be nervous about
> printf not printing things immediately.
> The other issue is that lots of code assumes it can call printf from anywhere
> and everywhere. Mostly this just means that if you add locking and line-
> buffering to printf(9) you have to be very careful to make sure it works in
> odd places. Probably a lot of this could be solved by deferring things like
> trap_fatal() until panic() has already been called (which is bde's preferred
> solution I think).
Thanks for the insights, they're greatly appreciated.
I went looking this morning to see how Linux addressed this issue (if at
all), and it's been discussed a few times in the past. The longest lkml
thread I could find that mentioned the problem was circa 2002. Probably
not worth reading as there was work done in 2009 to solve the issue.
Work done by RedHat in 2009 details how they implemented a lockless
version of their kernel ring buffer (similar to our system message
buffer, but probably a lot more complex):
Supposedly having multiple writers to the ring is 100% safe; no
interspersed output. Same goes for interrupt-generated stuff. There's
some comments in the technical document (2nd link) that imply there's an
individual ring buffer for each CPU; possibly per-CPU kernel message
buffers would solve our issue?
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-questions