random FreeBSD panics
Jeremy Chadwick
freebsd at jdc.parodius.com
Mon Mar 29 20:30:50 UTC 2010
On Mon, Mar 29, 2010 at 02:27:34PM -0400, John Baldwin wrote:
> On Monday 29 March 2010 1:30:38 pm Jeremy Chadwick wrote:
> > On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> > > On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivoras at freebsd.org> wrote:
> > > > On 28 March 2010 16:42, Masoom Shaikh <masoom.shaikh at gmail.com> wrote:
> > > >
> > > >> lets assume if this is h/w problem, then how can other OSes overcome
> > > >> this ? is there a way to make FreeBSD ignore this as well, let it
> > > >> result in reasonable performance penalty.
> > > >
> > > > Very probably, if only we could detect where the problem is.
> > > > Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
> > >
> > > this option is already there
> >
> > The key word in Ivan's phrase is "less mangled". Neither use of or
> > increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
> > output. I've been ranting/raving about this problem for years now; it
> > truly looks like a mutex lock issue (or lack of such lock), but I've
> > been told numerous times that isn't the case.
> >
> > To developers: what incentives would help get this issue well-needed
> > attention? This problem makes kernel debugging, panic analysis, and
> > other console-oriented viewing basically impossible.
>
> I was recently going to look at it. The somewhat drastic approach I was going
> to take was to add a simple serializing lock around trap_fatal() and a few
> other places that do similar block prints (e.g. mca_log()). One of the issues
> with fixing this in printf itself is that you'd want probably want to
> serialize complete lines of text on a per-thread basis. You would want to be
> able to accumulate this line of text across multiple calls to printf (think of
> it as line-buffering ala stdio). However, some folks may be nervous about
> printf not printing things immediately.
>
> The other issue is that lots of code assumes it can call printf from anywhere
> and everywhere. Mostly this just means that if you add locking and line-
> buffering to printf(9) you have to be very careful to make sure it works in
> odd places. Probably a lot of this could be solved by deferring things like
> trap_fatal() until panic() has already been called (which is bde's preferred
> solution I think).
John,
Thanks for the insights, they're greatly appreciated.
I went looking this morning to see how Linux addressed this issue (if at
all), and it's been discussed a few times in the past. The longest lkml
thread I could find that mentioned the problem was circa 2002. Probably
not worth reading as there was work done in 2009 to solve the issue.
http://lkml.indiana.edu/hypermail/linux/kernel/0204.1/index.html#161
Work done by RedHat in 2009 details how they implemented a lockless
version of their kernel ring buffer (similar to our system message
buffer, but probably a lot more complex):
http://lwn.net/Articles/340400/
http://lwn.net/Articles/340443/
Supposedly having multiple writers to the ring is 100% safe; no
interspersed output. Same goes for interrupt-generated stuff. There's
some comments in the technical document (2nd link) that imply there's an
individual ring buffer for each CPU; possibly per-CPU kernel message
buffers would solve our issue?
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-questions
mailing list