Hard lockups using 5.3-RELEASE..

Robert Watson rwatson at FreeBSD.org
Sat Feb 19 12:45:11 GMT 2005

On Sat, 19 Feb 2005, Peter Losher wrote:

> We have a Celestica dual-Opteron system w/ 4GB RAM running
> 5.3-RELEASE/i386 (32-bit), and a SMP-aware kernel, which is experiencing
> hard lockups.  Debugging results below. 

Hmm.  So just to summarize:

- The system appears to wedge
- Serial break can get into the debugger

Have you tried updating to the latest RELENG_5_3 patch level?  That
includes at least one significant SMP stability fix.  You can rebuild
along the RELENG_5_3 branch, or just use freebsd-update to pull it in.

> It looks like it's trying to lock Giant while it already has Giant.  In
> any case, we have rebuilt a uniprocessor kernel for now.  If this is
> already fixed in 5-STABLE, then let me know. ;) 

Generally speaking, recursing Giant is fine, as Giant is a recursible
mutex; however, an ithread shouldn't already hold Giant at that point.

This may be fixed in 5-STABLE, but it's hard to say.  I think the order of
operations here is:

- First, slide to RELENG_5_3 head (p5?) to make sure you have the IPI
  stability fix.  See if the problem goes away.

- Generate the following information: when the box is wedged, does it...

  (1) Respond to pings
  (2) Does the num lock light go on and off when the num lock key is hit
  (3) If it responds to pings, what happens when you build a new TCP
      connection to an open TCP port (a) once (b) twice (c) the 100'd
      (or so) time.

- Generate the following DDB output using your serial console:

  show pcpu
  show pcpu 0
  show pcpu 1
  show lockedvnods

  I may then ask you to generate stack traces of the processes that appear
  "interesting".  The definition of interesting is a little bit
  context-specifi so it's hard to say what it is just now.  If there are a
  lot of processes wedged in VM and VFS, then I'll ask you to trace each
  process that appears in the lockedvnods output. 

- Next, recompile with INVARIANTS and see if the problem triggers an
  assertion failure when it occurs.

- Next, recompile with WITNESS and see if WITNESS creates a warning or
  assertion failure when it occurs.

  Break to the debugger and generate the above DDB output, but also "show
  allocks" (5-STABLE only), or "show locks" for interesting processes if

Also, I don't think you mentioned what sort of workload is present on the


Robert N M Watson

More information about the freebsd-stable mailing list