System Freezes When MBufClust Usages Rises

Mon Nov 12 00:42:43 PST 2007

On Sat, 10 Nov 2007, Ed Mandy wrote:

> If kern.ipc.nmbclusters is set to 25600, the system will hard freeze when 
> "vmstat -z" shows the number of clusters reaches 25600.  If 
> kern.ipc.nmbclusters is set to 0 (or 102400), the system will hard freeze 
> when "vmstat -z" shows the number of clusters is around 66000.  When it 
> freezes, the number of Kbytes allocated to network (as shown by "netstat 
> -m") is roughly 160,000 (160MB).
>
> For a while, we thought that there may be a limit of 65536 mbuf clusters, so 
> we tested building the kernel with MCLSHIFT=12, which makes each mbcluster 
> 4096-bytes.  With this configuration, nmbclusters only reached about 33000 
> before the system froze.  The number of Kbytes allocated to network (as 
> shown by "netstat -m") still maxed out at around 160,000.
>
> Now, it seems that we are running into some other memory limitation that 
> occurs when our network allocation gets close to 160MB.  We have tried 
> tuning paramaters such as KVA_PAGES, vm.kmem_size, vm.kmem_size_max, etc. 
> Though, we are unsure if the mods we made there helped in any way.
>
> This is all being done on Celeron 2.8GHz machines with 3+ GB of RAM running 
> FreeBSD 5.3.  We are very much tied to this platform at the moment, and 
> upgrading is not a realistic option for us.  We would like to tune the 
> systems to not lockup.  We can currently work around the problem (by using 
> smaller buffers and such), but it is at the expense of network throughput, 
> which is less than ideal.
>
> Are there any other parameters that would help us to allocate more memory to 
> the kernel networking?  What other options should we look into?

I'd like to diagnose "freeze hard" a little more to understand what's going 
on.  Hopefully this won't be too disruptive for your environment while you're 
doing it.

First off, can you tell me how you're accessing the system to run diagnostic 
tools, monitor it, etc?  Remember that if you run out of clusters, you may 
experience network deadlocks that prevent SSH sessions from operating (since 
there may be no memory for them to operate), so direct console access may be 
required to effectively monitor the system when in an extreme state of low 
memory in the network stack.  Could you tell me if you are using a serial 
console or the video console?  (Or firewire, I suppose?)

FreeBSD 5.3 was the first release to include an MPSAFE network stack, and 
there were a number of optionally compiled features that could disable MPSAFE 
networking, resulting in the Giant lock being held over network operations. 
Could you tell me what the value of the sysctl debug.mpsafenet is?

When the system appears to hard hang, does it recover if, say, left five 
minutes?  What if you unplug the network cable and leave it five minutes?

Does the numlock key on the console work?  If you leave the console logged in 
and running an application (such as "sleep 100000") and the system hangs, what 
do you see if you hit Ctrl-T?

If you compile options BREAK_TO_DEBUGGER into the kernel and generate a serial 
break / hit ctrl-alt-esc, are you able to get into the debugger?  If you type 
in "trace", what do you get?  (There is a chapter of the developer's handbook 
that talks about using the kernel debugger, FYI).  With 5.3, we found that 
usig a serial console to get to the debugger was a lot more reliable than the 
video console -- this is in part because a significant amount of the kernel 
(especially file systems and the video console) still run under Giant, so a 
thread hanging while holding Giant can prevent a console break from getting to 
the debugger.  My advice would be to use a serial console anyway, if possible, 
when debugging, as it means you can use a second machine to copy and paste DDB 
output into a file to e-mail out later.  After about the third line of a 
kernel stack trace, copying addresses out by hand becomes pretty painful :-).

Unfortunately, I have to say that my first advice would be to upgrade -- not 
just because a lot of work has been done relating to network stack performance 
and stability since 5.3, but also because the debugging tools have gotten a 
lot better since then.  For example, in more recent versions the kernel 
debugging includes memory monitoring tools, commands to more readily extract 
debugging information, etc.  5.3 is a solid and functional release, but when 
it comes to debugging problems of this sort, being on a more recent release 
means you're more likely to see the problem already fixed, and even if not, it 
will be easier for us to fix it.  I understand that may simply not be 
possible, but if you have that flexibility, it's good advice.

A general comment on configuration: increasing the maximum memory allocated to 
the network stack can indeed increase your KVA usage significantly.  You might 
well find that tuning KVA up is required to run with very high memory 
configurations for the network stack, so your intuitions about tuning that up 
aren't bad.  However, when you run out of KVA, the result is usually a panic 
(since the kernel basically has to halt), so if you're not seeing a panic then 
you're probably not yet hitting the limit.

Robert N M Watson
Computer Laboratory
University of Cambridge