Or it could be ZFS memory starvation and 9k packets (was Re: istgt causes massive jumbo nmbclusters loss)

Sat Oct 26 05:16:37 UTC 2013

At first I thought this was entirely the interaction of istgt and 9k
packets, but after some observation (and a few more hangs) I'm reasonably
positive it's a form of resource starvation related to ZFS and 9k packets.

To reliably trigger the hang, I need to do something that triggers a demand
for 9k packets (like istgt traffic, but also bit torrent traffic --- as you
see the MTU is 9014) and it must have been some time since the system
booted.  ZFS is fairly busy (with both NFS and SMB guests), so it generally
takes quite a bit of the 8G of memory for itself.

Now... below the netstat -m shows 1399 9k bufs with 376 available.  When
the network gets busy, I've seen 4k or even 5k bufs in total... never near
the 77k max.  After some time of lesser activity, the number of 9k buffers
returns to this level.

When the problem occurs, the number of denied buffers will shoot up at the
rate of several hundred or even several thousand per second, but the system
will not be "out" of memory.  Top will show 800 meg often in the free
column when this happens.  While it's happening, when I'm logged into the
console, none of these stats seem out of place, save the number of denied
9k buffer allocations and the "cache" of 9k buffers will be less than 10
(but I've never seen it at 0).

On Tue, Oct 22, 2013 at 3:42 PM, Zaphod Beeblebrox <zbeeble at gmail.com>wrote:

> I have a server
>
> FreeBSD virtual.accountingreality.com 9.2-STABLE FreeBSD 9.2-STABLE #13
> r256549M: Tue Oct 15 16:29:48 EDT 2013
> root at virtual.accountingreality.com:/usr/obj/usr/src/sys/VRA  amd64
>
> That has an em0 with jumbo packets enabled:
>
> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9014
>
> It has (among other things): ZFS, NFS, iSCSI (via istgt) and Samba.
>
> Every day or two, it looses it's ability to talk to the network.  ifconfig
> down/up on em0 gives the message about not being able to allocate the
> receive buffers...
>
> With everything running, but with specifically iSCSI not used, everything
> seems good.  When I start hitting istgt, I see the denied stat for 9k mbufs
> rise very rapidly (this amount only took a few seconds):
>
> [1:47:347]root at virtual:/usr/local/etc/iet> netstat -m
> 1313/877/2190 mbufs in use (current/cache/total)
> 20/584/604/523514 mbuf clusters in use (current/cache/total/max)
> 20/364 mbuf+clusters out of packet secondary zone in use (current/cache)
> 239/359/598/261756 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 1023/376/1399/77557 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/43626 16k jumbo clusters in use (current/cache/total/max)
> 10531K/6207K/16738K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> 0/50199/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
>
> ... the denied number rises... and somewhere in the millions or more the
> machine stops --- but even with the large number of denied 9k clusters, the
> "9k jumbo clusters in use" line will always indicate some available.
>
> ... so is this a tuning or a bug issue?  I've tried ietd --- basically it
> doesn't want to work with a zfs zvol, it seems (refuses to use it).
>
>