Server gets a high load, but no CPU use, and then later stops respond on the network
lokadamus at gmx.de
lokadamus at gmx.de
Fri Sep 23 20:56:02 UTC 2016
On 09/21/16 11:38, Ståle Kristoffersen wrote:
> On 2016-09-20 at 16:57, Anton Yuzhaninov wrote:
>> On 2016-09-13 19:23, Stxe5le Bordal Kristoffersen wrote:
>>> about once a day, but not in any pattern, it starts getting a load of 5-10
>>> and usually stops responding over the network before I notice it.
>> Does it stop responding completely (including ping) or only some
>> services and ssh doesn't respond?
> It just starts getting more and more lagged. It usually responds to ping,
> but ssh can start to time out. Already opened ssh sessions can live quite
> long, but running stuff can be a problem after a while.
>>> From googling a bit, I have tried to disable msix on the igb network
>>> interface, and increased the nmbclusters with no apparent change in behaviour.
>>> (kern.ipc.nmbclusters="1000000" and hw.igb.enable_msix=0 in loader.conf)
>> kern.ipc.nmbclusters on modern FreeBSD version autotuned to very big
>> value and manual increasing is rarely need.
>> Disabling msix on igb is also unlikely need.
> This was more of a "grasping at straws"-move, and only included that for
>>> All I see is that the igb0 taskq pid is almost always in the RUN state when
>>> the machine is having trouble.
>> There is no igb0 taskq in top output below.
>> To see and inspect how top output looks when machine stops responding it
>> is useful to run top from cron and log output.
>> Example script for top logging:
>> In top output you should look at WCPU and STATE for kernel threads and
>> for unresponding network daemons.
> I've now configured that script to run, and I'll share the results the next
> time the server has issues.
>> Also do you have network load graph (bytes and packets per second) for
>> this host (I saw munin in process list) - may be load is too high in
>> moments when host not responding.
> When this happens network traffic crawls to a stop. I've also checked that
> there isn't any other traffic on the network port causing problems. I also
> tried doing 'ifconfig igb0 down' on the interface just to see if the server
> would unclog itself.
>> Do you use firewalls or netgraph?
> No, nothing configured.
>> Which is the primary function of this server?
> Its a fileserver, sharing files via samba and FTP.
I have no idea. Can you tell me, what dmesg tell you? it looks like
there is a system overun, but difficult to understand why.
More information about the freebsd-questions