Server gets a high load, but no CPU use, and then later stops respond on the network

lokadamus at gmx.de lokadamus at gmx.de
Fri Sep 23 20:56:02 UTC 2016


On 09/21/16 11:38, Ståle Kristoffersen wrote:
> On 2016-09-20 at 16:57, Anton Yuzhaninov wrote:
>> On 2016-09-13 19:23, Stxe5le Bordal Kristoffersen wrote:
>>> about once a day, but not in any pattern, it starts getting a load of 5-10
>>> and usually stops responding over the network before I notice it.
>>
>> Does it stop responding completely (including ping) or only some 
>> services and ssh doesn't respond?
> 
> It just starts getting more and more lagged. It usually responds to ping,
> but ssh can start to time out. Already opened ssh sessions can live quite
> long, but running stuff can be a problem after a while.
> 
>>
>>> From googling a bit, I have tried to disable msix on the igb network
>>> interface, and increased the nmbclusters with no apparent change in behaviour.
>>> (kern.ipc.nmbclusters="1000000" and hw.igb.enable_msix=0 in loader.conf)
>>
>> kern.ipc.nmbclusters on modern FreeBSD version autotuned to very big 
>> value and manual increasing is rarely need.
>>
>> Disabling msix on igb is also unlikely need.
> 
> This was more of a "grasping at straws"-move, and only included that for
> completeness.
> 
>>> All I see is that the igb0 taskq pid is almost always in the RUN state when
>>> the machine is having trouble.
>>
>> There is no igb0 taskq in top output below.
>>
>> To see and inspect how top output looks when machine stops responding it 
>> is useful to run top from cron and log output.
>>
>> Example script for top logging:
>> https://bitbucket.org/snippets/citrin/BpeXb
>>
>> In top output you should look at WCPU and STATE for kernel threads and 
>> for unresponding network daemons.
> 
> I've now configured that script to run, and I'll share the results the next
> time the server has issues.
> 
>> Also do you have network load graph (bytes and packets per second) for 
>> this host (I saw munin in process list) - may be load is too high in 
>> moments when host not responding.
> 
> When this happens network traffic crawls to a stop. I've also checked that
> there isn't any other traffic on the network port causing problems. I also
> tried doing 'ifconfig igb0 down' on the interface just to see if the server
> would unclog itself.
> 
>> Do you use firewalls or netgraph?
> 
> No, nothing configured.
> 
>> Which is the primary function of this server?
> 
> Its a fileserver, sharing files via samba and FTP.
> 
I have no idea. Can you tell me, what dmesg tell you? it looks like
there is a system overun, but difficult to understand why.

Greetings


More information about the freebsd-questions mailing list