FreeBSD 11.x grinds to a halt after about 48h of uptime
Kevin Oberman
rkoberman at gmail.com
Sat Oct 15 16:36:29 UTC 2016
On Sat, Oct 15, 2016 at 9:26 AM, Hans Petter Selasky <hps at selasky.org>
wrote:
> On 10/15/16 18:18, Ulrich Spörlein wrote:
>
>> Hey all, while 11.x is -STABLE now, this happens to my machine ever
>> since I upgraded it to 11-CURRENT years ago. I have no idea when this
>> started, actually, but what always happens is this:
>>
>> - System and X11 is up and running, I keep it running over night as I'm
>> too lazy to reboot and restart everthing.
>> - There's a bunch of xterms, Chrome, Clementine-Player and some other
>> programs running
>> - Coming back to the machine the next day (or the day after) it will
>> exit the screensaver just fine and then either I can use it for a couple
>> of seconds before it freezes, or it's pretty much dead already. The
>> mouse cursor still moves for a bit, but the also freezes (so it this a
>> GPU problem??)
>>
>> Now what I currently see on the screen is a clock widget stuck at 18:04
>> but conky itself has last updated at 18:00:18 ...
>>
>> This time I had some SSH sessions from another machine to see some more
>> useful things. There was nothing in various logs under /var/log (I also
>> can't run dmesg anymore ...)
>> I had top(1) running in a loop, this is the last output:
>>
>> last pid: 25633; load averages: 0.27, 0.39, 0.36 up 1+23:03:28
>> 18:00:12
>> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting
>>
>> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Free
>> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 1012M Other
>> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse
>>
>>
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
>> COMMAND
>> 11 root 8 155 ki31 0K 128K CPU0 0 364.6H 772.95%
>> idle
>> 3122 uqs 15 28 0 7113M 5861M uwait 0
>> 94:44 13.96% chrome
>> 2887 uqs 28 22 0 1394M 237M
>> select 2 172:53 6.98% chrome
>> 2890 uqs 11 21 0
>> 1034M 178M select 5 231:21 1.95% chrome
>> 1062 root 9
>> 21 0 440M 47220K select 0 67:09 0.98% Xorg
>> 3002 uqs
>> 15 25 5 1159M 172M uwait 2 19:09 0.00% chrome
>> 3139 uqs 17 25 5 1163M 156M uwait 2 16:15 0.00%
>> chrome
>> 3001 uqs 18 25 5 1639M 575M uwait 0 16:05 0.00%
>> chrome
>> 12 root 24 -64 - 0K 384K WAIT -1 10:53 0.00%
>> intr
>> 3129 uqs 12 20 0 2820M 1746M uwait 6 8:36 0.00%
>> chrome
>> 2822 uqs 9 20 0 217M 81300K select 0 5:10 0.00%
>> conky
>> 3174 root 1 20 0 21532K 3188K select 0 4:20 0.00%
>> systat
>> 3130 uqs 16 20 0 1058M 131M uwait 4 3:03 0.00%
>> chrome
>> 2998 uqs 16 20 0 1110M 123M uwait 2 2:53 0.00%
>> chrome
>> 3165 uqs 10 20 0 1209M 215M uwait 6 2:52 0.00%
>> chrome
>> 3142 uqs 11 25 5 1344M 195M uwait 2 2:46 0.00%
>> chrome
>> 2876 uqs 19 20 0 580M 37164K select 3 2:42 0.00%
>> clementine-player
>> 20 root 2 -16 - 0K 32K psleep 6 2:25 0.00%
>> pagedaemon
>>
>> I also had systat -vm running and it continued to update its screen ...
>> for a short while, this is the last update before SSH died:
>>
>>
>> Mem usage: 0k%Phy 5%Kmem
>> Mem: KB REAL VIRTUAL VN PAGER SWAP
>> PAGER
>> Tot Share Tot Share Free in out in
>> out
>> Act 11051k 67868 71051992 255448 61840 count
>> All 11051k 67924 71058776 262100 pages
>> Proc:
>> Interrupts
>> r p d s w Csw Trp Sys Int Sof Flt ioflt 224
>> total
>> 25 730 11 724 109 404 101 13 cow 2
>> ehci0 16
>> zfod 3
>> ehci1 23
>> 0.0%Sys 0.1%Intr 0.0%User 0.0%Nice 99.9%Idle ozfod 16
>> cpu0:timer
>> | | | | | | | | | | %ozfod
>> xhci0 264
>> daefr 3 em0
>> 265
>> 50 dtbuf prcfr 94
>> hdac1 266
>> Namei Name-cache Dir-cache 349167 desvn totfr
>> ahci0 270
>> Calls hits % hits % 349155 numvn react 5
>> cpu1:timer
>> 121 121 100 253501 frevn pdwak 1
>> cpu2:timer
>> pdpgs 29
>> cpu7:timer
>> Disks md0 ada0 ada1 pass0 pass1 pass2 intrn 12
>> cpu3:timer
>> KB/t 0.00 0.00 0.00 0.00 0.00 0.00 5318892 wire 41
>> cpu6:timer
>> tps 0 0 0 0 0 0 9261404 act 12
>> cpu5:timer
>> MB/s 0.00 0.00 0.00 0.00 0.00 0.00 1598184 inact 6
>> cpu4:timer
>> %busy 0 0 0 0 0 0 cache
>> vgapci0
>> 61840 free
>> 712304 buf
>>
>>
>> Why do I have a Chrome tab using about 6G? What other sort of debugging
>> output can be helpful to get to the bottom of this? The machine still
>> responds to pings just fine, TCP connections get set up but the SSH
>> handshake never completes.
>>
>> This always happens between 30-50h and is super annoying and has been
>> going on for >1year. Help?
>>
>> Note, I cut the power to the monitor overnight to save electricity, can
>> this mess up something in the Radeon card or X server? What combinations
>> would be most useful to try next?
>>
>>
> Hi,
>
> Sounds like a memory leak. Can you track the memory use over time?
>
> Did you look at the output from:
>
> vmstat -m ?
>
> --HPS
I have noted significant memory leakage in chromium for some time. If I
leave it running overnight, my system is essentially frozen. If I terminate
the chromium process, it slowly comes back to life. I always keep a gkrellm
session on-screen where the memory and swap utilization is continuously
displayed and that clearly shows resources declining.
Try closing your chromium at night and see if that fixes the problem.
If you have never tried gkrellm (sysutils/gkrellm2), it is a the best
system monitor I have found. though pulls in a lot of dependencies. It also
can run as a server with remote systems displaying the data. Handy to
monitor servers.
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkoberman at gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
More information about the freebsd-current
mailing list