FreeBSD 11.x grinds to a halt after about 48h of uptime
Ulrich Spörlein
uqs at FreeBSD.org
Mon Oct 24 17:43:31 UTC 2016
On Sat, 2016-10-15 at 09:36:27 -0700, Kevin Oberman wrote:
> On Sat, Oct 15, 2016 at 9:26 AM, Hans Petter Selasky <hps at selasky.org>
> wrote:
>
> > On 10/15/16 18:18, Ulrich Spörlein wrote:
> >
> >> Hey all, while 11.x is -STABLE now, this happens to my machine ever
> >> since I upgraded it to 11-CURRENT years ago. I have no idea when this
> >> started, actually, but what always happens is this:
> >>
> >> - System and X11 is up and running, I keep it running over night as I'm
> >> too lazy to reboot and restart everthing.
> >> - There's a bunch of xterms, Chrome, Clementine-Player and some other
> >> programs running
> >> - Coming back to the machine the next day (or the day after) it will
> >> exit the screensaver just fine and then either I can use it for a couple
> >> of seconds before it freezes, or it's pretty much dead already. The
> >> mouse cursor still moves for a bit, but the also freezes (so it this a
> >> GPU problem??)
> >>
> >> Now what I currently see on the screen is a clock widget stuck at 18:04
> >> but conky itself has last updated at 18:00:18 ...
> >>
> >> This time I had some SSH sessions from another machine to see some more
> >> useful things. There was nothing in various logs under /var/log (I also
> >> can't run dmesg anymore ...)
> >> I had top(1) running in a loop, this is the last output:
> >>
> >> last pid: 25633; load averages: 0.27, 0.39, 0.36 up 1+23:03:28
> >> 18:00:12
> >> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting
> >>
> >> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Free
> >> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 1012M Other
> >> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse
> >>
> >>
> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
> >> COMMAND
> >> 11 root 8 155 ki31 0K 128K CPU0 0 364.6H 772.95%
> >> idle
> >> 3122 uqs 15 28 0 7113M 5861M uwait 0
> >> 94:44 13.96% chrome
> >> 2887 uqs 28 22 0 1394M 237M
> >> select 2 172:53 6.98% chrome
> >> 2890 uqs 11 21 0
> >> 1034M 178M select 5 231:21 1.95% chrome
> >> 1062 root 9
> >> 21 0 440M 47220K select 0 67:09 0.98% Xorg
> >> 3002 uqs
> >> 15 25 5 1159M 172M uwait 2 19:09 0.00% chrome
> >> 3139 uqs 17 25 5 1163M 156M uwait 2 16:15 0.00%
> >> chrome
> >> 3001 uqs 18 25 5 1639M 575M uwait 0 16:05 0.00%
> >> chrome
> >> 12 root 24 -64 - 0K 384K WAIT -1 10:53 0.00%
> >> intr
> >> 3129 uqs 12 20 0 2820M 1746M uwait 6 8:36 0.00%
> >> chrome
> >> 2822 uqs 9 20 0 217M 81300K select 0 5:10 0.00%
> >> conky
> >> 3174 root 1 20 0 21532K 3188K select 0 4:20 0.00%
> >> systat
> >> 3130 uqs 16 20 0 1058M 131M uwait 4 3:03 0.00%
> >> chrome
> >> 2998 uqs 16 20 0 1110M 123M uwait 2 2:53 0.00%
> >> chrome
> >> 3165 uqs 10 20 0 1209M 215M uwait 6 2:52 0.00%
> >> chrome
> >> 3142 uqs 11 25 5 1344M 195M uwait 2 2:46 0.00%
> >> chrome
> >> 2876 uqs 19 20 0 580M 37164K select 3 2:42 0.00%
> >> clementine-player
> >> 20 root 2 -16 - 0K 32K psleep 6 2:25 0.00%
> >> pagedaemon
> >>
> >> I also had systat -vm running and it continued to update its screen ...
> >> for a short while, this is the last update before SSH died:
> >>
> >>
> >> Mem usage: 0k%Phy 5%Kmem
> >> Mem: KB REAL VIRTUAL VN PAGER SWAP
> >> PAGER
> >> Tot Share Tot Share Free in out in
> >> out
> >> Act 11051k 67868 71051992 255448 61840 count
> >> All 11051k 67924 71058776 262100 pages
> >> Proc:
> >> Interrupts
> >> r p d s w Csw Trp Sys Int Sof Flt ioflt 224
> >> total
> >> 25 730 11 724 109 404 101 13 cow 2
> >> ehci0 16
> >> zfod 3
> >> ehci1 23
> >> 0.0%Sys 0.1%Intr 0.0%User 0.0%Nice 99.9%Idle ozfod 16
> >> cpu0:timer
> >> | | | | | | | | | | %ozfod
> >> xhci0 264
> >> daefr 3 em0
> >> 265
> >> 50 dtbuf prcfr 94
> >> hdac1 266
> >> Namei Name-cache Dir-cache 349167 desvn totfr
> >> ahci0 270
> >> Calls hits % hits % 349155 numvn react 5
> >> cpu1:timer
> >> 121 121 100 253501 frevn pdwak 1
> >> cpu2:timer
> >> pdpgs 29
> >> cpu7:timer
> >> Disks md0 ada0 ada1 pass0 pass1 pass2 intrn 12
> >> cpu3:timer
> >> KB/t 0.00 0.00 0.00 0.00 0.00 0.00 5318892 wire 41
> >> cpu6:timer
> >> tps 0 0 0 0 0 0 9261404 act 12
> >> cpu5:timer
> >> MB/s 0.00 0.00 0.00 0.00 0.00 0.00 1598184 inact 6
> >> cpu4:timer
> >> %busy 0 0 0 0 0 0 cache
> >> vgapci0
> >> 61840 free
> >> 712304 buf
> >>
> >>
> >> Why do I have a Chrome tab using about 6G? What other sort of debugging
> >> output can be helpful to get to the bottom of this? The machine still
> >> responds to pings just fine, TCP connections get set up but the SSH
> >> handshake never completes.
> >>
> >> This always happens between 30-50h and is super annoying and has been
> >> going on for >1year. Help?
> >>
> >> Note, I cut the power to the monitor overnight to save electricity, can
> >> this mess up something in the Radeon card or X server? What combinations
> >> would be most useful to try next?
> >>
> >>
> > Hi,
> >
> > Sounds like a memory leak. Can you track the memory use over time?
> >
> > Did you look at the output from:
> >
> > vmstat -m ?
> >
> > --HPS
>
>
> I have noted significant memory leakage in chromium for some time. If I
> leave it running overnight, my system is essentially frozen. If I terminate
> the chromium process, it slowly comes back to life. I always keep a gkrellm
> session on-screen where the memory and swap utilization is continuously
> displayed and that clearly shows resources declining.
That is not what is happening to my system though, it actually
deadlocks. There's no way to recover from it, it seems.
So I killed Chromium overnight each day, and I'm at this:
% top -Sbores
last pid: 44526; load averages: 0.10, 0.11, 0.56 up 7+09:53:30 19:33:25
156 processes: 2 running, 153 sleeping, 1 waiting
Mem: 315M Active, 550M Inact, 5671M Wired, 515M Buf, 9324M Free
ARC: 1852M Total, 541M MFU, 196M MRU, 16K Anon, 93M Header, 1022M Other
Swap: 4096M Total, 2186M Used, 1910M Free, 53% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
2755 uqs 10 20 0 1697M 311M select 1 47:23 0.00% conky
2736 uqs 32 20 0 699M 116M select 7 94:29 0.00% clementine-player
3000 uqs 12 20 0 1126M 69380K select 5 9:48 0.00% digikam
960 root 9 20 0 448M 59076K select 0 250:22 0.00% Xorg
72608 uqs 8 20 0 939M 55432K uwait 5 0:01 0.00% chrome
72599 uqs 9 52 0 929M 55116K uwait 0 0:00 0.00% chrome
2567 root 1 20 0 89948K 42964K select 1 1:51 0.00% bsnmpd
70476 uqs 1 20 0 93656K 25712K select 2 0:05 0.00% xterm
2730 uqs 5 20 0 208M 14988K select 1 0:22 0.00% clock-applet
880 root 1 20 0 22628K 12500K select 3 0:20 0.00% ntpd
2726 uqs 4 20 0 206M 12456K select 6 0:09 0.00% mateweather-applet
44352 uqs 1 20 0 75224K 12348K select 4 0:00 0.00% xterm
43049 uqs 1 20 0 75224K 11792K select 5 0:00 0.00% xterm
3074 uqs 2 20 0 308M 9692K select 1 0:02 0.00% kdeinit4
2671 uqs 1 20 0 144M 9488K select 1 0:13 0.00% openbox
3072 uqs 1 20 0 210M 8284K select 3 0:00 0.00% kdeinit4
2724 uqs 4 20 0 154M 8256K select 2 0:19 0.00% wnck-applet
2701 uqs 5 20 0 177M 8144K select 2 0:01 0.00% mate-panel
7d running, pretty good. But look closer, the system is doing pretty
much nothing but did swap out 2G. What?
> Try closing your chromium at night and see if that fixes the problem.
It's better, but I'm not sure it's a real fix. I've now turned off
"hardware acceleration" in Chromium, though chrome://gpu didn't real
inspire confidence that it was actually using any h/w accel at all.
> If you have never tried gkrellm (sysutils/gkrellm2), it is a the best
> system monitor I have found. though pulls in a lot of dependencies. It also
> can run as a server with remote systems displaying the data. Handy to
> monitor servers.
I had a cacti-setup that would also monitor my workstation (through a
OpenVPN tunnel), but that has bit-rotted and Apache only gives me 500s
on that cacti URL and nothing in the logs, oh well ...)
Hooking up a serial console and testing whether DDB works is probably
the next best step to take ...
Cheers,
Uli
More information about the freebsd-current
mailing list