FreeBSD 11.x grinds to a halt after about 48h of uptime
Ulrich Spörlein
uspoerlein at gmail.com
Sat Oct 15 19:17:21 UTC 2016
2016-10-15 18:36 GMT+02:00 Kevin Oberman <rkoberman at gmail.com>:
>
> On Sat, Oct 15, 2016 at 9:26 AM, Hans Petter Selasky <hps at selasky.org>
> wrote:
>
> > On 10/15/16 18:18, Ulrich Spörlein wrote:
> >
> >> Hey all, while 11.x is -STABLE now, this happens to my machine ever
> >> since I upgraded it to 11-CURRENT years ago. I have no idea when this
> >> started, actually, but what always happens is this:
> >>
> >> - System and X11 is up and running, I keep it running over night as I'm
> >> too lazy to reboot and restart everthing.
> >> - There's a bunch of xterms, Chrome, Clementine-Player and some other
> >> programs running
> >> - Coming back to the machine the next day (or the day after) it will
> >> exit the screensaver just fine and then either I can use it for a couple
> >> of seconds before it freezes, or it's pretty much dead already. The
> >> mouse cursor still moves for a bit, but the also freezes (so it this a
> >> GPU problem??)
> >>
> >> Now what I currently see on the screen is a clock widget stuck at 18:04
> >> but conky itself has last updated at 18:00:18 ...
> >>
> >> This time I had some SSH sessions from another machine to see some more
> >> useful things. There was nothing in various logs under /var/log (I also
> >> can't run dmesg anymore ...)
> >> I had top(1) running in a loop, this is the last output:
> >>
> >> last pid: 25633; load averages: 0.27, 0.39, 0.36 up 1+23:03:28
> >> 18:00:12
> >> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting
> >>
> >> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Free
> >> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 1012M Other
> >> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse
> >>
> >>
> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
> >> COMMAND
> >> 11 root 8 155 ki31 0K 128K CPU0 0 364.6H 772.95%
> >> idle
> >> 3122 uqs 15 28 0 7113M 5861M uwait 0
> >> 94:44 13.96% chrome
> >> 2887 uqs 28 22 0 1394M 237M
> >> select 2 172:53 6.98% chrome
> >> 2890 uqs 11 21 0
> >> 1034M 178M select 5 231:21 1.95% chrome
> >> 1062 root 9
> >> 21 0 440M 47220K select 0 67:09 0.98% Xorg
> >> 3002 uqs
> >> 15 25 5 1159M 172M uwait 2 19:09 0.00% chrome
> >> 3139 uqs 17 25 5 1163M 156M uwait 2 16:15 0.00%
> >> chrome
> >> 3001 uqs 18 25 5 1639M 575M uwait 0 16:05 0.00%
> >> chrome
> >> 12 root 24 -64 - 0K 384K WAIT -1 10:53 0.00%
> >> intr
> >> 3129 uqs 12 20 0 2820M 1746M uwait 6 8:36 0.00%
> >> chrome
> >> 2822 uqs 9 20 0 217M 81300K select 0 5:10 0.00%
> >> conky
> >> 3174 root 1 20 0 21532K 3188K select 0 4:20 0.00%
> >> systat
> >> 3130 uqs 16 20 0 1058M 131M uwait 4 3:03 0.00%
> >> chrome
> >> 2998 uqs 16 20 0 1110M 123M uwait 2 2:53 0.00%
> >> chrome
> >> 3165 uqs 10 20 0 1209M 215M uwait 6 2:52 0.00%
> >> chrome
> >> 3142 uqs 11 25 5 1344M 195M uwait 2 2:46 0.00%
> >> chrome
> >> 2876 uqs 19 20 0 580M 37164K select 3 2:42 0.00%
> >> clementine-player
> >> 20 root 2 -16 - 0K 32K psleep 6 2:25 0.00%
> >> pagedaemon
> >>
> >> I also had systat -vm running and it continued to update its screen ...
> >> for a short while, this is the last update before SSH died:
> >>
> >>
> >> Mem usage: 0k%Phy 5%Kmem
> >> Mem: KB REAL VIRTUAL VN PAGER SWAP
> >> PAGER
> >> Tot Share Tot Share Free in out in
> >> out
> >> Act 11051k 67868 71051992 255448 61840 count
> >> All 11051k 67924 71058776 262100 pages
> >> Proc:
> >> Interrupts
> >> r p d s w Csw Trp Sys Int Sof Flt ioflt 224
> >> total
> >> 25 730 11 724 109 404 101 13 cow 2
> >> ehci0 16
> >> zfod 3
> >> ehci1 23
> >> 0.0%Sys 0.1%Intr 0.0%User 0.0%Nice 99.9%Idle ozfod 16
> >> cpu0:timer
> >> | | | | | | | | | | %ozfod
> >> xhci0 264
> >> daefr 3 em0
> >> 265
> >> 50 dtbuf prcfr 94
> >> hdac1 266
> >> Namei Name-cache Dir-cache 349167 desvn totfr
> >> ahci0 270
> >> Calls hits % hits % 349155 numvn react 5
> >> cpu1:timer
> >> 121 121 100 253501 frevn pdwak 1
> >> cpu2:timer
> >> pdpgs 29
> >> cpu7:timer
> >> Disks md0 ada0 ada1 pass0 pass1 pass2 intrn 12
> >> cpu3:timer
> >> KB/t 0.00 0.00 0.00 0.00 0.00 0.00 5318892 wire 41
> >> cpu6:timer
> >> tps 0 0 0 0 0 0 9261404 act 12
> >> cpu5:timer
> >> MB/s 0.00 0.00 0.00 0.00 0.00 0.00 1598184 inact 6
> >> cpu4:timer
> >> %busy 0 0 0 0 0 0 cache
> >> vgapci0
> >> 61840 free
> >> 712304 buf
> >>
> >>
> >> Why do I have a Chrome tab using about 6G? What other sort of debugging
> >> output can be helpful to get to the bottom of this? The machine still
> >> responds to pings just fine, TCP connections get set up but the SSH
> >> handshake never completes.
> >>
> >> This always happens between 30-50h and is super annoying and has been
> >> going on for >1year. Help?
> >>
> >> Note, I cut the power to the monitor overnight to save electricity, can
> >> this mess up something in the Radeon card or X server? What combinations
> >> would be most useful to try next?
> >>
> >>
> > Hi,
> >
> > Sounds like a memory leak. Can you track the memory use over time?
Memory leak or not, it shouldn't lock up the whole system just the
minute/second that I start using it again.
>
> >
> > Did you look at the output from:
> >
> > vmstat -m ?
No, but I'll capture it for the next cycle :)
>
> >
> > --HPS
>
>
> I have noted significant memory leakage in chromium for some time. If I
> leave it running overnight, my system is essentially frozen. If I terminate
> the chromium process, it slowly comes back to life. I always keep a gkrellm
> session on-screen where the memory and swap utilization is continuously
> displayed and that clearly shows resources declining.
>
> Try closing your chromium at night and see if that fixes the problem.
>
> If you have never tried gkrellm (sysutils/gkrellm2), it is a the best
> system monitor I have found. though pulls in a lot of dependencies. It also
> can run as a server with remote systems displaying the data. Handy to
> monitor servers.
I'll try w/o Chrome, it's easy to stop and restart anyway.
I'll be back in a week or so :)
Uli
More information about the freebsd-current
mailing list