FreeBSD 11.x grinds to a halt after about 48h of uptime

Sat Oct 15 19:17:21 UTC 2016

2016-10-15 18:36 GMT+02:00 Kevin Oberman <rkoberman at gmail.com>:
>
> On Sat, Oct 15, 2016 at 9:26 AM, Hans Petter Selasky <hps at selasky.org>
> wrote:
>
> > On 10/15/16 18:18, Ulrich Spörlein wrote:
> >
> >> Hey all, while 11.x is -STABLE now, this happens to my machine ever
> >> since I upgraded it to 11-CURRENT years ago. I have no idea when this
> >> started, actually, but what always happens is this:
> >>
> >> - System and X11 is up and running, I keep it running over night as I'm
> >> too lazy to reboot and restart everthing.
> >> - There's a bunch of xterms, Chrome, Clementine-Player and some other
> >> programs running
> >> - Coming back to the machine the next day (or the day after) it will
> >> exit the screensaver just fine and then either I can use it for a couple
> >> of seconds before it freezes, or it's pretty much dead already. The
> >> mouse cursor still moves for a bit, but the also freezes (so it this a
> >> GPU problem??)
> >>
> >> Now what I currently see on the screen is a clock widget stuck at 18:04
> >> but conky itself has last updated at 18:00:18 ...
> >>
> >> This time I had some SSH sessions from another machine to see some more
> >> useful things. There was nothing in various logs under /var/log (I also
> >> can't run dmesg anymore ...)
> >> I had top(1) running in a loop, this is the last output:
> >>
> >> last pid: 25633;  load averages:  0.27,  0.39,  0.36  up 1+23:03:28
> >> 18:00:12
> >> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting
> >>
> >> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Free
> >> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 1012M Other
> >> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse
> >>
> >>
> >>   PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU
> >> COMMAND
> >>    11 root            8 155 ki31     0K   128K CPU0    0 364.6H 772.95%
> >> idle
> >>              3122 uqs            15  28    0  7113M  5861M uwait   0
> >> 94:44  13.96% chrome
> >>                            2887 uqs            28  22    0  1394M   237M
> >> select  2 172:53   6.98% chrome
> >>                                        2890 uqs            11  21    0
> >> 1034M   178M select  5 231:21   1.95% chrome
> >>                                                    1062 root            9
> >> 21    0   440M 47220K select  0  67:09   0.98% Xorg
> >>                                                              3002 uqs
> >>       15  25    5  1159M   172M uwait   2  19:09   0.00% chrome
> >>  3139 uqs            17  25    5  1163M   156M uwait   2  16:15   0.00%
> >> chrome
> >>  3001 uqs            18  25    5  1639M   575M uwait   0  16:05   0.00%
> >> chrome
> >>    12 root           24 -64    -     0K   384K WAIT   -1  10:53   0.00%
> >> intr
> >>  3129 uqs            12  20    0  2820M  1746M uwait   6   8:36   0.00%
> >> chrome
> >>  2822 uqs             9  20    0   217M 81300K select  0   5:10   0.00%
> >> conky
> >>  3174 root            1  20    0 21532K  3188K select  0   4:20   0.00%
> >> systat
> >>  3130 uqs            16  20    0  1058M   131M uwait   4   3:03   0.00%
> >> chrome
> >>  2998 uqs            16  20    0  1110M   123M uwait   2   2:53   0.00%
> >> chrome
> >>  3165 uqs            10  20    0  1209M   215M uwait   6   2:52   0.00%
> >> chrome
> >>  3142 uqs            11  25    5  1344M   195M uwait   2   2:46   0.00%
> >> chrome
> >>  2876 uqs            19  20    0   580M 37164K select  3   2:42   0.00%
> >> clementine-player
> >>    20 root            2 -16    -     0K    32K psleep  6   2:25   0.00%
> >> pagedaemon
> >>
> >> I also had systat -vm running and it continued to update its screen ...
> >> for a short while, this is the last update before SSH died:
> >>
> >>
> >>    Mem usage:  0k%Phy  5%Kmem
> >> Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP
> >> PAGER
> >>         Tot   Share      Tot    Share    Free           in   out     in
> >>  out
> >> Act  11051k   67868 71051992   255448   61840  count
> >> All  11051k   67924 71058776   262100          pages
> >> Proc:
> >> Interrupts
> >>   r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt   224
> >> total
> >>      25     730  11   724  109  404  101   13             cow       2
> >> ehci0 16
> >>                                                           zfod      3
> >> ehci1 23
> >>  0.0%Sys   0.1%Intr  0.0%User  0.0%Nice 99.9%Idle         ozfod    16
> >> cpu0:timer
> >> |    |    |    |    |    |    |    |    |    |           %ozfod
> >>  xhci0 264
> >>                                                           daefr     3 em0
> >> 265
> >>                                         50 dtbuf          prcfr    94
> >> hdac1 266
> >> Namei     Name-cache   Dir-cache    349167 desvn          totfr
> >>  ahci0 270
> >>    Calls    hits   %    hits   %    349155 numvn          react     5
> >> cpu1:timer
> >>      121     121 100                253501 frevn          pdwak     1
> >> cpu2:timer
> >>                                                           pdpgs    29
> >> cpu7:timer
> >> Disks   md0  ada0  ada1 pass0 pass1 pass2                 intrn    12
> >> cpu3:timer
> >> KB/t   0.00  0.00  0.00  0.00  0.00  0.00         5318892 wire     41
> >> cpu6:timer
> >> tps       0     0     0     0     0     0         9261404 act      12
> >> cpu5:timer
> >> MB/s   0.00  0.00  0.00  0.00  0.00  0.00         1598184 inact     6
> >> cpu4:timer
> >> %busy     0     0     0     0     0     0                 cache
> >>  vgapci0
> >>                                                     61840 free
> >>                                                    712304 buf
> >>
> >>
> >> Why do I have a Chrome tab using about 6G? What other sort of debugging
> >> output can be helpful to get to the bottom of this? The machine still
> >> responds to pings just fine, TCP connections get set up but the SSH
> >> handshake never completes.
> >>
> >> This always happens between 30-50h and is super annoying and has been
> >> going on for >1year. Help?
> >>
> >> Note, I cut the power to the monitor overnight to save electricity, can
> >> this mess up something in the Radeon card or X server? What combinations
> >> would be most useful to try next?
> >>
> >>
> > Hi,
> >
> > Sounds like a memory leak. Can you track the memory use over time?

Memory leak or not, it shouldn't lock up the whole system just the
minute/second that I start using it again.

>
> >
> > Did you look at the output from:
> >
> > vmstat -m ?

No, but I'll capture it for the next cycle :)

>
> >
> > --HPS
>
>
> I have noted significant  memory leakage in chromium for some time. If I
> leave it running overnight, my system is essentially frozen. If I terminate
> the chromium process, it slowly comes back to life. I always keep a gkrellm
> session on-screen where the memory and swap utilization is continuously
> displayed and that clearly shows resources declining.
>
> Try closing your chromium at night and see if that fixes the problem.
>
> If you have never tried gkrellm (sysutils/gkrellm2), it is a the best
> system monitor I have found. though pulls in a lot of dependencies. It also
> can run as a server with remote systems displaying the data. Handy to
> monitor servers.

I'll try w/o Chrome, it's easy to stop and restart anyway.

I'll be back in a week or so :)
Uli