High load and MySQL slow without apparent reason

Allan Jude allanjude at freebsd.org
Wed Oct 17 03:31:56 UTC 2018


On 2018-10-16 22:25, Darek Margas wrote:
> Hi Everyone,
> 
> I'm trying to refresh my old FreeBSD experience by moving MySQL platform
> from Linux onto FreebSD+ZFS.
> 
> Before I ask for your help I would like to give you some context.
> 
> The machine is Dell server 2x20 cores, Intel IXL NIC, 1TB of RAM and lots
> of SAS SSD drives.
> The kernel is slightly modified by removing some unused stuff, replacing
> ixl driver with latest from Intel website and enabling NUMA.
> The whole thing runs number of MySQL daemons packed in jails (bridged
> network) with settings optimized for ZFS ARC caching (O_DIRECT, small
> buffers, etc).
> 
> This is 11.2-RELEASE.
> 
> When I tested it first time I found troubles with back pressure on ARC
> whilst short in memory leading machine do death. I also found that
> disabling ARC compression solved silent death but decided to make some
> tunes to keep more memory free for sudden need.
> 
> Ran some tests, used it for replication salves, etc.
> 
> Here is the thing - how I crashed this machine without understanding what
> has happened.
> 
> First my tunes. I adjusted v_free_target and v_free_min aiming to 128G and
> 64G respectively. However, I overlooked fact that this is in pages not in
> 1k blocks. As result I set:
> 
> - 700G max ARC size
> - 512G v_free_target
> - 256G v_free_min

You likely want to tune 'vfs.zfs.arc_free_target' to a value very close
to v_free_target or atleast v_free_min to cause ZFS to give back memory
at that level of memory shortage as well.

> 
> Obviously this is a nonsense, however, the machine worked calm until ARC
> got half of memory. Then shit happened. As I made machine with no swap at
> all I have got number of zombies and problems with reclaiming console (say,
> open VI which works, then exit and VI stays on console while became zombie).
> That was "fixed" by disabling swapping via sysctl. I also noticed 25% of
> CPU taken by "system" with nothing popping in top except pagedaemon and zfs
> (on arc_reclaim).
> 
> I have added 40G of swap, rebooted machine but kept wrong settings.
> 
> It was again calm until ARC got half of memory. This is when I found what I
> did and fixed v_free stuff to be
> 
> - 128G v_free_target
> - 64G v_free_min
> 
> The machine started managing memory the right way, wiping inactive to
> laundry and laundering only when needed. I still observed 25% of
> unexplained load from "system" (floating 5-60%) but all seemed OK.
> 
> At this point I switched one replica to be master and put production
> queries on it.
> 
> Summarizing the above - the machine had issues and has not been rebooted
> but seemed OK with memory management while having unexplained system load.
> 
> Once I switched my SQLs from Linux master to FreeBSD I noticed slow
> performance. There is stored proc called every 15 minutes. On old machine
> and all others it takes around 30-40s to complete and previous master had
> spike in ROW executions to 650kps (one minute sample) while new one got it
> up to 350kps and run for nearly 3 minutes.
> 
> I started looking deeper and found:
> - Made all MySQL settings the same (when possible as some follow platform)
> with no improvement
> - MySQL reload did not help
> - Stopping all replicas running around on the same machine (5 of them) to
> release resources made it worse (over 5 minutes to complete call). Starting
> replicas made it better again by one minute.
> 
> BTW - jail was limited to one NUMA zone and half cores. Not all replicas
> had the same NUMA and CPU group.
> 
> I copied ZFS content to test machine which is exactly the same and kicked
> the same MySQL in same jail and with same settings.
> - Test instance ran correctly within similar completion time to old Linux
> master
> - ARC on test machine was loaded up to 700G so I thought it would be good
> enough to compare but machine still had lots of memory
> 
> To make it closer I compiled "memory allocator" which simply allocates and
> fills memory until killed or system dies.
> 
> Run it on test machine first:
> - No effect until v_mem_target passed
> - Once passed pagedaemon kicked in, memory got wiped and shifted, swap got
> full (paging only anyway)
> - Load around 20% appeared from system, similar to broken production machine
> - Got down to 50G passing v_free_min
> - KIlled allocator
> - After 1-2s freezing all got back to normal, load from system was gone.
> - Swap was in use for some time after but finally got clean (that was only
> 4G swap on test machine)
> - After some time machine is still calm and MySQL fast
> 
> Repeated the same on production machine:
> - All as above, except:
> - after killing allocator machine got frozen for, say, 10-15s
> - memory was released but load did not change - neither got much higher
> while allocating memory nor lower after.
> - Machine remained slow
> 
> Finally I rebooted whole machine and now it is fast while building ARC. I
> believe it won't have the same issue soon as v_free stuff is set correctly,
> however, I need to understand why this MySQL process suffered and whether
> it was possible to recover it without reboot. I can imagine it was
> something running in a loop or contention on something otherwise unused or
> simply another clash in settings triggering something in unusual way but
> have no idea where to look to investigate it. Well, it's possible that
> there is a bug too.
> 
> Before reboot I collected various vmstats, tops, ran ktrace on MySQL and
> sysctl to dump settings. Not posting as don't know what would be useful.
> 
> Could you please point me in right direction?
> 
> Cheers,
> Darek
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
> 


-- 
Allan Jude

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 834 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20181016/42d05643/attachment.sig>


More information about the freebsd-hackers mailing list