ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4
Cy Schubert
Cy.Schubert at cschubert.com
Sat Sep 8 18:02:37 UTC 2018
In message <e5abddc5-17f0-bf5f-753b-1edbc9356385 at alvermark.net>, Jakob
Alvermar
k writes:
>
> Total MFU MRU Anon Hdr L2Hdr Other
> ZFS ARC 667M 186M 168M 13M 3825K 0K 295M
>
> rate hits misses total hits total
> misses
> arcstats : 99% 65636 605 167338494 9317074
> arcstats.demand_data : 57% 431 321 13414675 2117714
> arcstats.demand_metadata : 99% 65175 193 152969480 5344919
> arcstats.prefetch_data : 0% 0 30 3292 401344
> arcstats.prefetch_metadata: 32% 30 61 951047 1453097
> zfetchstats : 9% 119 1077 612582 55041789
> arcstats.l2 : 0% 0 0 0 0
> vdev_cache_stats : 0% 0 0 0 0
>
>
>
>
> This is while a 'make -j8 buildworld' (it has 8 cores) is going.
Overall you have a 94% hit ratio.
slippy$ bc
scale=4
167338494/(167338494+9317074)
.9472
slippy$
It could be better.
Why is your ZFS ARC so small? Before I answer this I will discuss my
experience first.
My machines are seeing something similar to this:
Total MFU MRU Anon Hdr L2Hdr
Other
ZFS ARC 4274M 2329M 1394M 17M 82M 0K
445M
rate hits misses total hits total
misses
arcstats : 97% 614 13 866509066
51853442
arcstats.demand_data :100% 96 0 107658733
3101522
arcstats.demand_metadata : 97% 516 13 755890353
48080146
arcstats.prefetch_data : 0% 0 0 327613
225688
arcstats.prefetch_metadata:100% 2 0 2632367
446086
zfetchstats : 6% 6 80 2362709
294731645
arcstats.l2 : 0% 0 0 0
0
vdev_cache_stats : 0% 0 0 0
0
This is what you should see. This is with -CURRENT built two days ago.
cwsys$ uname -a
FreeBSD cwsys 12.0-ALPHA5 FreeBSD 12.0-ALPHA5 #51 r338520M: Thu Sep 6
17:44:35 PDT 2018 root at cwsys:/export/obj/opt/src/svn-current/amd64.a
md64/sys/BREAK amd64
cwsys$
Top reports:
CPU: 0.3% user, 89.9% nice, 9.5% system, 0.3% interrupt, 0.0% idle
Mem: 678M Active, 344M Inact, 175M Laundry, 6136M Wired, 168M Buf, 598M
Free
ARC: 4247M Total, 2309M MFU, 1386M MRU, 21M Anon, 86M Header, 446M Other
3079M Compressed, 5123M Uncompressed, 1.66:1 Ratio
Swap: 20G Total, 11M Used, 20G Free
This is healthy. It's running a poudriere build.
My laptop:
Total MFU MRU Anon Hdr L2Hdr
Other
ZFS ARC 3175M 1791M 872M 69M 165M 0K
277M
rate hits misses total hits total
misses
arcstats : 99% 3851 26 89082984
5101207
arcstats.demand_data : 99% 345 2 6197930
340186
arcstats.demand_metadata : 99% 3506 24 81391265
4367755
arcstats.prefetch_data : 0% 0 0 11507
30945
arcstats.prefetch_metadata: 0% 0 0 1482282
362321
zfetchstats : 2% 12 576 113185
38564546
arcstats.l2 : 0% 0 0 0
0
vdev_cache_stats : 0% 0 0 0
0
Similar results after working on a bunch of ports in four VMs last
night, testing various combinations of options while Heimdal in base is
private, hence the large ARC remaining this morning.
Currently on the laptop top reports:
CPU: 0.2% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.8% idle
Mem: 376M Active, 1214M Inact, 5907M Wired, 464M Buf, 259M Free
ARC: 3175M Total, 1863M MFU, 803M MRU, 69M Anon, 160M Header, 280M Other
2330M Compressed, 7881M Uncompressed, 3.38:1 Ratio
Swap: 22G Total, 22G Free
This is also healthy.
Now for questions:
Do you have any UFS filesystems? Top will report buf. What is that at?
Some background: My /, /usr, and /var are UFS (these are old
installations which when I install a new machine I dump | rsh
new-machine restore, change a couple of entries in rc.conf and fstab,
rsync ports (/usr/local, /var/db...) and boot (I'm terribly impatient).
Hence the legacy.
I have noticed that when writing a lot to UFS, increasing the size of
the UFS buffer cache, my ARC will reduce to 1 GB or even less. But this
is during a -j8 installworld to /, a test partition, an i386 partition
and a number of VMs on UFS on a zpool and other VMs using ZFS on the
same zpool. My ARC drops rapidly when the UFS filesystems are actively
being written to. UFS and ZFS on the same server will impact
performance unless one or the other is sparsely used.
To repeat, do you have any UFS on the system? Do you write to UFS? Is
it actively being written to at the time? How many MB is used by UFS
buffers?
How much RAM is installed on this machine?
What is the scan rate?
>
> SSH'ing to the machine while the buildworld is going it takes 40-60
> seconds to get to the shell!
Then your iostat or systat -v should show that you're hammering your
disks. Or, you may be using a lot of swap.
>
> Hitting ^T while waiting: load: 1.06 cmd: zsh 45334
> [arc_reclaim_waiters_cv] 56.11r 0.00u 0.10s 0% 5232k
Load might be low because processes are waiting for disk I/O. Processes
waiting on I/O are not in the run queue and therefore don't affect load
average. Disk I/O will kill performance worse than CPU load. Back in
the days when I was an MVS systems programmer (IBM mainframe), I did a
fair bit of tuning MVS at the time (Z/OS today). The rule of thumb then
was machine instructions took nanoseconds whereas disk I/O took
milliseconds and interrupting a process to gain control of the CPU the
scheduler took nanoseconds because that's how long instructions took.
You cannot interrupt I/O. You have to wait for the current I/O
operation to complete before inserting a new I/O into the queue and
with tagged queuing you have to wait for the disk to complete its work
before scheduling new work. Now you're waiting multiples of
milliseconds instead of a few nanoseconds. I/O kills performance.
Look at iostat or systat -v. I think your answer lies there and since
your ARC is small we need to find out why.
>
> I will test the patch below and report back.
Agreed, though IMO your workload and your environment need to be
understood first. What concerns me about the patch is what impact will
it have on other workloads. Not evicting data and only metadata could
impact buildworld -DNO_CLEAN for example. I do a -DNO_CLEAN
buildworlds, sometimes -DWORLDFAST. Adjusting vfs.zfs.arc_meta_limit to
the same value as vfs.zfs.arc_max improved my buildworld/installworld
performance. In addition disabling atime for the ZFS dataset containing
/usr/obj also improved buildworld/installworld performance by reducing
unnecessary (IMO) metadata writes. I think evicting metadata only might
cause a new set of problems for different workloads. (Maybe this should
be a sysctl?)
--
Cheers,
Cy Schubert <Cy.Schubert at cschubert.com>
FreeBSD UNIX: <cy at FreeBSD.org> Web: http://www.FreeBSD.org
The need of the many outweighs the greed of the few.
>
>
> Jakob
>
> On 9/7/18 7:27 PM, Cy Schubert wrote:
> > I'd be interested in seeing systat -z output.
> >
> > ---
> > Sent using a tiny phone keyboard.
> > Apologies for any typos and autocorrect.
> > Also, this old phone only supports top post. Apologies.
> >
> > Cy Schubert
> > <Cy.Schubert at cschubert.com> or <cy at freebsd.org>
> > The need of the many outweighs the greed of the few.
> > ---
> > ------------------------------------------------------------------------
> > From: Mark Johnston
> > Sent: 07/09/2018 09:09
> > To: Jakob Alvermark
> > Cc: Subbsd; allanjude at freebsd.org; freebsd-current Current
> > Subject: Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4
> >
> > On Fri, Sep 07, 2018 at 03:40:52PM +0200, Jakob Alvermark wrote:
> > > On 9/6/18 2:28 AM, Mark Johnston wrote:
> > > > On Wed, Sep 05, 2018 at 11:15:03PM +0300, Subbsd wrote:
> > > >> On Wed, Sep 5, 2018 at 5:58 PM Allan Jude <allanjude at freebsd.org>
> > wrote:
> > > >>> On 2018-09-05 10:04, Subbsd wrote:
> > > >>>> Hi,
> > > >>>>
> > > >>>> I'm seeing a huge loss in performance ZFS after upgrading
> > FreeBSD 12
> > > >>>> to latest revision (r338466 the moment) and related to ARC.
> > > >>>>
> > > >>>> I can not say which revision was before except that the newver.sh
> > > >>>> pointed to ALPHA3.
> > > >>>>
> > > >>>> Problems are observed if you try to limit ARC. In my case:
> > > >>>>
> > > >>>> vfs.zfs.arc_max="128M"
> > > >>>>
> > > >>>> I know that this is very small. However, for two years with
> > this there
> > > >>>> were no problems.
> > > >>>>
> > > >>>> When i send SIGINFO to process which is currently working with
> > ZFS, i
> > > >>>> see "arc_reclaim_waiters_cv":
> > > >>>>
> > > >>>> e.g when i type:
> > > >>>>
> > > >>>> /bin/csh
> > > >>>>
> > > >>>> I have time (~5 seconds) to press several times 'ctrl+t' before
> > csh is executed:
> > > >>>>
> > > >>>> load: 0.70 cmd: csh 5935 [arc_reclaim_waiters_cv] 1.41r 0.00u
> > 0.00s 0% 3512k
> > > >>>> load: 0.70 cmd: csh 5935 [zio->io_cv] 1.69r 0.00u 0.00s 0% 3512k
> > > >>>> load: 0.70 cmd: csh 5935 [arc_reclaim_waiters_cv] 1.98r 0.00u
> > 0.01s 0% 3512k
> > > >>>> load: 0.73 cmd: csh 5935 [arc_reclaim_waiters_cv] 2.19r 0.00u
> > 0.01s 0% 4156k
> > > >>>>
> > > >>>> same story with find or any other commans:
> > > >>>>
> > > >>>> load: 0.34 cmd: find 5993 [zio->io_cv] 0.99r 0.00u 0.00s 0% 2676k
> > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.13r 0.00u
> > 0.00s 0% 2676k
> > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.25r 0.00u
> > 0.00s 0% 2680k
> > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.38r 0.00u
> > 0.00s 0% 2684k
> > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.51r 0.00u
> > 0.00s 0% 2704k
> > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.64r 0.00u
> > 0.00s 0% 2716k
> > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.78r 0.00u
> > 0.00s 0% 2760k
> > > >>>>
> > > >>>> this problem goes away after increasing vfs.zfs.arc_max
> > > >>>> _______________________________________________
> > > >>>> freebsd-current at freebsd.org mailing list
> > > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > >>>> To unsubscribe, send any mail to
> > "freebsd-current-unsubscribe at freebsd.org"
> > > >>>>
> > > >>> Previously, ZFS was not actually able to evict enough dnodes to keep
> > > >>> your arc_max under 128MB, it would have been much higher based
> > on the
> > > >>> number of open files you had. A recent improvement from upstream ZFS
> > > >>> (r337653 and r337660) was pulled in that fixed this, so setting an
> > > >>> arc_max of 128MB is much more effective now, and that is causing the
> > > >>> side effect of "actually doing what you asked it to do", in this
> > case,
> > > >>> what you are asking is a bit silly. If you have a working set
> > that is
> > > >>> greater than 128MB, and you ask ZFS to use less than that, it'll
> > have to
> > > >>> constantly try to reclaim memory to keep under that very low bar.
> > > >>>
> > > >> Thanks for comments. Mark was right when he pointed to r338416 (
> > > >>
> > https://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/commo
> n/fs/zfs/arc.c?r1=338416&r2=338415&pathrev=338416
> > > >> ). Commenting aggsum_value returns normal speed regardless of the
> > rest
> > > >> of the new code from upstream.
> > > >> I would like to repeat that the speed with these two lines is not
> > just
> > > >> slow, but _INCREDIBLY_ slow! Probably, this should be written in the
> > > >> relevant documentation for FreeBSD 12+
> > >
> > > Hi,
> > >
> > > I am experiencing the same slowness when there is a bit of load on the
> > > system (buildworld for example) which I haven't seen before.
> >
> > Is it a regression following a recent kernel update?
> >
> > > I have vfs.zfs.arc_max=2G.
> > >
> > > Top is reporting
> > >
> > > ARC: 607M Total, 140M MFU, 245M MRU, 1060K Anon, 4592K Header, 217M
> > Other
> > > 105M Compressed, 281M Uncompressed, 2.67:1 Ratio
> > >
> > > Should I test the patch?
> >
> > I would be interested in the results, assuming it is indeed a
> > regression.
> > _______________________________________________
> > freebsd-current at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
> >
More information about the freebsd-current
mailing list