ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

Sat Sep 8 18:02:37 UTC 2018

In message <e5abddc5-17f0-bf5f-753b-1edbc9356385 at alvermark.net>, Jakob 
Alvermar
k writes:
>
>                         Total     MFU     MRU    Anon     Hdr L2Hdr   Other
>       ZFS ARC            667M    186M    168M     13M   3825K 0K    295M
>
>                                  rate    hits  misses   total hits total 
> misses
>       arcstats                  : 99%   65636     605 167338494      9317074
>       arcstats.demand_data      : 57%     431     321 13414675      2117714
>       arcstats.demand_metadata  : 99%   65175     193 152969480      5344919
>       arcstats.prefetch_data    :  0%       0      30 3292       401344
>       arcstats.prefetch_metadata: 32%      30      61 951047      1453097
>       zfetchstats               :  9%     119    1077 612582     55041789
>       arcstats.l2               :  0%       0       0 0            0
>       vdev_cache_stats          :  0%       0       0 0            0
>
>
>
>
> This is while a 'make -j8 buildworld' (it has 8 cores) is going.

Overall you have a 94% hit ratio.

slippy$ bc
scale=4
167338494/(167338494+9317074)
.9472
slippy$ 

It could be better.

Why is your ZFS ARC so small? Before I answer this I will discuss my 
experience first.

My machines are seeing something similar to this:

                      Total     MFU     MRU    Anon     Hdr   L2Hdr   
Other
     ZFS ARC           4274M   2329M   1394M     17M     82M      0K    
445M

                                rate    hits  misses   total hits total 
misses
     arcstats                  : 97%     614      13    866509066     
51853442
     arcstats.demand_data      :100%      96       0    107658733      
3101522
     arcstats.demand_metadata  : 97%     516      13    755890353     
48080146
     arcstats.prefetch_data    :  0%       0       0       327613       
225688
     arcstats.prefetch_metadata:100%       2       0      2632367       
446086
     zfetchstats               :  6%       6      80      2362709    
294731645
     arcstats.l2               :  0%       0       0            0       
     0
     vdev_cache_stats          :  0%       0       0            0       
     0

This is what you should see. This is with -CURRENT built two days ago.

cwsys$ uname -a
FreeBSD cwsys 12.0-ALPHA5 FreeBSD 12.0-ALPHA5 #51 r338520M: Thu Sep  6 
17:44:35 PDT 2018     root at cwsys:/export/obj/opt/src/svn-current/amd64.a
md64/sys/BREAK  amd64
cwsys$ 

Top reports:

CPU:  0.3% user, 89.9% nice,  9.5% system,  0.3% interrupt,  0.0% idle
Mem: 678M Active, 344M Inact, 175M Laundry, 6136M Wired, 168M Buf, 598M 
Free
ARC: 4247M Total, 2309M MFU, 1386M MRU, 21M Anon, 86M Header, 446M Other
     3079M Compressed, 5123M Uncompressed, 1.66:1 Ratio
Swap: 20G Total, 11M Used, 20G Free

This is healthy. It's running a poudriere build.

My laptop:

                       Total     MFU     MRU    Anon     Hdr   L2Hdr   
Other
     ZFS ARC           3175M   1791M    872M     69M    165M      0K    
277M

                                rate    hits  misses   total hits total 
misses
     arcstats                  : 99%    3851      26     89082984      
5101207
     arcstats.demand_data      : 99%     345       2      6197930       
340186
     arcstats.demand_metadata  : 99%    3506      24     81391265      
4367755
     arcstats.prefetch_data    :  0%       0       0        11507       
 30945
     arcstats.prefetch_metadata:  0%       0       0      1482282       
362321
     zfetchstats               :  2%      12     576       113185     
38564546
     arcstats.l2               :  0%       0       0            0       
     0
     vdev_cache_stats          :  0%       0       0            0       
     0

Similar results after working on a bunch of ports in four VMs last 
night, testing various combinations of options while Heimdal in base is 
private, hence the large ARC remaining this morning.

Currently on the laptop top reports:

CPU:  0.2% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.8% idle
Mem: 376M Active, 1214M Inact, 5907M Wired, 464M Buf, 259M Free
ARC: 3175M Total, 1863M MFU, 803M MRU, 69M Anon, 160M Header, 280M Other
     2330M Compressed, 7881M Uncompressed, 3.38:1 Ratio
Swap: 22G Total, 22G Free

This is also healthy.

Now for questions:

Do you have any UFS filesystems? Top will report buf. What is that at?

Some background: My /, /usr, and /var are UFS (these are old 
installations which when I install a new machine I dump | rsh 
new-machine restore, change a couple of entries in rc.conf and fstab, 
rsync ports (/usr/local, /var/db...) and boot (I'm terribly impatient). 
Hence the legacy.

I have noticed that when writing a lot to UFS, increasing the size of 
the UFS buffer cache, my ARC will reduce to 1 GB or even less. But this 
is during a -j8 installworld to /, a test partition, an i386 partition 
and a number of VMs on UFS on a zpool and other VMs using ZFS on the 
same zpool. My ARC drops rapidly when the UFS filesystems are actively 
being written to. UFS and ZFS on the same server will impact 
performance unless one or the other is sparsely used.

To repeat, do you have any UFS on the system? Do you write to UFS? Is 
it actively being written to at the time? How many MB is used by UFS 
buffers?

How much RAM is installed on this machine?

What is the scan rate?

>
> SSH'ing to the machine while the buildworld is going it takes 40-60 
> seconds to get to the shell!

Then your iostat or systat -v should show that you're hammering your 
disks. Or, you may be using a lot of swap.

>
> Hitting ^T while waiting: load: 1.06  cmd: zsh 45334 
> [arc_reclaim_waiters_cv] 56.11r 0.00u 0.10s 0% 5232k

Load might be low because processes are waiting for disk I/O. Processes 
waiting on I/O are not in the run queue and therefore don't affect load 
average. Disk I/O will kill performance worse than CPU load. Back in 
the days when I was an MVS systems programmer (IBM mainframe), I did a 
fair bit of tuning MVS at the time (Z/OS today). The rule of thumb then 
was machine instructions took nanoseconds whereas disk I/O took 
milliseconds and interrupting a process to gain control of the CPU the 
scheduler took nanoseconds because that's how long instructions took. 
You cannot interrupt I/O. You have to wait for the current I/O 
operation to complete before inserting a new I/O into the queue and 
with tagged queuing you have to wait for the disk to complete its work 
before scheduling new work. Now you're waiting multiples of 
milliseconds instead of a few nanoseconds. I/O kills performance.

Look at iostat or systat -v. I think your answer lies there and since 
your ARC is small we need to find out why.

>
> I will test the patch below and report back.

Agreed, though IMO your workload and your environment need to be 
understood first. What concerns me about the patch is what impact will 
it have on other workloads. Not evicting data and only metadata could 
impact buildworld -DNO_CLEAN for example. I do a -DNO_CLEAN 
buildworlds, sometimes -DWORLDFAST. Adjusting vfs.zfs.arc_meta_limit to 
the same value as vfs.zfs.arc_max improved my buildworld/installworld 
performance. In addition disabling atime for the ZFS dataset containing 
/usr/obj also improved buildworld/installworld performance by reducing 
unnecessary (IMO) metadata writes. I think evicting metadata only might 
cause a new set of problems for different workloads. (Maybe this should 
be a sysctl?)

-- 
Cheers,
Cy Schubert <Cy.Schubert at cschubert.com>
FreeBSD UNIX:  <cy at FreeBSD.org>   Web:  http://www.FreeBSD.org

	The need of the many outweighs the greed of the few.

>
>
> Jakob
>
> On 9/7/18 7:27 PM, Cy Schubert wrote:
> > I'd be interested in seeing systat -z output.
> >
> > ---
> > Sent using a tiny phone keyboard.
> > Apologies for any typos and autocorrect.
> > Also, this old phone only supports top post. Apologies.
> >
> > Cy Schubert
> > <Cy.Schubert at cschubert.com> or <cy at freebsd.org>
> > The need of the many outweighs the greed of the few.
> > ---
> > ------------------------------------------------------------------------
> > From: Mark Johnston
> > Sent: 07/09/2018 09:09
> > To: Jakob Alvermark
> > Cc: Subbsd; allanjude at freebsd.org; freebsd-current Current
> > Subject: Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4
> >
> > On Fri, Sep 07, 2018 at 03:40:52PM +0200, Jakob Alvermark wrote:
> > > On 9/6/18 2:28 AM, Mark Johnston wrote:
> > > > On Wed, Sep 05, 2018 at 11:15:03PM +0300, Subbsd wrote:
> > > >> On Wed, Sep 5, 2018 at 5:58 PM Allan Jude <allanjude at freebsd.org> 
> > wrote:
> > > >>> On 2018-09-05 10:04, Subbsd wrote:
> > > >>>> Hi,
> > > >>>>
> > > >>>> I'm seeing a huge loss in performance ZFS after upgrading 
> > FreeBSD 12
> > > >>>> to latest revision (r338466 the moment) and related to ARC.
> > > >>>>
> > > >>>> I can not say which revision was before except that the newver.sh
> > > >>>> pointed to ALPHA3.
> > > >>>>
> > > >>>> Problems are observed if you try to limit ARC. In my case:
> > > >>>>
> > > >>>> vfs.zfs.arc_max="128M"
> > > >>>>
> > > >>>> I know that this is very small. However, for two years with 
> > this there
> > > >>>> were no problems.
> > > >>>>
> > > >>>> When i send SIGINFO to process which is currently working with 
> > ZFS, i
> > > >>>> see "arc_reclaim_waiters_cv":
> > > >>>>
> > > >>>> e.g when i type:
> > > >>>>
> > > >>>> /bin/csh
> > > >>>>
> > > >>>> I have time (~5 seconds) to press several times 'ctrl+t' before 
> > csh is executed:
> > > >>>>
> > > >>>> load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.41r 0.00u 
> > 0.00s 0% 3512k
> > > >>>> load: 0.70  cmd: csh 5935 [zio->io_cv] 1.69r 0.00u 0.00s 0% 3512k
> > > >>>> load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.98r 0.00u 
> > 0.01s 0% 3512k
> > > >>>> load: 0.73  cmd: csh 5935 [arc_reclaim_waiters_cv] 2.19r 0.00u 
> > 0.01s 0% 4156k
> > > >>>>
> > > >>>> same story with find or any other commans:
> > > >>>>
> > > >>>> load: 0.34  cmd: find 5993 [zio->io_cv] 0.99r 0.00u 0.00s 0% 2676k
> > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.13r 0.00u 
> > 0.00s 0% 2676k
> > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.25r 0.00u 
> > 0.00s 0% 2680k
> > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.38r 0.00u 
> > 0.00s 0% 2684k
> > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.51r 0.00u 
> > 0.00s 0% 2704k
> > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.64r 0.00u 
> > 0.00s 0% 2716k
> > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.78r 0.00u 
> > 0.00s 0% 2760k
> > > >>>>
> > > >>>> this problem goes away after increasing vfs.zfs.arc_max
> > > >>>> _______________________________________________
> > > >>>> freebsd-current at freebsd.org mailing list
> > > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > >>>> To unsubscribe, send any mail to 
> > "freebsd-current-unsubscribe at freebsd.org"
> > > >>>>
> > > >>> Previously, ZFS was not actually able to evict enough dnodes to keep
> > > >>> your arc_max under 128MB, it would have been much higher based 
> > on the
> > > >>> number of open files you had. A recent improvement from upstream ZFS
> > > >>> (r337653 and r337660) was pulled in that fixed this, so setting an
> > > >>> arc_max of 128MB is much more effective now, and that is causing the
> > > >>> side effect of "actually doing what you asked it to do", in this 
> > case,
> > > >>> what you are asking is a bit silly. If you have a working set 
> > that is
> > > >>> greater than 128MB, and you ask ZFS to use less than that, it'll 
> > have to
> > > >>> constantly try to reclaim memory to keep under that very low bar.
> > > >>>
> > > >> Thanks for comments. Mark was right when he pointed to r338416 (
> > > >> 
> > https://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/commo
> n/fs/zfs/arc.c?r1=338416&r2=338415&pathrev=338416
> > > >> ). Commenting aggsum_value returns normal speed regardless of the 
> > rest
> > > >> of the new code from upstream.
> > > >> I would like to repeat that the speed with these two lines is not 
> > just
> > > >> slow, but _INCREDIBLY_ slow! Probably, this should be written in the
> > > >> relevant documentation for FreeBSD 12+
> > >
> > > Hi,
> > >
> > > I am experiencing the same slowness when there is a bit of load on the
> > > system (buildworld for example) which I haven't seen before.
> >
> > Is it a regression following a recent kernel update?
> >
> > > I have vfs.zfs.arc_max=2G.
> > >
> > > Top is reporting
> > >
> > > ARC: 607M Total, 140M MFU, 245M MRU, 1060K Anon, 4592K Header, 217M 
> > Other
> > >       105M Compressed, 281M Uncompressed, 2.67:1 Ratio
> > >
> > > Should I test the patch?
> >
> > I would be interested in the results, assuming it is indeed a
> > regression.
> > _______________________________________________
> > freebsd-current at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
> >