FreeBSD 11.1 Beta 2 ZFS performance degradation on SSDs

Caza, Aaron Aaron.Caza at ca.weatherford.com
Tue Jun 20 17:16:36 UTC 2017


> -----Original Message-----

> From: Steven Hartland [mailto:killing at multiplay.co.uk]

> Sent: Monday, June 19, 2017 7:32 PM

> To: freebsd-fs at freebsd.org<mailto:freebsd-fs at freebsd.org>

> Subject: Re: FreeBSD 11.1 Beta 2 ZFS performance degradation on SSDs

>

> On 20/06/2017 01:57, Caza, Aaron wrote:

> >> vfs.zfs.min_auto_ashift is a sysctl only its not a tuneable, so setting it in /boot/loader.conf won't have any effect.

> >>

> >> There's no need for it to be a tuneable as it only effects vdevs when they are created, which an only be done once the system is running.

> >>

> > The bsdinstall install script itself set vfs.zfs.min_auto_shift=12 in /boot/loader.conf yet, as you say, this doesn't do anything.  As a user, this is a bit confusing to see it in /boot/loader.conf but do a 'sysctl -a | grep min_auto_ashift' and see 'vfs.zfs.min_auto_ashift: 9' so felt it was worth mentioning.

> Absolutely, patch is in review here:

> https://reviews.freebsd.org/D11278



Thanks for taking care of this Steve - appreciated.



> >

> >> You don't explain why you believe there is degrading performance?

> > As I related in my post, my previous FreeBSD 11-Stable setup using this same hardware, I was seeing 950MB/s after bootup.  I've been posting to the freebsd-hackers list, but have moved to freebsd-fs list as this seemingly has something to do with FreeBSD+ZFS behavior and user Jov had previously cross-posted to this list for me:

> > https://docs.freebsd.org/cgi/getmsg.cgi?fetch=2905+0+archive/2017/fr

> > ee

> > bsd-fs/20170618.freebsd-fs

> >

> > I've been using FreeBSD+ZFS ever since FreeBSD 9.0, admittedly, with a different zpool layout which is essentially as follows:

> >      adaXp1 - gptboot loader

> >      adaXp2 - 1GB UFS partition

> >      adaXp3 - UFS with UUID labeled partition hosting a GEOM ELI

> > layer using NULL encryption to emulate 4k sectors (done before

> > ashift was an

> > option)

> >

> > So, adaXp3 would show up as something like the following:

> >

> >    /dev/gpt/b62feb20-554b-11e7-989b-000bab332ee8

> >    /dev/gpt/b62feb20-554b-11e7-989b-000bab332ee8.eli

> >

> > Then, the zpool mirrored pair would be something like the following:

> >

> >    pool: wwbase

> >   state: ONLINE

> >    scan: none requested

> > config:

> >

> >          NAME                                              STATE     READ WRITE CKSUM

> >          wwbase                                            ONLINE       0     0     0

> >            mirror-0                                        ONLINE       0     0     0

> >              gpt/b62feb20-554b-11e7-989b-000bab332ee8.eli  ONLINE       0     0     0

> >              gpt/4c596d40-554c-11e7-beb1-002590766b41.eli  ONLINE       0     0     0

> >

> > Using the above zpool configuration on this same hardware on FreeBSD

> > 11-Stable, I was seeing read speeds of 950MB/s using dd (dd

> > if=/testdb/test of=/dev/null bs=1m).  However, after anywhere from 5

> > to 24 hours, performance would degrade down to less than 100MB/s for

> > unknown reasons - server was essentially idle so it's a

>  mystery to me why this occurs.  I'm seeing this behavior on FreeBSD

> 10.3R amd64 up through FreeBSD11.0 Stable.  As I wasn't making any headway in resolving this, I opted today to use the FreeBSD11.1 Beta 2 memstick image to create a basic FreeBSD 11.1 Beta 2 amd64 Auto(ZFS) installation to see if this would resolve the original  issue I was having as I would be using ZFS-on-root and vfs.zfs.min_auto_ashift=12 instead of my own emulation as described above.  However, instead of seeing the 950MB/s that I expected - which it what I see it with my alternative emulation - I'm seeing 450MB/s.  I've yet to determine if this zpool setup as done by the bsdinstall script > will suffer from the original performance degradation I observed.

> >

> >> What is the exact dd command your running as that can have a huge impact on performance.

> > dd if=/testdb/test of=/dev/null bs=1m

> >

> > Note that file /testdb/test is 16GB, twice the size of ram available in this system.  The /testdb directory is a ZFS file system with recordsize=8k, chosen as ultimately it's intended to host a PostgreSQL database which uses an 8k page size.

> >

> > My understanding is that a ZFS mirrored pool with two drives can read from both drives at the same time hence double the speed.  This is what I've actually observed ever since I first started using this in FreeBSD 9.0 with the GEOM ELI 4k sector emulation.  This is actually my first time using FreeBSD's native installer's Auto(ZFS) setup > with 4k sectors emulated using vfs.zfs.min_auto_ashift=12.  As it's a ZFS mirrored pool, I still expected it to be able to read at double-speed as it does with the GEOM ELI 4k sector emulation; however, it does not.

> >

>> On 19/06/2017 23:14, Caza, Aaron wrote:

>>> I've been  having a problem with FreeBSD ZFS SSD performance inexplicably degrading after < 24  hours uptime as described in a separate e-mail thread.  In an effort to get down to basics, I've now performed a ZFS-on-Root install of FreeBSD 11.1 Beta 2 amd64 using the default Auto(ZFS) install using the default 4k sector emulation (vfs.zfs.min_auto_ashift=3D12) setting (no swap, not encrypted).

>>>

>>> Firstly, the vfs.zfs.min_auto_ashift=3D12 is set correctly in the /boot=/loader.conf file, but doesn't appear to work because when I log in and do "systctl -a | grep min_auto_ashift" it's set to 9 and not 12 as expected.  I tried setting it to vfs.zfs.min_auto_ashift=3D"12" in /boot/loader.conf but that didn't make any difference so I finally just added it to /etc/sysctl.conf where it seems to work.  So, something needs to be changed to make this functionaly work correctly.

>>>

>>> Next, after reboot I was expecting somewhere in the neighborhood of 950MB/s from the ZFS mirrored zpool of 2 Samsung 850 Pro 256GB SSDs that I'm using as I was previously seeing this before with my previous FreeBSD 11-Stable setup which, admittedly, is a different from the way the bsdinstall script does it.  However, I'm seeing half that on bootup.

>>>

>>> Performance result:

>>> Starting 'dd' test of large file...please wait

>>> 16000+0 records in

>>> 16000+0 records out

>>> 16777216000 bytes transferred in 37.407043 secs (448504207

>>> bytes/sec)

> Can you show the output from gstat -pd during this DD please.



dT: 1.001s  w: 1.000s

L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name

    0   4318   4318  34865    0.0      0      0    0.0      0      0    0.0   14.2| ada0

    0   4402   4402  35213    0.0      0      0    0.0      0      0    0.0   14.4| ada1



dT: 1.002s  w: 1.000s

L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name

    1   4249   4249  34136    0.0      0      0    0.0      0      0    0.0   14.1| ada0

    0   4393   4393  35287    0.0      0      0    0.0      0      0    0.0   14.5| ada1



Every now and again, I was seeing d/s hit, which I understand to be TRIM operations - it would briefly show 16 then go back to 0.



test at f111beta2:~ # dd if=/testdb/test of=/dev/null bs=1m

16000+0 records in

16000+0 records out

16777216000 bytes transferred in 43.447343 secs (386150561 bytes/sec) test at f111beta2:~ # uptime  9:54AM  up 19:38, 2 users, load averages: 2.92, 1.01, 0.44 root at f111beta2:~ # dd if=/testdb/test of=/dev/null bs=1m

16000+0 records in

16000+0 records out

16777216000 bytes transferred in 236.097011 secs (71060688 bytes/sec) test at f111beta2:~ # uptime 10:36AM  up 20:20, 2 users, load averages: 0.90, 0.62, 0.36



As can be seen in the above 'dd' test results, I'm back to seeing the original issue I reported on freebsd-hackers - performance degrading after < 24 hours of uptime going from ~386MB/sec to ~71MB/sec inexpicably - this server isn't doing anything other than running this test hourly.



Please note in the gstat -pd output above, this was after the performance degradation hit.  Prior to this, I was seeing %busy of ~60%.  In this particular instance, the performance degradation hit ~20hrs into the test but I've see it hit as soon as ~5hrs.



Previously, Allan Jude had advised zfs.vfs.trim.enabled=0 to see if this changed the behavior.  I did this; however, it had no impact - but that was when I was using the GEOM ELI 4k sector emulation and not the ashift 4k sector emulation.  The GEOM ELI 4k sector emulation does not appear to work with TRIM operations as gstat -d in that case always stayed at 0 ops/s.  I can try disabling trim, but did not want to reboot the server to restart the test in case there was some additional info worth capturing.



I have captured an hourly log that can be provided containing the initial dmsg, zpool status, zfs list, zfs get all along with an hourly capture of the results of running the above 'dd' test with associated zfs-stats -a and sysctl -a output though it's currently 2.8MB hence too large to post to this list.



Also, there seems to be a problem with my freebsd-fs subscription as I'm not getting e-mail notifications despite having submitted a subscription request so apologies for my slow responses.



--

Aaron


Aaron Caza
Senior Server Developer
Weatherford SLS Canada R&D Group
Weatherford | 1620 27 Ave NE | #124B | Calgary | AB | T2E 8W4
Direct +1 (403) 693-7773
Aaron.Caza at ca.weatherford.com<mailto:Aaron.Caza at ca.weatherford.com> | www.weatherford.com<http://www.weatherford.com/>

[cid:image001.jpg at 01D27566.E8E4ABE0]<http://www.weatherford.com/>

[cid:image002.jpg at 01D27566.E8E4ABE0]<https://www.linkedin.com/company/weatherford>

[cid:image003.jpg at 01D27566.E8E4ABE0]<https://www.facebook.com/WeatherfordCorp/>

[cid:image004.jpg at 01D27566.E8E4ABE0]<https://www.youtube.com/user/weatherfordcorp>

[cid:image005.jpg at 01D27566.E8E4ABE0]<https://twitter.com/WeatherfordCorp?lang=en>


This message may contain confidential and privileged information.  If it has been sent to you in error, please reply to advise the sender of the error and then immediately delete it.  If you are not the intended recipient, do not read, copy, disclose or otherwise use this message.  The sender disclaims any liability for such unauthorized use.  PLEASE NOTE that all incoming e-mails sent to Weatherford e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails("spam").  This process could result in deletion of a legitimate e-mail before it is read by its intended recipient at our organization.  Moreover, based on the scanning results, the full text of e-mails and attachments may be made available to Weatherford security and other personnel for review and appropriate action.  If you have any concerns about this process, please contact us at dataprivacy at weatherford.com.

This message may contain confidential and privileged information. If it has been sent to you in error, please reply to advise the sender of the error and then immediately delete it. If you are not the intended recipient, do not read, copy, disclose or otherwise use this message. The sender disclaims any liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent to Weatherford e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails (spam). This process could result in deletion of a legitimate e-mail before it is read by its intended recipient at our organization. Moreover, based on the scanning results, the full text of e-mails and attachments may be made available to Weatherford security and other personnel for review and appropriate action. If you have any concerns about this process, please contact us at dataprivacy at weatherford.com.


More information about the freebsd-fs mailing list