Strange slowdown when cache devices enabled in ZFS
Freddie Cash
fjwcash at gmail.com
Wed May 8 22:22:53 UTC 2013
On Wed, May 8, 2013 at 3:02 PM, Brendan Gregg <brendan.gregg at joyent.com>wrote:
> On Wed, May 8, 2013 at 2:45 PM, Freddie Cash <fjwcash at gmail.com> wrote:
>
>> On Wed, May 8, 2013 at 2:35 PM, Brendan Gregg <brendan.gregg at joyent.com>wrote:
>>
>>> Freddie Cash wrote (Mon Apr 29 16:01:55 UTC 2013):
>>> |
>>> | The following settings in /etc/sysctl.conf prevent the "stalls"
>>> completely,
>>> [...]
>>>
>>> To feed at 160 Mbytes/sec, with an 8 Kbyte recsize, you'll need at least
>>> 20,000 random read disk IOPS. How many spindles does that take? A lot. Do
>>> you have a lot?
>>>
>>>
>> 45x 2 TB SATA harddrives, configured in raidz2 vdevs of 6 disks each for
>> a total of 7 vdevs (with a few spare disks). With 2x SSD for log+OS and 2x
>> SSD for cache.
>>
>
> What's the max random read rate? I'd expect (7 vdevs, modern disks) it to
> be something like 1,000. What is your recsize? (or if it is tiny files,
> then average size?).
>
> On the other hand, if it's caching streaming workloads, then do those 2
> SSDs outperform 45 spindles?
>
> If you are getting 120 Mbytes/sec warmup, then I'm guessing it's either a
> 128 Kbyte recsize random reads, or sequential.
>
>
There's 128 GB of RAM in the box, arc_max set to 124 GB, arc_meta_max set
to 120 GB. And 16 CPU cores (2x 8-core CPU at 2.0 GHz).
Recordsize property for the pool is left at default (128 KB).
LZJB compression is enabled.
Dedupe is enabled.
"zpool list" shows 76 TB total storage space in the pool, with 29 TB
available (61% cap).
"zfs list" shows just over 18 TB of actual usable space left in the pool.
"zdb -DD" shows the following:
DDT-sha256-zap-duplicate: 110879014 entries, size 557 on disk, 170 in core
DDT-sha256-zap-unique: 259870524 entries, size 571 on disk, 181 in core
DDT histogram (aggregated over all DDTs):
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 248M 27.2T 18.6T 19.3T 248M 27.2T 18.6T 19.3T
2 80.0M 9.07T 7.56T 7.72T 175M 19.8T 16.5T 16.9T
4 16.0M 1.80T 1.40T 1.44T 77.2M 8.67T 6.72T 6.91T
8 4.51M 498G 345G 358G 47.6M 5.13T 3.51T 3.65T
16 2.53M 293G 137G 146G 53.8M 6.09T 2.84T 3.05T
32 1.55M 119G 63.8G 71.4G 72.6M 5.07T 2.77T 3.13T
64 762K 78.7G 45.6G 49.0G 71.5M 7.45T 4.25T 4.57T
128 264K 26.3G 18.3G 19.3G 44.8M 4.49T 3.25T 3.41T
256 57.5K 4.21G 2.28G 2.58G 18.2M 1.30T 704G 805G
512 9.25K 436M 216M 277M 6.38M 299G 144G 186G
1K 2.96K 116M 56.8M 76.5M 4.10M 166G 81.4G 109G
2K 1.15K 56.9M 27.1M 34.7M 3.26M 163G 76.0G 97.6G
4K 618 16.6M 3.10M 7.65M 3.27M 85.0G 17.0G 41.5G
8K 169 7.36M 3.11M 4.25M 1.89M 81.4G 33.2G 46.4G
16K 156 3.54M 948K 2.07M 3.42M 79.9G 20.2G 45.8G
32K 317 2.11M 763K 3.05M 13.8M 91.7G 32.1G 135G
64K 15 712K 32K 160K 1.26M 53.2G 2.44G 13.0G
128K 10 13.5K 8.50K 79.9K 1.60M 2.18G 1.37G 12.8G
256K 3 1.50K 1.50K 24.0K 926K 463M 463M 7.23G
Total 354M 39.0T 28.2T 29.1T 848M 86.2T 59.5T 62.3T
dedup = 2.14, compress = 1.45, copies = 1.05, dedup * compress / copies =
2.96
Not sure which zdb command to use to show the average block sizes in use,
though.
This is the off-site replication storage server for our backups systems,
aggregating data from the three main backups servers (schools, non-schools,
groupware). Each of those backups servers does an rsync of a remote Linux
or FreeBSD server (65, 73, 1 resp) overnight, and then does a "zfs send" to
push the data to this off-site server.
The issue I noticed was during the zfs recv from the other 3 boxes. Would
run fine without L2ARC devices, saturating the gigabit link between them.
Would run fine with L2ARC devices enabled ... until the L2ARC usage neared
100%, then the l2arc_feed_thread would hit 100% CPU usage, and there would
be 0 I/O to the pool. If I limited ARC to 64 GB, it would take longer to
reach the "l2arc_feed_thread @ 100%; no I/O" issue.
Turning l2arc_norw off, everything works. I've been running with the
sysctl.conf settings shown before without any issues for over a week now.
Full 124 GB ARC, 2x 64GB cache devices, L2ARC sitting at near 100% usage,
and l2arc_feed_thread never goes above 50% CPU, usually around 20%.
--
Freddie Cash
fjwcash at gmail.com
More information about the freebsd-fs
mailing list