Scrub incredibly slow with 13.0-RC3 (as well as RC1 & 2)
Mathieu Chouquet-Stringer
me+freebsd at mathieu.digital
Sat Mar 27 16:06:48 UTC 2021
Hello,
TL;DR: read_max makes no difference, don't use hpet with KVM, use
scan_legacy.
On Sat, Mar 27, 2021 at 11:59:43AM +0000, tech-lists wrote:
> Have you tried the following sysctl:
>
> vfs.read_max=128
It's something I haven't so I gave it a shot. Note that I first ran my
tests on zroot which is on a virtio controller (the image file is on
mirror made of two SSDs).
This pool is small with only 12.1GB allocated out of 21.9GB:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 21.9G 12.1G 9.88G - - 37% 54% 1.00x ONLINE -
Scrubbing is hard to bench because the status seems stuck forever:
scan: scrub in progress since Sat Mar 27 15:16:27 2021
11.8G scanned at 146M/s, 1.23M issued at 15.2K/s, 12.1G total
0B repaired, 0.01% done, no estimated completion time
scan: scrub in progress since Sat Mar 27 15:16:27 2021
12.0G scanned at 145M/s, 3.58M issued at 43.2K/s, 12.1G total
0B repaired, 0.03% done, no estimated completion time
scan: scrub in progress since Sat Mar 27 15:16:27 2021
12.1G scanned at 142M/s, 409M issued at 4.70M/s, 12.1G total
0B repaired, 3.31% done, no estimated completion time
Look how it already read 12.0G so pretty much 100% of the allocated
space in the pool and it still shows 0.03% done...
And with a bs of 128k (zfs record size), dd with iflags=direct tells me
I can read this thing at:
23266197504 bytes (23 GB, 22 GiB) transferred 23.042s, 1010 MB/s
With a bs of 32k I still get 310MB/s.
So please find below scrubbing duration with vfs.read_max 512, 256, 128
and 64: it makes no difference.
Please not it seems it's completely cpu bound again as I see this
process using 100% of the CPU the whole time scrub is running:
31 root -8 - 0B 3952K CPU0 0 20:05 100.00% zfskern{txg_thread_enter}
- vfs.read_max=512
vfs.read_max: 64 -> 512
- Scrub 1 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:10:41 2021
- Scrub 2 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:13:33 2021
- Scrub 3 scan: scrub repaired 0B in 00:02:54 with 0 errors on Sat Mar 27 15:16:27 2021
- vfs.read_max=256
vfs.read_max: 512 -> 256
- Scrub 1 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:19:19 2021
- Scrub 2 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:22:12 2021
- Scrub 3 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:25:04 2021
- vfs.read_max=128
vfs.read_max: 256 -> 128
- Scrub 1 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:27:56 2021
- Scrub 2 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:30:49 2021
- Scrub 3 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:33:41 2021
- vfs.read_max=64
vfs.read_max: 128 -> 64
- Scrub 1 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:36:34 2021
- Scrub 2 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:39:26 2021
- Scrub 3 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:42:19 2021
I am attaching a flamegraph for that busy process: again a lot of
hpet_get_timecount... So I went and disabled hpet in kvm: I gained 30
seconds and again, playing with vfs.read_max made no difference but then
again it's a ssd. And given I still am cpu bound I'm not sure I'll gain
anything by playing with block sizes... But the first lesson is to not
use hpet with KVM.
- vfs.read_max=512
vfs.read_max: 64 -> 512
- Scrub 1 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:14:25 2021
- Scrub 2 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:16:44 2021
- Scrub 3 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:19:02 2021
- vfs.read_max=256
vfs.read_max: 512 -> 256
- Scrub 1 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:21:20 2021
- Scrub 2 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:23:39 2021
- Scrub 3 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:25:57 2021
- vfs.read_max=128
vfs.read_max: 256 -> 128
- Scrub 1 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:28:15 2021
- Scrub 2 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:30:33 2021
- Scrub 3 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:32:51 2021
- vfs.read_max=64
vfs.read_max: 128 -> 64
- Scrub 1 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:35:10 2021
- Scrub 2 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:37:28 2021
- Scrub 3 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:39:47 2021
An interesting thing is vfs.zfs.scan_legacy=1 makes the progress start
right away. See how I read almost 50% of the disk and I'm already 49%
done while with the non legacy scan, progress only moves after reading
all the allocated space.
scan: scrub in progress since Sat Mar 27 16:41:33 2021
5.85G scanned at 133M/s, 5.85G issued at 133M/s, 12.1G total
0B repaired, 48.50% done, 00:00:47 to go
It's altogether way quicker (I gain 54 seconds on the no hpet test!!!)
and totally reproducible:
scan: scrub repaired 0B in 00:01:35 with 0 errors on Sat Mar 27 16:43:08 2021
scan: scrub repaired 0B in 00:01:35 with 0 errors on Sat Mar 27 16:46:10 2021
I will start a scrub of my zraid2 zpool and tweak read_max to see if it
makes a difference but at this point I doubt it.
And scan_legacy could be quicker because I see as many busy threads as there
are vdevs while if I start a scan with the non legacy scan, I only see one from
the get go.
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -16 - 0B 13M CPU0 0 8:40 99.17% kernel{z_rd_iss_4}
0 root -16 - 0B 13M CPU9 9 8:40 99.15% kernel{z_rd_iss_6}
0 root -16 - 0B 13M CPU7 7 8:40 98.97% kernel{z_rd_iss_5}
0 root -16 - 0B 13M CPU4 4 8:40 98.96% kernel{z_rd_iss_7}
0 root -16 - 0B 13M CPU6 6 8:40 98.59% kernel{z_rd_iss_1}
0 root -16 - 0B 13M CPU5 5 8:40 98.56% kernel{z_rd_iss_2}
0 root -16 - 0B 13M CPU14 14 8:40 98.33% kernel{z_rd_iss_3}
0 root -16 - 0B 13M CPU11 11 8:40 97.89% kernel{z_rd_iss_0}
--
Mathieu Chouquet-Stringer
The sun itself sees not till heaven clears.
-- William Shakespeare --
More information about the freebsd-fs
mailing list