Scrub incredibly slow with 13.0-RC3 (as well as RC1 & 2)
Mathieu Chouquet-Stringer
me+freebsd at mathieu.digital
Mon Mar 22 10:17:47 UTC 2021
Hello,
I'm running vanilla 13.0-RC3 at the moment but experienced the same
issue with RC2 and RC1.
This instance of FreeBSD is running in a Linux kvm and it has 16 threads
and 32GB of RAM:
hw.model: Intel Xeon E3-12xx v2 (Ivy Bridge, IBRS)
hw.machine: amd64
hw.ncpu: 16
hw.physmem: 34318278656
I have 2 zpools, 1 is for the root filesystems the other one is just my
storage array.
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data 29T 24.5T 4.54T - - 11% 84% 1.00x ONLINE -
zroot 11.9G 8.76G 3.18G - - 62% 73% 1.00x ONLINE -
Zroot is on vtbd0 device backed by ssd and I used to be able to scrub it
reaching SATA2 bandwidth limits, close to 300MB/s. Now when I scrub it,
I get half that at best.
But that's not the worst part as it's really small so it's still quick
enough. The worst part is my data zpool. It's raidz2 array with 8
devices, a log mirror on SSDs and 2 caches devices on SSD. The sas/sata
controllers (2 x SAS3008 in IT mode) are dedicated to FreeBSD so it has
full control of them (through vfio pci).
pool: data
state: ONLINE
scan: scrub in progress since Sun Mar 21 23:47:40 2021
2.82T scanned at 77.8M/s, 2.24T issued at 62.0M/s, 24.5T total
0B repaired, 9.17% done, 4 days 08:23:32 to go
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da7 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
da6 ONLINE 0 0 0
da9 ONLINE 0 0 0
da2 ONLINE 0 0 0
da10 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
da0p1 ONLINE 0 0 0
da5p1 ONLINE 0 0 0
cache
da0p2 ONLINE 0 0 0
da5p2 ONLINE 0 0 0
spares
da8 AVAIL
I used to be able to scrub the whole pool in a day or so but not
anymore. I started scrubbing it almost 11 hours ago and I've only done
10% of it.
iostat shows devices aren't that busy:
extended device statistics
device r/s w/s kr/s kw/s ms/r ms/w ms/o ms/t qlen %b
vtbd0 9 2 264.1 14.3 0 0 2 0 0 0
da0 0 71 10.9 9106.4 0 0 0 0 0 3
da1 1443 29 8632.4 1095.7 0 0 79 0 0 34
da2 1442 29 8631.3 1095.7 0 0 82 0 0 31
da3 1434 28 8637.7 1096.1 0 0 84 0 1 36
da4 1434 29 8631.8 1095.7 0 0 79 0 0 33
da5 0 71 11.7 9130.4 0 0 0 0 0 3
da6 1438 29 8634.1 1095.6 0 0 82 0 0 31
da7 1441 29 8636.2 1095.6 0 0 84 0 1 33
da8 0 0 0.0 0.0 1 2 0 1 0 0
da9 1441 29 8630.8 1095.8 0 0 81 0 0 32
da10 1438 29 8633.7 1095.5 0 0 82 0 0 32
pass1 0 0 0.0 0.0 0 0 0 0 0 0
pass2 0 0 0.0 0.0 43 17 36 41 0 0
pass3 0 0 0.0 0.0 43 15 35 40 0 0
pass4 0 0 0.0 0.0 44 12 42 43 0 0
pass5 0 0 0.0 0.0 45 13 32 41 0 0
pass6 0 0 0.0 0.0 0 0 0 0 0 0
pass7 0 0 0.0 0.0 47 23 44 46 0 0
pass8 0 0 0.0 0.0 44 15 35 42 0 0
pass9 0 0 0.0 0.0 33 0 198 71 0 0
pass10 0 0 0.0 0.0 43 16 33 40 0 0
pass11 0 0 0.0 0.0 41 9 32 39 0 0
Here's what I have in my sysctl.conf:
vfs.zfs.l2arc_write_max=314572800
vfs.zfs.min_auto_ashift=12
vfs.zfs.vdev.async_read_max_active=10
I tried playing with scan_mem_lim_fact (I went through the whole openzfs
doc yesterday) but it didn't make any difference:
vfs.zfs.scan_mem_lim_fact=5
With or without vfs.zfs.scan_mem_lim_fact, I often see I have 1 and
sometimes 2 threads pegged at 100% of CPU in kernel mode...
1[|| 6.8%] 5[|| 8.8%] 9[| 4.7%] 13[|| 9.4%]
2[|| 8.3%] 6[| 4.2%] 10[|| 8.9%] 14[| 5.7%]
3[|| 8.4%] 7[|||||||||100.0%] 11[||| 14.1%] 15[| 5.2%]
4[| 5.7%] 8[|| 9.4%] 12[|| 13.0%] 16[| 4.2%]
Mem[||||||||||||||||||||||||||9.33G/32.0G] Tasks: 45, 0 thr; 2 running
Swp[| 130M/8.00G] Load average: 2.51 2.63 2.46
Uptime: 10:47:36
PID USER PRI NI VIRT RES S CPU% MEM% TIME+ Command
0 root -16 0 0 12976 S 22.5 0.0 9h45:36 kernel
1 root 24 0 11824 32 S 0.0 0.0 0:00.36 /sbin/init
3453 root 20 0 17668 2224 S 0.0 0.0 0:00.48 /usr/local/sbin/smartd -A /usr/local/
3974 root 52 0 12892 48 S 0.0 0.0 0:00.00 /usr/libexec/getty Pc ttyv0
4973 root 52 0 12892 48 S 0.0 0.0 0:00.00 /usr/libexec/getty Pc ttyv1
5618 root 52 0 12892 48 S 0.0 0.0 0:00.00 /usr/libexec/getty Pc ttyv2
5974 root 52 0 12892 48 S 0.0 0.0 0:00.00 /usr/libexec/getty Pc ttyv3
6320 root 52 0 12892 48 S 0.0 0.0 0:00.00 /usr/libexec/getty Pc ttyv4
6386 root 20 0 39188 2932 S 0.0 0.0 0:00.65 /usr/local/sbin/nmbd --daemon --confi
Any idea how I could help debug that?
Cheers,
--
Mathieu Chouquet-Stringer me+freebsd at mathieu.digital
The sun itself sees not till heaven clears.
-- William Shakespeare --
More information about the freebsd-current
mailing list