Interesting: ZFS scrub prefetch hurting sequential scrub performance?

Borja Marcos borjam at sarenet.es
Thu Jan 3 10:34:40 UTC 2019


Hi,

I have noticed that my scrubs have become painfully slow. I am wondering wether I’ve just hit some worst case or maybe
there is some interaction between the ZFS sequential scrub and scrub prefetch. I don’t recall seeing this behavior
before the sequential scrub code was committed. 

Did I hit some worst case or should scrub prefetch be disabled with the new sequential scrub code?


# zpool status
  pool: pool
 state: ONLINE
  scan: scrub in progress since Sat Dec 29 03:56:02 2018
	133G scanned at 309K/s, 129G issued at 300K/s, 619G total
	0 repaired, 20.80% done, no estimated completion time

When this happened last month I tried rebooting the server and restarting the scrub and everything went better. 

The first graph shows the disk I/O bandwith history for the last week. When the scrub started disk I/O “busy percent”
reached almost 100 %. And curiously the transfer rates looked rather healthy at around 10 MBps of read activity.

At first I suspected a misbehaving disk slowing down the whole process with retries but all the disks show a similar
service time pattern. One attached for reference.

Looking at the rest of the stats for some misbehavior hints I saw arctats_prefetch_metadata misses raising to
about 2000 per second and arcstats_l2_misses following the same pattern. 

Could it be prefetch spending a lot of time writing on the l2arc only to have the data evicted due to misses?

I have tried disabling scrub prefetch (vfs.zfs.no_scrub_prefetch=1) and, voila! everything picked up speed. Now
with a zpool iostat I see bursts of 100+ MBps reading activity and a proper scrub activity.

Disk busy percent has gone down to around 50% and cache stats have become much better. Turns out that
most of the I/O activity was just pointless writes to the L2ARC.

Now, the hardware configuration.

The server has only 8 GB of memory with a maximum configured ARC size of 4 GB. 

It has a LSI2008 card with IR firmware. I didn´t bother to cross flash but anyway I am not using the RAID facilities,
it´s just configured like a plain HBA.

mps0: <Avago Technologies (LSI) SAS2008> port 0x9000-0x90ff mem 0xdfff0000-0xdfffffff,0xdff80000-0xdffbffff irq 17 at device 0.0 numa-domain 0 on pci4
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>

zpool status
  pool: pool
 state: ONLINE
  scan: scrub in progress since Sat Dec 29 03:56:02 2018
	323G scanned at 742K/s, 274G issued at 632K/s, 619G total
	0 repaired, 44.32% done, no estimated completion time
config:

	NAME        STATE     READ WRITE CKSUM
	pool        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    da12    ONLINE       0     0     0
	    da13    ONLINE       0     0     0
	    da14    ONLINE       0     0     0
	    da9     ONLINE       0     0     0
	    da15    ONLINE       0     0     0
	    da3     ONLINE       0     0     0
	  raidz1-1  ONLINE       0     0     0
	    da10    ONLINE       0     0     0
	    da4     ONLINE       0     0     0
	    da5     ONLINE       0     0     0
	    da6     ONLINE       0     0     0
	    da7     ONLINE       0     0     0
	    da8     ONLINE       0     0     0
	logs
	  da11p2    ONLINE       0     0     0
	cache
	  da11p3    ONLINE       0     0     0

errors: No known data errors


Yes, both ZIL and L2ARC on the same disk (a SSD). I know it’s not optimal but I guess it’s better
than the high latency of conventional disks,

# camcontrol devlist
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 11 lun 0 (pass0,da0)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 15 lun 0 (pass1,da1)
<SEAGATE ST9146803SS FS03>         at scbus6 target 17 lun 0 (pass2,da2)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 18 lun 0 (pass3,da3)
<SEAGATE ST9146803SS FS03>         at scbus6 target 20 lun 0 (pass4,da4)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 21 lun 0 (pass5,da5)
<SEAGATE ST9146803SS FS03>         at scbus6 target 22 lun 0 (pass6,da6)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 23 lun 0 (pass7,da7)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 24 lun 0 (pass8,da8)
<SEAGATE ST9146803SS FS03>         at scbus6 target 25 lun 0 (pass9,da9)
<SEAGATE ST9146803SS FS03>         at scbus6 target 26 lun 0 (pass10,da10)
<LSILOGIC SASX28 A.0 5021>         at scbus6 target 27 lun 0 (ses0,pass11)
<ATA Samsung SSD 850 2B6Q>         at scbus6 target 28 lun 0 (pass12,da11)
<SEAGATE ST9146803SS FS03>         at scbus6 target 29 lun 0 (pass13,da12)
<SEAGATE ST9146802SS S229>         at scbus6 target 30 lun 0 (pass14,da13)
<SEAGATE ST9146803SS FS03>         at scbus6 target 32 lun 0 (pass15,da14)
<SEAGATE ST9146802SS S22B>         at scbus6 target 33 lun 0 (pass16,da15)
<TSSTcorp CD/DVDW TS-T632A SR03>   at scbus13 target 0 lun 0 (pass17,cd0)


Hope the attachments reach the list, otherwise I will mail them to anyone interested.


Cheers,





Borja.










More information about the freebsd-fs mailing list