ZFS slow reads for unallocated blocks

Adam Nowacki nowakpl at platinum.linux.pl
Sat Apr 13 12:04:47 UTC 2013


Temporary dbufs are created for each missing (unallocated on disk) 
record, including indirects if the hole is large enough. Those dbufs 
never find way to ARC and are freed at the end of dmu_read_uio.

A small read (from a hole) would in the best case bzero 128KiB 
(recordsize, more if missing indirects) ... and I'm running modified ZFS 
with record sizes up to 8MiB.

# zfs create -o atime=off -o recordsize=8M -o compression=off -o 
mountpoint=/home/testfs home/testfs
# truncate -s 8m /home/testfs/trunc8m
# dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1
1+0 records in
1+0 records out
8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec)

# time cat /home/testfs/trunc8m > /dev/null
0.000u 6.111s 0:06.11 100.0%    15+2753k 0+0io 0pf+0w

# time cat /home/testfs/zero8m > /dev/null
0.000u 0.010s 0:00.01 100.0%    12+2168k 0+0io 0pf+0w

600x increase in system time and close to 1MB/s - insanity.

The fix - a lot of the code to efficiently handle this was already there.

dbuf_hold_impl has int fail_sparse argument to return ENOENT for holes. 
Just had to get there and somehow back to dmu_read_uio where zeroing can 
happen at byte granularity.

... didn't have time to actually test it yet.

On 2013-04-13 12:24, Andriy Gapon wrote:
> on 13/04/2013 02:35 Adam Nowacki said the following:
>> http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization.patch.txt
>>
>> Does it look sane?
>
> It's hard to tell from a quick look since they change is not small.
> What is your idea of the problem and the fix?
>
>> On 2013-04-12 09:03, Andriy Gapon wrote:
>>>
>>> ENOTIME to really investigate, but here is a basic profile result for those
>>> interested:
>>>                 kernel`bzero+0xa
>>>                 kernel`dmu_buf_hold_array_by_dnode+0x1cf
>>>                 kernel`dmu_read_uio+0x66
>>>                 kernel`zfs_freebsd_read+0x3c0
>>>                 kernel`VOP_READ_APV+0x92
>>>                 kernel`vn_read+0x1a3
>>>                 kernel`vn_io_fault+0x23a
>>>                 kernel`dofileread+0x7b
>>>                 kernel`sys_read+0x9e
>>>                 kernel`amd64_syscall+0x238
>>>                 kernel`0xffffffff80747e4b
>>>
>>> That's where > 99% of time is spent.
>>>
>>
>
>



More information about the freebsd-fs mailing list