Strange IO performance with UFS
Roger Pau Monné
roger.pau at citrix.com
Sat Jul 5 10:35:18 UTC 2014
On 05/07/14 11:58, Konstantin Belousov wrote:
> On Sat, Jul 05, 2014 at 11:32:06AM +0200, Roger Pau Monn? wrote:
>> On 04/07/14 23:19, Stefan Parvu wrote:
>>> Hi,
>>>
>>>>> I'm doing some tests on IO performance using fio, and I've
>>>>> found something weird when using UFS and large files. I
>>>>> have the following very simple sequential fio workload:
>>>
>>> System: FreeBSD ox 10.0-RELEASE-p6 FreeBSD 10.0-RELEASE-p6 #0:
>>> Tue Jun 24 07:47:37 UTC 2014
>>> root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
>>> amd64
>>>
>>>
>>> 1. Seq Write to 1 file, 10GB size, single writer, block 4k,
>>> UFS2:
>>>
>>> I tried to write seq using a single writer using an IOSIZE
>>> similar to your example, 10 GB to a 14TB Hdw RAID 10 LSI device
>>> using fio 2.1.9 under FreeBSD 10.0.
>>>
>>> Result: Run status group 0 (all jobs): WRITE: io=10240MB,
>>> aggrb=460993KB/s, minb=460993KB/s, maxb=460993KB/s,
>>> mint=22746msec, maxt=22746msec
>>
>> This looks much better than what I've saw in my benchmarks, how
>> much memory does the system have?
>>
>> In my case I've seen the reads issue when trying to write to
>> files that where greater than the memory the system has. My box
>> has 6GB of RAM and I was using a 10GB file.
>>
>>>
>>>
>>> 2. Seq Write to 2500 files, each file 5MB size, multiple
>>> writers, UFS2:
>>>
>>> Result: Run status group 0 (all jobs): WRITE: io=12500MB,
>>> aggrb=167429KB/s, minb=334KB/s, maxb=9968KB/s, mint=2568msec,
>>> maxt=76450msec
>>>
>>> Questions:
>>>
>>> - where are you writing, what storage: hdw / sfw RAID ?
>>
>> The storage is a simple SATA disk, no RAID:
>>
>> pass0 at ahcich0 bus 0 scbus0 target 0 lun 0 pass0:
>> <ST500DM002-1BD142 KC45> ATA-8 SATA 3.x device pass0: Serial
>> Number Z3T3FJXL pass0: 300.000MB/s transfers (SATA 2.x, UDMA6,
>> PIO 8192bytes) pass0: Command Queueing enabled
>>
>>> - are you using time based fio tests ?
>>
>> I'm using the following fio workload, as stated in the first
>> email:
>>
>> [global] rw=write size=4g bs=4k
>>
>> [job1]
>>
>> The problem doesn't seem to be related to the hardware (I've also
>> seen this when running inside of a VM), but to UFS itself that at
>> some point (or maybe under certain conditions) starts making a
>> lot of reads when doing a simple write:
>>
>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3
>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3
>> kernel`g_io_request+0x384 kernel`ufs_strategy+0x8a
>> kernel`VOP_STRATEGY_APV+0xf5 kernel`bufstrategy+0x46
>> kernel`cluster_read+0x5e6 kernel`ffs_balloc_ufs2+0x1be2
>> kernel`ffs_write+0x310 kernel`VOP_WRITE_APV+0x166
>> kernel`vn_write+0x2eb kernel`vn_io_fault_doio+0x22
>> kernel`vn_io_fault1+0x78 kernel`vn_io_fault+0x173
>> kernel`dofilewrite+0x85 kernel`kern_writev+0x65
>> kernel`sys_write+0x63
>>
>> This can also be seen by running iostat in parallel with the fio
>> workload:
>>
>> device r/s w/s kr/s kw/s qlen svc_t %b ada0
>> 243.3 233.7 31053.3 29919.1 31 57.4 100
>>
>> This clearly shows that even when I was doing a sequential write
>> (the fio workload shown above), the disk was actually reading
>> more data than writing it, which makes no sense, and all the
>> reads come from the path trace shown above.
>
> The backtrace above means that the BA_CLRBUF was specified for
> UFS_BALLOC(). In turns, this occurs when the write size is less
> than the UFS block size. UFS has to read the block to ensure that
> partial write does not corrupt the rest of the buffer.
Thanks for the clarification, that makes sense. I'm not opening the
file with O_DIRECT, so shouldn't the write be cached in memory and
flushed to disk when we have the full block? It's a sequential write,
so the whole block is going to be rewritten very soon.
>
> You can get the block size for file with stat(2), st_blksize field
> of the struct stat, or using statfs(2), field f_iosize of struct
> statfs, or just looking at the dumpfs output for your filesystem,
> the bsize value. For modern UFS typical value is 32KB.
Yes, block size is 32KB, checked with dumpfs. I've changed the block
size in fio to 32k and then I get the expected results in iostat and fio:
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 1.0 658.2 31.1 84245.1 58 108.4 101
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 689.8 0.0 88291.4 54 112.1 99
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 1.0 593.3 30.6 75936.9 80 111.7 97
write: io=10240MB, bw=81704KB/s, iops=2553, runt=128339msec
Roger.
More information about the freebsd-hackers
mailing list