Strange IO performance with UFS
Roger Pau Monné
roger.pau at citrix.com
Sat Jul 5 16:18:21 UTC 2014
On 05/07/14 13:24, Konstantin Belousov wrote:
> On Sat, Jul 05, 2014 at 12:35:11PM +0200, Roger Pau Monn? wrote:
>> On 05/07/14 11:58, Konstantin Belousov wrote:
>>> On Sat, Jul 05, 2014 at 11:32:06AM +0200, Roger Pau Monn?
>>> wrote:
>>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3
>>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3
>>>> kernel`g_io_request+0x384 kernel`ufs_strategy+0x8a
>>>> kernel`VOP_STRATEGY_APV+0xf5 kernel`bufstrategy+0x46
>>>> kernel`cluster_read+0x5e6 kernel`ffs_balloc_ufs2+0x1be2
>>>> kernel`ffs_write+0x310 kernel`VOP_WRITE_APV+0x166
>>>> kernel`vn_write+0x2eb kernel`vn_io_fault_doio+0x22
>>>> kernel`vn_io_fault1+0x78 kernel`vn_io_fault+0x173
>>>> kernel`dofilewrite+0x85 kernel`kern_writev+0x65
>>>> kernel`sys_write+0x63
>>>>
>>>> This can also be seen by running iostat in parallel with the
>>>> fio workload:
>>>>
>>>> device r/s w/s kr/s kw/s qlen svc_t %b ada0
>>>> 243.3 233.7 31053.3 29919.1 31 57.4 100
>>>>
>>>> This clearly shows that even when I was doing a sequential
>>>> write (the fio workload shown above), the disk was actually
>>>> reading more data than writing it, which makes no sense, and
>>>> all the reads come from the path trace shown above.
>>>
>>> The backtrace above means that the BA_CLRBUF was specified for
>>> UFS_BALLOC(). In turns, this occurs when the write size is
>>> less than the UFS block size. UFS has to read the block to
>>> ensure that partial write does not corrupt the rest of the
>>> buffer.
>>
>> Thanks for the clarification, that makes sense. I'm not opening
>> the file with O_DIRECT, so shouldn't the write be cached in
>> memory and flushed to disk when we have the full block? It's a
>> sequential write, so the whole block is going to be rewritten
>> very soon.
>>
>>>
>>> You can get the block size for file with stat(2), st_blksize
>>> field of the struct stat, or using statfs(2), field f_iosize of
>>> struct statfs, or just looking at the dumpfs output for your
>>> filesystem, the bsize value. For modern UFS typical value is
>>> 32KB.
>>
>> Yes, block size is 32KB, checked with dumpfs. I've changed the
>> block size in fio to 32k and then I get the expected results in
>> iostat and fio:
>>
>> extended device statistics device r/s w/s kr/s kw/s
>> qlen svc_t %b ada0 1.0 658.2 31.1 84245.1 58 108.4
>> 101 extended device statistics device r/s w/s kr/s
>> kw/s qlen svc_t %b ada0 0.0 689.8 0.0 88291.4 54
>> 112.1 99 extended device statistics device r/s w/s kr/s
>> kw/s qlen svc_t %b ada0 1.0 593.3 30.6 75936.9 80
>> 111.7 97
>>
>> write: io=10240MB, bw=81704KB/s, iops=2553, runt=128339msec
>
> The current code in ffs_write() only avoids read before write when
> write covers complete block. I think we can somewhat loose the
> test to also avoid read when we are at EOF and write covers
> completely the valid portion of the last block.
>
> This leaves the unwritten portion of the block with the garbage. I
> believe that it is not harmful, since the only way for usermode to
> access that garbage is through the mmap(2). The
> vnode_generic_getpages() zeroes out parts of the page which are
> after EOF.
>
> Try this, almost completely untested:
Doesn't seem to help much, I'm still seeing the same issue. I'm
sampling iostat every 1s, and here's the output form the start of the
4k block fio workload:
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 349.5 0.0 44612.3 48 88.0 52
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 655.4 0.0 83773.6 76 99.8 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 699.2 0.0 89493.1 59 109.4 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 628.1 0.0 80392.6 55 114.8 98
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 655.7 0.0 83799.6 79 98.4 102
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 701.4 0.0 89782.0 80 105.5 97
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 697.9 0.0 89331.6 78 112.0 103
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 714.1 0.0 91408.7 77 110.3 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 724.0 0.0 92675.0 67 112.5 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 700.4 0.0 89646.6 49 102.5 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 686.4 0.0 87857.2 78 110.0 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 702.0 0.0 89851.6 80 112.9 97
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 736.3 0.0 94246.4 67 110.1 103
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 624.6 0.0 79950.0 48 115.7 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 704.0 0.0 90118.4 77 106.1 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 714.6 0.0 91470.0 80 103.6 99
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 710.4 0.0 90926.1 80 111.1 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 655.3 0.0 83882.1 70 115.8 99
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 539.8 0.0 69094.5 80 121.2 101
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 711.6 0.0 91087.6 79 107.9 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 705.5 0.0 90304.5 81 111.3 97
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 727.3 0.0 93092.8 81 108.9 102
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 699.5 0.0 89296.4 55 109.0 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 689.0 0.0 88066.1 78 96.6 101
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 738.3 0.0 94496.1 56 109.1 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 615.4 0.0 78770.0 80 112.3 98
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 707.3 0.0 90529.8 86 105.7 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 704.3 0.0 89333.9 67 98.3 99
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 641.3 0.0 82081.5 80 112.3 101
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 701.6 0.0 89747.9 51 101.1 101
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 693.0 0.0 88702.1 80 103.6 97
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 632.7 0.0 80991.8 80 112.0 99
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 709.0 0.0 90748.2 80 107.5 102
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 715.0 0.0 91523.0 80 104.7 101
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 650.1 0.0 83210.5 56 110.9 101
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 682.2 0.0 87319.1 57 107.9 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 719.0 0.0 92032.6 80 103.6 99
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 624.3 0.0 79905.8 80 110.5 97
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 696.5 0.0 89151.7 80 109.9 103
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 664.2 0.0 85017.6 77 109.9 101
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 681.7 0.0 87254.0 80 107.5 98
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 668.5 0.0 85569.3 57 109.9 99
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 682.3 0.0 87329.0 53 110.8 102
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 0.0 643.9 0.0 82420.9 77 104.8 101
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 107.5 457.1 13701.7 58471.3 57 106.0 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 220.9 253.9 28281.4 32498.9 54 108.8 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 290.6 277.9 37198.8 35576.1 65 94.3 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 309.3 267.9 39590.7 34295.9 80 89.5 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 213.6 302.0 27212.7 38562.0 24 93.5 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 232.1 224.3 29712.5 28339.8 31 117.4 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 262.9 249.4 33654.0 31928.1 47 81.4 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 232.2 229.2 29721.6 29340.5 50 78.5 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 222.8 229.4 28430.0 29362.7 42 85.9 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 231.5 246.5 29628.8 31555.9 6 72.9 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 261.7 256.8 33498.7 32769.1 33 83.9 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 262.7 260.7 33628.3 33279.4 35 85.5 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 234.0 249.1 29867.9 31883.1 18 90.9 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 252.1 239.8 32263.0 30581.4 32 91.2 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 241.5 257.5 30917.0 32961.1 16 69.5 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 257.9 243.5 33011.9 31164.2 32 86.8 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 237.5 235.6 30311.2 30046.9 31 67.4 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 290.4 213.1 37172.8 27277.0 79 65.3 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 216.4 284.3 27703.7 36392.5 42 95.4 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 223.8 248.2 28645.1 31774.4 16 69.4 89
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 294.0 217.7 37544.4 27864.2 64 68.0 110
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 210.7 245.6 26966.6 31439.8 59 107.4 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 228.5 265.2 29246.6 33940.5 10 99.2 98
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 279.1 218.4 35727.2 27955.0 52 71.9 102
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 232.3 293.4 29607.9 37521.4 14 93.2 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 299.5 236.6 38340.2 30288.8 79 69.7 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 216.3 268.9 27686.3 34417.3 4 90.5 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 285.8 261.0 36585.3 33409.5 53 84.6 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 228.5 232.5 29059.7 29661.1 48 74.3 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 242.7 262.4 31060.0 33588.2 27 69.9 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 248.2 252.2 31766.1 32149.3 8 78.9 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 267.9 230.2 34288.6 29462.8 62 68.5 100
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
ada0 238.0 266.2 30375.8 34075.6 0 95.4 100
As can be seen from the log above, at first the workload runs fine,
and the disk is only performing writes, but at some point (in this
case around 40% of completion) it starts performing this
read-before-write dance that completely screws up performance.
Roger.
More information about the freebsd-fs
mailing list