Strange IO performance with UFS

Konstantin Belousov kostikbel at gmail.com
Sun Jul 6 07:17:46 UTC 2014


On Sat, Jul 05, 2014 at 05:34:48PM -0700, Adrian Chadd wrote:
> Hm, wait a sec. So if the IO size is a multiple of the underlying FS
> block size, it should be okay?
It was already suggested in the thread, and confirmed later.
Did you read the mails ?

> 
> 
> -a
> 
> 
> On 5 July 2014 12:58, Konstantin Belousov <kostikbel at gmail.com> wrote:
> > On Sat, Jul 05, 2014 at 06:18:07PM +0200, Roger Pau Monn? wrote:
> >> On 05/07/14 13:24, Konstantin Belousov wrote:
> >> > On Sat, Jul 05, 2014 at 12:35:11PM +0200, Roger Pau Monn? wrote:
> >> >> On 05/07/14 11:58, Konstantin Belousov wrote:
> >> >>> On Sat, Jul 05, 2014 at 11:32:06AM +0200, Roger Pau Monn?
> >> >>> wrote:
> >> >>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3
> >> >>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3
> >> >>>> kernel`g_io_request+0x384 kernel`ufs_strategy+0x8a
> >> >>>> kernel`VOP_STRATEGY_APV+0xf5 kernel`bufstrategy+0x46
> >> >>>> kernel`cluster_read+0x5e6 kernel`ffs_balloc_ufs2+0x1be2
> >> >>>> kernel`ffs_write+0x310 kernel`VOP_WRITE_APV+0x166
> >> >>>> kernel`vn_write+0x2eb kernel`vn_io_fault_doio+0x22
> >> >>>> kernel`vn_io_fault1+0x78 kernel`vn_io_fault+0x173
> >> >>>> kernel`dofilewrite+0x85 kernel`kern_writev+0x65
> >> >>>> kernel`sys_write+0x63
> >> >>>>
> >> >>>> This can also be seen by running iostat in parallel with the
> >> >>>> fio workload:
> >> >>>>
> >> >>>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b ada0
> >> >>>> 243.3 233.7 31053.3 29919.1   31  57.4 100
> >> >>>>
> >> >>>> This clearly shows that even when I was doing a sequential
> >> >>>> write (the fio workload shown above), the disk was actually
> >> >>>> reading more data than writing it, which makes no sense, and
> >> >>>> all the reads come from the path trace shown above.
> >> >>>
> >> >>> The backtrace above means that the BA_CLRBUF was specified for
> >> >>> UFS_BALLOC(). In turns, this occurs when the write size is
> >> >>> less than the UFS block size. UFS has to read the block to
> >> >>> ensure that partial write does not corrupt the rest of the
> >> >>> buffer.
> >> >>
> >> >> Thanks for the clarification, that makes sense. I'm not opening
> >> >> the file with O_DIRECT, so shouldn't the write be cached in
> >> >> memory and flushed to disk when we have the full block? It's a
> >> >> sequential write, so the whole block is going to be rewritten
> >> >> very soon.
> >> >>
> >> >>>
> >> >>> You can get the block size for file with stat(2), st_blksize
> >> >>> field of the struct stat, or using statfs(2), field f_iosize of
> >> >>> struct statfs, or just looking at the dumpfs output for your
> >> >>> filesystem, the bsize value.  For modern UFS typical value is
> >> >>> 32KB.
> >> >>
> >> >> Yes, block size is 32KB, checked with dumpfs. I've changed the
> >> >> block size in fio to 32k and then I get the expected results in
> >> >> iostat and fio:
> >> >>
> >> >> extended device statistics device     r/s   w/s    kr/s    kw/s
> >> >> qlen svc_t  %b ada0       1.0 658.2    31.1 84245.1   58 108.4
> >> >> 101 extended device statistics device     r/s   w/s    kr/s
> >> >> kw/s qlen svc_t  %b ada0       0.0 689.8     0.0 88291.4   54
> >> >> 112.1  99 extended device statistics device     r/s   w/s    kr/s
> >> >> kw/s qlen svc_t  %b ada0       1.0 593.3    30.6 75936.9   80
> >> >> 111.7  97
> >> >>
> >> >> write: io=10240MB, bw=81704KB/s, iops=2553, runt=128339msec
> >> >
> >> > The current code in ffs_write() only avoids read before write when
> >> > write covers complete block.  I think we can somewhat loose the
> >> > test to also avoid read when we are at EOF and write covers
> >> > completely the valid portion of the last block.
> >> >
> >> > This leaves the unwritten portion of the block with the garbage. I
> >> > believe that it is not harmful, since the only way for usermode to
> >> > access that garbage is through the mmap(2). The
> >> > vnode_generic_getpages() zeroes out parts of the page which are
> >> > after EOF.
> >> >
> >> > Try this, almost completely untested:
> >>
> >> Doesn't seem to help much, I'm still seeing the same issue. I'm
> >> sampling iostat every 1s, and here's the output form the start of the
> >> 4k block fio workload:
> >>
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 349.5     0.0 44612.3   48  88.0  52
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 655.4     0.0 83773.6   76  99.8 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 699.2     0.0 89493.1   59 109.4 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 628.1     0.0 80392.6   55 114.8  98
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 655.7     0.0 83799.6   79  98.4 102
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 701.4     0.0 89782.0   80 105.5  97
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 697.9     0.0 89331.6   78 112.0 103
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 714.1     0.0 91408.7   77 110.3 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 724.0     0.0 92675.0   67 112.5 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 700.4     0.0 89646.6   49 102.5 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 686.4     0.0 87857.2   78 110.0 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 702.0     0.0 89851.6   80 112.9  97
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 736.3     0.0 94246.4   67 110.1 103
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 624.6     0.0 79950.0   48 115.7 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 704.0     0.0 90118.4   77 106.1 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 714.6     0.0 91470.0   80 103.6  99
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 710.4     0.0 90926.1   80 111.1 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 655.3     0.0 83882.1   70 115.8  99
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 539.8     0.0 69094.5   80 121.2 101
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 711.6     0.0 91087.6   79 107.9 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 705.5     0.0 90304.5   81 111.3  97
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 727.3     0.0 93092.8   81 108.9 102
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 699.5     0.0 89296.4   55 109.0 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 689.0     0.0 88066.1   78  96.6 101
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 738.3     0.0 94496.1   56 109.1 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 615.4     0.0 78770.0   80 112.3  98
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 707.3     0.0 90529.8   86 105.7 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 704.3     0.0 89333.9   67  98.3  99
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 641.3     0.0 82081.5   80 112.3 101
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 701.6     0.0 89747.9   51 101.1 101
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 693.0     0.0 88702.1   80 103.6  97
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 632.7     0.0 80991.8   80 112.0  99
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 709.0     0.0 90748.2   80 107.5 102
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 715.0     0.0 91523.0   80 104.7 101
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 650.1     0.0 83210.5   56 110.9 101
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 682.2     0.0 87319.1   57 107.9 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 719.0     0.0 92032.6   80 103.6  99
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 624.3     0.0 79905.8   80 110.5  97
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 696.5     0.0 89151.7   80 109.9 103
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 664.2     0.0 85017.6   77 109.9 101
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 681.7     0.0 87254.0   80 107.5  98
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 668.5     0.0 85569.3   57 109.9  99
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 682.3     0.0 87329.0   53 110.8 102
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0       0.0 643.9     0.0 82420.9   77 104.8 101
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     107.5 457.1 13701.7 58471.3   57 106.0 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     220.9 253.9 28281.4 32498.9   54 108.8 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     290.6 277.9 37198.8 35576.1   65  94.3 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     309.3 267.9 39590.7 34295.9   80  89.5 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     213.6 302.0 27212.7 38562.0   24  93.5 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     232.1 224.3 29712.5 28339.8   31 117.4 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     262.9 249.4 33654.0 31928.1   47  81.4 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     232.2 229.2 29721.6 29340.5   50  78.5 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     222.8 229.4 28430.0 29362.7   42  85.9 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     231.5 246.5 29628.8 31555.9    6  72.9 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     261.7 256.8 33498.7 32769.1   33  83.9 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     262.7 260.7 33628.3 33279.4   35  85.5 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     234.0 249.1 29867.9 31883.1   18  90.9 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     252.1 239.8 32263.0 30581.4   32  91.2 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     241.5 257.5 30917.0 32961.1   16  69.5 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     257.9 243.5 33011.9 31164.2   32  86.8 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     237.5 235.6 30311.2 30046.9   31  67.4 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     290.4 213.1 37172.8 27277.0   79  65.3 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     216.4 284.3 27703.7 36392.5   42  95.4 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     223.8 248.2 28645.1 31774.4   16  69.4  89
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     294.0 217.7 37544.4 27864.2   64  68.0 110
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     210.7 245.6 26966.6 31439.8   59 107.4 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     228.5 265.2 29246.6 33940.5   10  99.2  98
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     279.1 218.4 35727.2 27955.0   52  71.9 102
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     232.3 293.4 29607.9 37521.4   14  93.2 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     299.5 236.6 38340.2 30288.8   79  69.7 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     216.3 268.9 27686.3 34417.3    4  90.5 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     285.8 261.0 36585.3 33409.5   53  84.6 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     228.5 232.5 29059.7 29661.1   48  74.3 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     242.7 262.4 31060.0 33588.2   27  69.9 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     248.2 252.2 31766.1 32149.3    8  78.9 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     267.9 230.2 34288.6 29462.8   62  68.5 100
> >>                         extended device statistics
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >> ada0     238.0 266.2 30375.8 34075.6    0  95.4 100
> >>
> >> As can be seen from the log above, at first the workload runs fine,
> >> and the disk is only performing writes, but at some point (in this
> >> case around 40% of completion) it starts performing this
> >> read-before-write dance that completely screws up performance.
> >
> > I reproduced this locally.  I think my patch is useless for the fio/4k write
> > situation.
> >
> > What happens is indeed related to the amount of the available memory.
> > When the size of the file written by fio is larger than the memory,
> > system has to recycle the cached pages.  So after some moment, doing
> > a write has to do read-before-write, and this occurs not at the EOF
> > (since fio pre-allocated the job file).
> >
> > In fact, I used 10G file on 8G machine, but I interrupted the fio
> > before it finish the job.  The longer the previous job runs, the longer
> > is time for which new job does not issue reads.  If I allow the job to
> > completely fill the cache, then the reads starts immediately on the next
> > job run.
> >
> > I do not see how could anything be changed there, if we want to keep
> > user file content on partial block writes, and we do.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20140706/9c4c5ed5/attachment.sig>


More information about the freebsd-fs mailing list