Strange IO performance with UFS

Konstantin Belousov kostikbel at gmail.com
Sat Jul 5 09:58:44 UTC 2014


On Sat, Jul 05, 2014 at 11:32:06AM +0200, Roger Pau Monn? wrote:
> On 04/07/14 23:19, Stefan Parvu wrote:
> > Hi,
> > 
> >>> I'm doing some tests on IO performance using fio, and I've found
> >>> something weird when using UFS and large files. I have the following
> >>> very simple sequential fio workload:
> > 
> > System:
> > FreeBSD ox 10.0-RELEASE-p6 FreeBSD 10.0-RELEASE-p6 #0: Tue Jun 24 07:47:37 UTC 2014     
> > root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
> > 
> > 
> > 1. Seq Write to 1 file, 10GB size, single writer, block 4k, UFS2:
> > 
> > I tried to write seq using a single writer using an IOSIZE similar to your example, 10
> > GB to a 14TB Hdw RAID 10 LSI device using fio 2.1.9 under FreeBSD 10.0. 
> > 
> > Result:
> > Run status group 0 (all jobs):
> >   WRITE: io=10240MB, aggrb=460993KB/s, minb=460993KB/s, maxb=460993KB/s, 
> >   mint=22746msec, maxt=22746msec
> 
> This looks much better than what I've saw in my benchmarks, how much
> memory does the system have?
> 
> In my case I've seen the reads issue when trying to write to files that
> where greater than the memory the system has. My box has 6GB of RAM and
> I was using a 10GB file.
> 
> > 
> > 
> > 2. Seq Write to 2500 files, each file 5MB size, multiple writers, UFS2:
> > 
> > Result:
> > Run status group 0 (all jobs):
> >   WRITE: io=12500MB, aggrb=167429KB/s, minb=334KB/s, maxb=9968KB/s, 
> >   mint=2568msec, maxt=76450msec
> > 
> > Questions:
> > 
> >  - where are you writing, what storage: hdw / sfw RAID ?
> 
> The storage is a simple SATA disk, no RAID:
> 
> pass0 at ahcich0 bus 0 scbus0 target 0 lun 0
> pass0: <ST500DM002-1BD142 KC45> ATA-8 SATA 3.x device
> pass0: Serial Number Z3T3FJXL
> pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> pass0: Command Queueing enabled
> 
> >  - are you using time based fio tests ?
> 
> I'm using the following fio workload, as stated in the first email:
> 
> [global]
> rw=write
> size=4g
> bs=4k
> 
> [job1]
> 
> The problem doesn't seem to be related to the hardware (I've also seen
> this when running inside of a VM), but to UFS itself that at some point
> (or maybe under certain conditions) starts making a lot of reads when
> doing a simple write:
> 
>               kernel`g_io_request+0x384
>               kernel`g_part_start+0x2c3
>               kernel`g_io_request+0x384
>               kernel`g_part_start+0x2c3
>               kernel`g_io_request+0x384
>               kernel`ufs_strategy+0x8a
>               kernel`VOP_STRATEGY_APV+0xf5
>               kernel`bufstrategy+0x46
>               kernel`cluster_read+0x5e6
>               kernel`ffs_balloc_ufs2+0x1be2
>               kernel`ffs_write+0x310
>               kernel`VOP_WRITE_APV+0x166
>               kernel`vn_write+0x2eb
>               kernel`vn_io_fault_doio+0x22
>               kernel`vn_io_fault1+0x78
>               kernel`vn_io_fault+0x173
>               kernel`dofilewrite+0x85
>               kernel`kern_writev+0x65
>               kernel`sys_write+0x63
> 
> This can also be seen by running iostat in parallel with the fio workload:
> 
> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> ada0     243.3 233.7 31053.3 29919.1   31  57.4 100
> 
> This clearly shows that even when I was doing a sequential write (the
> fio workload shown above), the disk was actually reading more data than
> writing it, which makes no sense, and all the reads come from the path
> trace shown above.

The backtrace above means that the BA_CLRBUF was specified for UFS_BALLOC().
In turns, this occurs when the write size is less than the UFS block size.
UFS has to read the block to ensure that partial write does not corrupt
the rest of the buffer.

You can get the block size for file with stat(2), st_blksize field of
the struct stat, or using statfs(2), field f_iosize of struct statfs,
or just looking at the dumpfs output for your filesystem, the bsize
value.  For modern UFS typical value is 32KB.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20140705/263e70b3/attachment.sig>


More information about the freebsd-fs mailing list