sysbench / fileio - Linux vs. FreeBSD

Sun Jun 6 17:20:22 UTC 2010

:All of these tests have been apples vs. oranges for years.
:
:The following seems to be true, though:
:
:a) FreeBSD sequential write performance in UFS has always been less than 
:optimal.

    If there's no read activity sequential write performance should be
    maximal with UFS.  The keyphrase here is "no read activity".

    UFS's main problem, easily demonstrated by running something like
    blogbench --iterations=100, is that read I/O is given such a huge
    precedence over write I/O it can cause the write I/O to come to
    a complete grinding halt once the system caches are blown out and
    the reads start having to go to disk.

    Another big issue with filesystem benchmarks is the data footprint
    size of the benchmark.  Many benchmarks do not have a sufficiently large
    data footprint and wind up simply testing how much memory the kernel
    is willing to give over to cache the benchmark's tests, instead of
    testing disk performance.  Bonnie++ is a really good example of the
    latter problem.

    That said, all the BSDs have stall issues with parallel read & write
    activity on the same file.  It essentially comes down to the vnode
    lock held during writes which can cause reads on the same file to
    stall even when those reads could be satisfied from the VM/BUF cache.

    Linux might appear to work better in such benchmarks because Linux
    essentially allows infininte write buffering, up to the point where
    system memory is exhausted, and the BSDs do not.  Infinite write
    buffering might make a benchmark look good but it creates horrible
    stalls and inconsistencies on production systems.

    I noticed that FreeBSD's ZFS implementation issues VOP_WRITE's
    with a shared lock instead of an exclusive lock, thus avoiding
    this particular problem.  It would be possible to do this with UFS
    too with some work to prevent file size changes from colliding during
    concurrent writes, or even using a separate serializer for
    modifying/write operations so read operations can continue to run
    concurrently.

    blogbench is a good way to test read/write interference during the
    system-cache phase of blogbench's operation (that would be the first
    500-800 or so blogs on a 4G system).  If working properly both read and
    write operations should be maximal during this phase.  That is, the
    disk should be 100% saturated with writes while all reads are still
    fully satisfiable from the buffer cache / VM system, and at the same
    time the read rate should not suffer (not be seen to stall).

    It would be interesting to see a blogbench comparison between UFS
    and ZFS on the same hw/disk.

						-Matt