Weak disk I/O performance on daX compared to adaX, was: Re: dd performance [was Re: Call for Foundation-supported Project Ideas]

From: Stefan Blachmann <sblachmann_at_gmail.com>
Date: Tue, 14 Dec 2021 03:06:50 UTC
I am wondering what could be the cause for the weak disk I/O
performance on FreeBSD when using daX drivers instead of adaX drivers.

Explanation:
The HP Z420 has 6 SATA ports.
SATA drives that get connected to port #1 to #4 are being shown as daX
drives on FreeBSD.
Drives connected to ports 5 and 6 appear as adaX drives.

> On 12/2/21, Alan Somers <asomers@freebsd.org> wrote:
>> That is your problem then.  The default value for dd if 512B.  If it
>> took 3 days to erase a 2 TB HDD, that means you were writing 15,000
>> IOPs.  Frankly, I'm impressed that the SATA bus could handle that

This shows that on the ada driver, the disk I/O performance is acceptable.
However, after 14 days dd is still working on the same type drive on
connector 4 (da3).

So my questions:
- Why does FreeBSD use the da driver instead of the ada driver for
drives on SATA ports 1-4?
- And, why is the da driver so slow? (For example, on HP Z800 when
used with FreeBSD, 15k SAS drives seem as slow as normal consumer
drives, while on Linux disk I/O is just snappy.)
- Is there a way to configure FreeBSD to use the ada driver instead of
the da driver, so using FreeBSD is still an alternative to Linux if
disk speed matters?
- Or is it impossible to use the ada drivers on SATA connectors 1-4
for maybe some HP Z420 hardware related reasons?

Cheers,
Stefan


On 12/2/21, Stefan Blachmann <sblachmann@gmail.com> wrote:
> Ah, the buffer cache! Didn't think of that.
> Top shows the weighted cpu load is about 4%, so your guess that it was
> the SATA scheduler might be correct.
> Will try this on Linux the next days using conv=direct with a pair of
> identical HDDs.
> Already curious for the results.
>
>
>
> On 12/2/21, Alan Somers <asomers@freebsd.org> wrote:
>> That is your problem then.  The default value for dd if 512B.  If it
>> took 3 days to erase a 2 TB HDD, that means you were writing 15,000
>> IOPs.  Frankly, I'm impressed that the SATA bus could handle that
>> many.  By using such a small block size, you were doing an excellent
>> job of exercising the SATA bus and the HDD's host interface, but its
>> servo and write head were mostly just idle.
>>
>> The reason why Linux is different is because unlike FreeBSD it has a
>> buffer cache.  Even though dd was writing with 512B blocks, those
>> writes probably got combined by the buffer cache before going to SATA.
>> However, if you use the conv=direct option with dd, then they probably
>> won't be combined.  I haven't tested this; it's just a guess.  You can
>> probably verify using iostat.
>>
>> When you were trying to erase two HDDs concurrently but only one was
>> getting all of the IOPs and CPU time, was your CPU saturated?  I'm
>> guessing not.  On my machine, with a similar HDD, dd only consumes 10%
>> of the CPU when I write zeros with a 512B block size.  I need to use a
>> 16k block size or larger to get the IOPs under 10,000.  So I'm
>> guessing that in your case the CPU scheduler was working just fine,
>> but the SATA bus was saturated, and the SATA scheduler was the source
>> of the unfairness.
>> -Alan
>>
>> On Thu, Dec 2, 2021 at 10:37 AM Stefan Blachmann <sblachmann@gmail.com>
>> wrote:
>>>
>>> I intentionally used dd without the bs parameter, as I do care less
>>> about "maximum speed" than clearing the drives completely and also do
>>> a lot of I/O transactions.
>>> The latter because drives that are becoming unreliable tend to
>>> occasionally throw errors, and the more I/O transactions one does the
>>> better the chance is to spot this kind of drives.
>>>
>>> The system is a HP Z420, the mainboard/chipset/controller specs can be
>>> found in the web.
>>> The drives in question here (quite old) 2TB WD Black enterprise grade
>>> 3.5" SATA drives. Their SMART data is good, not hinting at any
>>> problems.
>>>
>>> On Linux, erasing them both concurrently finished at almost the same
>>> time.
>>> Thus I do not really understand why on FreeBSD this is so much
>>> different.
>>>
>>> On 12/2/21, Alan Somers <asomers@freebsd.org> wrote:
>>> > This is very surprising to me.  I never see dd take significant CPU
>>> > consumption until the speed gets up into the GB/s range.  What are you
>>> > using for the bs= option?  If you set that too low, or use the
>>> > default, it will needlessly consume extra CPU and IOPs.  I usually set
>>> > it to 1m for this kind of usage.  And what kind of HDDs are these,
>>> > connected to what kind of controller?
>>> >
>>> > On Thu, Dec 2, 2021 at 9:54 AM Stefan Blachmann <sblachmann@gmail.com>
>>> > wrote:
>>> >>
>>> >> Regarding the suggestions to either improve or replace the ULE
>>> >> scheduler, I would like to share another observation.
>>> >>
>>> >> Usually when I need to zero out HDDs using dd, I use a live Linux.
>>> >> This time I did that on FreeBSD (13).
>>> >> My observations:
>>> >> - On the same hardware, the data transfer rate is a small fraction
>>> >> (about 1/4th) of which is achieved by Linux.
>>> >> - The first dd process, which erases the first HDD, gets almost all
>>> >> CPU and I/O time. The second process which does the second HDD is
>>> >> getting starved. It actually really starts only after the first one
>>> >> finished.
>>> >>
>>> >> To me it was *very* surprising to find out that, while erasing two
>>> >> similar HDDs concurrently takes about one day on Linux, on FreeBSD,
>>> >> the first HDD was finished after three days, and only after that the
>>> >> remaining second dd process got the same CPU time, making it proceed
>>> >> fast instead of creepingly slowly.
>>> >>
>>> >> So I guess this might be a scheduler issue.
>>> >> I certainly will do some tests using the old scheduler when I got
>>> >> time.
>>> >> And, I ask myself:
>>> >> Could it be a good idea to sponsor porting the Dragonfly scheduler to
>>> >> FreeBSD?
>>> >>
>>> >> On 12/2/21, Johannes Totz <jo@bruelltuete.com> wrote:
>>> >> > On 29/11/2021 03:17, Ed Maste wrote:
>>> >> >> On Sun, 28 Nov 2021 at 19:37, Steve Kargl
>>> >> >> <sgk@troutmask.apl.washington.edu> wrote:
>>> >> >>>
>>> >> >>> It's certainly not the latest and greatest,
>>> >> >>> CPU: Intel(R) Core(TM)2 Duo CPU     T7250  @ 2.00GHz (1995.04-MHz
>>> >> >>> K8-class CPU)
>>> >> >>
>>> >> >> If you're content to use a compiler from a package you can save a
>>> >> >> lot
>>> >> >> of time by building with `CROSS_TOOLCHAIN=llvm13` and
>>> >> >> `WITHOUT_TOOLCHAIN=yes`. Or, instead of WITHOUT_TOOLCHAIN perhaps
>>> >> >> `WITHOUT_CLANG=yes`, `WITHOUT_LLD=yes` and `WITHOUT_LLDB=yes`.
>>> >> >
>>> >> > (re-send to list, sorry)
>>> >> > Can we disconnect the compiler optimisation flag for base and
>>> >> > clang?
>>> >> > I
>>> >> > don't need the compiler to be build with -O2 but I want the
>>> >> > resulting
>>> >> > base system to have optimisations enabled.
>>> >> > Right now, looks like both get -O2 and a lot of time is spent on
>>> >> > optimising the compiler (for no good reason).
>>> >> >
>>> >> >
>>> >>
>>> >
>>
>