NVMe performance 4x slower than expected

Jim Harris jim.harris at gmail.com
Wed Apr 1 23:24:53 UTC 2015


On Wed, Apr 1, 2015 at 3:04 PM, Tobias Oberstein <tobias.oberstein at gmail.com
> wrote:

>     Is this vmstat after the test ?
>>
>
> No, it wasn't (I ran vmstat hours after the test).
>
> Here is right after test (shortened test duration, otherwise exactly the
> same FIO config):
>
> https://github.com/oberstet/scratchbox/blob/master/
> freebsd/cruncher/results/freebsd_vmstat.md#nvd7
>
>      Somewhat funny is that nvme does not use MSI(X).
>>
>>
>> Yes - this is exactly the problem.
>>
>> nvme does use MSI-X if it can allocate the vectors (one per core).  With
>> 48 cores,
>> I suspect we are quickly running out of vectors, so NVMe is reverting to
>> INTx.
>>
>> Could you actually send vmstat -ia (I left off the 'a' previously) -
>> just so we can
>> see all allocated interrupt vectors.
>>
>> As an experiment, can you try disabling hyperthreading - this will
>> reduce the
>>
>
> The CPUs in this box
>
> root at s4l-zfs:~/src/sys/amd64/conf # sysctl hw.model
> hw.model: Intel(R) Xeon(R) CPU E7-8857 v2 @ 3.00GHz
>
> don't have hyperthreading (we deliberately selected CPU model for max.
> clock rather than HT)
>
> http://ark.intel.com/products/75254/Intel-Xeon-Processor-E7-
> 8857-v2-30M-Cache-3_00-GHz
>
>  number of cores and should let you get MSI-X vectors allocated for at
>> least
>> the first couple of NVMe controllers.  Then please re-run your performance
>> test on one of those controllers.
>>
>>
> You mean I should run against nvdN where N is a controller that still got
> MSI-X while other controllers did not?
>
> How would I find out which controller N? I don't know which nvdN is
> mounted in a PCIe slot directly assigned to which CPU socket, and I don't
> know which one's still got MSI-X and which not.
>

vmstat -ia should show you which controllers were assigned per-core vectors
- you'll see all of them in the irq256+ range instead of the single vector
per controller you see now in the lower irq index range.


>
> I could arrange for disabling all but 1 CPU and retest. Would that help?
>

Yes - that would help.  Depending on how your system is configured, and
which CPU socket the NVMe controllers are attached to, you may need to keep
2 CPU sockets enabled.

You can also try a debug tunable that is in the nvme driver.

hw.nvme.per_cpu_io_queues=0

This would just try to allocate a single MSI-X vector per controller - so
all cores would still share a single I/O queue pair, but it would be MSI-X
instead of INTx.  (This actually should be the first fallback if we cannot
allocate per-core vectors).  Would at least show we are able to allocate
some number of MSI-X vectors for NVMe.


>
> ===
>
> Right after running against nvd7
>
> irq56: nvme0                        6440          0
> ...
> irq106: nvme7                     145056          3
>
>
> Then, immediately thereafter, running against nvd0
>
> https://github.com/oberstet/scratchbox/blob/master/
> freebsd/cruncher/results/freebsd_vmstat.md#nvd0
>
> irq56: nvme0                        9233          0
> ...
> irq106: nvme7                     145056          3
>
> ===
>
> Earlier this day, I ran multiple longer tests .. all against nvd7. So if
> these are cumulative numbers since last boot, that would make sense.
>
>


More information about the freebsd-hackers mailing list