NVMe performance 4x slower than expected
Jim Harris
jim.harris at gmail.com
Wed Apr 1 23:24:53 UTC 2015
On Wed, Apr 1, 2015 at 3:04 PM, Tobias Oberstein <tobias.oberstein at gmail.com
> wrote:
> Is this vmstat after the test ?
>>
>
> No, it wasn't (I ran vmstat hours after the test).
>
> Here is right after test (shortened test duration, otherwise exactly the
> same FIO config):
>
> https://github.com/oberstet/scratchbox/blob/master/
> freebsd/cruncher/results/freebsd_vmstat.md#nvd7
>
> Somewhat funny is that nvme does not use MSI(X).
>>
>>
>> Yes - this is exactly the problem.
>>
>> nvme does use MSI-X if it can allocate the vectors (one per core). With
>> 48 cores,
>> I suspect we are quickly running out of vectors, so NVMe is reverting to
>> INTx.
>>
>> Could you actually send vmstat -ia (I left off the 'a' previously) -
>> just so we can
>> see all allocated interrupt vectors.
>>
>> As an experiment, can you try disabling hyperthreading - this will
>> reduce the
>>
>
> The CPUs in this box
>
> root at s4l-zfs:~/src/sys/amd64/conf # sysctl hw.model
> hw.model: Intel(R) Xeon(R) CPU E7-8857 v2 @ 3.00GHz
>
> don't have hyperthreading (we deliberately selected CPU model for max.
> clock rather than HT)
>
> http://ark.intel.com/products/75254/Intel-Xeon-Processor-E7-
> 8857-v2-30M-Cache-3_00-GHz
>
> number of cores and should let you get MSI-X vectors allocated for at
>> least
>> the first couple of NVMe controllers. Then please re-run your performance
>> test on one of those controllers.
>>
>>
> You mean I should run against nvdN where N is a controller that still got
> MSI-X while other controllers did not?
>
> How would I find out which controller N? I don't know which nvdN is
> mounted in a PCIe slot directly assigned to which CPU socket, and I don't
> know which one's still got MSI-X and which not.
>
vmstat -ia should show you which controllers were assigned per-core vectors
- you'll see all of them in the irq256+ range instead of the single vector
per controller you see now in the lower irq index range.
>
> I could arrange for disabling all but 1 CPU and retest. Would that help?
>
Yes - that would help. Depending on how your system is configured, and
which CPU socket the NVMe controllers are attached to, you may need to keep
2 CPU sockets enabled.
You can also try a debug tunable that is in the nvme driver.
hw.nvme.per_cpu_io_queues=0
This would just try to allocate a single MSI-X vector per controller - so
all cores would still share a single I/O queue pair, but it would be MSI-X
instead of INTx. (This actually should be the first fallback if we cannot
allocate per-core vectors). Would at least show we are able to allocate
some number of MSI-X vectors for NVMe.
>
> ===
>
> Right after running against nvd7
>
> irq56: nvme0 6440 0
> ...
> irq106: nvme7 145056 3
>
>
> Then, immediately thereafter, running against nvd0
>
> https://github.com/oberstet/scratchbox/blob/master/
> freebsd/cruncher/results/freebsd_vmstat.md#nvd0
>
> irq56: nvme0 9233 0
> ...
> irq106: nvme7 145056 3
>
> ===
>
> Earlier this day, I ran multiple longer tests .. all against nvd7. So if
> these are cumulative numbers since last boot, that would make sense.
>
>
More information about the freebsd-hackers
mailing list