Kernel modules

Jason Bacon bacon4000 at gmail.com
Sat Apr 13 18:41:17 UTC 2019


On 2019-04-13 13:29, Justin Clift wrote:
> On 2019-04-13 23:52, Jason Bacon wrote:
> <snip>
>> Stability will take a long time to test properly.  I'm going to start
>> by rerunning some of our most I/O-intensive jobs on it - jobs that
>> actually broke our CentOS RAID servers until I switched them to NFS
>> over RDMA.
>
> That's got to be the first time anyone's ever mentioned "NFS over 
> RDMA" as
> increasing a systems' stability. :)
>
> + Justin

Believe it or not...  ;-)

After my upgrade from CentOS 6 to CentOS 7, NFS over TCP started falling 
apart under heavy load; servers and compute nodes becoming unresponsive 
and requiring a reboot to restore stability.

If it's due to problems in the CentOS TCP stack, NFS over RDMA would 
help by eliminating the TCP stack from the pathway.

One one cluster (old qlogic HCAs), setting net.core.netdev_budget=2000 
seems to have solved the issue.  On the other (newer Mellanox FDR HCAs), 
it did not seem to help, so I tried RDMA and it's been stable ever 
since.  Down side is we can no longer monitor traffic with iftop...

-- 
Earth is a beta site.



More information about the freebsd-infiniband mailing list