Rodney W. Grimes
freebsd-rwg at gndrsh.dnsmgr.net
Sun Apr 14 17:55:32 UTC 2019
> On 2019-04-13 13:29, Justin Clift wrote:
> > On 2019-04-13 23:52, Jason Bacon wrote:
> > <snip>
> >> Stability will take a long time to test properly.? I'm going to start
> >> by rerunning some of our most I/O-intensive jobs on it - jobs that
> >> actually broke our CentOS RAID servers until I switched them to NFS
> >> over RDMA.
> > That's got to be the first time anyone's ever mentioned "NFS over
> > RDMA" as
> > increasing a systems' stability. :)
> > + Justin
> Believe it or not...? ;-)
> After my upgrade from CentOS 6 to CentOS 7, NFS over TCP started falling
> apart under heavy load; servers and compute nodes becoming unresponsive
> and requiring a reboot to restore stability.
> If it's due to problems in the CentOS TCP stack, NFS over RDMA would
> help by eliminating the TCP stack from the pathway.
Any idea what happened in the CentOS TCP stack between 6 and 7?
> One one cluster (old qlogic HCAs), setting net.core.netdev_budget=2000
> seems to have solved the issue.? On the other (newer Mellanox FDR HCAs),
> it did not seem to help, so I tried RDMA and it's been stable ever
> since.? Down side is we can no longer monitor traffic with iftop...
Rod Grimes rgrimes at freebsd.org
More information about the freebsd-infiniband