bacon4000 at gmail.com
Sun Apr 14 18:47:52 UTC 2019
On 2019-04-14 12:54, Rodney W. Grimes wrote:
>> On 2019-04-13 13:29, Justin Clift wrote:
>>> On 2019-04-13 23:52, Jason Bacon wrote:
>>>> Stability will take a long time to test properly.? I'm going to start
>>>> by rerunning some of our most I/O-intensive jobs on it - jobs that
>>>> actually broke our CentOS RAID servers until I switched them to NFS
>>>> over RDMA.
>>> That's got to be the first time anyone's ever mentioned "NFS over
>>> RDMA" as
>>> increasing a systems' stability. :)
>>> + Justin
>> Believe it or not...? ;-)
>> After my upgrade from CentOS 6 to CentOS 7, NFS over TCP started falling
>> apart under heavy load; servers and compute nodes becoming unresponsive
>> and requiring a reboot to restore stability.
>> If it's due to problems in the CentOS TCP stack, NFS over RDMA would
>> help by eliminating the TCP stack from the pathway.
> Any idea what happened in the CentOS TCP stack between 6 and 7?
Not really - I don't have time do deep dive into such specifics given
the number of hats I have to wear. I can only say that for our
particular use case, CentOS 7 is generally more complicated, slower and
slightly less reliable than 6 (which actually served us well for
years). I hit a few pitfalls following the upgrades, but I found my way
around them and our clusters are pretty stable now.
Earth is a beta site.
More information about the freebsd-infiniband