when has a pNFS data server failed?
Rick Macklem
rmacklem at uoguelph.ca
Fri Aug 18 21:52:15 UTC 2017
This is kind of a "big picture" question that I thought I 'd throw out.
As a brief background, I now have the code for running mirrored pNFS Data Servers
working for normal operation. You can look at:
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
if you are interested in details related to the pNFS server code/testing.
So, now I am facing the interesting part:
1 - The Metadata Server (MDS) needs to decide that a mirrored DS has failed at some
point. Once that happens, it stops using the DS, etc.
--> This brings me to the question of "when should the MDS decide that the DS has
failed and should be taken offline?".
- I'm not up to date w.r.t. the TCP stack, so I'm not sure how long it will take for the
TCP connection to decide that a DS server is no longer working and fail the TCP
connection. I think it takes a fair amount of time, so I'm not sure if TCP connection
loss is a good indicator of DS server failure or not?
- It seems to me that the MDS should wait a fairly long time before failing the DS,
since this will have a major impact on the pNFS server, requiring repair/resilvering
by a sysadmin once it happens.
So, any comments or thoughts on this? rick
More information about the freebsd-fs
mailing list