Major issues with nfsv4

J David j.david.lists at gmail.com
Mon Dec 14 15:21:43 UTC 2020


TLDR: The values of OpenOwner and Opens have a statistically
significant correlation to the passage of time and are statistically
independent of the number of currently running jobs (jails),
processes, or threads.

3,173 samples were collected over approximately twelve hours,
containing the following values (five number summary in parenthesis:
min 1Q median 3Q max):

- nfsstat -E -c OpenOwner (137 1405 2380 3541 4693)
- nfsstat -E -c Opens (49 10479 18229 27732 36589)
- # of active Jobs (1 50 50 50 51)
- # of Job processes (1 117 117 117 121)
- # of Job threads (1 519 521 525 533)
- # of nfscl Threads (48 53 53 53 55)
- Total # of processes on system (149 260 261 264 280)
- Total # of threads on system (481 996 1001 1005 1023)

OpenOwner and Opens are the dependent variables. The remaining values
and the sample sequence number (N) are independent variables.

The following table shows the adjusted R-squared values of linear
regressions using each combination of the independent and dependent
variables. While R-squared is not always the best measure of goodness
of fit, it is easy to understand, and given the type of data and the
relationship sought, its use here is both accurate and illustrative.

                 OpenOwner        Opens
N                0.9369           0.9310
NTestEnd*        0.9962           0.9979
Jobs             0.2461           0.0324
JobProcs         0.0225           0.0285
JobThreads       0.0921           0.1060
NfsclThreads     0.0072           0.0000
SysProcs         0.0325           0.0376
SysThreads       0.1003           0.1145

*Because the test ended at sample 3156, NTestEnd reflects the
regressions of OpenOwner and Opens vs. sample sequence number for only
sample 1 - 3156.

The results strongly indicate that both OpenOwner and Opens are highly
correlated with time. No other regression demonstrates a statistically
significant correlation. Opens and OpenOwner are also highly
correlated to each other (adjusted R-squared = 0.9957).

The high correlation and strong linear relationship with time suggests
this is caused by something that is both roughly constant over time
and largely independent of system activity measures based on process
counts.

It may be worth re-doing this test, capturing the rest of "nfsstat -E
-c stats" about operations as well as counts of open files.  Finding a
strong correlation might help narrow down the causal action, which
would hopefully make it possible to independently reproduce and/or fix
this.

Couple of questions around that:

1) Is there a way to get the total number of currently-open files more
efficiently than enumerating them?  (E.g., "fstat | wc -l" and "fstat
-m | wc -l" are slow and resource-intensive.)

2) If so, is there a way to do that on a per-process basis?

Thanks!


More information about the freebsd-fs mailing list