releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
Rick Macklem
rmacklem at uoguelph.ca
Sat May 22 00:56:10 UTC 2021
Mark Millard wrote:
[stuff snipped]
>Well, why is it that ls -R, find, and diff -r all get file
>name problems via genet0 but diff -r gets no problems
>comparing the content of files that it does match up (the
>vast majority)? Any clue how could the problems possibly
>be unique to the handling of file names/paths? Does it
>suggest anything else to look into for getting some more
>potentially useful evidence?
Well, all I can do is describe the most common TSO related
failure:
- When a read RPC reply (including NFS/RPC/TCP/IP headers)
is slightly less than 64K bytes (many TSO implementations are
limited to 64K or 32 discontiguous segments, think 32 2K
mbuf clusters), the driver decides it is ok, but when the MAC
header is added it exceeds what the hardware can handle correctly...
--> This will happen when reading a regular file that is slightly less
than a multiple of 64K in size.
or
--> This will happen when reading just about any large directory,
since the directory reply for a 64K request is converted to Sun XDR
format and clipped at the last full directory entry that will fit within 64K.
For ports, where most files are small, I think you can tell which is more
likely to happen.
--> If TSO is disabled, I have no idea how this might matter, but??
>I'll note that netstat -I ue0 -d and netstat -I genet0 -d
>do not report changes in Ierrs or Idrop in a before vs.
>after failures comparison. (There may be better figures
>to look at for all I know.)
>
>I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
>and got no obvious change in behavior.
All we know is that the data is getting corrupted somehow.
NFS traffic looks very different than typical TCP traffic. It is
mostly small messages travelling in both directions concurrently,
with some large messages thrown in the mix.
All I'm saying is that, testing a net interface with something like
bulk data transfer in one direction doesn't verify it works for NFS
traffic.
Also, the large RPC messages are a chain of about 33 mbufs of
various lengths, including a mix of partial clusters and regular
data mbufs, whereas a bulk send on a socket will typically
result in an mbuf chain of a lot of full 2K clusters.
--> As such, NFS can be good at tickling subtle bugs it the
net driver related to mbuf handling.
rick
> W.r.t. reverting r367492...the patch to replace r367492 was just
> committed to "main" by rscheff@ with a two week MFC, so it
> should be in stable/13 soon. Not sure if an errata can be done
> for it for releng13.0?
That update is reported to be causing "rack" related panics:
https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html
reports (via links):
panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632
Still, I have a non-debug update to main building and will
likely do a debug build as well. llvm is rebuilding, so
the builds will take a notable time.
> Thanks for isolating this, rick
> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy.
I'll warn that the primary "small arm" development/support
folk(s) do not work on the RPi*'s these days, beyond
committing what others provide and the like.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-stable
mailing list