releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

Mark Millard marklmi at yahoo.com
Fri May 21 04:40:27 UTC 2021


[main test example and main/releng/13 mixed example]

On 2021-May-20, at 20:36, Mark Millard <marklmi at yahoo.com> wrote:

> [stable/13 test: example ends up being odder. That might
> allow eliminating some potential alternatives.]
> 
> On 2021-May-20, at 19:38, Mark Millard <marklmi at yahoo.com> wrote:
>> 
>> On 2021-May-20, at 18:09, Rick Macklem <rmacklem at uoguelph.ca> wrote:
>>> 
>>> Oh, one additional thing that I'll dare to top post...
>>> r367492 broke the TCP upcalls that the NFS server uses, such
>>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
>>> This has not yet been resolved in "main" etc and could explain
>>> why an RPC could time out for a soft mount.
>> 
>> See later notes that I added: soft mount is not required
>> to see the problem.
>> 
>>> You can revert the patch in r367492 to avoid the problem.
>> 
>> If I understand right, you are indicating that this would
>> not apply to the non-soft mount case that I got.
>> 
>>> Disabling TSO, LRO are also de-facto standard things to do when
>>> you observe weird NFS  behaviour, because they are often broken
>>> in various network device drivers.
>> 
>> I'll have to figure out how to experiment with such. Things
>> are at defaults rather generally on the systems. I'm not
>> literate in the subject areas.
>> 
>> I'm the only user of the machines and network. It is not
>> outward facing. It is a rather small EtherNet network.
>> 
>>> rick
>>> 
>>> ________________________________________
>>> From: owner-freebsd-stable at freebsd.org <owner-freebsd-stable at freebsd.org> on behalf of Rick Macklem <rmacklem at uoguelph.ca>
>>> Sent: Thursday, May 20, 2021 8:55 PM
>>> To: FreeBSD-STABLE Mailing List; Mark Millard
>>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
>>> 
>>> Mark Millard wrote:
>>>> [I warn that I'm a fairly minimal user of NFS
>>>> mounts, not knowing all that much. I'm mostly
>>>> reporting this in case it ends up as evidence
>>>> via eventually matching up with others observing
>>>> possibly related oddities.]
>>>> 
>>>> I got the following odd sequence (that I've
>>>> mixed notes into). It involved a diff -r over NFS
>>>> showing differences (files missing) and then a
>>>> later diff finding matches for the same files,
>>>> no file system changes made on either machine.
>>>> I'm unable to reproduce the oddity on demand.
>>>> 
>>>> Note: A larger scope diff -r originally returned the
>>>> below as well, but doing the narrower diff -r did
>>>> repeat the result and that is what I show. (I
>>>> make no use of devel/ice .)
>>>> 
>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>> . . .
>>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>>> 
>>>> Note: The above was not expected. So I tried:
>>>> 
>>>> # ls -Tld /mnt/devel/ice/files/*
>>>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 /mnt/devel/ice/files/Make.rules.FreeBSD
>> . . .
>>>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 /mnt/devel/ice/files/patch-scripts-TestUtil.py
>>>> 
>>>> Note: So that indicated that the files were there on the
>>>> machine that /mnt references. So attempting the original
>>>> diff -r again:
>>>> 
>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>> #
>>>> 
>>>> (Empty difference.)
>>>> 
>>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>>>> the odd result of the diff -r no longer happened: no
>>>> differences reported.
>>>> 
>>>> 
>>>> 
>>>> For reference (both machines reported):
>>>> 
>>>> . . .
>>>> The original mount command was on CA72_16Gp_ZFS:
>>>> 
>>>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/
>>> The likely explanation for this is your use of a "soft" mount.
>>> - If the NFS server is slow to respond or there is a temporary network issue,
>>> the RPC request can time out and then the
>>> syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the
>>>  readdir(3) libc functions expect syscalls to fail this way...
>>>  Then the cached directory is messed up.
>>>  Doing the "ls" read the directory again and fixed the problem.
>>> 
>>> Try to reproduce it for a mount without the "soft" option.
>>> (If a mount point is hung, due to an unresponsive server "umount -N /mnt"
>>> can usually get rid of it.)
>>> Personally, I thought "soft" was a bad idea when Sun introduced it in NFS in 1985
>>> and I still feel that way.
>>> --> If you can reproduce it without "soft" then I can't explain it.
>>>    To be honest, the directory reading/caching code in the NFSv3 client
>>>    hasn't changed significantly in literally decades, as far as I can remember.
>> 
>> Well . . . trying an even wider scope diff than
>> the original . . .
>> 
>> # umount /mnt/
>> # mount -onoatime 192.168.1.170:/usr/ports/ /mnt/
>> # diff -r /usr/ports/ /mnt/ | more
>> Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_
>> Only in /usr/ports/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src25.cpp
>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>> Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules
>> Only in /usr/ports/devel/ice/files: patch-cpp-Makefile
>> . . .
>> Only in /usr/ports/devel/ice/files: patch-python-test-Slice-unicodePaths-run.py
>> Only in /usr/ports/devel/ice/files: patch-scripts-Expect.py
>> Only in /usr/ports/devel/ice/files: patch-scripts-IceGridAdmin.py
>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>> 
>> So the devel/ice files showed up again.
>> 
>> But 2 other lines show up, one finding a file supposedly only
>> on /mnt/. . .
>> 
>> QUOTE
>> Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_
>> END QUOTE
>> 
>> That seems to be a truncated file name. Looking directly on the machine that
>> /mnt/ references (hitting tab at the end of the partial name to show a
>> list):
>> 
>> # ls -Tld /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_
>> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_gen-config.sh                                              
>> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_js-confdefs.h              
>> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src0.cpp  
>> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src1.cpp  
>> . . .  
>> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src9.cpp  
>> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_include_js-config.h            
>> 
>> The other machine agrees (machine-local usage).
>> 
>> The other of the 2 new names is one of the matches to the prefix:
>> 
>> QUOTE
>> Only in /usr/ports/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src25.cpp
>> END QUOTE
>> 
>> For reference: I've not gotten any console messages about
>> anything during these.
>> 
>>> One additional thing to note is that cached directory contents are invalidated
>>> when the directory's ctime changes.
>> 
>> I'm not aware of anything that should have been touching the
>> /usr/ports file systems on either machine any time near my
>> diff activities. (I'm the only system user.)
>> 
>>> I am not sure how/if/when ZFS changes a
>>> directory's ctime. However, if it was badly broken, I'd hear about this a lot.
>>> (If the ZFS change to ZoL has changed its ctime handling, that might also explain it
>>> and I'll be hearing a lot more soon as FreeBSD13 becomes adopted. I never use ZFS and,
>>> as such, never test with it.)
>> 
>> I recently decided to try using bectl, which lead to my recent
>> ZFS-based system experiments.
>> 
>> This means I can boot the stable/13 or main [so: 14] that
>> I last built and try the same experiments with the same
>> /usr/ports file sysystems. releng/13 's release/13.0.0 ,
>> stable/13 , and main are all non-debug builds as stands. I
>> could add debug builds of any or all, but it would take
>> a while. (aarch64 4-core Cortex-A72 contexts.)
>> 
>>> --> For UFS, if you use mtime, directory caching does not work as well, which is
>>>     why the client directory caching code uses ctime and not mtime to detect that
>>>     a directory has changed and cached directory blocks need to be invalidated.
>>> 
>>> Jason Bacon did report a directory reading issue some months ago that never
>>> quite got resolved, although I recall he said he couldn't reproduce it after a
>>> system update, so he thought it was related to some local change he had made.
>>> (I can't remember his email or I'd add him to the cc so he could remind me what
>>> his case was. I do recall it being somewhat reproducible and happened for both
>>> UFS and ZFS.)
>>>> The network is just a local EtherNet.
>>> 
>> 
> 
> 
> stable/13 got similar "diff -r /usr/ports/ /mnt/ | more" results but
> /mnt/devel/electron12/files indications of the /usr/ports/devel/ice/files
> ones. It did again start with:
> 
> Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_
> Only in /usr/ports/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src25.cpp
> 
> for this rather wide range diff -r . It continued with:
> 
> Only in /mnt/devel/electron12/files: 
> Only in /mnt/devel/electron12/files: package.json
> Only in /mnt/devel/electron12/files: patch-apps_ui_views_app__window__frame__view.cc
> Only in /mnt/devel/electron12/files: patch-ash_display_mirror__window__controller.cc
> Only in /mnt/devel/electron12/files: patch-base_BUILD.gn
> . . .
> 
> It finished with:
> 
> Only in /mnt/devel/electron12/files: yarn.lock
> Only in /mnt/devel/electron12/files: <A0><CE><C8>֏<DC>ܦ2<B2><E2><AA>^H
> Only in /mnt/www/chromium/files: patch-chrome_browser_chrome__browser
> Only in /usr/ports/www/chromium/files: patch-chrome_browser_chrome__browser__main__posix.cc
> 
> 
> That last is the only /usr/ports/ prefixed path this time: the
> only one where it was under /mnt/ that something appeared to
> be missing.
> 
> It appears that the file name on the line after the yarn.lock
> line is garbage with no matching file present when using ls
> on the system that /mnt/ references.
> 
> Locally on each machine the devel/electron12/files/* files
> are shown by ls as present ( through yarn.lock ).
> 
> NOTE:
> I find it odd that the local /usr/ports/ ended up being where
> most of the files were reported as missing, instead of under
> /mnt/ : Wrong side for a network/network-protocol issue?
> 
> 
> For reference (David W. indicated I should look at ifconfig
> for figuring out controlling TSO and such so I figured I'd
> show the default ifconfig output):
> 
> # ifconfig
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
>        inet6 ::1 prefixlen 128
>        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
>        inet 127.0.0.1 netmask 0xff000000
>        groups: lo
>        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> ue0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>        options=68009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
>        ether REPLACED
>        inet 192.168.1.148 netmask 0xffffff00 broadcast 192.168.1.255
>        media: Ethernet autoselect (1000baseT <full-duplex>)
>        status: active
>        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> 
> # ifconfig
> genet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>        options=68000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
>        ether REPLACED
>        inet6 REPLACED%genet0 prefixlen 64 scopeid 0x1
>        inet6 REPLACED prefixlen 64 autoconf
>        inet 192.168.1.170 netmask 0xffffff00 broadcast 192.168.1.255
>        media: Ethernet autoselect (1000baseT <full-duplex>)
>        status: active
>        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
>        inet6 ::1 prefixlen 128
>        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
>        inet 127.0.0.1 netmask 0xff000000
>        groups: lo
>        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> 
> 
> # uname -apKU
> FreeBSD CA72_16Gp_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #1 stable/13-n245474-fb34817c686c-dirty: Sat May  1 02:27:02 PDT 2021     root at CA72_4c8G_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1300504 1300504
> 
> # ~/fbsd-based-on-what-commit.sh 
> branch: stable/13
> merge-base: fb34817c686cc130449325499870e36979899801
> merge-base: CommitDate: 2021-05-01 00:56:57 +0000
> fb34817c686c (HEAD -> stable/13, freebsd/stable/13) param.h: bump __FreeBSD_version for commits efe7f12cd37b and 9781105bea58
> n245474 (--first-parent --count for merge-base)
> 
> # uname -apKU
> FreeBSD CA72_4c8G_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #1 stable/13-n245474-fb34817c686c-dirty: Sat May  1 02:27:02 PDT 2021     root at CA72_4c8G_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1300504 1300504
> 
> # ~/fbsd-based-on-what-commit.sh 
> branch: stable/13
> merge-base: fb34817c686cc130449325499870e36979899801
> merge-base: CommitDate: 2021-05-01 00:56:57 +0000
> fb34817c686c (HEAD -> stable/13, freebsd/stable/13) param.h: bump __FreeBSD_version for commits efe7f12cd37b and 9781105bea58
> n245474 (--first-parent --count for merge-base)

Both systems running main:

# diff -r /usr/ports/ /mnt/ | more
Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_
Only in /usr/ports/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src25.cpp
Only in /mnt/devel/electron12/files: 
Only in /mnt/devel/electron12/files: 
Only in /mnt/devel/electron12/files: patch-chrome2
Only in /usr/ports/devel/electron12/files: patch-chrome_browser_media_webrtc_webrtc__logging__controller.cc
Only in /usr/ports/devel/electron12/files: patch-chrome_browser_ui_webui_settings_appearance__handler.h
Only in /usr/ports/devel/electron12/files: patch-components_previews_core_previews__features.cc
Only in /usr/ports/devel/electron12/files: patch-ui_compositor_compositor.cc
Only in /mnt/devel/electron12/files: <A0><CE><C8>֏<DC>ܦ2<B2><E2><AA>^H

(That was all that was listed.)

# uname -apKU
FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246411-a6ca7519f89c-dirty: Sat May  1 19:07:50 PDT 2021     root at CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400013 1400013

# ~/fbsd-based-on-what-commit.sh 
branch: main
merge-base: a6ca7519f89c52e9fab205cded0f2bf32d914cd6
merge-base: CommitDate: 2021-05-01 00:58:11 +0000
a6ca7519f89c (HEAD -> main, freebsd/main, freebsd/HEAD) powerpc64: Optimize radix trap handling a little more
n246411 (--first-parent --count for merge-base)

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246411-a6ca7519f89c-dirty: Sat May  1 19:07:50 PDT 2021     root at CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400013 1400013

# ~/fbsd-based-on-what-commit.sh 
branch: main
merge-base: a6ca7519f89c52e9fab205cded0f2bf32d914cd6
merge-base: CommitDate: 2021-05-01 00:58:11 +0000
a6ca7519f89c (HEAD -> main, freebsd/main, freebsd/HEAD) powerpc64: Optimize radix trap handling a little more
n246411 (--first-parent --count for merge-base)



I tried main on the /usr/ side with releng/13 's release/13.0.0
where /mnt/ references and got:

# diff -r /usr/ports/ /mnt/ | more
Only in /mnt/devel/electron12/files: package.json
Only in /mnt/devel/electron12/files: patch-apps_ui_views_app__window__frame__view.cc
Only in /mnt/devel/electron12/files: patch-ash_display_mirror__window__controller.cc
Only in /mnt/devel/electron12/files: patch-base_BUILD.gn
. . .
Only in /mnt/devel/electron12/files: patch-weblayer_browser_system__network__context__manager.cc
Only in /mnt/devel/electron12/files: patch-weblayer_common_weblayer__paths.cc
Only in /mnt/devel/electron12/files: yarn.lock
Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules
Only in /usr/ports/devel/ice/files: patch-cpp-Makefile
. . .
Only in /usr/ports/devel/ice/files: patch-scripts-Expect.py
Only in /usr/ports/devel/ice/files: patch-scripts-IceGridAdmin.py
Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
Only in /mnt/games: 0ad
Only in /mnt/games: 0verkill
Only in /mnt/games: 2048
. . .
Only in /mnt/games: zaz
Only in /mnt/games: zhlt
Only in /mnt/games: ztrack

No obvious garbage or truncated names. Another mix of
/mnt/ vs. /usr/ being the "missing" side.

NOTE:
So far I do not see an obvious reason to prefer any
specific one of releng/13 vs. stable/13 vs. main
at either end of the connection for the vintages that
I happen to have in place for them.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-stable mailing list