Rick Macklem
2021-May-21 05:19 UTC
releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
Ok, so it isn't related to "soft". I am wondering if it is something specific to what "diff -r" does? Could you try: # cd /usr/ports # ls -R > /tmp/x # cd /mnt # ls -R > /tmp/y # cd /tmp # diff -u -p x y --> To see if "ls -R" finds any difference? rick ps: I do not think that r367492 could cause this, but it would be nice if you try a kernel with the r367492 patch reverted. It is currently in all of releng13, stable13 and main, although the patch to fix this is was just reviewed and may hit main soon. ________________________________________ From: Mark Millard <marklmi at yahoo.com> Sent: Friday, May 21, 2021 12:40 AM To: Rick Macklem Cc: FreeBSD-STABLE Mailing List Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp at uoguelph.ca [main test example and main/releng/13 mixed example] On 2021-May-20, at 20:36, Mark Millard <marklmi at yahoo.com> wrote:> [stable/13 test: example ends up being odder. That might > allow eliminating some potential alternatives.] > > On 2021-May-20, at 19:38, Mark Millard <marklmi at yahoo.com> wrote: >> >> On 2021-May-20, at 18:09, Rick Macklem <rmacklem at uoguelph.ca> wrote: >>> >>> Oh, one additional thing that I'll dare to top post... >>> r367492 broke the TCP upcalls that the NFS server uses, such >>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur. >>> This has not yet been resolved in "main" etc and could explain >>> why an RPC could time out for a soft mount. >> >> See later notes that I added: soft mount is not required >> to see the problem. >> >>> You can revert the patch in r367492 to avoid the problem. >> >> If I understand right, you are indicating that this would >> not apply to the non-soft mount case that I got. >> >>> Disabling TSO, LRO are also de-facto standard things to do when >>> you observe weird NFS behaviour, because they are often broken >>> in various network device drivers. >> >> I'll have to figure out how to experiment with such. Things >> are at defaults rather generally on the systems. I'm not >> literate in the subject areas. >> >> I'm the only user of the machines and network. It is not >> outward facing. It is a rather small EtherNet network. >> >>> rick >>> >>> ________________________________________ >>> From: owner-freebsd-stable at freebsd.org <owner-freebsd-stable at freebsd.org> on behalf of Rick Macklem <rmacklem at uoguelph.ca> >>> Sent: Thursday, May 20, 2021 8:55 PM >>> To: FreeBSD-STABLE Mailing List; Mark Millard >>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) >>> >>> Mark Millard wrote: >>>> [I warn that I'm a fairly minimal user of NFS >>>> mounts, not knowing all that much. I'm mostly >>>> reporting this in case it ends up as evidence >>>> via eventually matching up with others observing >>>> possibly related oddities.] >>>> >>>> I got the following odd sequence (that I've >>>> mixed notes into). It involved a diff -r over NFS >>>> showing differences (files missing) and then a >>>> later diff finding matches for the same files, >>>> no file system changes made on either machine. >>>> I'm unable to reproduce the oddity on demand. >>>> >>>> Note: A larger scope diff -r originally returned the >>>> below as well, but doing the narrower diff -r did >>>> repeat the result and that is what I show. (I >>>> make no use of devel/ice .) >>>> >>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD >> . . . >>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py >>>> >>>> Note: The above was not expected. So I tried: >>>> >>>> # ls -Tld /mnt/devel/ice/files/* >>>> -rw-r--r-- 1 root wheel 755 Apr 21 21:07:54 2021 /mnt/devel/ice/files/Make.rules.FreeBSD >> . . . >>>> -rw-r--r-- 1 root wheel 2588 Apr 21 21:07:54 2021 /mnt/devel/ice/files/patch-scripts-TestUtil.py >>>> >>>> Note: So that indicated that the files were there on the >>>> machine that /mnt references. So attempting the original >>>> diff -r again: >>>> >>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>>> # >>>> >>>> (Empty difference.) >>>> >>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*" >>>> the odd result of the diff -r no longer happened: no >>>> differences reported. >>>> >>>> >>>> >>>> For reference (both machines reported): >>>> >>>> . . . >>>> The original mount command was on CA72_16Gp_ZFS: >>>> >>>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/ >>> The likely explanation for this is your use of a "soft" mount. >>> - If the NFS server is slow to respond or there is a temporary network issue, >>> the RPC request can time out and then the >>> syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the >>> readdir(3) libc functions expect syscalls to fail this way... >>> Then the cached directory is messed up. >>> Doing the "ls" read the directory again and fixed the problem. >>> >>> Try to reproduce it for a mount without the "soft" option. >>> (If a mount point is hung, due to an unresponsive server "umount -N /mnt" >>> can usually get rid of it.) >>> Personally, I thought "soft" was a bad idea when Sun introduced it in NFS in 1985 >>> and I still feel that way. >>> --> If you can reproduce it without "soft" then I can't explain it. >>> To be honest, the directory reading/caching code in the NFSv3 client >>> hasn't changed significantly in literally decades, as far as I can remember. >> >> Well . . . trying an even wider scope diff than >> the original . . . >> >> # umount /mnt/ >> # mount -onoatime 192.168.1.170:/usr/ports/ /mnt/ >> # diff -r /usr/ports/ /mnt/ | more >> Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_ >> Only in /usr/ports/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src25.cpp >> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD >> Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules >> Only in /usr/ports/devel/ice/files: patch-cpp-Makefile >> . . . >> Only in /usr/ports/devel/ice/files: patch-python-test-Slice-unicodePaths-run.py >> Only in /usr/ports/devel/ice/files: patch-scripts-Expect.py >> Only in /usr/ports/devel/ice/files: patch-scripts-IceGridAdmin.py >> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py >> >> So the devel/ice files showed up again. >> >> But 2 other lines show up, one finding a file supposedly only >> on /mnt/. . . >> >> QUOTE >> Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_ >> END QUOTE >> >> That seems to be a truncated file name. Looking directly on the machine that >> /mnt/ references (hitting tab at the end of the partial name to show a >> list): >> >> # ls -Tld /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_ >> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_gen-config.sh >> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_js-confdefs.h >> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src0.cpp >> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src1.cpp >> . . . >> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src9.cpp >> /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-60_platform_aarch64_freebsd_include_js-config.h >> >> The other machine agrees (machine-local usage). >> >> The other of the 2 new names is one of the matches to the prefix: >> >> QUOTE >> Only in /usr/ports/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src25.cpp >> END QUOTE >> >> For reference: I've not gotten any console messages about >> anything during these. >> >>> One additional thing to note is that cached directory contents are invalidated >>> when the directory's ctime changes. >> >> I'm not aware of anything that should have been touching the >> /usr/ports file systems on either machine any time near my >> diff activities. (I'm the only system user.) >> >>> I am not sure how/if/when ZFS changes a >>> directory's ctime. However, if it was badly broken, I'd hear about this a lot. >>> (If the ZFS change to ZoL has changed its ctime handling, that might also explain it >>> and I'll be hearing a lot more soon as FreeBSD13 becomes adopted. I never use ZFS and, >>> as such, never test with it.) >> >> I recently decided to try using bectl, which lead to my recent >> ZFS-based system experiments. >> >> This means I can boot the stable/13 or main [so: 14] that >> I last built and try the same experiments with the same >> /usr/ports file sysystems. releng/13 's release/13.0.0 , >> stable/13 , and main are all non-debug builds as stands. I >> could add debug builds of any or all, but it would take >> a while. (aarch64 4-core Cortex-A72 contexts.) >> >>> --> For UFS, if you use mtime, directory caching does not work as well, which is >>> why the client directory caching code uses ctime and not mtime to detect that >>> a directory has changed and cached directory blocks need to be invalidated. >>> >>> Jason Bacon did report a directory reading issue some months ago that never >>> quite got resolved, although I recall he said he couldn't reproduce it after a >>> system update, so he thought it was related to some local change he had made. >>> (I can't remember his email or I'd add him to the cc so he could remind me what >>> his case was. I do recall it being somewhat reproducible and happened for both >>> UFS and ZFS.) >>>> The network is just a local EtherNet. >>> >> > > > stable/13 got similar "diff -r /usr/ports/ /mnt/ | more" results but > /mnt/devel/electron12/files indications of the /usr/ports/devel/ice/files > ones. It did again start with: > > Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_ > Only in /usr/ports/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src25.cpp > > for this rather wide range diff -r . It continued with: > > Only in /mnt/devel/electron12/files: > Only in /mnt/devel/electron12/files: package.json > Only in /mnt/devel/electron12/files: patch-apps_ui_views_app__window__frame__view.cc > Only in /mnt/devel/electron12/files: patch-ash_display_mirror__window__controller.cc > Only in /mnt/devel/electron12/files: patch-base_BUILD.gn > . . . > > It finished with: > > Only in /mnt/devel/electron12/files: yarn.lock > Only in /mnt/devel/electron12/files: <A0><CE><C8>?<DC>?2<B2><E2><AA>^H > Only in /mnt/www/chromium/files: patch-chrome_browser_chrome__browser > Only in /usr/ports/www/chromium/files: patch-chrome_browser_chrome__browser__main__posix.cc > > > That last is the only /usr/ports/ prefixed path this time: the > only one where it was under /mnt/ that something appeared to > be missing. > > It appears that the file name on the line after the yarn.lock > line is garbage with no matching file present when using ls > on the system that /mnt/ references. > > Locally on each machine the devel/electron12/files/* files > are shown by ls as present ( through yarn.lock ). > > NOTE: > I find it odd that the local /usr/ports/ ended up being where > most of the files were reported as missing, instead of under > /mnt/ : Wrong side for a network/network-protocol issue? > > > For reference (David W. indicated I should look at ifconfig > for figuring out controlling TSO and such so I figured I'd > show the default ifconfig output): > > # ifconfig > lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 > options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> > inet6 ::1 prefixlen 128 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 > inet 127.0.0.1 netmask 0xff000000 > groups: lo > nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> > ue0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=68009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> > ether REPLACED > inet 192.168.1.148 netmask 0xffffff00 broadcast 192.168.1.255 > media: Ethernet autoselect (1000baseT <full-duplex>) > status: active > nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> > > # ifconfig > genet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=68000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> > ether REPLACED > inet6 REPLACED%genet0 prefixlen 64 scopeid 0x1 > inet6 REPLACED prefixlen 64 autoconf > inet 192.168.1.170 netmask 0xffffff00 broadcast 192.168.1.255 > media: Ethernet autoselect (1000baseT <full-duplex>) > status: active > nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> > lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 > options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> > inet6 ::1 prefixlen 128 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 > inet 127.0.0.1 netmask 0xff000000 > groups: lo > nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> > > > # uname -apKU > FreeBSD CA72_16Gp_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #1 stable/13-n245474-fb34817c686c-dirty: Sat May 1 02:27:02 PDT 2021 root at CA72_4c8G_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1300504 1300504 > > # ~/fbsd-based-on-what-commit.sh > branch: stable/13 > merge-base: fb34817c686cc130449325499870e36979899801 > merge-base: CommitDate: 2021-05-01 00:56:57 +0000 > fb34817c686c (HEAD -> stable/13, freebsd/stable/13) param.h: bump __FreeBSD_version for commits efe7f12cd37b and 9781105bea58 > n245474 (--first-parent --count for merge-base) > > # uname -apKU > FreeBSD CA72_4c8G_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #1 stable/13-n245474-fb34817c686c-dirty: Sat May 1 02:27:02 PDT 2021 root at CA72_4c8G_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1300504 1300504 > > # ~/fbsd-based-on-what-commit.sh > branch: stable/13 > merge-base: fb34817c686cc130449325499870e36979899801 > merge-base: CommitDate: 2021-05-01 00:56:57 +0000 > fb34817c686c (HEAD -> stable/13, freebsd/stable/13) param.h: bump __FreeBSD_version for commits efe7f12cd37b and 9781105bea58 > n245474 (--first-parent --count for merge-base)Both systems running main: # diff -r /usr/ports/ /mnt/ | more Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_ Only in /usr/ports/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cpp__js__src25.cpp Only in /mnt/devel/electron12/files: Only in /mnt/devel/electron12/files: Only in /mnt/devel/electron12/files: patch-chrome2 Only in /usr/ports/devel/electron12/files: patch-chrome_browser_media_webrtc_webrtc__logging__controller.cc Only in /usr/ports/devel/electron12/files: patch-chrome_browser_ui_webui_settings_appearance__handler.h Only in /usr/ports/devel/electron12/files: patch-components_previews_core_previews__features.cc Only in /usr/ports/devel/electron12/files: patch-ui_compositor_compositor.cc Only in /mnt/devel/electron12/files: <A0><CE><C8>?<DC>?2<B2><E2><AA>^H (That was all that was listed.) # uname -apKU FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246411-a6ca7519f89c-dirty: Sat May 1 19:07:50 PDT 2021 root at CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400013 1400013 # ~/fbsd-based-on-what-commit.sh branch: main merge-base: a6ca7519f89c52e9fab205cded0f2bf32d914cd6 merge-base: CommitDate: 2021-05-01 00:58:11 +0000 a6ca7519f89c (HEAD -> main, freebsd/main, freebsd/HEAD) powerpc64: Optimize radix trap handling a little more n246411 (--first-parent --count for merge-base) # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246411-a6ca7519f89c-dirty: Sat May 1 19:07:50 PDT 2021 root at CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400013 1400013 # ~/fbsd-based-on-what-commit.sh branch: main merge-base: a6ca7519f89c52e9fab205cded0f2bf32d914cd6 merge-base: CommitDate: 2021-05-01 00:58:11 +0000 a6ca7519f89c (HEAD -> main, freebsd/main, freebsd/HEAD) powerpc64: Optimize radix trap handling a little more n246411 (--first-parent --count for merge-base) I tried main on the /usr/ side with releng/13 's release/13.0.0 where /mnt/ references and got: # diff -r /usr/ports/ /mnt/ | more Only in /mnt/devel/electron12/files: package.json Only in /mnt/devel/electron12/files: patch-apps_ui_views_app__window__frame__view.cc Only in /mnt/devel/electron12/files: patch-ash_display_mirror__window__controller.cc Only in /mnt/devel/electron12/files: patch-base_BUILD.gn . . . Only in /mnt/devel/electron12/files: patch-weblayer_browser_system__network__context__manager.cc Only in /mnt/devel/electron12/files: patch-weblayer_common_weblayer__paths.cc Only in /mnt/devel/electron12/files: yarn.lock Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules Only in /usr/ports/devel/ice/files: patch-cpp-Makefile . . . Only in /usr/ports/devel/ice/files: patch-scripts-Expect.py Only in /usr/ports/devel/ice/files: patch-scripts-IceGridAdmin.py Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py Only in /mnt/games: 0ad Only in /mnt/games: 0verkill Only in /mnt/games: 2048 . . . Only in /mnt/games: zaz Only in /mnt/games: zhlt Only in /mnt/games: ztrack No obvious garbage or truncated names. Another mix of /mnt/ vs. /usr/ being the "missing" side. NOTE: So far I do not see an obvious reason to prefer any specific one of releng/13 vs. stable/13 vs. main at either end of the connection for the vintages that I happen to have in place for them. ==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Mark Millard
2021-May-21 05:56 UTC
releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
On 2021-May-20, at 22:19, Rick Macklem <rmacklem at uoguelph.ca> wrote:> Ok, so it isn't related to "soft". > I am wondering if it is something specific to what > "diff -r" does? > > Could you try: > # cd /usr/ports > # ls -R > /tmp/x > # cd /mnt > # ls -R > /tmp/y > # cd /tmp > # diff -u -p x y > --> To see if "ls -R" finds any difference? ># diff -u -p x y --- x 2021-05-20 22:35:48.021663000 -0700 +++ y 2021-05-20 22:39:03.691936000 -0700 @@ -227209,10 +227209,10 @@ patch-chrome_browser_background_background__mode__mana patch-chrome_browser_background_background__mode__optimizer.cc patch-chrome_browser_browser__resources.grd patch-chrome_browser_browsing__data_chrome__browsing__data__remover__delegate.cc +patch-chrome_browser_chrome__browser patch-chrome_browser_chrome__browser__interface__binders.cc patch-chrome_browser_chrome__browser__main.cc patch-chrome_browser_chrome__browser__main__linux.cc -patch-chrome_browser_chrome__browser__main__posix.cc patch-chrome_browser_chrome__content__browser__client.cc patch-chrome_browser_chrome__content__browser__client.h patch-chrome_browser_crash__upload__list_crash__upload__list.cc # find /usr/ports/ -name 'patch-chrome_browser_chrome__browser*' -print | more /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__posix.cc /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc find /mnt/ -name 'patch-chrome_browser_chrome__browser*' -print | more /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc /mnt/www/chromium/files/patch-chrome_browser_chrome__browser /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc So: patch-chrome_browser_chrome__browser appears to be a truncated: patch-chrome_browser_chrome__browser__main__posix.cc file name and find also gets the same oddity. (Note: This had /usr/ports in a main context and /mnt/ referring to a release/13.0.0 context.)> ps: I do not think that r367492 could cause this, but it would be > nice if you try a kernel with the r367492 patch reverted. > It is currently in all of releng13, stable13 and main, although > the patch to fix this is was just reviewed and may hit main soon.Do you want a debug kernel to be used? Do you have a preference for main vs. stable/13 vs. release/13.0.0 based? Is it okay to stick to the base version things are now based on --or do you want me to update to more recent? (That last only applies if main or stable/13 is to be put to use.)> . . . old history deleted . . .==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)