Mark Millard
2021-May-23 07:44 UTC
releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
On 2021-May-21, at 17:56, Rick Macklem <rmacklem at uoguelph.ca> wrote:> Mark Millard wrote: > [stuff snipped] >> Well, why is it that ls -R, find, and diff -r all get file >> name problems via genet0 but diff -r gets no problems >> comparing the content of files that it does match up (the >> vast majority)? Any clue how could the problems possibly >> be unique to the handling of file names/paths? Does it >> suggest anything else to look into for getting some more >> potentially useful evidence? > Well, all I can do is describe the most common TSO related > failure: > - When a read RPC reply (including NFS/RPC/TCP/IP headers) > is slightly less than 64K bytes (many TSO implementations are > limited to 64K or 32 discontiguous segments, think 32 2K > mbuf clusters), the driver decides it is ok, but when the MAC > header is added it exceeds what the hardware can handle correctly... > --> This will happen when reading a regular file that is slightly less > than a multiple of 64K in size. > or > --> This will happen when reading just about any large directory, > since the directory reply for a 64K request is converted to Sun XDR > format and clipped at the last full directory entry that will fit within 64K. > For ports, where most files are small, I think you can tell which is more > likely to happen. > --> If TSO is disabled, I have no idea how this might matter, but?? > >> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >> do not report changes in Ierrs or Idrop in a before vs. >> after failures comparison. (There may be better figures >> to look at for all I know.) >> >> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >> and got no obvious change in behavior. > All we know is that the data is getting corrupted somehow. > > NFS traffic looks very different than typical TCP traffic. It is > mostly small messages travelling in both directions concurrently, > with some large messages thrown in the mix. > All I'm saying is that, testing a net interface with something like > bulk data transfer in one direction doesn't verify it works for NFS > traffic. > > Also, the large RPC messages are a chain of about 33 mbufs of > various lengths, including a mix of partial clusters and regular > data mbufs, whereas a bulk send on a socket will typically > result in an mbuf chain of a lot of full 2K clusters. > --> As such, NFS can be good at tickling subtle bugs it the > net driver related to mbuf handling. > > rick > >>> W.r.t. reverting r367492...the patch to replace r367492 was just >>> committed to "main" by rscheff@ with a two week MFC, so it >>> should be in stable/13 soon. Not sure if an errata can be done >>> for it for releng13.0? >> >> That update is reported to be causing "rack" related panics: >> >> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html >> >> reports (via links): >> >> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632 >> >> Still, I have a non-debug update to main building and will >> likely do a debug build as well. llvm is rebuilding, so >> the builds will take a notable time.I got the following built and installed on the two machines: # uname -apKU FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 root at CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 root at CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 Note that both are booted with debug builds of main. Using the context with the alternate EtherNet device that has not had an associated diff -r, find, pr ls -R failure yet yet got a panic that looks likely to be unrelated: # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ # diff -r /usr/ports/ /mnt/ | more nvme0: cpl does not map to outstanding cmd cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0 panic: received completion for unknown cmd cpuid = 3 time = 1621743752 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x188 panic() at panic+0x44 nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc nvme_timeout() at nvme_timeout+0x3c softclock_call_cc() at softclock_call_cc+0x124 softclock() at softclock+0x60 ithread_loop() at ithread_loop+0x2a8 fork_exit() at fork_exit+0x74 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 12 tid 100028 ] Stopped at kdb_enter+0x48: undefined f904411f db> Based on the "nvme" references, I expect this is tied to handling the Optane 480 GiByte that is in the PCIe slot and is the boot/only media for the machine doing the diff. "db> dump" seems to have worked. After reboot, zpool scrub found no errors. So, trying again . . . I got some "Expensive timeout(9) function" notices: Expensive timeout(9) function: 0xffff000000717b64(0) 1.210285924 s Expensive timeout(9) function: 0xffff000000717b64(0) 4.001010935 s 0xffff000000717b64 looks to be uma_timeout: ffff000000717b60 <uma_startup3+0x118> b ffff000000717b3c <uma_startup3+0xf4> ffff000000717b64 <uma_timeout> stp x29, x30, [sp, #-32]! ffff000000717b68 <uma_timeout+0x4> stp x20, x19, [sp, #16] . . . . . . Hmm. The debug kernel test context seems to take a very long time. It has not failed so far but is still going. So I stopped it and switch to testing with the genet0 device that was involved for the earlier failures. . . . It did not fail. Nor did the debug kernel report anything beyond: if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 Expensive timeout(9) function: 0xffff00000050c088(0) 6.318652023 s on one machine and: if_delmulti_locked: detaching ifnet instance 0xffffa0000b56b800 on the other. So I may reboot into the also-updated non-debug builds on both machines and try in that context.>>> Thanks for isolating this, rick >>> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy. >> >> I'll warn that the primary "small arm" development/support >> folk(s) do not work on the RPi*'s these days, beyond >> committing what others provide and the like. >==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Mark Millard
2021-May-23 08:27 UTC
releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
On 2021-May-23, at 00:44, Mark Millard <marklmi at yahoo.com> wrote:> On 2021-May-21, at 17:56, Rick Macklem <rmacklem at uoguelph.ca> wrote: > >> Mark Millard wrote: >> [stuff snipped] >>> Well, why is it that ls -R, find, and diff -r all get file >>> name problems via genet0 but diff -r gets no problems >>> comparing the content of files that it does match up (the >>> vast majority)? Any clue how could the problems possibly >>> be unique to the handling of file names/paths? Does it >>> suggest anything else to look into for getting some more >>> potentially useful evidence? >> Well, all I can do is describe the most common TSO related >> failure: >> - When a read RPC reply (including NFS/RPC/TCP/IP headers) >> is slightly less than 64K bytes (many TSO implementations are >> limited to 64K or 32 discontiguous segments, think 32 2K >> mbuf clusters), the driver decides it is ok, but when the MAC >> header is added it exceeds what the hardware can handle correctly... >> --> This will happen when reading a regular file that is slightly less >> than a multiple of 64K in size. >> or >> --> This will happen when reading just about any large directory, >> since the directory reply for a 64K request is converted to Sun XDR >> format and clipped at the last full directory entry that will fit within 64K. >> For ports, where most files are small, I think you can tell which is more >> likely to happen. >> --> If TSO is disabled, I have no idea how this might matter, but?? >> >>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >>> do not report changes in Ierrs or Idrop in a before vs. >>> after failures comparison. (There may be better figures >>> to look at for all I know.) >>> >>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >>> and got no obvious change in behavior. >> All we know is that the data is getting corrupted somehow. >> >> NFS traffic looks very different than typical TCP traffic. It is >> mostly small messages travelling in both directions concurrently, >> with some large messages thrown in the mix. >> All I'm saying is that, testing a net interface with something like >> bulk data transfer in one direction doesn't verify it works for NFS >> traffic. >> >> Also, the large RPC messages are a chain of about 33 mbufs of >> various lengths, including a mix of partial clusters and regular >> data mbufs, whereas a bulk send on a socket will typically >> result in an mbuf chain of a lot of full 2K clusters. >> --> As such, NFS can be good at tickling subtle bugs it the >> net driver related to mbuf handling. >> >> rick >> >>>> W.r.t. reverting r367492...the patch to replace r367492 was just >>>> committed to "main" by rscheff@ with a two week MFC, so it >>>> should be in stable/13 soon. Not sure if an errata can be done >>>> for it for releng13.0? >>> >>> That update is reported to be causing "rack" related panics: >>> >>> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html >>> >>> reports (via links): >>> >>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632 >>> >>> Still, I have a non-debug update to main building and will >>> likely do a debug build as well. llvm is rebuilding, so >>> the builds will take a notable time. > > I got the following built and installed on the two > machines: > > # uname -apKU > FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 root at CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 > > # uname -apKU > FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 root at CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 > > Note that both are booted with debug builds of main. > > Using the context with the alternate EtherNet device that has not > had an associated diff -r, find, pr ls -R failure yet > yet got a panic that looks likely to be unrelated: > > # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ > # diff -r /usr/ports/ /mnt/ | more > nvme0: cpl does not map to outstanding cmd > cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0 > panic: received completion for unknown cmd > cpuid = 3 > time = 1621743752 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x188 > panic() at panic+0x44 > nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc > nvme_timeout() at nvme_timeout+0x3c > softclock_call_cc() at softclock_call_cc+0x124 > softclock() at softclock+0x60 > ithread_loop() at ithread_loop+0x2a8 > fork_exit() at fork_exit+0x74 > fork_trampoline() at fork_trampoline+0x14 > KDB: enter: panic > [ thread pid 12 tid 100028 ] > Stopped at kdb_enter+0x48: undefined f904411f > db> > > Based on the "nvme" references, I expect this is tied to > handling the Optane 480 GiByte that is in the PCIe slot > and is the boot/only media for the machine doing the diff. > > "db> dump" seems to have worked. > > After reboot, zpool scrub found no errors. > > So, trying again . . . > > I got some "Expensive timeout(9) function" notices: > > Expensive timeout(9) function: 0xffff000000717b64(0) 1.210285924 s > Expensive timeout(9) function: 0xffff000000717b64(0) 4.001010935 s > > 0xffff000000717b64 looks to be uma_timeout: > > ffff000000717b60 <uma_startup3+0x118> b ffff000000717b3c <uma_startup3+0xf4> > ffff000000717b64 <uma_timeout> stp x29, x30, [sp, #-32]! > ffff000000717b68 <uma_timeout+0x4> stp x20, x19, [sp, #16] > . . . > > . . . Hmm. The debug kernel test context seems to take a > very long time. It has not failed so far but is still > going. > > So I stopped it and switch to testing with the genet0 device > that was involved for the earlier failures. . . . > > It did not fail. Nor did the debug kernel report anything > beyond: > > if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 > if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 > Expensive timeout(9) function: 0xffff00000050c088(0) 6.318652023 s > > on one machine and: > > if_delmulti_locked: detaching ifnet instance 0xffffa0000b56b800 > > on the other. > > So I may reboot into the also-updated non-debug builds on both > machines and try in that context. >The non-debug build pair of machines got the problem: # diff -r /usr/ports/ /mnt/ | more Only in /mnt/devel/electron12/files: Only in /usr/ports/devel/electron12/files: patch-chrome_browser_media_webrtc_webrtc__logging__controller.cc Only in /usr/ports/devel/electron12/files: patch-components_previews_core_previews__features.cc Only in /mnt/devel/electron12/files: <A0><CE><C8>?<DC>?2<B2><E2><AA>^H Only in /mnt/www/chromium/files: patch-chrome_browser_chrome__browser Only in /usr/ports/www/chromium/files: patch-chrome_browser_chrome__browser__main__posix.cc I'll note that it turns out that the debug build had more than is typical enabled: DIAGNOSTICS, BUF_TRACKING, and FULL_BUF_TRACKING were also enabled. I'd forgotten that I'd previously had a reason to add those to what my debug builds included (for a prior problem investigation). I'd not done debug builds in some time. ==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)