Rick Macklem
2021-May-22 00:56 UTC
releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
Mark Millard wrote: [stuff snipped]>Well, why is it that ls -R, find, and diff -r all get file >name problems via genet0 but diff -r gets no problems >comparing the content of files that it does match up (the >vast majority)? Any clue how could the problems possibly >be unique to the handling of file names/paths? Does it >suggest anything else to look into for getting some more >potentially useful evidence?Well, all I can do is describe the most common TSO related failure: - When a read RPC reply (including NFS/RPC/TCP/IP headers) is slightly less than 64K bytes (many TSO implementations are limited to 64K or 32 discontiguous segments, think 32 2K mbuf clusters), the driver decides it is ok, but when the MAC header is added it exceeds what the hardware can handle correctly... --> This will happen when reading a regular file that is slightly less than a multiple of 64K in size. or --> This will happen when reading just about any large directory, since the directory reply for a 64K request is converted to Sun XDR format and clipped at the last full directory entry that will fit within 64K. For ports, where most files are small, I think you can tell which is more likely to happen. --> If TSO is disabled, I have no idea how this might matter, but??>I'll note that netstat -I ue0 -d and netstat -I genet0 -d >do not report changes in Ierrs or Idrop in a before vs. >after failures comparison. (There may be better figures >to look at for all I know.) > >I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >and got no obvious change in behavior.All we know is that the data is getting corrupted somehow. NFS traffic looks very different than typical TCP traffic. It is mostly small messages travelling in both directions concurrently, with some large messages thrown in the mix. All I'm saying is that, testing a net interface with something like bulk data transfer in one direction doesn't verify it works for NFS traffic. Also, the large RPC messages are a chain of about 33 mbufs of various lengths, including a mix of partial clusters and regular data mbufs, whereas a bulk send on a socket will typically result in an mbuf chain of a lot of full 2K clusters. --> As such, NFS can be good at tickling subtle bugs it the net driver related to mbuf handling. rick> W.r.t. reverting r367492...the patch to replace r367492 was just > committed to "main" by rscheff@ with a two week MFC, so it > should be in stable/13 soon. Not sure if an errata can be done > for it for releng13.0?That update is reported to be causing "rack" related panics: https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html reports (via links): panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632 Still, I have a non-debug update to main building and will likely do a debug build as well. llvm is rebuilding, so the builds will take a notable time.> Thanks for isolating this, rick > ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy.I'll warn that the primary "small arm" development/support folk(s) do not work on the RPi*'s these days, beyond committing what others provide and the like. ==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Mark Millard
2021-May-23 07:44 UTC
releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
On 2021-May-21, at 17:56, Rick Macklem <rmacklem at uoguelph.ca> wrote:> Mark Millard wrote: > [stuff snipped] >> Well, why is it that ls -R, find, and diff -r all get file >> name problems via genet0 but diff -r gets no problems >> comparing the content of files that it does match up (the >> vast majority)? Any clue how could the problems possibly >> be unique to the handling of file names/paths? Does it >> suggest anything else to look into for getting some more >> potentially useful evidence? > Well, all I can do is describe the most common TSO related > failure: > - When a read RPC reply (including NFS/RPC/TCP/IP headers) > is slightly less than 64K bytes (many TSO implementations are > limited to 64K or 32 discontiguous segments, think 32 2K > mbuf clusters), the driver decides it is ok, but when the MAC > header is added it exceeds what the hardware can handle correctly... > --> This will happen when reading a regular file that is slightly less > than a multiple of 64K in size. > or > --> This will happen when reading just about any large directory, > since the directory reply for a 64K request is converted to Sun XDR > format and clipped at the last full directory entry that will fit within 64K. > For ports, where most files are small, I think you can tell which is more > likely to happen. > --> If TSO is disabled, I have no idea how this might matter, but?? > >> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >> do not report changes in Ierrs or Idrop in a before vs. >> after failures comparison. (There may be better figures >> to look at for all I know.) >> >> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >> and got no obvious change in behavior. > All we know is that the data is getting corrupted somehow. > > NFS traffic looks very different than typical TCP traffic. It is > mostly small messages travelling in both directions concurrently, > with some large messages thrown in the mix. > All I'm saying is that, testing a net interface with something like > bulk data transfer in one direction doesn't verify it works for NFS > traffic. > > Also, the large RPC messages are a chain of about 33 mbufs of > various lengths, including a mix of partial clusters and regular > data mbufs, whereas a bulk send on a socket will typically > result in an mbuf chain of a lot of full 2K clusters. > --> As such, NFS can be good at tickling subtle bugs it the > net driver related to mbuf handling. > > rick > >>> W.r.t. reverting r367492...the patch to replace r367492 was just >>> committed to "main" by rscheff@ with a two week MFC, so it >>> should be in stable/13 soon. Not sure if an errata can be done >>> for it for releng13.0? >> >> That update is reported to be causing "rack" related panics: >> >> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html >> >> reports (via links): >> >> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632 >> >> Still, I have a non-debug update to main building and will >> likely do a debug build as well. llvm is rebuilding, so >> the builds will take a notable time.I got the following built and installed on the two machines: # uname -apKU FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 root at CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 root at CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 Note that both are booted with debug builds of main. Using the context with the alternate EtherNet device that has not had an associated diff -r, find, pr ls -R failure yet yet got a panic that looks likely to be unrelated: # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ # diff -r /usr/ports/ /mnt/ | more nvme0: cpl does not map to outstanding cmd cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0 panic: received completion for unknown cmd cpuid = 3 time = 1621743752 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x188 panic() at panic+0x44 nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc nvme_timeout() at nvme_timeout+0x3c softclock_call_cc() at softclock_call_cc+0x124 softclock() at softclock+0x60 ithread_loop() at ithread_loop+0x2a8 fork_exit() at fork_exit+0x74 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 12 tid 100028 ] Stopped at kdb_enter+0x48: undefined f904411f db> Based on the "nvme" references, I expect this is tied to handling the Optane 480 GiByte that is in the PCIe slot and is the boot/only media for the machine doing the diff. "db> dump" seems to have worked. After reboot, zpool scrub found no errors. So, trying again . . . I got some "Expensive timeout(9) function" notices: Expensive timeout(9) function: 0xffff000000717b64(0) 1.210285924 s Expensive timeout(9) function: 0xffff000000717b64(0) 4.001010935 s 0xffff000000717b64 looks to be uma_timeout: ffff000000717b60 <uma_startup3+0x118> b ffff000000717b3c <uma_startup3+0xf4> ffff000000717b64 <uma_timeout> stp x29, x30, [sp, #-32]! ffff000000717b68 <uma_timeout+0x4> stp x20, x19, [sp, #16] . . . . . . Hmm. The debug kernel test context seems to take a very long time. It has not failed so far but is still going. So I stopped it and switch to testing with the genet0 device that was involved for the earlier failures. . . . It did not fail. Nor did the debug kernel report anything beyond: if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 Expensive timeout(9) function: 0xffff00000050c088(0) 6.318652023 s on one machine and: if_delmulti_locked: detaching ifnet instance 0xffffa0000b56b800 on the other. So I may reboot into the also-updated non-debug builds on both machines and try in that context.>>> Thanks for isolating this, rick >>> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy. >> >> I'll warn that the primary "small arm" development/support >> folk(s) do not work on the RPi*'s these days, beyond >> committing what others provide and the like. >==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)