thr3ads.net - freebsd stable - releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) [May 2021]

If this information is useful, please help other people find it:
Share via:

Rick Macklem

2021-May-22 00:56 UTC

releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

Mark Millard wrote:
[stuff snipped]>Well, why is it that ls -R, find, and diff -r all get file
>name problems via genet0 but diff -r gets no problems
>comparing the content of files that it does match up (the
>vast majority)? Any clue how could the problems possibly
>be unique to the handling of file names/paths? Does it
>suggest anything else to look into for getting some more
>potentially useful evidence?Well, all I can do is describe the most common TSO related
failure:
- When a read RPC reply (including NFS/RPC/TCP/IP headers)
  is slightly less than 64K bytes (many TSO implementations are
  limited to 64K or 32 discontiguous segments, think 32 2K
  mbuf clusters), the driver decides it is ok, but when the MAC
  header is added it exceeds what the hardware can handle correctly...
--> This will happen when reading a regular file that is slightly less
       than a multiple of 64K in size.
or
--> This will happen when reading just about any large directory,
      since the directory reply for a 64K request is converted to Sun XDR
      format and clipped at the last full directory entry that will fit within
64K.
For ports, where most files are small, I think you can tell which is more
likely to happen.
--> If TSO is disabled, I have no idea how this might matter, but??
>I'll note that netstat -I ue0 -d and netstat -I genet0 -d
>do not report changes in Ierrs or Idrop in a before vs.
>after failures comparison. (There may be better figures
>to look at for all I know.)
>
>I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
>and got no obvious change in behavior.All we know is that the data is getting corrupted somehow.

NFS traffic looks very different than typical TCP traffic. It is
mostly small messages travelling in both directions concurrently,
with some large messages thrown in the mix.
All I'm saying is that, testing a net interface with something like
bulk data transfer in one direction doesn't verify it works for NFS
traffic.

Also, the large RPC messages are a chain of about 33 mbufs of
various lengths, including a mix of partial clusters and regular
data mbufs, whereas a bulk send on a socket will typically
result in an mbuf chain of a lot of full 2K clusters.
--> As such, NFS can be good at tickling subtle bugs it the
      net driver related to mbuf handling.

rick
> W.r.t. reverting r367492...the patch to replace r367492 was just
> committed to "main" by rscheff@ with a two week MFC, so it
> should be in stable/13 soon. Not sure if an errata can be done
> for it for releng13.0?
That update is reported to be causing "rack" related panics:

https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html

reports (via links):

panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @
/syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632

Still, I have a non-debug update to main building and will
likely do a debug build as well. llvm is rebuilding, so
the builds will take a notable time.
> Thanks for isolating this, rick
> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy.
I'll warn that the primary "small arm" development/support
folk(s) do not work on the RPi*'s these days, beyond
committing what others provide and the like.




==Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Mark Millard

2021-May-23 07:44 UTC

head link

releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

On 2021-May-21, at 17:56, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> Mark Millard wrote:
> [stuff snipped]
>> Well, why is it that ls -R, find, and diff -r all get file
>> name problems via genet0 but diff -r gets no problems
>> comparing the content of files that it does match up (the
>> vast majority)? Any clue how could the problems possibly
>> be unique to the handling of file names/paths? Does it
>> suggest anything else to look into for getting some more
>> potentially useful evidence?
> Well, all I can do is describe the most common TSO related
> failure:
> - When a read RPC reply (including NFS/RPC/TCP/IP headers)
>  is slightly less than 64K bytes (many TSO implementations are
>  limited to 64K or 32 discontiguous segments, think 32 2K
>  mbuf clusters), the driver decides it is ok, but when the MAC
>  header is added it exceeds what the hardware can handle correctly...
> --> This will happen when reading a regular file that is slightly less
>       than a multiple of 64K in size.
> or
> --> This will happen when reading just about any large directory,
>      since the directory reply for a 64K request is converted to Sun XDR
>      format and clipped at the last full directory entry that will fit
within 64K.
> For ports, where most files are small, I think you can tell which is more
> likely to happen.
> --> If TSO is disabled, I have no idea how this might matter, but??
> 
>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d
>> do not report changes in Ierrs or Idrop in a before vs.
>> after failures comparison. (There may be better figures
>> to look at for all I know.)
>> 
>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
>> and got no obvious change in behavior.
> All we know is that the data is getting corrupted somehow.
> 
> NFS traffic looks very different than typical TCP traffic. It is
> mostly small messages travelling in both directions concurrently,
> with some large messages thrown in the mix.
> All I'm saying is that, testing a net interface with something like
> bulk data transfer in one direction doesn't verify it works for NFS
> traffic.
> 
> Also, the large RPC messages are a chain of about 33 mbufs of
> various lengths, including a mix of partial clusters and regular
> data mbufs, whereas a bulk send on a socket will typically
> result in an mbuf chain of a lot of full 2K clusters.
> --> As such, NFS can be good at tickling subtle bugs it the
>      net driver related to mbuf handling.
> 
> rick
> 
>>> W.r.t. reverting r367492...the patch to replace r367492 was just
>>> committed to "main" by rscheff@ with a two week MFC, so
it
>>> should be in stable/13 soon. Not sure if an errata can be done
>>> for it for releng13.0?
>> 
>> That update is reported to be causing "rack" related panics:
>> 
>>
https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html
>> 
>> reports (via links):
>> 
>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @
/syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632
>> 
>> Still, I have a non-debug update to main building and will
>> likely do a debug build as well. llvm is rebuilding, so
>> the builds will take a notable time.
I got the following built and installed on the two
machines:

# uname -apKU
FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1
main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021     root at
CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
arm64 aarch64 1400013 1400013

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1
main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021     root at
CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
arm64 aarch64 1400013 1400013

Note that both are booted with debug builds of main.

Using the context with the alternate EtherNet device that has not
had an associated diff -r, find, pr ls -R failure yet
yet got a panic that looks likely to be unrelated:

# mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
nvme0: cpl does not map to outstanding cmd
cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
panic: received completion for unknown cmd
cpuid = 3
time = 1621743752
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x188
panic() at panic+0x44
nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
nvme_timeout() at nvme_timeout+0x3c
softclock_call_cc() at softclock_call_cc+0x124
softclock() at softclock+0x60
ithread_loop() at ithread_loop+0x2a8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 12 tid 100028 ]
Stopped at      kdb_enter+0x48: undefined       f904411f
db> 

Based on the "nvme" references, I expect this is tied to
handling the Optane 480 GiByte that is in the PCIe slot
and is the boot/only media for the machine doing the diff.

"db> dump" seems to have worked.

After reboot, zpool scrub found no errors.

So, trying again . . .

I got some "Expensive timeout(9) function" notices:

Expensive timeout(9) function: 0xffff000000717b64(0) 1.210285924 s
Expensive timeout(9) function: 0xffff000000717b64(0) 4.001010935 s

0xffff000000717b64 looks to be uma_timeout:

ffff000000717b60 <uma_startup3+0x118> b ffff000000717b3c
<uma_startup3+0xf4>
ffff000000717b64 <uma_timeout> stp      x29, x30, [sp, #-32]!
ffff000000717b68 <uma_timeout+0x4> stp  x20, x19, [sp, #16]
. . .

. . . Hmm. The debug kernel test context seems to take a
very long time. It has not failed so far but is still
going.

So I stopped it and switch to testing with the genet0 device
that was involved for the earlier failures.  . . .

It did not fail. Nor did the debug kernel report anything
beyond:

if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000
if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000
Expensive timeout(9) function: 0xffff00000050c088(0) 6.318652023 s

on one machine and:

if_delmulti_locked: detaching ifnet instance 0xffffa0000b56b800

on the other.

So I may reboot into the also-updated non-debug builds on both
machines and try in that context.

>>> Thanks for isolating this, rick
>>> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a
toy.
>> 
>> I'll warn that the primary "small arm"
development/support
>> folk(s) do not work on the RPi*'s these days, beyond
>> committing what others provide and the like.
> 





==Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

freebsd stable - May 2021 - releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)