NAGY Andreas
2018-Mar-07 17:32 UTC
RE: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client
Hi,
attached the trace. If I see it correct it uses FORE_OR_BOTH. (bctsa_dir:
CDFC4_FORE_OR_BOTH (0x00000003))
The trace is only with the first patch, have not compiled the wantdeleg patches
so far.
I think this is related to the BIND_CONN_TO_SESSION; after a disconnect the ESXi
cannot connect to the NFS also with this warning:
2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361: BUG -
Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP
Another thing I noticed today is that it is not possible to delete a folder with
the ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bug, but with
NFS3 it works.
Here the vmkernel.log with only one connection contains mounting, trying to
delete a folder and disconnect:
2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)World: 12235: VC opID
c55dbe59 maps to vmkernel opID 55bea165
2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)NFS41: NFS41_VSIMountSet:423:
Mount server: 10.0.0.225, port: 2049, path: /, label: nfsds1, security: 1 user:
, options: <none>
2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)StorageApdHandler: 977: APD
Handle Created with lock[StorageApd-0x43046e4c6d70]
2018-03-07T16:46:04.544Z cpu11:66486)NFS41: NFS41ProcessClusterProbeResult:3873:
Reclaiming state, cluster 0x43046e4c7ee0 [7]
2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41:
NFS41FSCompleteMount:3791: Lease time: 120
2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41:
NFS41FSCompleteMount:3792: Max read xfer size: 0x20000
2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41:
NFS41FSCompleteMount:3793: Max write xfer size: 0x20000
2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41:
NFS41FSCompleteMount:3794: Max file size: 0x800000000000
2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41:
NFS41FSCompleteMount:3795: Max file name: 255
2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)WARNING: NFS41:
NFS41FSCompleteMount:3800: The max file name size (255) of file system is larger
than that of FSS (128)
2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: NFS41FSAPDNotify:5960:
Restored connection to the server 10.0.0.225 mount point nfsds1, mounted as
1a7893c8-eec764a7-0000-000000000000 ("/")
2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: NFS41_VSIMountSet:435:
nfsds1 mounted successfully
2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)World: 12235: VC opID
c55dbe91 maps to vmkernel opID e47706ec
2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.870Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.870Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.870Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.870Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.871Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.871Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.871Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.871Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.872Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.872Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.872Z cpu21:67981 opID=e47706ec)WARNING: UserFile: 2155:
hostd-worker: Directory changing too often to perform readdir operation (11
retries), returning busy
2018-03-07T16:47:19.874Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.875Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.875Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.875Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.875Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.876Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.876Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.876Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.876Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.877Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.877Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.877Z cpu21:67981 opID=e47706ec)WARNING: UserFile: 2155:
hostd-worker: Directory changing too often to perform readdir operation (11
retries), returning busy
2018-03-07T16:47:19.879Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.879Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.879Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.880Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.880Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.880Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.880Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: NFS41:
NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398:
Transient file system condition, suggest retry
2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: UserFile: 2155:
hostd-worker: Directory changing too often to perform readdir operation (11
retries), returning busy
2018-03-07T16:49:41.154Z cpu23:85446)WARNING: SunRPC: 3987: fail all pending
calls for client 0x4303cbe221f0 IP 10.0.0.225.8.1 (socket half closed)
2018-03-07T16:49:41.155Z cpu11:66486)WARNING: NFS41: NFS41FSAPDNotify:5908: Lost
connection to the server 10.0.0.225 mount point nfsds1, mounted as
1a7893c8-eec764a7-0000-000000000000 ("/")
-----Original Message-----
From: Rick Macklem [mailto:rmacklem at uoguelph.ca]
Sent: Mittwoch, 7. M?rz 2018 16:07
To: NAGY Andreas <Andreas.Nagy at frequentis.com>; 'freebsd-stable at
freebsd.org' <freebsd-stable at freebsd.org>
Subject: Re: NFS 4.1 RECLAIM_COMPLETE FS?failed error in combination with ESXi
client
NAGY Andreas wrote:>Okay, that was the main reason for using NFS 4.1.
>Is it planned to implement it, or is the focus on pNFS?
I took a quick look and implementing this for some cases will be pretty easy.
Binding a FORE channel is implied, so for that case all the server does is reply
OK to the BIND_CONN_TO_SESSION.
To know if the ESXi client case is a simple one, I need to see what the
BIND_CONN_TO_SESSION arguments look like.
If you can capture packets for when this second connection is done and email it
to me as an attachment, I can look at what the BIND_CONN_TO_SESSION args are.
# tcpdump -s 0 -w <file.pcap> host <client-host-for-this-connection>
run on the FreeBSD server should get the <file.pcap> I need.
Alternately, if you have wireshark handy, you can just use it to look for the
BIND_CONN_TO_SESSION request and see if it specifies (FORE, BACK, FORE_OR_BOTH
or BACK_OR_BOTH) in it.
FORE or FORE_OR_BOTH means it is easy to do and I can probably have a patch for
testing in a day or two.
rick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nfs41_trace.pcap
Type: application/octet-stream
Size: 4930 bytes
Desc: nfs41_trace.pcap
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20180307/7631a65b/attachment.obj>
Rick Macklem
2018-Mar-08 02:48 UTC
Re: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client
NAGY Andreas wrote:>attached the trace. If I see it correct it uses FORE_OR_BOTH. (bctsa_dir: >CDFC4_FORE_OR_BOTH (0x00000003))Yes. The scary part is the ExchangeID before the BindConnectiontoSession. (Normally that is only done at the beginning of a new mount to get a ClientID, followed immediately by a CreateSession. I don't know why it would do this?) The attached patch might get BindConnectiontoSession to work. I have no way to test it beyond seeing it compile. Hopefully it will apply cleanly.>The trace is only with the first patch, have not compiled the wantdeleg patches so >far.That's fine. I don't think that matters much.>I think this is related to the BIND_CONN_TO_SESSION; after a disconnect the ESXi >cannot connect to the NFS also with this warning: >2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361: >BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPPIf the attached patch works, you'll find out what it fixes.>Another thing I noticed today is that it is not possible to delete a folder with the >ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bug, but with >NFS3 it works. > >Here the vmkernel.log with only one connection contains mounting, trying to >delete a folder and disconnect: > >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)World: 12235: VC opID >c55dbe59 maps to vmkernel opID 55bea165 >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)NFS41: >NFS41_VSIMountSet:423: Mount server: 10.0.0.225, port: 2049, path: /, label: >nfsds1, security: 1 user: , options: <none> >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)StorageApdHandler: >977: APD Handle Created with lock[StorageApd-0x43046e4c6d70] >2018-03-07T16:46:04.544Z cpu11:66486)NFS41: >NFS41ProcessClusterProbeResult:3873: Reclaiming state, cluster 0x43046e4c7ee0 >[7] >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3791: Lease time: 120 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3792: Max read xfer size: 0x20000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3793: Max write xfer size: 0x20000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3794: Max file size: 0x800000000000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3795: Max file name: 255 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)WARNING: NFS41: >NFS41FSCompleteMount:3800: The max file name size (255) of file system is >larger than that of FSS (128) >2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSAPDNotify:5960: Restored connection to the server 10.0.0.225 mount >point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000 ("/") >2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: >NFS41_VSIMountSet:435: nfsds1 mounted successfully >2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)World: 12235: VC opID >c55dbe91 maps to vmkernel opID e47706ec >2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)WARNING: NFS41: >NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6I have no idea if getting BindConnectiontoSession working will fix this or not? rick -------------- next part -------------- A non-text attachment was scrubbed... Name: bindconn.patch Type: application/octet-stream Size: 5343 bytes Desc: bindconn.patch URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20180308/320d504a/attachment.obj>