NAGY Andreas
2018-Mar-07 17:32 UTC
RE: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client
Hi, attached the trace. If I see it correct it uses FORE_OR_BOTH. (bctsa_dir: CDFC4_FORE_OR_BOTH (0x00000003)) The trace is only with the first patch, have not compiled the wantdeleg patches so far. I think this is related to the BIND_CONN_TO_SESSION; after a disconnect the ESXi cannot connect to the NFS also with this warning: 2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361: BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP Another thing I noticed today is that it is not possible to delete a folder with the ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bug, but with NFS3 it works. Here the vmkernel.log with only one connection contains mounting, trying to delete a folder and disconnect: 2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)World: 12235: VC opID c55dbe59 maps to vmkernel opID 55bea165 2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)NFS41: NFS41_VSIMountSet:423: Mount server: 10.0.0.225, port: 2049, path: /, label: nfsds1, security: 1 user: , options: <none> 2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)StorageApdHandler: 977: APD Handle Created with lock[StorageApd-0x43046e4c6d70] 2018-03-07T16:46:04.544Z cpu11:66486)NFS41: NFS41ProcessClusterProbeResult:3873: Reclaiming state, cluster 0x43046e4c7ee0 [7] 2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: NFS41FSCompleteMount:3791: Lease time: 120 2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: NFS41FSCompleteMount:3792: Max read xfer size: 0x20000 2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: NFS41FSCompleteMount:3793: Max write xfer size: 0x20000 2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: NFS41FSCompleteMount:3794: Max file size: 0x800000000000 2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: NFS41FSCompleteMount:3795: Max file name: 255 2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)WARNING: NFS41: NFS41FSCompleteMount:3800: The max file name size (255) of file system is larger than that of FSS (128) 2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: NFS41FSAPDNotify:5960: Restored connection to the server 10.0.0.225 mount point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000 ("/") 2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: NFS41_VSIMountSet:435: nfsds1 mounted successfully 2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)World: 12235: VC opID c55dbe91 maps to vmkernel opID e47706ec 2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.870Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.870Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.870Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.870Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.871Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.871Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.871Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.871Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.872Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.872Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.872Z cpu21:67981 opID=e47706ec)WARNING: UserFile: 2155: hostd-worker: Directory changing too often to perform readdir operation (11 retries), returning busy 2018-03-07T16:47:19.874Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.875Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.875Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.875Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.875Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.876Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.876Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.876Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.876Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.877Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.877Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.877Z cpu21:67981 opID=e47706ec)WARNING: UserFile: 2155: hostd-worker: Directory changing too often to perform readdir operation (11 retries), returning busy 2018-03-07T16:47:19.879Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.879Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.879Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.880Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.880Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.880Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.880Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6398: Transient file system condition, suggest retry 2018-03-07T16:47:19.881Z cpu21:67981 opID=e47706ec)WARNING: UserFile: 2155: hostd-worker: Directory changing too often to perform readdir operation (11 retries), returning busy 2018-03-07T16:49:41.154Z cpu23:85446)WARNING: SunRPC: 3987: fail all pending calls for client 0x4303cbe221f0 IP 10.0.0.225.8.1 (socket half closed) 2018-03-07T16:49:41.155Z cpu11:66486)WARNING: NFS41: NFS41FSAPDNotify:5908: Lost connection to the server 10.0.0.225 mount point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000 ("/") -----Original Message----- From: Rick Macklem [mailto:rmacklem at uoguelph.ca] Sent: Mittwoch, 7. M?rz 2018 16:07 To: NAGY Andreas <Andreas.Nagy at frequentis.com>; 'freebsd-stable at freebsd.org' <freebsd-stable at freebsd.org> Subject: Re: NFS 4.1 RECLAIM_COMPLETE FS?failed error in combination with ESXi client NAGY Andreas wrote:>Okay, that was the main reason for using NFS 4.1. >Is it planned to implement it, or is the focus on pNFS?I took a quick look and implementing this for some cases will be pretty easy. Binding a FORE channel is implied, so for that case all the server does is reply OK to the BIND_CONN_TO_SESSION. To know if the ESXi client case is a simple one, I need to see what the BIND_CONN_TO_SESSION arguments look like. If you can capture packets for when this second connection is done and email it to me as an attachment, I can look at what the BIND_CONN_TO_SESSION args are. # tcpdump -s 0 -w <file.pcap> host <client-host-for-this-connection> run on the FreeBSD server should get the <file.pcap> I need. Alternately, if you have wireshark handy, you can just use it to look for the BIND_CONN_TO_SESSION request and see if it specifies (FORE, BACK, FORE_OR_BOTH or BACK_OR_BOTH) in it. FORE or FORE_OR_BOTH means it is easy to do and I can probably have a patch for testing in a day or two. rick -------------- next part -------------- A non-text attachment was scrubbed... Name: nfs41_trace.pcap Type: application/octet-stream Size: 4930 bytes Desc: nfs41_trace.pcap URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20180307/7631a65b/attachment.obj>
Rick Macklem
2018-Mar-08 02:48 UTC
Re: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client
NAGY Andreas wrote:>attached the trace. If I see it correct it uses FORE_OR_BOTH. (bctsa_dir: >CDFC4_FORE_OR_BOTH (0x00000003))Yes. The scary part is the ExchangeID before the BindConnectiontoSession. (Normally that is only done at the beginning of a new mount to get a ClientID, followed immediately by a CreateSession. I don't know why it would do this?) The attached patch might get BindConnectiontoSession to work. I have no way to test it beyond seeing it compile. Hopefully it will apply cleanly.>The trace is only with the first patch, have not compiled the wantdeleg patches so >far.That's fine. I don't think that matters much.>I think this is related to the BIND_CONN_TO_SESSION; after a disconnect the ESXi >cannot connect to the NFS also with this warning: >2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361: >BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPPIf the attached patch works, you'll find out what it fixes.>Another thing I noticed today is that it is not possible to delete a folder with the >ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bug, but with >NFS3 it works. > >Here the vmkernel.log with only one connection contains mounting, trying to >delete a folder and disconnect: > >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)World: 12235: VC opID >c55dbe59 maps to vmkernel opID 55bea165 >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)NFS41: >NFS41_VSIMountSet:423: Mount server: 10.0.0.225, port: 2049, path: /, label: >nfsds1, security: 1 user: , options: <none> >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)StorageApdHandler: >977: APD Handle Created with lock[StorageApd-0x43046e4c6d70] >2018-03-07T16:46:04.544Z cpu11:66486)NFS41: >NFS41ProcessClusterProbeResult:3873: Reclaiming state, cluster 0x43046e4c7ee0 >[7] >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3791: Lease time: 120 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3792: Max read xfer size: 0x20000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3793: Max write xfer size: 0x20000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3794: Max file size: 0x800000000000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3795: Max file name: 255 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)WARNING: NFS41: >NFS41FSCompleteMount:3800: The max file name size (255) of file system is >larger than that of FSS (128) >2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSAPDNotify:5960: Restored connection to the server 10.0.0.225 mount >point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000 ("/") >2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: >NFS41_VSIMountSet:435: nfsds1 mounted successfully >2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)World: 12235: VC opID >c55dbe91 maps to vmkernel opID e47706ec >2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)WARNING: NFS41: >NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6I have no idea if getting BindConnectiontoSession working will fix this or not? rick -------------- next part -------------- A non-text attachment was scrubbed... Name: bindconn.patch Type: application/octet-stream Size: 5343 bytes Desc: bindconn.patch URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20180308/320d504a/attachment.obj>