Rick Macklem
2018-Mar-08 02:48 UTC
Re: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client
NAGY Andreas wrote:>attached the trace. If I see it correct it uses FORE_OR_BOTH. (bctsa_dir: >CDFC4_FORE_OR_BOTH (0x00000003))Yes. The scary part is the ExchangeID before the BindConnectiontoSession. (Normally that is only done at the beginning of a new mount to get a ClientID, followed immediately by a CreateSession. I don't know why it would do this?) The attached patch might get BindConnectiontoSession to work. I have no way to test it beyond seeing it compile. Hopefully it will apply cleanly.>The trace is only with the first patch, have not compiled the wantdeleg patches so >far.That's fine. I don't think that matters much.>I think this is related to the BIND_CONN_TO_SESSION; after a disconnect the ESXi >cannot connect to the NFS also with this warning: >2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361: >BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPPIf the attached patch works, you'll find out what it fixes.>Another thing I noticed today is that it is not possible to delete a folder with the >ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bug, but with >NFS3 it works. > >Here the vmkernel.log with only one connection contains mounting, trying to >delete a folder and disconnect: > >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)World: 12235: VC opID >c55dbe59 maps to vmkernel opID 55bea165 >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)NFS41: >NFS41_VSIMountSet:423: Mount server: 10.0.0.225, port: 2049, path: /, label: >nfsds1, security: 1 user: , options: <none> >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)StorageApdHandler: >977: APD Handle Created with lock[StorageApd-0x43046e4c6d70] >2018-03-07T16:46:04.544Z cpu11:66486)NFS41: >NFS41ProcessClusterProbeResult:3873: Reclaiming state, cluster 0x43046e4c7ee0 >[7] >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3791: Lease time: 120 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3792: Max read xfer size: 0x20000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3793: Max write xfer size: 0x20000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3794: Max file size: 0x800000000000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSCompleteMount:3795: Max file name: 255 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)WARNING: NFS41: >NFS41FSCompleteMount:3800: The max file name size (255) of file system is >larger than that of FSS (128) >2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: >NFS41FSAPDNotify:5960: Restored connection to the server 10.0.0.225 mount >point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000 ("/") >2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: >NFS41_VSIMountSet:435: nfsds1 mounted successfully >2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)World: 12235: VC opID >c55dbe91 maps to vmkernel opID e47706ec >2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)WARNING: NFS41: >NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4c6I have no idea if getting BindConnectiontoSession working will fix this or not? rick -------------- next part -------------- A non-text attachment was scrubbed... Name: bindconn.patch Type: application/octet-stream Size: 5343 bytes Desc: bindconn.patch URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20180308/320d504a/attachment.obj>
NAGY Andreas
2018-Mar-08 14:35 UTC
RE: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client
Thanks you, really great how fast you adapt the source/make patches for this. Saw so many posts were people did not get NFS41 working with ESXi and FreeBSD and now I have it already running with your changes. I have now compiled the kernel with all 4 patches, and it works now. Some problems are still left: - the "Server returned improper reason for no delegation: 2" warnings are still in the vmkernel.log. 2018-03-08T11:41:20.290Z cpu0:68011 opID=488969b0)WARNING: NFS41: NFS41ValidateDelegation:608: Server returned improper reason for no delegation: 2 - can't delete a folder with the VMware host client datastore browser: 2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.350Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.350Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.350Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.352Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry 2018-03-08T11:34:00.352Z cpu1:67981 opID=f5159ce3)WARNING: UserFile: 2155: hostd-worker: Directory changing too often to perform readdir operation (11 retries), returning busy - after a reboot of the FreeBSD machine the ESXi does not restore the NFS datastore again with following warning (just disconnecting the links is fine) 2018-03-08T12:39:44.602Z cpu23:66484)WARNING: NFS41: NFS41_Bug:2361: BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP Actually I have only made some quick benchmarks with ATTO in a Windows VM which has a vmdk on the NFS41 datastore which is mounted over two 1GB links in different subnets. Read is nearly the double of just a single connection and write is just a bit faster. Don't know if write speed could be improved, actually the share is UFS on a HW raid controller which has local write speeds about 500MB/s. At following link is the vmkernel.log from mouning the NFS share, attaching a vmdk from the share to a Win VM, running ATTO benchmark on it, disconnecting/reconnecting network and also the problem with the BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP after reboot. Till the reboot I have also made a trace on one of the two links. (nfs41_trace_before_reboot.pcap and nfs41_trace_after_reboot.pcap) https://files.fm/u/wvybmdmc andi -----Original Message----- From: Rick Macklem [mailto:rmacklem at uoguelph.ca] Sent: Donnerstag, 8. M?rz 2018 03:48 To: NAGY Andreas <Andreas.Nagy at frequentis.com>; 'freebsd-stable at freebsd.org' <freebsd-stable at freebsd.org> Subject: Re: NFS 4.1 RECLAIM_COMPLETE FS?failed error in combination with ESXi client NAGY Andreas wrote:>attached the trace. If I see it correct it uses FORE_OR_BOTH. >(bctsa_dir: >CDFC4_FORE_OR_BOTH (0x00000003))Yes. The scary part is the ExchangeID before the BindConnectiontoSession. (Normally that is only done at the beginning of a new mount to get a ClientID, followed immediately by a CreateSession. I don't know why it would do this?) The attached patch might get BindConnectiontoSession to work. I have no way to test it beyond seeing it compile. Hopefully it will apply cleanly.>The trace is only with the first patch, have not compiled the wantdeleg patches so >far.That's fine. I don't think that matters much.>I think this is related to the BIND_CONN_TO_SESSION; after a disconnect the ESXi >cannot connect to the NFS also with this warning: >2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361: >>BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPPIf the attached patch works, you'll find out what it fixes.>Another thing I noticed today is that it is not possible to delete a folder with the >ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bug, but with >NFS3 it works. > >Here the vmkernel.log with only one connection contains mounting, trying to >delete a folder and disconnect: > >2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)World: 12235: VC >opID >c55dbe59 maps to vmkernel opID 55bea165 2018-03-07T16:46:04.543Z >cpu12:68008 opID=55bea165)NFS41: >NFS41_VSIMountSet:423: Mount server: >10.0.0.225, port: 2049, path: /, label: >nfsds1, security: 1 user: , >options: <none> 2018-03-07T16:46:04.543Z cpu12:68008 >opID=55bea165)StorageApdHandler: >977: APD Handle Created with >lock[StorageApd-0x43046e4c6d70] 2018-03-07T16:46:04.544Z >cpu11:66486)NFS41: >NFS41ProcessClusterProbeResult:3873: Reclaiming >state, cluster 0x43046e4c7ee0 >[7] 2018-03-07T16:46:04.545Z cpu12:68008 >opID=55bea165)NFS41: >NFS41FSCompleteMount:3791: Lease time: 120 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >>NFS41FSCompleteMount:3792: Max read xfer size: 0x20000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >>NFS41FSCompleteMount:3793: Max write xfer size: 0x20000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >>NFS41FSCompleteMount:3794: Max file size: 0x800000000000 >2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: >>NFS41FSCompleteMount:3795: Max file name: 255 2018-03-07T16:46:04.545Z >cpu12:68008 opID=55bea165)WARNING: NFS41: >NFS41FSCompleteMount:3800: >The max file name size (255) of file system is >larger than that of FSS >(128) 2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: >>NFS41FSAPDNotify:5960: Restored connection to the server 10.0.0.225 >mount >point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000 >("/") 2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: >>NFS41_VSIMountSet:435: nfsds1 mounted successfully >2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)World: 12235: VC >opID >c55dbe91 maps to vmkernel opID e47706ec 2018-03-07T16:47:19.869Z >cpu21:67981 opID=e47706ec)WARNING: NFS41: >NFS41FileOpReaddir:4728: >Failed to process READDIR result for fh 0x43046e4c6I have no idea if getting BindConnectiontoSession working will fix this or not? rick