Michael Weiser
2022-Sep-21 14:54 UTC
[Samba] Odd "File exists" behavior when copy-pasting many files to an SMB exported Lustre FS
Hello Jeremy, Hello Christian, Hi all,> >Resident wild-guy Christian here. In summary, I'm seeing an odd problem > >from both Windows Explorer and Mac Finder when copy-pasting a large batch > >of files to a Lustre-backed SMB share wherein the file manager appears to > >enumerate over all the files to determine if they can be made in the > >destination, makes 0-byte stubs of each file, and then proudly states that > >the files already exist in the destination. This behavior is > >less-than-desirable and I was hoping someone else here had run into it, or > >someone better than I at diagnosing SMB weirdness could give me a hand > >debugging. I'm inclined to think it's a Lustre idiosyncracy, since this > >does not occur on a ZFS share.> If it doesn't happen on a ZFS share, but does on a Lustre one then > it's certainly a Lustre oddity. Try looking at debug logs to figure > out what Lustre is doing that smbd doesn't like.A client has now also run into this behaviour and I think I've made some headway determining what's going on. To recap, we're seeing what Christian described in his original report with some additional details: - copying files from a local NTFS filesystem on Windows 10 to a Lustre share works fine - duplicating files *on* the Lustre share shows the problematic behaviour: for each file Windows complains that the file already exists and indeed, a zero-sized file of that name has been created where before there was none - copying a file off of the Lustre share onto the local NTFS works fine (obvs.) but copying it back onto the Lustre share again shows the problem - so it is travelling with the files and suggests that it's not related to sever-side-copying - switching option ea support to no works around the problem - so it seems to be related to extended attributes The relevant part of a level 10 debug log reads: [2022/09/21 15:50:57.127478, 10, pid=1967443, effective(200737, 130), real(200737, 0), class=locking] ../../source3/locking/share_mode_lock.c:171(share_mode_memcache_store) share_mode_memcache_store: stored entry for file blorf/blorf/blorf/blim.txt epoch d7ce6bcd04632a key 3188158982:144115340681419950:0 [2022/09/21 15:50:57.127601, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:335(get_ea_names_from_fsp) get_ea_names_from_fsp: ea_namelist size = 11 [2022/09/21 15:50:57.127612, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:262(get_ea_value_fsp) get_ea_value: EA lustre.lov is of length 56 [2022/09/21 15:50:57.127616, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../lib/util/util.c:570(dump_data) [0000] D0 0B D1 0B 01 00 00 00 AE 1C 00 00 00 00 00 00 ........ ........ [0010] 88 23 00 00 02 00 00 00 00 00 10 00 01 00 00 00 .#...... ........ [0020] 26 CB 2C 00 00 00 00 00 00 00 00 00 00 00 00 00 &.,..... ........ [0030] 00 00 00 00 08 00 00 00 ........ [2022/09/21 15:50:57.127641, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:508(get_ea_list_from_fsp) get_ea_list_from_file: total_len = 71, lustre.lov, val len = 56 [2022/09/21 15:50:57.127646, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:520(get_ea_list_from_fsp) get_ea_list_from_file: total_len = 75 [2022/09/21 15:50:57.127651, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:726(canonicalize_ea_name) canonicalize_ea_name: LUSTRE.LOV -> lustre.lov [2022/09/21 15:50:57.127655, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:786(set_ea) set_ea: ea_name user.lustre.lov ealen = 56 [2022/09/21 15:50:57.127660, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:810(set_ea) set_ea: setting ea name user.lustre.lov on file blorf/blorf/blorf/blim.txt by file descriptor. [2022/09/21 15:50:57.127668, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/open.c:6066(create_file_unixpath) create_file_unixpath: NT_STATUS_EAS_NOT_SUPPORTED So create_file_unixpath errors out with NT_STATUS_EAS_NOT_SUPPORTED because it cannot set extended attribute user.lustre.lov in Lustre. It's this codepath: https://github.com/samba-team/samba/blob/d9dda4b7af284ecbee4d04a89bd16fc0098e2931/source3/smbd/open.c#L6395 My guess (with somewhat fuzzy knowledge on Windows API behaviour) is that Windows Explorer does expect a CreateFile call with extended attributes to fail atomically and gets confused by the fact that samba leaves the file it just created but couldn't set the attributes on lying around after erroring out of the open. And indeed, patching the code path like so avoids the problem: --- samba-4.17.0.orig/source3/smbd/open.c 2022-09-06 16:22:03.302113000 +0200 +++ samba-4.17.0/source3/smbd/open.c 2022-09-21 12:38:22.724000000 +0200 @@ -6098,6 +6098,13 @@ ((info == FILE_WAS_CREATED) || (info == FILE_WAS_OVERWRITTEN))) { status = set_ea(conn, fsp, ea_list); if (!NT_STATUS_IS_OK(status)) { + DEBUG(10, ("set_ea failed: %s\n", nt_errstr(status))); + if (info == FILE_WAS_CREATED) { + DEBUG(10, ("file was created - unlinking\n")); + SMB_VFS_UNLINKAT(conn, dirfsp, + smb_fname_atname, 0); + } + goto fail; } } That leaves the question, where that extended attribute user.lustre.lov is coming from. It appears that Lustre itself exposes a number of extended attributes for every file which reflect internal housekeeping data: $ getfattr -m - blarf # file: blarf lustre.lov trusted.link trusted.lma trusted.lov $ getfattr -n lustre.lov blarf # file: blarf lustre.lov=0s0AvRCwEAAABgJAEAAAAAAIQjAAACAAAAAAAQAAEAAAAAqAAAAAAAAAAAAAAAAAAAAAAAAAYAAAA Unprivileged users only get to see the lustre.lov attribute:> getfattr -m - blarf# file: blarf lustre.lov> getfattr -n lustre.lov blarf# file: blarf lustre.lov=0s0AvRCwEAAABgJAEAAAAAAIQjAAACAAAAAAAQAAEAAAAAqAAAAAAAAAAAAAAAAAAAAAAAAAYAAAA That attribute is exposed to Windows by samba and even travels with the file to a local NTFS file system which explains my third observation from above. Verified with a small program I wrote leveraging BackupRead (attached): C:\Users\user>get_ea "blorf\blarf" LUSTRE.LOV=<binary garbage because I couldn't be bothered to base64 encode it> While I can also see a QueryEAFile call in ProcMon on Windows, its UI unfortunately is not able to show me any information about what extended attributes are returned from it or passed to the failing CreateFile. While exposing the lustre.lov attribute to unprivileged users, Lustre (at least by default) does not support user xattrs and throws ENOTSUP when trying anything of that sort, particularly setting them:> getfattr -n user.foo blarfblarf: user.foo: Operation not supported> setfattr -n user.foo -v dummy blarfsetfattr: blarf: Operation not supported This would explain why samba gets to see the attribute but runs into ENOTSUP when trying to set it, even after trying very hard to make it work by mapping it from that (non-existant) lustre.* namespace into the user namespace as user.lustre.lov. Finally, I was also able to avoid the problem by writing a small VFS module which hooks getxattr and listxattr and filters out the lustre.lov attribute so it's never exposed to the client (attached). What's your take on all this? What could a proper solution look like? Additional info: For debugging purposes I was able to replicate the exact same behaviour with fuse and sshfs: sshfs does not normally implement extended attributes. Adding getxattr and listxattr functions which synthesize a dummy lustre.lov attribute one can observe exactly the same behaviour without spinning up a whole Lustre. (attached as well) -- Thanks in advance, Michael ________________________________________ From: samba <samba-bounces at lists.samba.org> on behalf of Jeremy Allison via samba <samba at lists.samba.org> Sent: 05 April 2022 04:31:52 To: Christian Kuntz Cc: sambalist Subject: Re: [Samba] Odd "File exists" behavior when copy-pasting many files to an SMB exported Lustre FS Caution! External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe. On Mon, Apr 04, 2022 at 03:26:03PM -0700, Christian Kuntz via samba wrote:>Hi all, > >Resident wild-guy Christian here. In summary, I'm seeing an odd problem >from both Windows Explorer and Mac Finder when copy-pasting a large batch >of files to a Lustre-backed SMB share wherein the file manager appears to >enumerate over all the files to determine if they can be made in the >destination, makes 0-byte stubs of each file, and then proudly states that >the files already exist in the destination. This behavior is >less-than-desirable and I was hoping someone else here had run into it, or >someone better than I at diagnosing SMB weirdness could give me a hand >debugging. I'm inclined to think it's a Lustre idiosyncracy, since this >does not occur on a ZFS share.If it doesn't happen on a ZFS share, but does on a Lustre one then it's certainly a Lustre oddity. Try looking at debug logs to figure out what Lustre is doing that smbd doesn't like. -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba -------------- next part -------------- A non-text attachment was scrubbed... Name: samba-4.15.5-vfs_filter_lustre_private_xattr.patch Type: text/x-patch Size: 5412 bytes Desc: samba-4.15.5-vfs_filter_lustre_private_xattr.patch URL: <http://lists.samba.org/pipermail/samba/attachments/20220921/85d2ff2b/samba-4.15.5-vfs_filter_lustre_private_xattr.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: sshfs-lustre-lov.patch Type: text/x-patch Size: 1102 bytes Desc: sshfs-lustre-lov.patch URL: <http://lists.samba.org/pipermail/samba/attachments/20220921/85d2ff2b/sshfs-lustre-lov.bin>
Jeremy Allison
2022-Sep-21 16:35 UTC
[Samba] Odd "File exists" behavior when copy-pasting many files to an SMB exported Lustre FS
On Wed, Sep 21, 2022 at 02:54:50PM +0000, Michael Weiser via samba wrote:>Hello Jeremy, >Hello Christian, >Hi all, > >> >Resident wild-guy Christian here. In summary, I'm seeing an odd problem >> >from both Windows Explorer and Mac Finder when copy-pasting a large batch >> >of files to a Lustre-backed SMB share wherein the file manager appears to >> >enumerate over all the files to determine if they can be made in the >> >destination, makes 0-byte stubs of each file, and then proudly states that >> >the files already exist in the destination. This behavior is >> >less-than-desirable and I was hoping someone else here had run into it, or >> >someone better than I at diagnosing SMB weirdness could give me a hand >> >debugging. I'm inclined to think it's a Lustre idiosyncracy, since this >> >does not occur on a ZFS share. > >> If it doesn't happen on a ZFS share, but does on a Lustre one then >> it's certainly a Lustre oddity. Try looking at debug logs to figure >> out what Lustre is doing that smbd doesn't like. > >A client has now also run into this behaviour and I think I've made some headway determining what's going on. To recap, we're seeing what Christian described in his original report with some additional details: > >- copying files from a local NTFS filesystem on Windows 10 to a Lustre share works fine >- duplicating files *on* the Lustre share shows the problematic behaviour: for each file Windows complains that the file already exists and indeed, a zero-sized file of that name has been created where before there was none >- copying a file off of the Lustre share onto the local NTFS works fine (obvs.) but copying it back onto the Lustre share again shows the problem - so it is travelling with the files and suggests that it's not related to sever-side-copying >- switching option ea support to no works around the problem - so it seems to be related to extended attributes > >The relevant part of a level 10 debug log reads: > >[2022/09/21 15:50:57.127478, 10, pid=1967443, effective(200737, 130), real(200737, 0), class=locking] ../../source3/locking/share_mode_lock.c:171(share_mode_memcache_store) > share_mode_memcache_store: stored entry for file blorf/blorf/blorf/blim.txt epoch d7ce6bcd04632a key 3188158982:144115340681419950:0 >[2022/09/21 15:50:57.127601, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:335(get_ea_names_from_fsp) > get_ea_names_from_fsp: ea_namelist size = 11 >[2022/09/21 15:50:57.127612, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:262(get_ea_value_fsp) > get_ea_value: EA lustre.lov is of length 56 >[2022/09/21 15:50:57.127616, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../lib/util/util.c:570(dump_data) > [0000] D0 0B D1 0B 01 00 00 00 AE 1C 00 00 00 00 00 00 ........ ........ > [0010] 88 23 00 00 02 00 00 00 00 00 10 00 01 00 00 00 .#...... ........ > [0020] 26 CB 2C 00 00 00 00 00 00 00 00 00 00 00 00 00 &.,..... ........ > [0030] 00 00 00 00 08 00 00 00 ........ >[2022/09/21 15:50:57.127641, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:508(get_ea_list_from_fsp) > get_ea_list_from_file: total_len = 71, lustre.lov, val len = 56 >[2022/09/21 15:50:57.127646, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:520(get_ea_list_from_fsp) > get_ea_list_from_file: total_len = 75 >[2022/09/21 15:50:57.127651, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:726(canonicalize_ea_name) > canonicalize_ea_name: LUSTRE.LOV -> lustre.lov >[2022/09/21 15:50:57.127655, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:786(set_ea) > set_ea: ea_name user.lustre.lov ealen = 56 >[2022/09/21 15:50:57.127660, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/trans2.c:810(set_ea) > set_ea: setting ea name user.lustre.lov on file blorf/blorf/blorf/blim.txt by file descriptor. >[2022/09/21 15:50:57.127668, 10, pid=1967443, effective(200737, 130), real(200737, 0)] ../../source3/smbd/open.c:6066(create_file_unixpath) > create_file_unixpath: NT_STATUS_EAS_NOT_SUPPORTED > >So create_file_unixpath errors out with NT_STATUS_EAS_NOT_SUPPORTED because it cannot set extended attribute user.lustre.lov in Lustre. It's this codepath: > >https://github.com/samba-team/samba/blob/d9dda4b7af284ecbee4d04a89bd16fc0098e2931/source3/smbd/open.c#L6395 > >My guess (with somewhat fuzzy knowledge on Windows API behaviour) is that Windows Explorer does expect a CreateFile call with extended attributes to fail atomically and gets confused by the fact that samba leaves the file it just created but couldn't set the attributes on lying around after erroring out of the open. And indeed, patching the code path like so avoids the problem: > >--- samba-4.17.0.orig/source3/smbd/open.c 2022-09-06 16:22:03.302113000 +0200 >+++ samba-4.17.0/source3/smbd/open.c 2022-09-21 12:38:22.724000000 +0200 >@@ -6098,6 +6098,13 @@ > ((info == FILE_WAS_CREATED) || (info == FILE_WAS_OVERWRITTEN))) { > status = set_ea(conn, fsp, ea_list); > if (!NT_STATUS_IS_OK(status)) { >+ DEBUG(10, ("set_ea failed: %s\n", nt_errstr(status))); >+ if (info == FILE_WAS_CREATED) { >+ DEBUG(10, ("file was created - unlinking\n")); >+ SMB_VFS_UNLINKAT(conn, dirfsp, >+ smb_fname_atname, 0); >+ } >+ > goto fail; > } > } > >That leaves the question, where that extended attribute user.lustre.lov is coming from. It appears that Lustre itself exposes a number of extended attributes for every file which reflect internal housekeeping data: > >$ getfattr -m - blarf ># file: blarf >lustre.lov >trusted.link >trusted.lma >trusted.lov > >$ getfattr -n lustre.lov blarf ># file: blarf >lustre.lov=0s0AvRCwEAAABgJAEAAAAAAIQjAAACAAAAAAAQAAEAAAAAqAAAAAAAAAAAAAAAAAAAAAAAAAYAAAA> >Unprivileged users only get to see the lustre.lov attribute: > >> getfattr -m - blarf ># file: blarf >lustre.lov > >> getfattr -n lustre.lov blarf ># file: blarf >lustre.lov=0s0AvRCwEAAABgJAEAAAAAAIQjAAACAAAAAAAQAAEAAAAAqAAAAAAAAAAAAAAAAAAAAAAAAAYAAAA> >That attribute is exposed to Windows by samba and even travels with the file to a local NTFS file system which explains my third observation from above. Verified with a small program I wrote leveraging BackupRead (attached): > >C:\Users\user>get_ea "blorf\blarf" >LUSTRE.LOV=<binary garbage because I couldn't be bothered to base64 encode it> > >While I can also see a QueryEAFile call in ProcMon on Windows, its UI unfortunately is not able to show me any information about what extended attributes are returned from it or passed to the failing CreateFile. > >While exposing the lustre.lov attribute to unprivileged users, Lustre (at least by default) does not support user xattrs and throws ENOTSUP when trying anything of that sort, particularly setting them: > >> getfattr -n user.foo blarf >blarf: user.foo: Operation not supported >> setfattr -n user.foo -v dummy blarf >setfattr: blarf: Operation not supported > >This would explain why samba gets to see the attribute but runs into ENOTSUP when trying to set it, even after trying very hard to make it work by mapping it from that (non-existant) lustre.* namespace into the user namespace as user.lustre.lov. > >Finally, I was also able to avoid the problem by writing a small VFS module which hooks getxattr and listxattr and filters out the lustre.lov attribute so it's never exposed to the client (attached). > >What's your take on all this? >What could a proper solution look like? > >Additional info: For debugging purposes I was able to replicate the exact same behaviour with fuse and sshfs: sshfs does not normally implement extended attributes. Adding getxattr and listxattr functions which synthesize a dummy lustre.lov attribute one can observe exactly the same behaviour without spinning up a whole Lustre. (attached as well)Great analysis Michael ! As we're emulating NTFS CreateFile we can't do the 'create with EA's' atomically. Lutre really should not be exposing EA's to callers if it doesn't actually support EA's. An elegant solution might be to add a Samba VFS module vfs_lustre.c that intercepts fgetxattr/fsetxattr/flistxattr calls and simply strips out the lustre.lov EA's from being seen. Can you log a bug in our bugzilla and upload all this info so we can track it ? Thanks, Jeremy.