Hi, We've got a system where staff use Samba mounts on their Windows desktops to drop files into a Linux directory for further processing. Some of those files are large, and take time for the file copy across Samba to complete. The problem is that looking at the directory from the Linux side, to see if there are new files to process, the directory listing for the files-copied-across-by-Samba looks the same for complete files as for partial ones - same file name, same perms. We have been handling this by a script which checks for files whose size hasn't increased in the last X seconds. That's not only an ugly kludge, but fails if system load or network congestion stalls the file transfer for too long - the partial file then gets "recognized" as complete when its not, and taken for further processing when it shouldn't be yet. There's got to be a better way. Looking at the docs I see options for different ways to lock files against being written to by two users at once. But I don't see anything to prevent a file from being read by one user as it's being written to by another - or in the initial process of being written to the directory location through copying, as in our case. I've tried, of course, googling the list archives. Maybe my search terms are poorly chosen. This has to be a problem that's been solved before, right? Whit
On 20 Aug 2010, at 21:49, Whit Blauvelt <whit+samba at transpect.com> wrote:> Hi, > > We've got a system where staff use Samba mounts on their Windows desktops to > drop files into a Linux directory for further processing. Some of those > files are large, and take time for the file copy across Samba to complete.Look into inotifywait. It can trigger actions on file close events. Your file will be closed when samba has finished writing to it.
On Fri, Aug 20, 2010 at 04:49:11PM -0400, Whit Blauvelt wrote:> Hi, > > We've got a system where staff use Samba mounts on their Windows desktops to > drop files into a Linux directory for further processing. Some of those > files are large, and take time for the file copy across Samba to complete. > > The problem is that looking at the directory from the Linux side, to see if > there are new files to process, the directory listing for the > files-copied-across-by-Samba looks the same for complete files as for > partial ones - same file name, same perms. We have been handling this by a > script which checks for files whose size hasn't increased in the last X > seconds. That's not only an ugly kludge, but fails if system load or network > congestion stalls the file transfer for too long - the partial file then > gets "recognized" as complete when its not, and taken for further processing > when it shouldn't be yet. > > There's got to be a better way. Looking at the docs I see options for > different ways to lock files against being written to by two users at once. > But I don't see anything to prevent a file from being read by one user as > it's being written to by another - or in the initial process of being > written to the directory location through copying, as in our case. > > I've tried, of course, googling the list archives. Maybe my search terms are > poorly chosen. This has to be a problem that's been solved before, right?Outside of Samba, there's no way to know when Samba is finished with a file. You could write a Windows app to do the copies, which does a share mode DENY_ALL over the file, but there's no guarantee that local POSIX apps on the Linux side will see it. One way of testing that Samba is finished with writing a file from a POSIX shell script is to write a custom program using the libsmbclient library to open the required file, and set the share mode using smbc_setOptionOpenShareMode() to be DENY_ALL. Such a program will only allow an open to succeed if there are no other openers in Samba. Otherwise you can write a VFS module in Samba that notifies an external program when a file in a particular share is closed, with no other openers. That might fit your specific case best. Jeremy.
On 20 August 2010 22:49, Whit Blauvelt <whit+samba at transpect.com> wrote:> Hi, > > We've got a system where staff use Samba mounts on their Windows desktops to > drop files into a Linux directory for further processing. Some of those > files are large, and take time for the file copy across Samba to complete. > > The problem is that looking at the directory from the Linux side, to see if > there are new files to process, the directory listing for the > files-copied-across-by-Samba looks the same for complete files as for > partial ones - same file name, same perms. We have been handling this by a > script which checks for files whose size hasn't increased in the last X > seconds. That's not only an ugly kludge, but fails if system load or network > congestion stalls the file transfer for too long - the partial file then > gets "recognized" as complete when its not, and taken for further processing > when it shouldn't be yet. > > There's got to be a better way. Looking at the docs I see options for > different ways to lock files against being written to by two users at once. > But I don't see anything to prevent a file from being read by one user as > it's being written to by another - or in the initial process of being > written to the directory location through copying, as in our case. > > I've tried, of course, googling the list archives. Maybe my search terms are > poorly chosen. This has to be a problem that's been solved before, right?I think the simplest way to do this, if you have some control over the clients, is to get them to upload to a temporary name and rename to the real name once the upload has completed. Then get the process that looks for files to ignore the ones with the temporary names. If you can't control the filenames, try using something like incron to process the files only when they have been closed. Or use lsof to check if samba still has the files open. I don't know how reliable that will be, but perhaps it's another heuristic to add to your "not changed size for a while" check. Otherwise, you'll have to do as Jeremy suggests, of course. -- Michael Wood <esiotrot at gmail.com>
Thanks much for the suggestions. This may be a dead end then, but trying to get just a bit clearer on the implications.... On Fri, Aug 20, 2010 at 04:46:20PM -0700, Jeremy Allison wrote:> Outside of Samba, there's no way to know when Samba is finished > with a file. You could write a Windows app to do the copies, which > does a share mode DENY_ALL over the file, but there's no guarantee > that local POSIX apps on the Linux side will see it.So inside Samba there's a way to know? Would it be possible, with the right hooks into Samba, to query: "I see a file. Are you done with it?"> One way of testing that Samba is finished with writing a file > from a POSIX shell script is to write a custom program using > the libsmbclient library to open the required file, and set > the share mode using smbc_setOptionOpenShareMode() to be > DENY_ALL. Such a program will only allow an open to succeed > if there are no other openers in Samba.Do I take it that this would only apply if the "open" were made through Samba? I ask since the file is being placed there through Samba, but taken for other uses by local *nix methods.> Otherwise you can write a VFS module in Samba that notifies > an external program when a file in a particular share is > closed, with no other openers. That might fit your specific > case best.Probably true, aside from the "you can write" part - that's beyond my C skills. At http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/VFS.html there's mention of a "audit" and "extd_audit" modules that can log among other things "file close." If we can have a log of when files are closed, we can have our Linux-side scripting only grab those files which are logged as closed. For that matter, we can have a primary and secondary directory, with logic like "if file in primary directory, if logged as closed, move to secondary directory; if file in secondary directory, process." There'd need to be some comparison of file time and log time, so that if a file by the same name comes in twice the check to the log doesn't misidentify it as closed, but on first glance this looks workable. Thanks. Whit