Hello everyone, I am having a problem with rsync that I hope someone can help me figure out. We are using rsync to sync up files between our staging and production ftp servers. Basically internal users are allowed to upload files via a samba share to a staging server. Those files are then synced out every 15 minutes via cron to our production ftp servers. The problem occurs when a large file is being upload from a windows machine via the samba share. If a rsync is instantiated during the time that the file is being uploaded the destination machines get the file with the correct filesize and timestamp. Unfortunately, even though the file shows the correct size it is not a good copy of the file. An md5sum of the files on both the source and destination machine returns different results. I believe this occurs because windows automatically "reserves" the full size of the file and fills it out with 0s and then overwrites this as it goes along copying. This wouldn't be a big deal except, subsequent runs of rsync (even with -c) fail to overwrite the file. I have tested running an rsync while I was copying the file over to the source directory locally (not via samba) and the file was corrupted. However after running an rsync again, the file was updated. Basically the problem only occurs when uploading the file via the samba share. I know I could use the -I flag to ignore times and filesizes, but from my understanding this would result in resyncing every file every time, and this is not what we want. We are currently using rsync 2.5.5 on the source machine, and 2.5.7 on the 2 destination machines, but I have also tested using 2.6.8 on both source and destination with the same results. Has anyone else experienced this problem before, or have any ideas for a fix? Let me know if I'm not being clear enough, or any other information I can provide. ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ Mark Osborne Web Systems Engineer mark.osborne@ni.com (512) 683-5019 ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ -------------- next part -------------- HTML attachment scrubbed and removed
Cheap shot that might be effective. Something like this might work. On the samba share, after the rsync has finished, run a script that touches any file that was last modified in the last few minutes. This marks files that should be retransmitted because they might have their contents changed without the directory entry being changed. Basically you want to run rsync on the file while windows is writing to the file and then (once) run rsync on the file after window has finished. It's a fundamental difference in approach :: windows : unix -----Original Message----- From: rsync-bounces+tony=servacorp.com@lists.samba.org [mailto:rsync-bounces+tony=servacorp.com@lists.samba.org]On Behalf Of Mark Osborne Sent: Monday, November 06, 2006 2:37 PM To: rsync@lists.samba.org Subject: rsync not updating files even with checksum flag Hello everyone, I am having a problem with rsync that I hope someone can help me figure out. We are using rsync to sync up files between our staging and production ftp servers. Basically internal users are allowed to upload files via a samba share to a staging server. Those files are then synced out every 15 minutes via cron to our production ftp servers. The problem occurs when a large file is being upload from a windows machine via the samba share. If a rsync is instantiated during the time that the file is being uploaded the destination machines get the file with the correct filesize and timestamp. Unfortunately, even though the file shows the correct size it is not a good copy of the file. An md5sum of the files on both the source and destination machine returns different results. I believe this occurs because windows automatically "reserves" the full size of the file and fills it out with 0s and then overwrites this as it goes along copying. This wouldn't be a big deal except, subsequent runs of rsync (even with -c) fail to overwrite the file. I have tested running an rsync while I was copying the file over to the source directory locally (not via samba) and the file was corrupted. However after running an rsync again, the file was updated. Basically the problem only occurs when uploading the file via the samba share. I know I could use the -I flag to ignore times and filesizes, but from my understanding this would result in resyncing every file every time, and this is not what we want. We are currently using rsync 2.5.5 on the source machine, and 2.5.7 on the 2 destination machines, but I have also tested using 2.6.8 on both source and destination with the same results. Has anyone else experienced this problem before, or have any ideas for a fix? Let me know if I'm not being clear enough, or any other information I can provide. ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ Mark Osborne Web Systems Engineer mark.osborne@ni.com (512) 683-5019 ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ -------------- next part -------------- HTML attachment scrubbed and removed
Mark Osborne wrote:> > Hello everyone, > > I am having a problem with rsync that I hope someone can help me > figure out. We are using rsync to sync up files between our staging > and production ftp servers. Basically internal users are allowed to > upload files via a samba share to a staging server. Those files are > then synced out every 15 minutes via cron to our production ftp servers. > > The problem occurs when a large file is being upload from a windows > machine via the samba share. If a rsync is instantiated during the > time that the file is being uploaded the destination machines get the > file with the correct filesize and timestamp. Unfortunately, even > though the file shows the correct size it is not a good copy of the > file. An md5sum of the files on both the source and destination > machine returns different results. I believe this occurs because > windows automatically "reserves" the full size of the file and fills > it out with 0s and then overwrites this as it goes along copying. > This wouldn't be a big deal except, subsequent runs of rsync (even > with -c) fail to overwrite the file. >You could do what we do. We use "smbstatus" first to check that there are no SMB locks present (on files only - you can ignore the ones on dirs), then rename the new files out to a staging dir - then rsync that instead. It guarantees the files have been finished, and even lets the Windows users know the files have been picked up (as they disappear) -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +64 3 9635 377 Fax: +64 3 9635 417 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
On Mon, Nov 06, 2006 at 02:36:58PM -0600, Mark Osborne wrote:> This wouldn't be a big deal except, subsequent runs of rsync (even > with -c) fail to overwrite the file.The only way this should be able to happen is when both files have an identical md4 checksum. This is possible when the files aren't identical, but should be pretty rare. You might want to check the contents of the corrupted file to see how the data differs, as it might shed some light on what weirdness the samba server is causing. ..wayne..
I get this with Outlook (.pst) files when the change is small (say one additional text email) Mark Osborne-2 wrote:> > Hello everyone, > > I am having a problem with rsync that I hope someone can help me figure > out. We are using rsync to sync up files between our staging and > production ftp servers. Basically internal users are allowed to upload > files via a samba share to a staging server. Those files are then synced > out every 15 minutes via cron to our production ftp servers. > > The problem occurs when a large file is being upload from a windows > machine via the samba share. If a rsync is instantiated during the time > that the file is being uploaded the destination machines get the file with > the correct filesize and timestamp. Unfortunately, even though the file > shows the correct size it is not a good copy of the file. An md5sum of > the files on both the source and destination machine returns different > results. I believe this occurs because windows automatically "reserves" > the full size of the file and fills it out with 0s and then overwrites > this as it goes along copying. This wouldn't be a big deal except, > subsequent runs of rsync (even with -c) fail to overwrite the file. > > I have tested running an rsync while I was copying the file over to the > source directory locally (not via samba) and the file was corrupted. > However after running an rsync again, the file was updated. Basically the > problem only occurs when uploading the file via the samba share. > > I know I could use the -I flag to ignore times and filesizes, but from my > understanding this would result in resyncing every file every time, and > this is not what we want. We are currently using rsync 2.5.5 on the > source machine, and 2.5.7 on the 2 destination machines, but I have also > tested using 2.6.8 on both source and destination with the same results. > > Has anyone else experienced this problem before, or have any ideas for a > fix? Let me know if I'm not being clear enough, or any other information > I can provide. > > ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ > Mark Osborne > Web Systems Engineer > mark.osborne@ni.com > (512) 683-5019 > ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ > -- > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html >-- View this message in context: http://www.nabble.com/rsync-not-updating-files-even-with-checksum-flag-tf2584909.html#a8192939 Sent from the Samba - rsync mailing list archive at Nabble.com.
Possibly Parallel Threads
- large files not being synced properly while being uploaded to samba share
- files not being updated even with -c after being uploaded to a samba share
- way to make files not show up until completely transferred?
- Why FLAC, why not MAC?
- Need Help with rsyncd.conf